The Oxford Handbook of Persian Linguistics

Introduction
Oxford Handbooks Online

Introduction
Anousha Sedighi and Pouneh Shabani-Jadidi
The Oxford Handbook of Persian Linguistics
Edited by Anousha Sedighi and Pouneh Shabani-Jadidi
Print Publication Date: Aug 2018

Subject: Linguistics, Languages by Region, Historical Linguistics
Online Publication Date: Sep 2018 DOI: 10.1093/oxfordhb/9780198736745.013.1
Abstract and Keywords
This chapter offers an introduction to the volume by providing a brief description of its
thematic sections (Classification and History, The Sound System, Syntax, Language and
Words, Language and People, and Language, Mind, and Technology) and their related
chapters. These thematic sections cover both the theoretical and the applied aspects of
the Persian language and provide a comprehensive overview of the field of Persian
linguistics by discussing its development, capturing critical accounts of cutting edge
research, as well as outlining current debates and suggesting productive lines of future
research. The chapter also discusses the goals and ambitions of the editors in putting
such volume together. It also points out the logic behind the selection of the contributors
and addresses the challenges the editors endured throughout the project.
Keywords: Persian, phonetics, phonology, morphology, syntax, sociolinguistics, teaching Persian,

psycholinguistics, neurolinguistics, computational linguistics
IN an ideal world, there is continuous communication and collaboration among all the
subfields of linguistics. As such, linguistic theories are tested by the experimental studies
in order to get validated, and research on language acquisition feeds into hands of the
language curriculum developers and educators who provide feedback for the theorists.
What inspired us to edit this volume is the fancy of a world where scholars, researchers,
and students in all the fields and subfields of Persian linguistics, both theoretical and
applied, and in all the countries around the world are collaborating with one another. This
handbook is a step towards creating such a world.
Modern Persian belongs to the Western Iranian branch of the Indo-Iranian group of the
Indo-European language family. It is a descendant of the Middle Persian, the official
language of the Sasanian Empire (third century BCE– seventh century CE) and Old
Persian, the language of the Achaemenid Empire (sixth–fourth century BCE). Currently,
Page 1 of 6
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: Gothenburg University Library; date: 16 October 2018

Introduction
Persian is spoken by more than 110 million people, mainly in the three countries of Iran,
Afghanistan, and Tajikistan. Persian is sometimes known by its endonym ‘Farsi’, which
was the term used by all its native speakers until the twentieth century. Currently, due to
political reasons, it is called ‘Dari’ in Afghanistan and ‘Tajiki’ in Tajiksitan, while in Iran,
the term ‘Farsi’ has remained the local name of the language.
These three dialects of Persian are mutually intelligible by its native speakers. Persian of
Iran has borrowed many words from Arabic and French. In Afghanistan, the Dari Persian
is closer to the Middle Persian and it has borrowed more words from English. Tajikistan,
due to its being part of the ex-Soviet Union has borrowed many words from Russian.
Persian has adopted the Arabic writing system since the Arab conquest of Persia in
seventh century CE which is now utilized in Iran and Afghanistan. Tajiki Persian, on the
other hand, uses the Cyrillic alphabet. Other than these three countries, where Persian is
an official language, Persian is also spoken in many other parts of the world. Persian has
influenced other languages such as the Turkic languages, Armenian, Georgian, and Urdu.
It has also had some influence on Arabic especially Bahraini and Kuwaiti Arabic. For
centuries, Persian has been a prestigious cultural language in Central Asia, South Asia,
and Western Asia, and Persian literature has been described as one of the great
literatures of mankind.
Research on Persian linguistics had previously centred on historical linguistics,

(p. 2)
mainly on ancient languages of the greater Iran, including Old Persian, Avestan, Pahlavi,
and also Middle Persian. Within the past century, Modern Persian linguistics has become
an active topic of research and many noteworthy volumes on different subfields have
been published across the world. However, most of these volumes have been somewhat
fragmented focusing on certain subfields or have been gathered in a conference
proceeding format, which does not necessarily cover all the subfields of theoretical and
applied linguistics. The present handbook is thus the first comprehensive attempt to
bring together all subfields of historical, theoretical, and applied Persian linguistics into
one cohesive volume.
We have invited the internationally renowned leading scholars of major subfields of

Persian linguistics to contribute to this project either as an author or a reviewer. At the
same time and as much as possible, the volume aims to maintain accessibility to those
outside the immediate specialization of the authors so that the book can also be
informative for students and non-specialist readers. In order to have a broader spectrum
of contributors, we have included scholars from across the world to collaborate on this
project.
Of course, no road that is worth travelling is smooth, and we have had our share of
challenges in the journey of editing this volume. One of the biggest challenges was to
persuade the older generation of scholars, to whom Persian linguistics owes a great deal,
to contribute to the volume. This caused our project to be postponed in several occasions,
as a few of the original contributors that we had invited to write a chapter withdrew in
the middle of the journey, and we had to roll the wheel from point zero again. Another
Page 2 of 6

Introduction
challenge was to find contributors for certain chapters. As the readers might be aware,
the most-studied area of Persian linguistics is syntax, hence our having allocated three
chapters to different topics in Persian syntax. However, several subfields of Persian
linguistics are deeply under-studied as reflected in the short overview of literature within
those subfields. Hopefully the avid readers and students of Persian linguistics will ensure
to fill the gaps in the near future.
There are six parts to this volume, each of which contains several chapters.
Transliterations and transcriptions may vary based on the focus and topic of each
chapter. We have decided to honour the choice of terminology each author has chosen as
some of the authors have strong feelings about them. For instance, some chapters use the
term ‘Tehrani Persian’ and some use ‘Tehran Persian’. The same goes for the terms
‘Tajiki’ and ‘Tajik’. The discussions move from more general to more specific topics
encompassing historical, theoretical, and applied linguistics. Part I is on the history and
classification of the Persian language, discussing the linguistic change from Old Persian,
to Middle Persian, and finally to the New Persian, as well as the typological approaches
and dialects of Persian. In Chapter 2, Mauro Maggi and Paola Orsatti look at the evolution
of Persian and provide a description of the most significant features of Old, Middle, and
New Persian, with an analysis of the main changes over time. In Chapter 3, Mohammad
Dabir-Moghaddam describes the main morphosyntactic typological features of Modern
Standard Persian and discusses various linguistic features of a number of currently
spoken Persian dialects, namely Tajiki Persian, Afghan (Dari) Persian, Isfahani Persian,
and Gha’eni Persian.
Part II is on the sound system of Persian, encompassing phonetics, phonology, and the
prosodic structure of the Persian language. In Chapter 4, Golnaz Modarresi Ghavami
investigates the phonetic aspects of the sound system of Modern Persian. She introduces
the phonemes of Persian and discusses their articulatory as well as acoustic properties.
The chapter also discusses the phonetic aspects of suprasegmental features of stress and
intonation. In (p. 3) Chapter 5, Mahmood Bijankhan investigates the phonology of
Modern Persian according to the formal and colloquial speech data in the Tehrani dialect.
The chapter presents a phoneme inventory and discusses the syllable structure, the
phonological processes and rules as well as the interaction of violating markedness and
faithfulness constraints. In Chapter 6, Arsalan Kahnemuyipour discusses Persian prosody
at the various levels and its interaction with information structure. He also considers the
phonetic realization of prosodic prominence and intonation in Persian.
Part III is focused on Persian syntax, which as mentioned earlier, is the most widely
studied area of Persian linguistics. Hence, we have assigned three chapters to cover all of
the existing work and to capture the essence of Persian syntax and its language specific
features. In Chapter 7, Simin Karimi provides a descriptive overview of some of the major
syntactic and morphosyntactic properties of Persian and introduces the major literature
provided by grammarians and linguists inside and outside Iran.
Page 3 of 6

Introduction
While Chapter 7 is mainly a description of generative approaches to Persian syntax,

primarily from the perspective of the minimalist framework, other approaches to Persian
syntax have been introduced and discussed in Chapter 8. In this chapter, Jila Ghomeshi
presents an overview of some of the other theoretical approaches to Persian syntax and
showcases the way in which aspects of Persian syntax have been addressed within a
number of different approaches. While the focus is on Persian, some of the discussion of
this chapter extends to languages in the same family as well as contact languages. In
Chapter 9, Pollet Samvelian discusses the three controversial features of Persian syntax:
the Ezāfe construction, the enclitic rā, and complex predicates. In this chapter, language-
specific challenging facts in each of these three phenomena are described and accounted
for. These language-specific phenomena can also be the topic of cross-linguistic
investigation and hence the Persian data is of crucial importance.
Part IV, is on language and words, encompassing topics such as morphology,

lexicography, and the Academy of Persian Language and Literature. In Chapter 10,
Behrooz Mahmoodi-Bakhtiari discusses Persian morphology, including lexical and
functional morphemes, and looks at a large array of pronouns, nouns, adjectives, adverbs,
as well as tense and verbal morphology. He also looks at processes such as compounding
and minor word-formation types in Persian. In Chapter 11, Seyed Mostafa Assi offers a
concise chronological overview of Persian lexicographic tradition and discusses the
recent advances and developments in the lexicographic publications. In Chapter 12,
Mohammad Dabir-Moghaddam introduces the Academy of Persian Language and
Literature by giving an overview of the developments that led to the establishment of the
first, second, and third Iranian academy for word-selection and the activities and
contributions of this academy.
Part V focuses on language and people, including topics such as sociolinguistics, language
contact and identity, as well as pedagogy. The Persian-speaking countries, which
presently include Iran, Tajikistan, and Afghanistan, create a great laboratory for
sociolinguists. In Chapter 13, Yahya Modarresi introduces Persian sociolinguistics and
discusses concepts such as dialect studies, social variations, contacts, borrowing, and
code-switching. While the official language of Iran is Persian, the mother tongue of a
large group of Iranians is a language other than Persian. In Chapter 14, Shahrzad
Mahootian focuses on the indigenous languages of Iran and discusses the struggles faced
by minority languages similar to that of the other multilingual nations. The subsequent
chapter looks at the other side of the same coin. The Persian-speaking countries have
undergone much political turmoil within the last (p. 4) several decades that have created
mass migration of Persian speaking population to the west. Thus the issue of language
maintenance among second generation Persian speakers comes to importance. In
Chapter 15, Anousha Sedighi examines the characteristics of heritage Persian speakers in
terms of their linguistic and metalinguistic abilities, compares their profiles with that of a
native speaker and a second language learner and sheds light on the current challenges
within this field. In Chapter 16, Pouneh Shabani-Jadidi and Anousha Sedighi explore
Page 4 of 6

Introduction
teaching Persian to speakers of other languages from a variety of perspectives including

language acquisition, pedagogy, assessment, and curriculum development.
Part VI investigates language, mind, and technology through a more experimental lens.
The relevant topics include psycholinguistics, neurolinguistics, and computational
linguistics. In Chapter 17, Pouneh Shabani-Jadidi discusses existing studies in Persian
psycholinguistics and provides the grounds for comparison between Persian
psycholinguistics and psycholinguistic studies in other languages. In Chapter 18, Reza
Nilipour discusses current neurolinguistics research from two major subfields of
neurolinguistic studies on monolingual and bilingual speakers of Persian: the
patholinguistic studies in brain damaged speakers, and the first experimental fMRI
studies on healthy monolingual and bilingual speakers of Persian. This volume would be
incomplete without discussing the machine language. In the last chapter, Chapter 19,
Karine Megerdoomian provides an overview of the Persian computational linguistics by
presenting the essential components of computational linguistic analysis, discussing the
main challenges of Persian within this field, and showcasing some of the important
resources and methodologies developed in the field.
Our intention as the editors of this volume has been to create a central reference for
theoretical and applied linguists interested in Persian as well as for scholars specializing
in comparative grammar. The contributions are also aimed to highlight crucial problems
and to present potential solutions or suggest further lines of research. An important goal
of this volume is to enhance communications and collaborations among the scholars of
different subfields of Persian linguistics across the globe. Today we are living in a world
where scholars are more and more specialized and less and less polymathic, which
reduces dialogue and unity within the field. The editors hope that the present volume
helps establish a platform where ideas of collaborations emerge. The Oxford Handbook of
Persian Linguistics, thus, in one volume, gives critical expression to this language and
builds a unifying bridge among its scholars.
Anousha Sedighi
Anousha Sedighi is Professor of Persian and Persian Program Head at Portland State
University. She received her PhD in Linguistics from the University of Ottawa in
2005. She has published on syntax, morphology, and teaching Persian as a heritage
language. Her first book, Agreement Restrictions in Persian, was published in 2008
by Rozenberg and Purdue University Press and republished in 2011 by Leiden
University Press and University of Chicago Press. Her second book, Persian in Use:
An Elementary Textbook of Language and Culture, was published in 2015 by Leiden
University Press and University of Chicago Press. She served as the President of the
American Association of Teachers of Persian (2014–16) and she currently sits in the
executive board of the National Council of Less Commonly Taught Languages.
Pouneh Shabani-Jadidi
Page 5 of 6

Introduction
Pouneh Shabani-Jadidi is Senior Lecturer of Persian Language and Linguistics at

McGill University. She holds a Ph.D. in Linguistics from the University of Ottawa
(2012) as well as a Ph.D. in Applied Linguistics from Tehran Azad University (2004).
She has taught Persian language and linguistics as well as Persian literature and
translation at McGill University, the University of Oxford, the University of Chicago,
and Tehran Azad University since 1997. She has published on morphology,
psycholinguistics, translation, teaching Persian as a second language, and second
language acquisition. Some of her representative publications are Processing
Compound Verbs in Persian: A Psycholinguistic Approach to Complex Predicates
(Leiden and University of Chicago Press, 2014), The Routledge Introductory Persian
Course and The Routledge Intermediate Persian Course (2010, 2012, with Dominic
Brookshaw), as well as What the Persian Media Says (Routledge, 2015). She is the
translator of The Thousand Families: Commentary on Leading Political Figures of
Nineteenth Century Iran, by Ali Shabani (Peter Lang, 2018, with Patricia Higgins).
She serves as reviewer for International Journal of Iranian Studies, International
Journal of Applied Linguistics, Sage Open Journal, Frontiers in Psychology,
International Journal of Psycholinguistic Research, and LINGUA. Currently, she is
president of the American Association of Teachers of Persian (2018–2020).
Access is brought to you by
Page 6 of 6

From Old to New Persian

Mauro Maggi and Paola Orsatti

Subject: Linguistics, Historical Linguistics, Languages by Region
This chapter looks at the evolution of Persian, the only language to be substantially
documented in all three periods of Old, Middle, and New Iranian on account of its close
association with political centres over the centuries: Old and Middle Persian with the
Achaemenids and the Sasanians, New Persian with Islamic powers. The chapter includes
two parts, preceded by a survey of research on the three stages of Persian. The first part
presents the documentation of Old and Middle Persian, discusses the innovations of Old
Persian, and considers the transition from Old to Middle Persian. The second part deals
with the rise of New Persian by taking into account Early Judaeo-Persian, Persian in
Syriac script, Manichaean New Persian, and the early texts in Arabic script. It then
discusses the main changes of the language in its literary and non-literary varieties until
Contemporary New Persian.
Keywords: history of the Persian language, Old Persian, Middle Persian, New Persian, Judaeo-Persian
2.1 Over 2,500 years of Persian

PERSIAN had its cradle in and owes its name to the south-western region of Iran called
Pārsa in Old Persian (Middle Persian Pārs, New Persian Pārs, Fārs) and Persis in Greek.
Among the Iranian languages, that are conventionally divided into the three stages of
Old, Middle, and New Iranian, Persian occupies a special position in that it is the only one
to be substantially documented in all three periods as Old, Middle, and New Persian. This
depends on its close connection with the main political centres for most of the time over
the centuries. Old Persian was the language of the ruling dynasty of the Achaemenid
empire from the sixth to the fourth century BC and, after the long interval of Greek and
Page 1 of 62
Subscriber: Freie Universitaet Berlin; date: 29 September 2018

Parthian suzerainty over Iran, Middle Persian was the language of the ruling dynasty of
the Sasanian empire from the third to the seventh century AD. Subsequently, New Persian
was associated with Islamic powers: the Iranian Persian speaking, Islamized armies that
conquered eastern Iran and Transoxiana; the Tahirid, Saffarid, and Samanid courts under
the Abbasid caliphate at the very origins of the New Persian literary language between
the ninth and the tenth centuries (Lazard 1975a: 595–6, 601–2; Perry 2009a: 52–3); the
non-Iranian Persianate dynasties from the end of the tenth century with the Ghaznavids
to the early twentieth century with the Qajars; and finally the Persian Pahlavi ruling
house in the twentieth century. Though the wide area where Persian was spoken
underwent a significant reduction after the second half of the eleventh century due to the
spread of Turkic peoples (section 2.21), Persian was an important literary and prestige
language far beyond the Persian speaking area all over the Islamic period. The Turkic
dynasties that succeeded one another almost uninterruptedly for nine centuries in the
Persian speaking territories had a major role in spreading the Persian culture and
literature in large areas of Asia. Thus, an important chapter in the history of Persian
literature is comprised of works produced in India from the late Ghaznavid dominion over
north-western India in the eleventh and twelfth centuries; Persian kept its function as the
learned and official language in India until 1834; and it was the language of official
correspondence and diplomacy, as well as a literary language from Ottoman Turkey to
Indonesia. In the present time, the three major varieties of Persian are official (p. 8)
languages in the modern states of the Islamic Republic of Iran (Contemporary New
Persian of Iran), Afghanistan (Afghan Persian, officially called dari, Pashto being another
official language), and Tajikistan (Tajik Persian or tojikī) (see also Chapters 3, 11, 13, and
19). In addition, Persian is nowadays spoken by naturalized communities in neighbouring
countries, including Pakistan, Turkey, the United Arab Emirates, Azerbaijan, Uzbekistan,
and other Central Asian countries, as well as in Europe and North America. It is
accordingly possible to follow the historical development of the Persian language over the
centuries for more than 2,500 years.1
2.2 Research on Old Persian

A critical bibliography of linguistic studies on Old Persian in the last three decades is
offered by Rossi (2008: 95–111), and a quick general survey by Schmitt (2013: 233–5).
Schmitt (1989, 2004) and de Vaan and Lubotsky (2009) offer general presentations of the
language. Schmitt (2009) has an up-to-date edition and translation of the entire Old
Persian corpus, while Lecoq (1997) presents a complete translation of the inscriptions
accompanied by a thorough treatment of Achaemenid culture. Schmitt (2016) classifies
the stylistic phenomena of the inscriptions.
For grammar, traditionally approached in a historical perspective, Kent (1953) and

Brandenstein and Mayrhofer (1964) still prove useful. Skjærvø (2009b) gives a
Page 2 of 62

comprehensive updated overview of Old Persian grammar in the framework of Old

Iranian (see also Testen 1997 on phonology and Skjærvø 2007 on morphology).
Schmitt (2014) updates and summarizes information on Old Persian vocabulary. Hinz
(1975) and Tavernier (2007) study the substantial Old Iranian element, including Persian,
in other languages (Old Persian is further discussed in Chapters 3 and 11).
2.3 Research on Middle Persian

Rossi (1975) offers a well-organized bibliography on Middle Persian but is limited to the
years 1966–73. Nawabi (1987: 262–384) has an exhaustive bibliography on Middle
Persian along with Parthian. More recently, Durkin-Meisterernst (2013) surveys the
studies on Middle Persian in the framework of Middle Iranian.
Sundermann (1989b) and Hale (2004) offer sketches of the language (see also Weber
1997 and 2007 on phonology and morphology). Klingenschmitt (2000) is important from
(p. 9) the standpoint of historical linguistics. Durkin-Meisterernst (2014) provides now a
thorough and up-to-date treatment of all aspects of the grammar of Middle Persian (see
also Rastorgueva and Molčanova 1981) and Parthian (see also Skjærvø 2009a), including
supplements to Brunner (1977) on syntax.
Standard Middle Persian dictionaries are MacKenzie (1971) for Zoroastrian texts (see also
Nyberg 1974), Durkin-Meisterernst (2004) for Manichaean texts, and Gignoux (1972: 3–
39) for the inscriptions. Proper names are dealt with by Gignoux (1986, 2003) and
Zimmer (1991). The compilation of a comprehensive Middle Persian dictionary is
underway (Shaked and Cereti 2005).
The sections on Middle Persian in the ground-breaking survey of Middle Iranian by

Henning (1958: esp. 21–7, 30–7, 43–52, 58–79, 89–92, 97–104) contribute a wealth of
information to the history of the language from its very first, sparse documentation in the
third century BC onwards.
See sections 2.9.1–2.9.4 for references and information on the corpus of Middle Persian
writings (see Chapters 3 and 11 for more discussion on Middle Persian).
Page 3 of 62

2.4 Research on the history of New Persian

A comprehensive history of the New Persian language is still a desideratum. However,
there are studies on different stages or aspects of Persian, as well as good grammatical
descriptions. A critical discussion of New Persian grammatical studies is given by
Windfuhr (1979); a bibliography of linguistic studies is offered by Ahadi (2002), and a
critical survey by Ludwig Paul (2013a). As to historical grammar, scholars have at their
disposal only the one by Darmesteter (1883). Horn’s description of New Persian (1898–
1901) and Hübschmann’s work (1895) on Persian etymology and historical phonology are
still useful. On New Persian etymology, a reference work has been recently provided by
Ḥasandust (2014).
Among the three main periods considered below (section 2.12), only the first one (Early
New Persian) and the last one (Contemporary New Persian) have been studied to some
extent from a purely linguistic perspective, while for the second period (Classical New
Persian) one has mainly to rely on research based on a stylistic approach (Bahār 1942).
Early New Persian has been further discussed in Chapters 3 and 4. For contemporary
literary or standard New Persian of Iran, a comprehensive reference grammar is provided
by Rubinčik (2001), to which the descriptions by Lazard (1989), Perry (2007), and
Windfuhr and Perry (2009) should be added. Phillot’s grammar (1919) is still useful. Less
attention has been paid to the spoken informal variety: apart from more or less detailed
information in some grammars (e.g. Meneghini and Orsatti 2012: 255–63) or independent
studies (e.g. Alfieri and Barbati 2010), the reference work for the spoken informal variety
is that by Lazard (1957).
For literary Early New Persian, Gilbert Lazard’s ample description (1963) of the language
of the most ancient prose texts of New Persian literature is destined to remain the
standard authority for many years to come; for the language of ancient New Persian
poetry see Lazard (1964: vol. 1, 41–6). In Persian, a comprehensive linguistic study of
both prose and verse New Persian texts up to the mid-thirteenth century is given by
Xānlari (1986). Very good (p. 10) studies of the language of single authors are offered by
editors of classical texts like Maḥjub (1959: 33–56) in the preface to his edition of
Gorgāni’s poem Vīs va Rāmīn, or Shafiʿi Kadkani (1987a: 181–209) in the preface to his
edition of Asrār al-tawḥīd by Muḥammad b. Munawwar. For the language of the Šāhnāme,
Wolff’s glossary (1935) is still a valuable research tool.
As to studies more specifically related to the history of the language, a comprehensive

and still useful reconstruction of the earliest attestations of New Persian was offered by
Henning (1958: 77–81, 86–9). Orsatti (2007b: 102–72) provides a critical survey of the
most ancient New Persian documents in Hebrew, Syriac, and Manichaean scripts. The
verbal system of literary New Persian from the eleventh to the sixteenth centuries is the
subject of a thorough study by Lenepveu-Hotz (2014), who also takes into account Early
Page 4 of 62

Judaeo-Persian documents. For the actual phonetic reality of Classical New Persian, Meier
(1981) provides us with a true mine of information, mainly based on the analysis of
rhymes in early and classical poetry. Telegdi (1955) offers an important historical, mainly
lexical study of Persian verbs with the ‘prefixes’ bar, dar, farā, foru, and bāz ~ vā.
Two volumes gathering articles on various aspects of the history of Persian have been
recently edited by Paul (2003b) and Maggi and Orsatti (2011). Quite useful is the
publication of a volume collecting Lazard’s articles on the formation of the New Persian
language (1995) and a volume collecting Utas’s contributions to the history of Persian
(2013). Finally, mention should be made of recent multi-author works such as the one
edited by Karimi, Stilo, and Samiian (2008).
2.5 Old Persian: documentation, use, script,

and parallel tradition
Old Persian is documented in the inscriptions of the Achaemenid kings (558–330 BC).
These epigraphic texts—which mark the beginning of writing in Iranian languages—are
free from modifications due to textual tradition, but form a comparatively small corpus
(Lecoq 1997; Schmitt 2009; cf. Huyse 2009: 73–83). Most inscriptions are from Fars,
ancient Elam, and Media, that is, from the first regions which the Persians occupied and
annexed in the seventh and sixth centuries BC after their immigration to south-western
Iran and which formed the core of their empire. The inscriptions date from the time from
Darius I (522–486 BC) to Artaxerxes III (359–338 BC), but most of them are from the
times of Darius I and Xerxes I (486–465 BC). Later texts are short, repetitive, and mostly
not accompanied by versions in other languages, unlike the earlier inscriptions, that
display Elamite, Babylonian, and, if produced in Egypt, Egyptian texts beside the Old
Persian ones as a mark of continuity with the previous powers whose territories had been
incorporated into the Persian empire. For more information on Elamite, refer to Chapter
3.
Though Old Persian was the Iranian dialect spoken in Fars and the native tongue of the
Achaemenids, the language of the inscriptions is a formal language with many loanwords
and an archaizing character. Old Persian as we know it from the inscriptions with its
special features was meant as a means to promote the prestige of the kings and their
feats. Its written use was, thus, very delimited. The same holds true for the quasi-
alphabetic writing system of the cuneiform Old Persian script (Hoffmann 1976; Lecoq
1997: 59–72, 285), which (p. 11) imitated ealier writing systems of ancient Near East in
the use of wedge-shaped marks and was not conceived for everyday usage, but as a
prestige script for a prestige language. This is confirmed by the fact that, after an
incubation period, the script was first adopted extensively for adding the Old Persian text
(Schmitt 1991) to the original Elamite and Bablonian texts of Darius I’s Bisotun
inscription aimed at royal self-portrayal and propaganda following his contrasted
Page 5 of 62

accession to the throne, and that it was employed for epigraphical texts partly
inaccessible and, thus, not intended to be actually read. This is precisely the case, for
instance, of the Bisotun inscription engraved into the rock more than 20 m above the
closest point reachable by climbing and 60 m above the nearby caravan trail.
Old Persian did not spread across the multiethnic Achaemenid empire, where a large
number of languages were in use (Schmitt 1993). The language of the central
administration, the official correspondence, and the local administration in some
provinces was the so-called Official Aramaic, while Persian had virtually no role in the
actual administration of the empire (see Chapter 3 for more information on Aramaic).
Even for the court administration in Persepolis the language used was Elamite, just like
Babylonian in Babylonia, Egyptian in Egypt, and Greek and other local languages in Asia
Minor. A consequence of the multilingualism of the Achaemenid empire is the occurrence,
in foreign language sources, of numerous Old Persian and other Iranian words and names
that are not preserved in the comparatively small textual corpus and form the so-called
parallel tradition (Hinz 1975; Tavernier 2007).
2.6 Old Persian innovations

A number of innovations characterize Old Persian as against the other Iranian languages
(Iranian languages are further discussed in Chapter 3). The most conspicuous
phonological changes—which enable one to distinguish in part genuine Persian words
from loanwords (section 2.7)—are the following (Schmitt 1989: 68–70):
1) Old Persian ϑ, d, d, as against s, z, z in the other Iranian languages, from Iranian

*ts, *dz, *dz resulting from the Indo-Iranian palatals *ć, *ȷ́, *ȷ́ʰ (cf. Vedic ś, j, h) < Indo-
European *ḱ, *ǵ, *ǵʰ: for instance, Old Persian *daϑa ‘ten’ (> Middle and New
Persian dah) indirectly attested in *daϑa-pati ‘decurion’, *daϑa-pa- ‘decury’, and
*daϑa-hva- ‘one tenth’ of the parallel tradition (Tavernier 2007: 419, 451, 455), but
Young Avestan dasa, cf. Vedic dáśa < Indo-Iranian *dáća; Old Persian present stem
dā-nā- ‘to know’ (> Middle and New Persian dān-), but Avestan zā-nā-, cf. Vedic jā-nā́-
< Indo-Iranian *ȷ´ā-nā́-; Old Persian adam ‘I’ (> Middle Persian an), but avest. azəm,
cf. Vedic ahám < Indo-Iranian *aʰám.
2) Old Persian ç (= [ss]?) from Iranian *ϑr resulting from Indo-Iranian *-tr- and
preserved elsewhere as such or in its continuations: for instance, Old Persian puça-
‘son’ (> Middle Persian pus and pusar [with -ar by analogy with other nouns of
relationship] > New Persian pesar), but Avestan puϑra-, cf. Vedic putrá- < Indo-
Iranian *putrá-; Old Persian xšaça- ‘kingdom, kingship, power’, but Avestan xšaϑra-
(or the borrowed Middle Persian > New Persian šahr), cf. Vedic kṣatrá- < Indo-
Iranian kšatrá-. (p. 12)
3) Old Persian s, as against sp elsewhere apart from Khotanese and Wakhi š, from
Indo-Iranian *ću̯ < Indo-European *ḱu̯: for instance, Old Persian asa- ‘horse’ (also in
Page 6 of 62

asa-bāra- ‘horseman’ > Middle Persian aswār > New Persian savār), but Avestan
aspa- (or the borrowed Middle Persian asp > New Persian asb) and Old Khotanese
aśśa- [aša-], cf. Vedic áśva- < Indo-Iranian *áću̯a-.
4) Old Persian šiy from the Iranian cluster *ϑi̯ resulting from Indo-Iranian *ti̯ and
preserved elsewhere: for instance, Old Persian hašiya- ‘true’, but Avestan haiϑiia- <
Iranian *haϑi̯a-, cf. Vedic satyá- < Indo-Iranian *sati̯á-.
In the nominal and pronominal inflections, the Indo-European and Indo-Iranian eight case
system (nominative, accusative, vocative, genitive, dative, ablative, instrumental,
locative), which is still preserved in Avestan, was reduced to six cases in Old Persian in
that all functions of the dative were subsumed by the genitive endings and the ablative
virtually merged with the instrumental. Moreover, several other originally differing
endings came to coincide because of the loss of most final consonants, so that, for
example, a single ending -āyā stands for the genitive-dative, ablative (< Iranian *-āyah),
locative, and instrumental singular (< Iranian *-āyā) of the ā-declension nouns (see
Chapter 9 for more discussion on case).
Likewise, the verbal system exhibits restructuring with losses and innovations (Skjærvø
1985). Notably, there is no longer any opposition of aspect between the rare aorist forms
and the prevailing imperfect, which denotes both progressive and completed action (see
section 2.6.1 on the inherited perfect and a new periphrastic past tense), so that the
formal third singular aorist active adā ‘he created’, preferred by Darius I and others in
the solemn formula seen in (1a), interchanges in otherwise virtually indentical contexts
with Darius’s two occurrences of the more colloquial imperfect adadā (1b–c):2
(1)
Peculiar of Persian from its earliest stage are also some lexical items. Thus, the Indo-
Iranian verbs for ‘to speak, say’ *u̯ač- and *mrau̯H-/mruH- are continued in Avestan as
vac- and mrū- (cf. Vedic vac- and brav-), but are replaced in Old Persian by verbs that
(p. 13) apparently were originally used with honorific force: θanh- ‘to say’ (cf. Middle
Persian saxwan ‘word, speech’ > New Persian saxon, soxan) from an original meaning ‘to
praise, announce’ witnessed by Avestan saŋh- ‘to announce, declare’ and Vedic śaṃs- ‘to
praise, announce’; and gaub- ‘to say’, only attested in the middle diathesis with the
meaning ‘to call oneself’ (> Middle Persian gō(w)-, guftan ‘to say, speak’ > New Persian
gū(y)-, goftan), from an original meaning ‘to praise, announce’ witnessed by Sogdian γwβ-
Page 7 of 62

‘to praise’, Choresmian γwβ(y)- ‘to praise oneself, boast, be proud’, etc. Similarly, Old
Persian does not continue Indo-Iranian *ćrau̯-/ćru- ‘to hear’ (cf. Avestan sru- and Vedic
śrav-),4 but has the vivid metaphorical ā-xšnu- ‘to hear’ ← ‘to sharpen (the ears)’ (>
Middle Persian āšnaw-, āšnūdan, cf. New Persian šenav-, šenudan), whose original
meaning is preserved in Avestan hu-xšnuta- ‘well-sharpened’ and Vedic kṣṇav- ‘to whet,
sharpen’ (cf. Schmitt 1989: 84; Cheung 2007: 113–14, 334, 456–7).
2.6.1 New perfect and pluperfect
The old synthetic perfect occurs only once in the optative (caxriyā third singular active to
kar- ‘to do’ in DB 1.50) and is actually replaced, in the indicative, by a new periphrastic
formation with resultative value (Skjærvø 2009b: 144–5). This new perfect consists of the
-ta- past participle and the auxiliary ah- ‘to be’, which is omitted in the third singular, and
occurs with both intransitive (2) and transitive verbs (3):
(2)
(3)
When the copula is in the imperfect, the formation has pluperfect value (4):
(4)
(p. 14)
Since the -ta- past participle has a passive meaning with transitive verbs, their new
perfect is also only passive and contrasts with the present and the imperfect that have
Page 8 of 62

both active and passive constructions. When an agent is expressed, this is in the genitive-
dative (5):
(5)
This construction with the agent in the genitive-dative, often referred to as the ‘manā
kr̥tam construction’, is the systematization of inherited expressions occasionally found in
Avestan (Haig 2008: 23–88; Jügel 2015: 68–80, 322–4, 571–8) and is at the origin of the
Middle Persian ergative construction (section 2.10.5).
The so-called ‘potential construction’ consisting of a past participle and the verbs kar- ‘to
do’ (active) and bav- ‘to become’ (passive) expresses (successful) completion of an action
in Old Persian (Filippone 2015).
2.6.2 Beginnings of the ezafe construction
The relative pronoun haya-/taya-5 used to join a modifier to a usually preceding

substantive (Kent 1953: 85; Skjærvø 2009b: 100–1) is another construction which is also
found in Avestan and developed further as the Middle Persian relative particle (section
2.10.2) and the New Persian ezafe. For more discussion on ezafe, refer to Chapters 3, 6,
7, 9, and 19. The modifer can be an apposition, an adjective (6), or a modifying noun or
pronoun (7):
(6)
Page 9 of 62

(7)
With appositions, the case of the relative pronoun and the apposition is the same
(p. 15)
as that of the modified noun: nominative Gaumāta haya maguš (DB 1.44), accusative
Gaumātam tayam magum (DB 1.49–50) ‘Gaumāta the Magian’.
Page 10 of 62

2.7 Loanwords in Old Persian

As there are Old Persian words and names in the parallel tradition, so there are a number
of loanwords in the Achaemenid inscriptions. A Semitic word such as Aramaic mašk ‘skin’
with the emphatic state suffix -ā (rather than Babylonian mašku) was borrowed as the -ā-
declension word maškā- (> Middle and New Persian mašk ‘leather bottle’) to refer to the
‘(inflated) skins’ used by Darius’s army as floats to cross the Tigris (DB 1.86: Schmitt
2014: 213).
Most loanwords, however, concern kingship and administration and are probably of
Median origin, though the phonological developments observed in these loanwords differ
from the Old Persian ones but are not specifically Median: because the Persians had been
subject to the Medes until the conquest of Media by Cyrus II (558–530 BC), it is only
natural that the Persians regarded themselves as their political heirs and took up their
political terminology. Thus, on the one hand, it is virtually certain that the epithets vispa-
zana- ‘having all (kinds of) men’ and uv-aspa- ‘having good horses’ that qualify the empire
(and contrast with everyday Old Persian visa- ‘all’ and asa- ‘horse’ < Indo-Iranian *u̯íću̯a-
and *áću̯a-) come from Median because the outcome sp of Indo-Iranian *ću̯ is documented
by the Median form spáka ‘bitch’ quoted by Herodotus (Histories 1.110.1; cf. Old Persian
*saka- ‘dog’ > Middle and New Persian sag). On the other hand, one can only postulate a
Median origin for xšāyaϑiya- ‘king’ (> Middle and New Persian šāh), with -ϑiy- instead of
expected Old Persian -šiy- < Iranian *-ϑi̯-, because the Median outcome of Iranian *-ϑi̯- is
not otherwise known (Schmitt 1984: 185–96).
Part of the political terminology adopted from Median goes ultimately back to earlier
Near Eastern formulas: for example, the expression vašnā Auramazdāha ‘by the
greatness/might of Ahuramazdā’ (Skjærvø 2007: 903, 935, instrumental of *vazar-/vašn-
‘greatness’, cf. vazr̥ka- ‘great’ > Middle Persian wuzurg > New Persian bozorg)
correponds to Urartian Ḫaldinini alsuišini/ušmašini ‘by the greatness/might of Ḫaldi’; and
the title xšāyaϑiya xšāyaϑiyānām ‘king of kings’ corresponds to Babylonian šar šarrāni.
Both formulas betray a non-Iranian origin because the modifying genitives (Auramazdāha,
xšāyaϑiyānām) follow the modified nouns instead of preceding them, as is commonly the
case in Old Iranian. The regular word order of the title was restored in Middle Persian
šāhān šāh > New Persian šāhan-šāh (Meillet and Benveniste 1931: 14–15; Colditz 2003:
63–4).
2.8 From Old to Middle Persian

Grammar and spelling mistakes frequently found in the inscriptions of Artaxerxes I (465–
24), II (404–359), and III point to a language already approaching Middle Persian with
confusion and loss of endings and conflation of different antecedents. Some of the
Page 11 of 62

mistakes are unsuccessful endeavours to restore the by then archaic forms (Kent 1953:
23‒4; Schmitt 1989: 60): for example, the genitive-dative singular (formed by appending
the a-declension (p. 16) ending -ahyā also to nominatives of other declensions) occurs
instead of the nominative and vice versa in genealogies from Artaxerxes I onwards and, in
an inscription of Artaxerxes III, the regular i-declension singular accusatives būmim
‘earth’ and šiyātim ‘happiness’ (e.g. DNa 2, 4) are replaced by būmām and šāyatām (A³Pa
2, 4) with the more common ā-declension ending. The latter mistake also reveals that the
word had become šāt or the like by the mid-fourth century (cf. the historical Pahlavi
spelling <šʾtˈ> for the Middle Persian adjective šād ‘happy’ < Old Persian šiyāta-) because
the -ā- resulting from -iyā- was erroneously restored as -āya- on account of the coincident
outcome -ā- from earlier -āya- found, for instance, in xšāyaϑiya- ‘king’ > Middle Persian
šāh (see Schmitt 1999: 59–118 for a detailed study of the features of ‘Late Old Persian’).
During the long Greek and Parthian domination of Iran by the Seleucids (305–125 BC)
and the Arsacids (247 AC–224 BC), the documentation of Persian is scarce and provides
little linguistic information: a damaged and hardly readable inscription on Darius I’s tomb
at Naqš-e Rostam near Persepolis, where the words ḥšʾyty wzrk ‘great king’, mʾhy
‘month’, and possibly slwk ‘Seleucus’ have been recognized, is thought to have been
written phonetically in Early Middle Persian in Aramaic script at the request of some
noble Persian in the early Seleucid period (Boyce and Grenet 1991: 118–20); the legends
on the third series of coins of the rulers of Fars, where <BRH> (from Aramaic <brh> br-
eh ‘his son’ with suffixed pronoun inappropriate to the context instead of <br> bar one
would expect if the legends were actually written in Aramaic) must stand for Middle
Persian pus ‘son’, attest to the use of aramaeograms (conventionally transliterated by
capital letters) for writing Persian from about the end of the second century BC (Henning
1958: 25); and an inscription on a bowl from the time of Ardašīr II, king of Fars in the
second half of the first century BC and a vassal of the Arsacids, is the first known and
readable Middle Persian inscription (Skjærvø 1997a).
Under the Arsacid dynasty, Parthian gained a dominant position in Iran as the vehicle of
Iranian culture, including oral epic poetry (Boyce 2003), and this caused a first batch of
Parthian words, recognizable from phonological changes contrasting with the Persian
ones, to enter Middle Persian (cf. section 2.9 on later Parthian loanwords), whence they
then reached New Persian as in the case of such a political term as Parthian šahr →
Middle Persian šahr ‘kingdom, country; city’ > New Persian šahr ‘city, town’ (as against
Old Persian xšaça- < Iranian *xšaϑra-: see Tedesco 1921: esp. 198–9 on -hr- and cf. section
2.6, no. 2) or such a term common in military and epic contexts as Parthian asp → Middle
Persian asp > New Persian asb ‘horse’ (as against Old Persian asa- < Indo-Iranian *áću̯a-:
cf. section 2.6, no. 3).
2.9 Middle Persian: documentation and scripts
Page 12 of 62

Middle Persian formed in post-Achaemenian times as a development of Old Persian and

its use was confined to Fars until the rise of the Sasanian dynasty (224–651 AD), when it
began not only to be more substantially employed in writing, but also to spread outside
its region of origin, as it became the language of administration and communication in
the Sasanian empire. Middle Persian continued to be used as a living language for a while
in post-Sasanian times and as a church language by the Zoroastrians in Iran and India
and the Manichaeans (p. 17) in Chinese Central Asia. Major Middle Persian texts date
from the third century, on account of the connection of the language with the ruling
dynasty, and the ninth century, when it enjoyed a revival due to the endeavour on the part
of the Zoroastrians to preserve their religious tradition after the spread of Islam. The use
of Middle Persian spans, thus, over many centuries and well beyond Fars. This is the
reason why various linguistic stages and developments are mirrored by the different text
groups that document it (survey in Durkin-Meisterernst 2014: 15–25 with references).
2.9.1 Inscriptional Middle Persian
The first substantial text group are the inscriptions (Huyse 2009: 90–102). Most
important and comparatively extensive are the third-century ones of the Sasanian kings
Šābuhr I (241–72) on the Kaʿbe-ye Zardošt at Naqš-e Rostam and Narseh I (293–302) at
Pāikūlī in Iraqi Kurdistan, the prominent Zoroastrian priest Kerdīr, and the court
dignitary Abnūn. The other royal inscriptions (none are known after Šābuhr III, 383–8),
the coin legends, and the inscriptions on seals, gems, bullae, and vessels provide less
linguistic information.7 Similarly to what happened with the Achaemenids, only the first
Sasanians Ardašīr I (224–40) and Šābuhr I produced trilingual inscriptions in Middle
Persian as well as in Parthian and Greek in continuity with the previous imperial powers
of the Seleucids and the Arsacids, while Šābuhr I already gives up Greek in a few
inscriptions and Narseh in the Pāikūlī inscription uses only Parthian besides Middle
Persian. Inscriptions by subsequent, fourth-century kings are in Middle Persian only.
2.9.2 Manichaean Middle Persian
A second text group, essential for the study of Middle Persian phonology, is the
Manichaean literature in Middle Persian (Sundermann 2009) initiated in the third century
by Mani himself (216–77), the founder of the Manichaean religion. He was at the court of
King Šābuhr I and dedicated him a description of his doctrine in Middle Persian titled
Šābuhragān, which survives in comparatively extensive fragments. A number of other
dogmatic, homiletic, and hymnic works composed by Mani and his disciples and followers
are known from Middle Persian manuscript fragments recovered from Turfan in Chinese
Central Asia. The bulk of them, either translations or original compositions, must go back
to the time when Persian speaking Manichaeans were still in Iran, before escaping
persecution by the Zoroastrians, and bears witness to the language as spoken in the early
Sasanian centuries (Durkin-Meisterernst 2014: 9–11), though a few texts may have been
Page 13 of 62

produced in Central Asia and some late features are occasionally detected (Durkin-
Meisterernst 2003).
(p. 18) 2.9.3 Zoroastrian Middle Persian
Since Narseh’s Pāikūlī inscription (between 293 and 296) is the last royal inscription with
a Parthian version and none of the few short private Parthian inscriptions in Parthia is
presumably later than the fourth century, it is likely that, at some point, the Sasanians
imposed Middle Persian as the only official and written language of Iran with the
consequence that it gradually spread everywhere from the fourth century on and even
became the only recognized language of Zoroastrianism as the state religion of the
Sasanian empire, thus marking the cultural triumph of Persia. The replacement of
Parthian by Persian outside Fars brought about, by reaction, the introduction of a further
number of Parthian loanwords in Persian (Boyce 1979: 116–17; see also section 2.14 with
n. 22). This is why, whereas the Manichaean texts basically represent genuine Middle
Persian in its provincial purity (Sundermann 1989b: 139), a large number of Parthian
loanwords characterizes conversely the late Sasanian speech varieties mirrored in the
literature in Zoroastrian Middle Persian (also called Book Pahlavi). Though no manuscript
is earlier than the fourteenth century, the Zoroastrian books were produced in the ninth
and tenth centuries also on the basis of earlier textual tradition and form the third and
largest text group of Middle Persian, which comprises, besides translations of large
portions of the Avesta, other religious, doctrinal, didactical, and juridical texts, as well as
a few non-religious ones (Macuch 2009).
It is noteworthy that, unlike later Zoroastrian Middle Persian, the early Avesta
translations are linguistically conservative and preserve a morphology and syntax
comparable to Inscriptional and Manichaean Middle Persian and the Pahlavi Psalter
(section 2.9.4) and mirror a comparably early stage of the language (Cantera Glera 1999,
who contrasts ‘Old Pahlavi’ with Book Pahlavi). For more information on Pahlavi, refer to
Chapters 3 and 11.
2.9.4 Christian Middle Persian
After their separation from the patriarchate of Antioch and the Western church chiefly for
political reasons in the fifth century, Christians in Sasanian Iran used Middle Persian both
in original texts and in translations before eventually abandoning it in favour of Syriac as
their sole church language (Henning 1958: 77–8; Sims-Williams 1992: 534). The scant
remnants of Middle Persian texts produced and used by Christians form a fourth, small
text group consisting just in fragments of the so-called Pahlavi Psalter (ed. Andreas and
Barr 1933) dated by scholars between the fourth and the seventh century (Durkin-
Meisterernst 2006: 6–78) and a fragmentary list of Pahlavi aramaeograms, both found in
Bulayïq (Turfan).
Page 14 of 62

2.9.5 Middle Persian scripts
Apart from Manichaean texts, for which the clear and unambiguous Manichaean script is
used, all other text groups are written in varieties of the highly conservative Pahlavi
script derived from the script used in the Achaemenid period for writing Official Aramaic
(Skjærvø 1996; Durkin-Meisterernst 2014: 29–74).
The Pahlavi script is characterized by a heterographic writing system that

(p. 19)
combines words and endings written phonetically with hundreds of frequently occurring
words (verbal and nominal stems, pronouns, prepositions, adverbs, and conjunctions)
written as aramaeograms (or heterograms), that is, written in their Aramaic or pseudo-
Aramaic shape but read as the corresponding Persian words (Utas 1988), much like the
Latin ligatures & <et> and @ <ad> one reads and and at in English (cf. <BRH> pus ‘son’
in section 2.8). For more information on heterograms, refer to Chapter 11. The script also
abounds in historical spellings that mirror the phonology of the language in the last
centuries BC and probably no longer correspond to the evolution of the language at the
time when the texts were written from the third century on, as is shown by the
contemporary Manichaean spellings: thus, Manichaean <pyd>, <p(y)dr> indicate that
the word for ‘father’ was already pronounced pid, pidar (direct and oblique < Old Persian
nominative pitā and accusative *pitaram, section 2.10.1, no. 1) with voiced postvocalic d
in AD 300 ± 50 in contrast with Pahlavi <pytˈ>, <pytl> with historical <t>, used besides
the aramaeograms <ABˈ>, <ABYtl> (see the groundbreaking article by MacKenzie 1967,
and its implementation in MacKenzie 1971).
In contrast to the conservative writing conventions of the Pahlavi script which lasted
unchanged until its demise and even introduced pseudo-historical spellings, its ductus
underwent a process of cursivization, which increased the intricacies of the script in that
the shapes of several letters came to coincide in the Zoroastrian books and especially the
papyri and the ostraka (Henning 1958: 46–9).9
2.10 A new language type: survivals and

innovations
In comparison with Old Persian, Middle Persian (Table 2.1) is characterized by
phonological changes that resulted in a phonemic system very close to the Early New
Persian one:
1) lenition of consonants in non-initial position through (a) voicing of the old

voiceless occlusives p, t, k > b, d, g after voiced sounds10 (xšap- ‘night’ > šab;
nominative brātā ‘brother’ > direct brād; bandaka- ‘vassal, follower’ > bandag
‘servant’); (b) voicing of old č > *ǰ after vowels and subsequent assibilation and
Page 15 of 62

depalatalization of secondarily voiced *ǰ and original ǰ > *ž > z in all positions (hacā
‘from’ > az; present stem jīva- ‘to live’ > zī(w)-); (c) spirantization of the old voiced
occlusives b, d, g > w, y, y (naiba- ‘good’ > nēw; pāda- ‘foot’ > pāy; baga- ‘god’ > bay);
2) contraction of the old diphthongs ai̯, au̯ > ē, ō (dai̯va- ‘demon’ > dēw; gauša- ‘ear’
> gōš) and introduction of short e and o (dahyu- ‘land, district’ > deh ‘land; village’;
Auramazdā- > Ohrmezd (Durkin-Meisterernst 2014: 131–2); (p. 20)
3) general loss of vowel and coda in final syllables due to accentuation of the
previous syllables (pati ‘in, at’ > pad; mártiya- ‘man’ > mard; genitive-dative plural
martiyā́nām > oblique plural mardān).12
Page 16 of 62

Table 2.1 Middle Persian phonemes (adapted from Durkin-Meisterernst 2014: 114ff.)11
īi ūu
ēe ə (?) ōo
āa
Labial Labiode Dental Palatal Velar Laryng

ntal eal
Plosive p b t d k g
Affricate f c j
Spirant s z š (ž) x xw γ (?) h
Nasal m n ŋ (?)
Liquid rl
Approxi w y
mant
Page 17 of 62
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights Reserved. Under the terms of the licence agreement, an
individual user may print out a PDF of a single chapter of a title in Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).

The last mentioned change amounted to the loss of very many of the old nominal and
verbal endings and brought about the disintegration of the Old Persian morphological
system and its restructuring into a new Middle Persian one. Nouns and pronouns no
longer distinguish gender (as against Old Persian masculine, feminine, and neuter) and
their inflection is reduced first to two cases and then one case in the singular and plural,
the dual number having vanished (see Chapter 9 for more discussion on case). Also the
verbal inflection lost many forms and categories: the rare aorist, perfect, and future13
temporal stems, the middle diathesis, the dual number, and virtually all the secondary
endings are not continued, modal forms are much reduced, and rare imperfect and -ya-
passive forms survive for a short time. Each verb has just a present stem, from which
analytical forms are obtained, and a past participle in -t/-d (< Old Persian -ta-), which
combines with auxiliary verbs into past periphrastic formations.
The loss of the old inflectional richness affected heavily the morphology and syntax of
Persian, which greatly expanded the use of periphrastic verb forms (section 2.10.5) and
(p. 21) resorted more and more to preverbal particles (bē lit. ‘out’, hamē lit. ‘always’) to
express aspectual distinctions (Brunner 1977: 157–68; Durkin-Meisterernst 2014: 388–

90) and prepositions (especially pad ‘to, at, in, on, etc.’, ō ‘to, at, etc.’, az ‘from, because
of, etc.’, and the postposition rāy ‘for, for the sake of, etc.’) to express and disambiguate
syntactical functions of nouns and pronouns, including those of the agent through pad (8)
and az (9) and the direct object through ō (10), rarely used for inanimate objects (Paul
2003a: 188–90; Durkin-Meisterernst 2013: 251), and, in late texts, rāy (11), though agent
and direct object were basically expressed by the oblique case alone, which was mostly
endingless in the singular (see Brunner 1977: 116–55; Durkin-Meisterernst 2014: 298–
359, 386, for the sources of the examples; cf. section 2.17.8, with n. 32):
(8)
(9)
Page 18 of 62

(10)
(11)
14
Word order is comparatively less free in Middle Persian and may contribute in part to
make clear the relationships of words in a clause (Mękarska 1981–4; Durkin-Meisterernst
2014: 262–3).
The language moved, thus, from the mainly synthetic morphological patterns of Old
Persian to the decidedly more analytic type of Middle Persian (Henning 1958: 89–90).
2.10.1 Two-case declension and shift from case to number opposition
Substantives, adjectives, and pronouns preserve conspicuous remnants of a two-case

system (Table 2.2) with a direct case used for the subject and the predicate noun, and an
oblique case (p. 22) used for direct object, indirect object, agent, to express possession,
and with prepositions, the postposition rāy, and the relative particle (in the plural, either
the direct or the oblique case could express the direct object). The two cases (Durkin-
Meisterernst 2014: 197–203, 206–8; on the prehistory of the system, see Huyse 2003;
Cantera 2009) are formally distinguished only in:
1) the singular and plural of the nouns of relationship in -dar < Old Persian -tar-
(singular: direct brād < nominative brātā, oblique brādar < accusative *brātaram;
plural: direct brādar < nominative *brātara, oblique brādarān < analogical genitive-
dative *brātarānām) and pus ‘son’ (Old Persian puça-) with analogical pusar;
2) the first person singular pronoun: direct an < *anam < Old Persian nominative
adam (Sims-Williams 1981: 166); oblique man < genitive-dative manā;15
3) the plural of all other substantives, adjectives, and non-personal pronouns
(oblique -ān, more rarely -īn, -ūn < Old Persian genitive -ānām, *-īnām, -ūnām).
Page 19 of 62

Table 2.2 The early two-case system of Middle Persian
Singular Plural
Direct Oblique Direct Oblique
Nouns of relationship -Ø -ar -ar arān (-arīn, -arūn)
Other nouns -Ø -Ø -Ø -ān (-īn, ūn)
First singular pronoun an man
Page 20 of 62

The two-case system occurs coherently only in Inscriptional Middle Persian but is on the
verge of dissolution and vanishes in time. Already in the Pahlavi Psalter the oblique plural
is used in a few instances as a general plural form (Skjærvø 1983), as regularly happens
in Manichaean Middle Persian, which only distinguishes two cases in the nouns of
relationship and the first person singular pronoun (Sims-Williams 1981: 166–71). The two-
case system is still functional in the early Avesta translations, as is particularly clear for
the nouns of relationship (Cantera Glera 1999: 194–202). The further development during
the Sasanian period resulted ultimately in the simplification of the system in Zoroastrian
Middle Persian, where man is the only form of the first person pronoun, the old oblique -
ān is the general plural ending (so that an opposition of number prevails on the
opposition of case), and only the nouns of relationship keep the old singular direct and
oblique forms (brād, brādar) but without any functional distiction. The next step will be
taken by New Persian, where only originally oblique singular forms in -ar survive
(singular barādar, plural barādarān).
In late texts, both Manichaean and especially Zoroastrian, there also occurs the plural
ending -īhā (Durkin-Meisterernst 2014: 201), the antecedent of New Persian -hā (cf.
section 2.17.10, n. 35). (p. 23)
2.10.2 Relative pronouns and relative particle
The Middle Persian relative pronoun and particle ī (Manichaean also <ʿyg> īg with
suffixal -g < -ka-) continues Old Persian haya-/taya- in both its values as a pronoun proper
and as a device connecting substantives with modifiers (section 2.6.2). Middle Persian ī
follows head nouns or pronouns and connects them to modifying dependent nouns or
nominal phrases (12–13), prepositional phrases (14), adjectives and adjectival phrases
(15), and even clauses (16) (see Boyce 1964: 28–9, 37–47; Durkin-Meisterernst 2014: 268–
71, for the sources of the examples):
(12)
(13)
(14)
Page 21 of 62

(15)
(16)
This construction largely compensates for the loss of the Old Persian genitive-dative and
other indirect cases, so that it occurs much more commonly in Middle Persian. It is the
direct antecedent of the New Persian ezafe construction with -e (doxtar-e bāhuš
‘intelligent girl’, mardom-e Gilān ‘the people of Gilan’, etc.; cf. section 2.17.9), though the
Middle and New Persian constructions have partly different functions and the New
Persian one occurs even more frequently.
In addition to the relative pronoun ī, Middle Persian, like the other Middle Iranian
languages, also uses the inherited interrogative pronouns kē ‘who? and čē ‘what?’ with
relative force, though, in this function, reference to living beings or inanimated things is
not always distinguished (Durkin-Meisterernst 2014: 216, 415–30).
2.10.3 Present and modal forms
Middle Persian present stems continue Old Iranian present stems formed by means of a
variety of suffixes, whose presence is partly obscured by phonological changes (p. 24)
(e.g., with suffix *-nau̯-, Old Iranian *kr̥-nau̯- ‘to do’ > Old Persian ku-nau̯- > Middle
Persian kun-; cf. section 2.6, no. 1 for an example [dān- ‘to know’] of the old suffix -nā- and
section 2.10.4 on the old suffix -sa-). It is commonly held that, in the inflection of the
present (Table 2.3), two suffixes came to prevail: Old Persian -aya- > Middle Persian -ē- for
indicative and imperative; Old Persian -a- with the addition of the subjunctive suffix -a- (>
-ā-> Middle Persian -ā-) for subjunctive and the optative suffix -i- (-ai̯- > Middle Persian -
ē-) for optative, only attested in the third singular -ē < -ēh < *-ait (?) (Sundermann 1989:
148–50). Recently, Durkin-Meisterernst (2014: 241) has suggested that also the
subjunctive contains the old suffix -aya- (-aya- + -a- > -ayā-> Middle Persian -ā-), which
provides a unified historical explanation of the inflection of the present with parallels in
Middle Indo-Aryan.
Page 22 of 62

Table 2.3 Manichaean Middle Persian endings of the present (after Durkin-Meisterernst 2014: 232ff.)16
Indicative Imperative Subjunctive Optative
1 singular ēm (am, om) ān
2 ēh > ē - Ø ( ē ) āy
3 ēd (ad) ād ēh > ē
1 plural om, ēm (am) ām
2 ēd ēd ād
3 ēnd ānd
Page 23 of 62

A conspicuous exception is the third singular present of the copula ast ‘is’, which
continues the suffixless inherited Old Persian form as-ti with the ending added directly to
the root. The rest of the paradigm is levelled and based on the stem h- (hēm, hē, etc.).
Differently from Inscriptional and Manichaean Middle Persian, the Pahlavi Psalter, and
the early Avesta translations, later Zoroastrian Middle Persian only has subjunctive forms
for the third persons singular and plural (Cantera Glera 1999: 177–87; Durkin-
Meisterernst 2014: 232–9).
In the absence of a specific form for the future, this is expressed by the indicative present
as the mood of plain statements and the subjunctive present as the mood of wish and
possibility. Combined with the particle ēw/hēb, the indicative acquires an exhortative
meaning similar to the imperative and optative (Durkin-Meisterernst 2014: 377–81).
2.10.4 New verb suffixes: causatives, denominatives, ‘inchoatives’,

and passive
New present stem formations that compensate for the loss or change of function of old
ones (e.g. the suffix -aya- also forming causatives in Old Persian [Kent 1953: 72–3] but
turned into (p. 25) a general present suffix in Middle Persian) are produced by a few
suffixes with clearly defined functions (Durkin-Meisterernst 2014: 228–30):
1) -ēn- makes an intransitive verb transitive (rōzēn- ‘to make bright’ from rōz- ‘to
shine’), changes a transitive verb into a causative (zāmēn- ‘to send’ from zām- ‘to
lead’), and forms denominatives with causative meaning (pērōzēn- ‘to make
victorious’ from pērōz ‘victorious, victor’);
2) the so-called ‘inchoatives’ add synchronically the suffix -s- (< Old Persian -sa-, no
longer productive in its original inchoative value: Kent 1953: 71; Weber 1970) to the
past participle without final -t to form intransitive verbs (hanzafs- ‘come to an end,
become perfect’ from hanzām-, hanzaft- ‘to finish, fulfil’);
3) -īh- forms passives (dānīh- ‘to be known, recognized’ from dān- ‘to know,
recognize’; kēšīh- ‘to be taught’ from a suffixless denominative *kēš- ‘to teach’ from
kēš ‘(false) teaching’) and is possibly a transformed reflex of the Old Persian suffix -
ya-.
2.10.5 Imperfect, periphrastic past tenses, ergative construction, and

periphrastic passive
The old synthetic imperfect survives in only few forms in the early inscriptions (ʾkylydy /
akirīy/ ‘was made’ < Old Persian akariya; Durkin-Meisterernst 2014: 244–6) and the third
singular anād, plural anānd of the verb ‘to be’ (if based ultimately on forms of the Old
Persian imperfect with stem āh- < a-ah- developed by analogical and conflation processes:
see Skjærvø 1991 and 1997b: 171–2). All other past tenses (Table 2.4) are expressed by
Page 24 of 62

periphrastic formations consisting of a past participle and inflected forms (including

periphrastic ones) of the indicative and, more rarely, the subjunctive or the optative of the
auxiliary verbs h- ‘to be’ (the third singular present being always omitted), ēst-, ēstād ‘to
stand’, and baw-, būd ‘to become’ as follows:17
Table 2.4 Middle Persian past tenses (PP = past participle)
Preterite PP + present of h šud hēm I went, have gone

šud he went, has gone
Past preterite PP + preterite of h šud būd hēm I had gone

šud būd he had gone
Perfect PP + present of ēst šud ēstēm I have gone

šud ēstēd he has gone
nibišt ēstēd it is (stands) written
Pluperfect PP + preterite of ēst šud ēstād hēm I had gone

šud ēstād he had gone
nibišt ēstād it was (stood) written
(p. 26)
The periphrastic past tenses are a development and an expansion of the new perfect and
pluperfect of Old Persian (section 2.6.1). The periphrastic formation is further discussed
in Chapter 3. On the one hand, periphrastic past tenses of intransitive verbs have an
active meaning: for example, āmad hēm ‘I came’. On the other hand, when they occur
with the passive past participle of transitive verbs (cf. the Old Persian ‘manā kr̥tam
construction’), they have a passive meaning and the logical subject, if expressed, is
grammatically an agent in the oblique case: thus, from paymōz-, paymōxt ‘to don, wear;
dress’, paymōxt hēm ‘I was dressed’, paymōxt būd hēm ‘I had been dressed’, man paymōxt
hēnd ‘I dressed them’ ← ‘they were (hēnd) dressed by me (man)’ (Sundermann 1989b:
152–3). This gives rise to a situation of split ergativity in that ergative alignement only
occurs in the past of transitive verbs but not in the past of intransitive verbs and the
present of all verbs (Haig 2008: 89–129; Durkin-Meisterernst 2014: 392–400; Jügel 2015:
81–95, 325–44, 626–806).
Besides the synthetic passives in -īh-, a periphrastic passive present can be formed by
combining a passive past participle with the present of baw- ‘to become’: paymōxt bawēm
‘I am (being) dressed’ (Sundermann 1989b: 152; Skjærvø 2009a: 221; cf. the Old Persian
passive ‘potential construction’, section 2.6.1).
Page 25 of 62

2.11 The linguistic situation in late Sasanian

Iran
An account by Ibn al-Muqaffaʿ (d. 757 AD), a native of Fars who translated numerous
works from Middle Persian into Arabic and may be accordingly regarded as a realiable
witness, makes it possible to outline the linguistic situation in late Sasanian Iran. The
account, which must refer to the end of the Sasanian period in the mid-seventh century,
has been transmitted by Ibn al-Nadīm in his Fihrist (about 987 AD) and other early Arabic
writers, and studied in detail by Lazard (1971a) in its implications for the subsequent
history of Persian. According to it, five languages were then in use in Iran, including two
non-Iranian ones:18 soryāni, that is, Aramaic; xuzi, possibly a survival of Elamite in
Khuzistan; pārsi, the language used in Fars and by the Zoroastrians priests (mowbad) and
the learned people; dari, used at the royal court (dar) and in the east up to Balkh
(present-day Afghanistan); and pahlavi, used in the historical region of ‘Fahlah’ (Pahle,
north-western Iran).
In this context, pahlavi refers to the Parthian language still spoken in north-western Iran
at that time (Middle Persian Pahlaw means ‘Parthia’), while pārsi and dari denote two
varieties of Middle Persian, that must have coexisted during the formative period that
preceded the origin of New Persian (section 2.14). On the one hand, pārsi was Persian
proper, that is, the spoken language of Fars and southern Iran that also formed the basis
of the written religious and literary language. On the other hand, dari was a more
innovative variety that was spoken at the Sasanian court in Ctesiphon (al-Madā’in) in
Mesopotamia, but, as the prestigious language of the imperial capital, also spread east
and was to form the basis of New Persian.
2.12 Chronological and other divisions of

(p. 27)
New Persian
The history of New Persian, or simply Persian, covers a period ranging from the time of
the oldest documents assumed to be written in New Persian, in the eighth century AD,
until now. This span of time can be divided on the basis of various criteria—linguistic,
historical, or a blending of both—and various divisions have been proposed (Windfuhr
1979: 166; Paul 2013a: 258). The very beginning of the New Persian linguistic period is a
controversial issue. It is usually connected with the historical change brought about by
the conquest of Iran by Muslim Arabs and the end of the Sasanian empire in the mid-
seventh century, but this is only a conventional starting point based on extra-linguistic
data. Indeed, it is unlikely that such however epochal change, which afterwards also
entailed a change of religion from Zoroastrianism to Islam and the adoption of the Arabic
Page 26 of 62

script to write Persian, could have any immediate consequences on the languages spoken
in Iran (see Chapter 11 for more on the influence of Arabic on Persian).
A possible division of the history of New Persian reckons three major periods, which
correspond to the traditional major periods in the history of Persian literature.
1) The first or archaic period, usually referred to as Early New Persian (Paul 2013b),
lasts from the first attestations of New Persian to the beginning of the thirteenth
century. It spans over several historical epochs, from the inclusion of Iran into the
Arabic caliphate to the first Mongol incursions on Iran.
2) The period of Classical New Persian begins with the blossoming of Persian
classical literature in the thirteenth century, the century of Saʿdi, and is usually
considered to reach the eve of modern Iran.19 Starting from the thirteenth century,
literary New Persian reached a unitary form all over Iran, losing the dialectal
features still present in Early New Persian texts and giving rise to a canon, to which
the literary language shall adhere for the centuries to come. In this period the
literary language exerted an increasing influence on the old non-Persian dialects,
which in some cases were even supplanted, or survived longer only among the
religious minorities of Iran (Yarshater 1974). Literary New Persian also exerted a
unifying influence on the spoken varieties of Persian, and the old Persian dialects
were replaced by new dialects issued from the encounter of the literary language
with the old dialectal substratum. An example are the old dialect of Isfahan studied
by Tafażżoli (1971) and the modern one (Smirnova 1978). For more discussion on
Persian dialects, see Chapters 3, 13, and 14 and for more information on Isfahani, see
Chapter 3.
3) Lastly, the period of Modern and Contemporary New Persian, from the mid-
eighteenth century to the present day, is characterized by an increasing influence, on
the development of the literary language as well as on Persian literature, of
European culture and languages: French, English, and—in the Central Asian
varieties of Persian—Russian.
(p. 28)Within each of these periods it is appropriate to distinguish between literary and
non-literary language varieties. Non-literary texts such as inscriptions, coins, and private
documents (letters, legal documents, etc.) are particularly important because, compared
to literary texts, they usually display linguistic features closely related to the everyday
language of a certain region and time. Moreover, inscriptions, coins, and private
documents are normally preserved in the original, while literary texts have mostly
undergone a long transmission that may have altered their linguistic reality because of
the normalizing intervention of copyists.
A last distinction concerns the presence of a high (or written) and a low (or spoken)
variety in Contemporary New Persian, as suggested by Jeremiás (1984) in a study on the
interpretation of the contemporary linguistic situation of Iran in terms of diglossia,
though this has been questioned by Perry (2003; summary of the matter in Rossi 2015).
For more information about diglossia, see Chapters 13 and 19. In actual fact, a distinction
Page 27 of 62

between a spoken variety—with a further stylistic differentiation between a formal,

official, or educated spoken sub-variety, and an informal, familiar, or colloquial spoken
sub-variety—and a literary or, for modern times, a standard variety should be taken into
account not only for Contemporary New Persian, but, with the due differences, also for
each period in the history of Persian (see Chapters 3, 4, 5, 6, 10, 11, and 15 for more on
colloquial form).
2.13 Early New Persian texts in different

scripts
Especially in the case of Early New Persian, it is important to take further into account
the differences between varieties reflected in documents in various scripts (Table 2.5).
Indeed, Early New Persian is documented not only by texts in Arabic script, but also by
texts in other scripts, emanating from the Persian speaking religious minorities spread all
over Iran.
For Early New Persian the following documents should be considered, besides the texts in
Arabic script: Judaeo-Persian texts, that is, Persian texts in Hebrew script; Manichaean
New Persian texts; Persian texts in Syriac script; Zoroastrian New Persian texts in Pahlavi
script (on these, see de Blois 2000 and 2003). Judaeo-Persian undoubtedly represents the
most important corpus, both for the quantity and quality of its documents, and for their
ancientness. Apart from single studies and editions, an overall study of the language of
the Early Judaeo-Persian texts, including some unpublished private letters, has been
recently provided by Paul (2013c). Like New Persian in Syriac script, which, however,
offers a much smaller corpus,20 and unlike Manichaean and Zoroastrian New Persian,
Judaeo-Persian has a continuation into later periods. Later Judaeo-Persian texts are less
interesting from the viewpoint of linguistic history, however, as their language can be
considered ‘an offshoot of Classical Persian’ (Shaked 2010: 321) and their orthography
appears as a mere transliteration of Arabo-Persian orthography (Meier 1981: 108). (p. 29)
Table 2.5 New Persian documents in different scripts (with abbreviations)
Early New Persian in Arabic script

Marriage contract (Scarcia 1963, 1966)
Codex Vindobonensis (facsimile ed. Muwaffaq 1972)
QQ: Qur’ān-i Quds (Revāqi 1985; Lazard 1990)
AT: Asrār al-tawḥīd (Shafiʿi Kadkani 1987b)
Judeo-Persian
Ar: Argument (MacKenzie 1968; Shaked 1971a: 178–80; MacKenzie 1999: 671–3;
MacKenzie 2011)
Page 28 of 62

Du1: Letter from Dandān Uiliq 1, Central Asia, northeast of the Khotan oasis (Utas
1968; Lazard 1988)
Du2: Letter from Dandān Uiliq 2 (Zhang and Shi 2008)
Ez1: Tafsir of Ezechiel, first part (Gindin 2007)
Gen: Tafsir to Genesis (Shaked 2003)
Kd: Karaite document (Shaked 1971b)
Lr: Law report of Ahvaz (Asmussen 1965; MacKenzie 1966; Shaked 1971a: 180–2)
Ta (A, B, C): three inscriptions of Tang-i Azao, western Afghanistan (Henning 1957)
Manichaean New Persian

Ha: Bilawhar and Būdāsaf (Henning 1962: 91–8)
Hb: qaṣīda (Henning 1962: 98–104)
Lehrtext (Sundermann 2003)
Manichaean New Persian fragments (Provasi 2011)
New Persian in Syriac script

Baptism (Orsatti 2003a)
Glosses (Maggi 2003)
Matthew (Maggi 2005)
Psalms (Sundermann 1974; Sims-Williams 2011: 353–61)
New Persian in Latin script

Codex Cumanicus (Monchi-Zadeh 1969, Bodrogligeti 1971)
Five eighth-century Judaeo-Persian documents represent the earliest attestation of New

Persian and go back to a period when the Arabic alphabet had probably not yet been
adapted to writing Persian (section 2.19). These are three inscriptions from Tang-i Azao in
western Afghanistan dated 1064 of the Seleucid era corresponding to 752 AD (Henning
1957)21 and two letters from Dandan Uiliq in the Khotan region in Chinese Central Asia,
datable between 780 and 790 AD (Utas 1968 with references, and Zhang and Shi 2008).
To these one may add a number of New Persian glosses in Syriac texts basically from the
first half of the eighth century (Maggi 2003). (p. 30)
The Manichaean New Persian documents published so far (Henning 1962; Sundermann
1989a, 2003; Provasi 2011) can be dated to the tenth and eleventh centuries and come
from the territory formerly occupied by the Sogdian colonies of Chinese Turkestan in
Central Asia.
The earliest original documents of New Persian in Arabic script, both literary and not, go
back instead to the eleventh century. They are the so-called Codex Vindobonensis, a
pharmacological treatise of the end of the tenth century by Abū Manṣūr Muwaffaq b. ʿAlī
al-Hirawī, copied by the poet Asadī in Šawwāl 447/December 1055–January 1056
(facsimile editions: Muwaffaq 1972, 2009), and, among non-literary documents, the
Page 29 of 62

Marriage contract from Bāmiyān, Afghanistan, dated 470/1078 (Scarcia 1963 and 1966)
as well as a deed concerning a sale of land from Khotan dated 501/1107 (Margoliouth
1903 with facsimile; Minorsky 1942 correcting the date as 501 instead of 401 of the
Hegira).
2.14 Dialectal classification of the Early New

Persian documents: pārsi and dari
A major dialectal division of Early New Persian is that between pārsi, ‘Persian’ tout court,
diffused all over southern Iran and in the first centuries of Islam still linguistically close
to literary Middle Persian, and dari ‘(the language) of the court’, which covered the
regions of northern Iran from west to east (Lazard 1971a, 1975a, 1993; cf. section 2.11).
Each of these major dialectal varieties (p. 31) of Early New Persian are divided into
western and eastern sub-varieties (see Table 2.6): pārsi is known from documents
originating from south-western (Khuzistan) and south-eastern Iran (Sistan) respectively;
likewise, dari is known from documents originating from north-western or central Iran,
and north-eastern Iran and Transoxiana (Lazard 2014). One of the most important
dialectal features is the treatment of initial wi- (Lazard 1987: 174; 2014: 93–4), which is
gu- in north-eastern dari and hence in literary New Persian, and bi- (still close to Middle
Persian wi-) in documents from southern Iran.
Table 2.6 Dialectal and chronological classification of ENP documents
North/North-West Iran North-East Iran

Tafsir of Ezechiel, first part (Ez1): Inscriptions of Tang-i Azao (Ta): 1064
eleventh century Seleucid/752 AD
Tafsir of Genesis (Gen): eleventh Letters of Dandān Uiliq (Du1, Du2): datable
century or after to the second half of the eighth century
Codex Vindobonensis: dated Šawwāl 447/
December 1055–January 1056
Marriage contract: dated 470/1078
Matthew: datable Herat eleventh century
Psalms in Syriac script: before mid-thirteenth
century
South-West Iran South-East Iran (Sistan)

Glosses: first half of the eighth Qur’ān-i Quds (QQ): datable to the second
century half of the eleventh century
Argument (Ar): tenth century or
earlier
Page 30 of 62

Karaite document (Kd): 1262

Seleucid/950 AD
Law report of Ahvaz (Lr): Ahvāz
(Khuzistan, south-western Iran),
1332 Seleucid/1020 AD
Baptism: before the thirteenth
century
The glottonyms dari and pārsi have different meanings in different periods:
1) At the end of the Sasanian epoch dari may have referred to the oral register of
Middle Persian, spoken at the Sasanian court (dar) and more broadly in the capital
city of Ctesiphon, on the Tigris. It was a variety of Middle Persian endowed with
prestige and probably more innovative compared to written or literary Middle
Persian.
2) In the first decades of Islam, dari, the spoken variety of Middle Persian, received a
strong burst to expansion thanks to the successive waves of conquest. Indeed,
Persian was the language of the Islamic conquests towards Central Asia; and the
glottonym dari came to refer to the northern and north-eastern varieties of Persian.
In this movement towards north and north-east, dari superseded other Iranian
languages such as Parthian and Sogdian, also borrowing some features from them
and thereby increasingly differing from the Middle Persian (pārsi) still spoken in
southern Iran.22
3) When, in the ninth century, literary New Persian arose in the courts of north-
eastern Iran, dari was the linguistic variety at the basis of literary New Persian. Then
pārsi-e dari, or simply dari, came to mean ‘literary New Persian’.
Correspondingly, pārsi as opposed to dari means:
1) The written register of Middle Persian, that is, literary Middle Persian.
2) The more conservative south-western variety of Persian, which continued to be
spoken and written during the first centuries of Islam. When, by the beginning of the
thirteenth century, the new unitary literary language originated in north-eastern Iran
spread all over Iran, pārsi ceased to be attested.
3) Persian in general as opposed to its literary variety (dari or pārsi-e dari).
2.15 Later New Persian texts in other

alphabets, the Codex Cumanicus, and Persian
as a lingua franca in Asia
Page 31 of 62

Later New Persian texts written in alphabets other than Arabic, for instance Armenian
and Latin, are also relevant for reconstructing the earlier stages of Persian. The so-called
Codex Cumanicus (Venice, Marciana Library, MS Lat. DXLIX 1597, dated 1330) is one of
the most ancient New Persian texts in Latin script and furnishes rich linguistic material.
Its first (p. 32) part contains a Latin–Persian–Cuman lexicon, whose original was probably
composed by Genoese merchants in Solghat, Crimea, in 1324–5 (Drimba 1981; Drüll 1980
proposes a somewhat earlier date). It represents a kind of manual for interpreters (Ligeti
1981). The Persian linguistic material, whose dialectal characterization poses not easily
solvable problems (MacKenzie 1992), has been published and studied by Monchi-Zadeh
(1969) and Bodrogligeti (1971). As to the reasons for the presence of Persian in a manual
for interpreters to be used in fourteenth-century Crimea, a Turkish speaking area, the
commonly accepted theory is that Persian functioned as lingua franca in the Black Sea
region (see Vásáry 2005) as well as in large parts of Asia (for criticism against this theory,
see Orsatti 2003b; Orsatti 2007b: 51–5; and Bausani 1969a: 517 for the sea-trade
language in Asia). Transcriptions of Persian texts in Latin script made by Catholic
missionaries or European travellers to Safavid Iran multiply in the seventeenth century
and provide important linguistic data (Orsatti 1984; Perry 1996).
The more or less occasional rendering of Persian through the Latin alphabet may be
heavily conditioned by the spelling conventions the transcriber uses to write his own
language (Italian, Portuguese, Spanish, etc.). A trivial example is the use of <j> (jaber for
xabar) to write the uvular fricative x in the transcription of a Persian translation of the
Koran in Latin script made by a Spanish missionary at the beginning of the seventeenth
century (Vatican Library, MS Vat. Pers. 51: Bodrogligeti 1961).
If the occasional use of the Latin script entails the difficulty just exposed for
reconstructing the linguistic reality of a text, other writing systems more stably used for
writing New Persian and other Iranian languages pose two kinds of equally thorny
problems: the presence of historical spellings, which is particularly cumbersome in
Manichaean New Persian;23 and the coexistence of different orthographic layers, whereby
old and new spellings occur in the same text.
2.16 New Persian phonology in historical

perspective
2.16.1 Contemporary New Persian of Iran
Before dealing with earlier stages of the phonology of Persian, it is appropriate to sketch
the phonological system of Contemporary New Persian as a term of comparison. For
Page 32 of 62

literary Contemporary New Persian of Iran, Pisowicz (1985) establishes a phonological

system of 6 vowel and 24 consonant phonemes (Table 2.7).
For vowels, the distinctive character opposing /e/ to /i/, /a/ to /ā/,24 and /o/ to /u/ is timbre,
that is, the different tongue positions and rounding vs. non-rounding. A difference of
length between the two series /e a o/ and /i ā u/ is only perceptible in an open unstressed
syllable, whereas in a stressed position the length of all vowels is more or less identical.
The (p. 33) diphthongs ey and ow are interpreted as a sequence of two phonemes each, /e
+ y/ and /o + w/, the latter phoneme only occurring after /o/.
Page 33 of 62

Table 2.7 Phonological system of literary Contemporary New Persian
i u
e o
aā
Labial Labio-dental Dental Alveolar Palatal Uvula Laryn

r geal
Plosiv p b t d k g q ʔ
e
Affrica c j
te
Fricati f v s z š ž x h
ve
Nasal m n
Liquid lr
Appro w y
ximant
Page 34 of 62

As to consonants, the opposition between the two series of plosives /p/ ~ /b/, /t/ ~ /d/, /k/
~ /g/, and affricates /c/ ~ /j/ is of tenseness rather than of voicing. This kind of opposition
had already been recognized for plosives and affricates in colloquial educated
Contemporary Tehrani New Persian by Provasi (1979). The realizations of /q/ (written
ġeyn and qāf ) as [q], [ɢ], or [ʁ] (= γ) are mere allophones conditioned by position.
Pisowicz regards the palatal articulation of /k g/, that is, [kʲ gʲ], as their chief realizations
in Contemporary New Persian of Iran.25
2.16.2 Classical and Early New Persian
For Classical and Early New Persian Pisowicz reconstructs a vocalic system consisting of
three short vowels /i a u/, five long vowels /ī ē ā ō ū/, and two monophonemic diphthongs /
a͡y a͡w/ (Table 2.8). As to consonants, there is no /v/ phoneme, because present-day v- at
the head of a syllable was realized as /w/. The opposition between the two series of
plosives, affricates, and fricatives was of voicing, not of tenseness as in Contemporary
New Persian. The ancient ‘Iranian labiovelar’ /xw/, subsequently reduced to and merged
into /x/, was still preserved. The voiced uvular fricative /γ/ contrasted with the plosive /q/
of Arabic and Turkish loanwords.
The methods Pisowicz follows for his reconstruction are diverse, first and foremost the
study of Persian texts in Latin alphabet like the Codex Cumanicus (section 2.15) and of
New Persian borrowings in other languages. The analysis of morphophonological
alternations (p. 34) in the contemporary language is also important. For example,
alternations like Xosrow ‘(King) Xosrow’ ~ xosravi ‘regal’ and miravam ‘I am going, I go’
~ berow ‘go!’ attest to an ancient pronunciation aw of the present-day diphthong ow. The
development aw > ow is confirmed by Arabic loanwords in Persian as Arabic dawr >
Contemporary New Persian of Iran dowr ‘circle’.
Page 35 of 62

Table 2.8 Phonological system of Early and Classical New Persian
īi ūu
ē ay ō aw
āa
Labial Labio- Dental Alveolar Velar Uvular Laryn

denta geal
l
Plosiv p b t d k g q ʔ
e
Affrica c j
te
Fricati f s z š ž x xw γ
ve
Nasal m n
Liquid lr
Page 36 of 62

Appro w y
ximant
Page 37 of 62

Comparison with present-day Persian of Afghanistan (also called Dari) is useful because
of its conservative character. Dari retains a vocalic system with three short vowels /æ e o/
(the two last ones still articulated close to /i u/) and five long vowels /ī ē ā ō ū/ matching
the Classical New Persian ones. Indeed, Afghan Persian still retains long /ē ō/ (the so-
called majhul vowels ‘unknown’ to Arabic), which in New Persian of Iran have merged
into /ī ū/ respectively (see below in this section on their possible outcomes ay and aw).
As to consonants, the opposition of voicing recognized for Early and Classical New
Persian between the two series of plosives, affricates, and fricatives is maintained in
Afghan Persian. Moreover, Classical New Persian presents an opposition between /γ/ and
the new phoneme /q/ introduced through the massive entrance of Arabic and Turkish
loanwords. This opposition has disappeared from New Persian of Iran but is preserved in
Afghan Persian.
Dari also suggests that the final -e of Iranian and Arabic words in Contemporary New
Persian of Iran was formerly -a. This is confirmed by spoken Contemporary New Persian
of Iran, where final -e alternates with -a before the postposition -rā, for instance, xāne-rā ~
xuna-ro ‘the house (direct object)’. Words such as ke ‘who’, ce ‘what’, se ‘three’, by
contrast, always retain final -e, thereby attesting to the original presence of a different
vowel.
Meier (1981: 86–103) gives a complete survey of the development of the majhul vowels /ē
ō/ of Early and Classical New Persian on the basis of the analysis of rhymes and especially
of infractions of the rule which prohibits rhyming /ē/ with /ī/ and /ō/ with /ū/. He
concludes that the merging of /ē ō/ into /ī ū/ (šēr ‘lion’ > šīr, now a homonym of šīr ‘milk’;
bō ‘smell’ > bū) originated in western Iran and spread eastwards, without reaching
Afghanistan, and that it was still in progress in central Iran in the first half of the
thirteenth century. Starting from (p. 35) cases where ē rhymes with the diphthong ay and
ō with aw (the latter rhyme being more frequent and even tolerated by theoreticians like
Šams-e Qeys in the thirteenth century), Meier shows that the real pronunciation of the
two diphthongs was very close to the majhul vowels and that /ē ō/ in some cases merged
with /ay aw/, as is indicated by Middle Persian nō ‘new’ > Classical New Persian naw, or
by double outcomes like Nēšābūr and Nayšābūr. Lastly, Meier (1981: 127–56) draws up a
list of suffixes and endings that had the form of long -ē in Classical New Persian:
indefinite -i (yā-ye nakere); relative or determinative -i (yā-ye ešārat or taʿrif ); the verbal
suffix -i denoting unreal or habitual action (section 2.17.4); and diminutive -i. For more
information on indefinite, see Chapter 6. By contrast, the verbal ending and the copula of
the second person singular, the gerundive ending, the suffix of abstraction (yā-ye
maṣdari), and adjectival -i (yā-ye nesbat) were long -ī.
Early New Persian knows fricative allophones for postvocalic /b d g/. In texts in Arabic
script, [β] is written ‫ﭪ‬, afterwards abandoned. In ancient manuscripts up to the mid-
thirteenth century, the allophone [δ] of /d/ is written with letter ḏāl, also present in Arabic
loanwords. Early New Persian δ is not generally considered as a phoneme. However, de
Blois (2006: 94) thinks that, given the existence of a phoneme /ḏ/ in Arabic loanwords
Page 38 of 62

(presumably pronounced δ in Early New Persian, as in Arabic), postvocalic δ in Persian

words should also be regarded as a separate phoneme. In both the Arabic and the few
Persian words which retained δ < d—as gozaštan ‘to pass’ and gozāštan ‘to let pass, to
put’—the ancient interdental fricative merged afterwards with /z/.26 The fricative
allophone of postvocalic /g/ is not attested in texts in Arabic script, presumably because
the spelling with ġeyn would have given rise to confusion with the phoneme /γ/, but is
testified to in other scripts (Maggi 2003: 118).
Delabialization of /xw/ > /x/ already occurred in the thirteenth century. Indeed, it appears
already concluded in the Persian language reflected in the Codex Cumanicus (first half of
the fourteenth century; section 2.15), as is revealed by spellings such as <ghos>
‘pleasant’ for Early and Classical New Persian xwaš, Contemporary New Persian xoš.
Delabialization of /xw/ entailed the change a > o, as in the preceding example, but had no
effect before /ē/ or /ā/: Early New Persian /xwēš/ /xwāst/> Contemporary New Persian /
xiš/ /xāst/ (Pisowicz 1985: 121–3; Meier 1981: 74–85; Cipriano 1998: 293–365).
Several questions are still debated. First of all, the existence and phonological status of
short e and o in Early New Persian, either as a possible continuation of Middle Persian /e
o/ (section 2.10, no. 2), or as allophones of /i/ (or possibly of /a/) and /u/ respectively. For
Early Judaeo-Persian the existence of [e] is generally accepted (Paul 2013c: 42–3, section
26). Another question concerns the time when the couples /ī/ ~ /i/, /ā/ ~ /a/, and /ū/ ~ /u/
begun to be contrasted through timbre, with an articulation shift of /i/ towards e and /u/
towards o, and with a back pronunciation of /ā/ contrasting with a slight palatalization of /
a/. The Codex Cumanicus already shows the beginning of the moving of /i/ towards e, /u/
towards o, and /a/ towards æ, written e (Bodrogligeti 1971: 43–45), a phenomenon more
widely attested in seventeenth-century Latin transcriptions of Persian. It is significant
that, in spoken Contemporary New Persian of Iran, short i is still retained near /š k c j/
and in the (p. 36) proximity of syllables containing etymologically long i. As to the back
articulation of /ā/, a pre-thirteenth-century text in Syriac script from South-West Iran
provides an early example in rwos for rās(t) ‘openly, truthfully’ (Baptism 10), where the
vowel is vocalized as o and final -t is lost as in the present-day spoken language (Orsatti
2003a: 166). Finally, the first instances of a closing and backing of /ā/ > o, u before nasals
go back to the fifteenth century according to Pisowicz (1985: 79); but Ṣādeqi (1984)
discusses earlier occurrences of this change in some toponyms (as Bisotun/Behistun),
which he traces back to the variety of Middle Persian spoken in Khuzistan between the
third and the fifth centuries AD.
2.17 From Middle to New Persian:

morphosyntactic continuity and innovation
Page 39 of 62

The earliest documents of New Persian display a language still close to Middle Persian,
but signals of later development are already visible. The most striking changes—some of
them already attested in late Pahlavi texts influenced by New Persian—concern the verbal
system.
2.17.1 Ergative construction
The ergative (passive) construction of transitive verbs in the past (section 2.10.5) gives
way to an accusative (active) construction (Ergativity is further discussed in Chapter 8).
This happens in parallel with a gradual change of the value of the past participle of
transitive verbs, mainly acquiring an active meaning: Middle Persian dīd hēm ‘I was/have
been seen’ > New Persian dīdam ‘I saw/have seen’, with amalgamation of the ancient past
participle, in its new function as past stem, with the auxiliary verb. Early examples of the
New Persian active construction are (17) and (18):
(17) 27 (p. 37)
(18)
28
Possible transitional forms (Paul 2008a: 192) are provided by combinations of an old past
participle like nibišt in (18) with the verbal endings, followed by the third singular present
or, more often, past of būdan ‘to be’ to obtain present perfect, as in (19–20), or past
perfect (= pluperfect) forms of transitive and intransitive verbs, as in (21–22) (Paul
2013c: 132, section 164c and 134 section 168b; Lenepveu-Hotz 2014: 56–7, qq.v. for the
sources of the examples):
(19)
Page 40 of 62

(20)
(21)
(22)
2.17.2 Old subjunctive
There are only scant traces of the old subjunctive with long thematic -ā- (section 2.10.3)
in subordinate clauses like rasād ‘it shall arrive’ in (23):
(23)
29
(p. 38)
In Early and Classical New Persian, only third singular forms, as bād < baw-ād in (24), of
the old subjunctive with precative value are found (Lazard 1963: 338–9 section 474,
where a single occurrence of the third plural is recorded). Today, only bād survives in set
combinations like zende bād ‘viva!’ or har ce bād-ā bād ‘what will be will be’.
Page 41 of 62

(24)
2.17.3 Towards a new present subjunctive
Because of the virtual disappearance of the old subjunctive, Early and Classical New
Persian have no formally marked subjunctive distinct from the present indicative: the new
present subjunctive with grammaticalization of the prefix bi- did not appear before the
sixteenth century (Lenepveu-Hotz 2014: 249, 303).
The prefix bi-, used in Early and Classical New Persian with both present and past forms,
has probably to be identified with the Middle Persian adverb and preverb bē ‘outside; out,
away’ (Lazard 1975b). Its function ranged from cases where it retained its full lexical
meaning (raft ‘he went’ ~ bi-raft ‘he went away, left’) to cases where it seems to be only a
means for emphasizing the verb (Lazard 1963: 298–326, sections 394–448). Some
scholars rather recognize an aspectual perfective value in forms with bi- (MacKinnon
1977; Josephson 1993 and 1995 for late Zoroastrian Middle Persian; to a different
conclusion, coinciding with that by Lazard for New Persian, had come for Middle Persian
Brunner 1977: 159–60).
Possibly as a consequence of the disappearance of the old subjunctive, which expressed

future time reference in both main and subordinate clauses, in Early and Classical New
Persian present forms with bi- also acquired a future value (section 2.22.1). The bi- +
present forms, as well as the unmarked present, i.e. present without bi- nor (ha)mē, also
developed, especially in subordinate clauses, a modal value as irrealis (Jahani 2008: 159–
60). It is from these values of the present forms with or without bi- that the new
subjunctive was at last developed.
For Early and Classical New Persian only the present of ‘to be’ might show some clear
modal opposition between the three different forms-ast/buwad/bāšad ‘(s/he, it) is’, the
latter being a new form only developed in New Persian. Lenepveu-Hotz (2014: 251–68)
has studied the values of these forms from the tenth to the sixteenth centuries and
recognizes an opposition ‘permanent’ vs. ‘transitory’ between buwad and bāšad (some
scholars had attributed a future meaning to bāšad), and just an opposition of emphasis
between -ast and buwad (Lenepveu-Hotz 2014: 266). The form buwad disappeared almost
completely in the fifteenth and sixteenth centuries (Lenepveu-Hotz 2014: 267).
2.17.4 Optative
Page 42 of 62

A new optative came into existence in Early and Classical New Persian. It inherited the
two values of unreal or habitual action (cf. English would) of the old optative (Lazard
1984a: 4–6, 10–11). Unlike the Middle Persian optative (section 2.10.3), the new optative
has a complete (p. 39) conjugation obtained by the suffix -ē appended to the verbal
endings combined with the past stem (unreal and habitual action) or, more rarely, with
the present stem (only unreal action): duzdī kardam-ē ‘I used to be a thief’, agar mā
dānistēm-ē ‘if we had known’.30 This verbal suffix—possibly originated from Middle
Persian hy hē < Old Iranian *hait, third singular optative of ah- ‘to be’—has also a form -ēd
in some Early New Persian texts in Arabic script from the region of Herat (Lazard 1963:
328, section 450). It is written -y and occasionally -yh in Judaeo-Persian (Paul 2013c: 115,
section 137). In Manichaean New Persian the verbal suffix -ē is written with the numeral
‘one’ (<I> in transliteration), like the indefinite article -ē and the final vowel of the adverb
and verbal prefix hmI hamē ‘always’. Subsequently, the verbal suffix -ē gradually fell into
disuse and its two values —apparently beginning with that of habitual action—were
subsumed by the prefix mi-. The suffix -ē survives today only in forms like bāyest-i ‘it was,
would be, would have been necessary’ and, perhaps, in other fixed expressions like guy-i
‘one would say’ (Lenepveu-Hotz 2014: 157–62).
2.17.5 Passive
Some Early Judaeo-Persian forms attest to a survival of the old synthetic passive in -īh-
(section 2.10.4, no. 3), with shortening of the vowel: ʾyʾryhynd ayār-ih-ind ‘they will be
helped’, bwrhʾd bur-ih-ād ‘it may be cut’, gwyhyd gōw-ih-id ‘it is (being) said’ (Paul 2013c:
136, section 171; Lenepveu-Hotz 2014: 61–6).
Periphrastic passives, formed mainly with the auxiliary āmadan ‘to come’, are very
frequent both in Early Judaeo-Persian (Paul 2013c: 136–7 section 172) and in Early New
Persian in Arabic script (Lazard 1963: 345, section 490). However, (25) provides an
interesting instance of an analytic present passive (karda buwad) formed with a past
participle still with passive meaning, and the auxiliary buwad like in Middle Persian:
(25)
Already at the end of the eleventh century, the auxiliary āmadan lost ground in favour of
šudan ‘to become’, originally ‘to go’, though the former appears still retained under
certain circumstances (for example, to indicate a process vs. a state) or in a stylistically
high level (Lenepveu-Hotz 2014: 66–72).
2.17.6 Hortative
Page 43 of 62

Faint traces of the old hortative, consisting of the particle (h)ē (< Middle Persian hēb,
ē(w)) before a present (Lazard 1984a: 6–8; cf. section 2.10.3) as seen in (26) and (27), are
found in (p. 40) Manichaean New Persian and northern Early Judaeo-Persian (Du1, Gen),
but are unknown to Early New Persian in Arabic script:
(26)
(27)
31
Even in north-eastern Early Judaeo-Persian texts, the hortative seems to be marginal and
perhaps stylistically marked. Compare ē … bāšad in (27) with bād in (24), with the
hortative alternating with old subjunctive.
2.17.7 Causative and past stem suffixes
Causatives with the suffix -in- (written yn or simply n, Middle Persian -ēn-, section 2.10.4,
no. 1), with the vowel probably already shortened, occur in the Early Judaeo-Persian texts
from south-western Iran (Paul 2013c: 137, section 173): by ʾngyzynyd bi-angēz-in-īd ‘He
(God) aroused’ (Ar E11). These causatives are an important dialectal feature
distinguishing the southern (pārsi) variety from the northern and north-eastern (dari) one
(section 2.14): texts in dari (and hence literary New Persian) present instead a northern
dialectal causative form in -ān- instead of -ēn-. The causative form is also discussed in
Chapter 3 and Chapter 7.
From the northern dialect also originate the past stems with -ād- instead of -īd- like firist-
ād-an ‘to send’ as against f(i)rist-īd-an in south-western Judaeo-Persian (Henning 1933:
213, 222–3; Paul 2013c: 110, section 125b).
The fragment from a Persian version of the Gospel of Matthew in Syriac script (eleventh
century) edited by Maggi shows an interesting causative in -ān- formed from the past
verbal stem instead of the present stem (košt-ān-īd-ēd ‘you have had murdered’, Matthew,
l. 12), which is a noteworthy dialectal feature typical of north-eastern (dari) Persian
(Maggi 2005: 639).
Page 44 of 62

2.17.8 Prepositions, postposition -rā, and circumpositions
Besides the postposition -rā, Early Judaeo-Persian has a number of simple or compound
prepositions and a variety of circumpositions or frame prepositions (Paul 2003a; Paul
2013c: 143–50, section 180–4). (p. 41)
As a remedy to the loss of a formal distinction between subject and object in Middle
Persian (at least in the singular), the direct object came to be marked—like the indirect
object and the beneficiary—by the postposition -rā (Middle Persian rāy), as in (28), or by
the directional preposition u/o ‘towards’, possibly also pronounced a (Middle Persian ō), as
in (29).32 The preposition u, unknown to Persian texts in Arabic script, is mainly attested
in south-western Judaeo-Persian texts (Lazard 2009), though it also occasionally occurs in
northern texts: ʾpyš a-pēš ‘near, before’ (Du2 8).33
(28)
(29) 34
Other prepositions and circumpositions could also mark the direct object: u … -rā (direct
or indirect object), az mar … (-rā) lit. ‘for, because of’ (30), or just mar … (-rā) (31), the
circumposition mar … -rā being a north-eastern form (Lazard 1963: 382–4 sections 575–
7):
(30)
Page 45 of 62

(31)
After a long and slow evolution, among all these forms only -rā survived in the modern
language. By losing—though not completely even now—its old values as a marker of the
beneficiary and the indirect object, it specialized to express the direct object under
certain (p. 42) grammatical (determinate vs. indeterminate; human vs. non-human;
topicalization) and pragmatic constraints (Paul 2008b).
Besides u/o, Early Judaeo-Persian knows a directional preposition bē ‘to, towards’,

probably from the Middle Persian adverb bē ‘out’ used to reinforce ō in the compound
preposition bē ō ‘towards’. The preposition bē (Lazard 1986 and Shaked 1989) is attested
in southern texts like Ar and QQ, and in northern texts, as can be seen from (23). It
possibly merged afterwards with pa(d), later ba ‘to, at, in, on’, which also acquired a
directional meaning (section 2.18–2.19).
2.17.9 Ezafe as a relative pronoun
A characteristic feature of Early Judaeo-Persian is the preservation of the ezafe, a particle

linking a noun with a following adjective or nominal determinant, in its old value as a
relative pronoun (section 2.10.2), as in (18) and (25). In Manichaean New Persian,
examples are to be found, inter alia, in Lehrtext: i xwad xuškī u sardī ‘which itself (is)
dryness and cold’ (Lehrtext d2), and i tarī u sōzāgī ‘which (is) moisture and
burning’ (Lehrtext d4).
In Early New Persian in Arabic script there are only scant traces of such value of the
ezafe, both written and not written (Lazard 1963: 490–1, sections 855–6). An instance of
the ezafe in its value as a relative pronoun, written by letter yā attached to the preceding
word, is to be found in (32):
(32)
Page 46 of 62

2.17.10 The plural ending -ihā/-hā
For the plural of nouns, continuations not only of the regular ending -ān (section 2.10.1),
but also of the late ending -īhā of Middle Persian are well attested in both Early Judaeo-
Persian (Paul 2013c: 73–6, sections 77–9) and in Early New Persian in Arabic script
(Lazard 1963: 195–6 sections 149–52).35 As for the latter ending, in Judaeo-Persian the
form -ihā (with a probably already shortened -i-) occurs mainly in religious texts, and -hā
in non religious texts (Paul 2013c: 73 section 78a). In Early New Persian in Arabic script
the form -ihā, explicitly vocalized with short i, is seldom attested. The Marriage contract
of 1078 from Bāmiyān shows a form -hāy apparently unattested elsewhere. The early
distribution of the forms is not very different from today’s usage, with -ān being used for
nouns denoting animate beings though not exclusively (full description of the classical
and modern distribution of the (p. 43) endings in Moʿin 1977: 28–81). The ‘exceptions’
are so many that the original distribution of the old ending -ān and the late ending -hā
might seem mainly a matter of use. When the Arabic loanwords entered Persian, however,
they received the Persian endings -ān or -hā depending on the opposition human vs. non-
human, which indicates that this opposition became at some time relevant in the choice
of the ending. When, much later, the European loanwords entered the Persian language,
only -hā was still a living and productive ending.
2.17.11 Comparative and superlative
Middle Persian had the superlative suffixes -tom and -ist, and the comparative suffix -tar
(also with superlative force); the material and identifying suffix -ēn (Darmesteter 1883:
139) could be added to obtain -tomēn and potentially *-tarēn when the adjectives were
used attributively (Durkin-Meisterernst 2014: 203–5 with n. 103). Among these suffixes,
literary New Persian only continues -tar and -tarīn, as well as -īn with elative force. In
Early New Persian, there is no clear distinction yet between the comparative and
superlative functions of -tar and -tarīn respectively, as these may express both meanings
(Lazard 1963: 206–14 sections 181–99). Old ‘irregular’ comparatives like mih ‘bigger’, kih
‘smaller’, and bih ‘better’, were re-marked with -tar (mih-tar, kih-tar, bih-tar) and both old
and new forms could receive the suffix -īn (mih-īn ~ mih-tar-īn, kih-īn ~ kih-tar-īn, bih-īn ~
bih-tar-īn), which is presumably also to be recognized in numerals (awwal-īn ‘first’,
duwwum-īn/duyyum-īn ‘second’, etc.).36 Only later -tar specialized to express the
comparative and -tarīn to express the relative superlative (‘the most … ’), albeit with
syntactic restrictions, because only -tar forms with superlative force are admitted when
used predicatively.
Page 47 of 62

2.18 From Middle to New Persian: notable

phonetic changes
Early New Persian documents from both north-eastern (Ta inscriptions) and south-
western Iran (Glosses) show that by the mid-eighth century final -g had already
disappeared after a long vowel: qy ʾyn nywy qnd kē īn niwē kand ‘who incised this
inscription’ (Ta A2), with niwē ‘inscription’ < Middle Persian nibīg.37 As a consequence,
the adjectival suffix -īg had become -ī as in gurgānī ‘[pistachio nut] of Gorgan’ (Glosses 4).
Instead, -g was still retained after -a, as in Glosses 5 banušag ‘middle-sized pistachio nut’,
10 drīmag ‘wormwood’, and 14 jāmag ‘cup’ (for a thorough discussion of the matter, see
Ciancaglini 2008: 54–7, 72–7).
The old form of the abstract suffix -īh is retained in Early Judaeo-Persian documents of
south-western origin. In north-eastern Iran, the very conservative Manichaean New
Persian (p. 44) orthography also keeps the Manichaean Middle Persian spelling -yẖ, but a
tenth-century poem proves that the abstract suffix had actually become -ī: in Ha22,
farāmōšīh ‘oblivion’ with final -h would not fit the meter. Conflation of the suffixes -īg and -
īh is also occasionally attested by Manichaean Middle Persian texts (Durkin-Meisterernst
2014: 175–6).
The metrics of Manichaean New Persian texts also shows that the third singular ending -
yd had a short vowel (Manichaean New Persian /ad/ and Judaeo-Persian /ed/ or /id/).
Manichaean New Persian texts, from north-eastern Iran, appear innovative in many
respects. The new form br bar ‘on, upon’, with loss of initial a-, alternates with older ʾbr
abar, and bʾ bā ‘with’ alternates with ʾbʾ(g) abā (de Blois 2006 s.vv.), thereby indicating
that new and old forms occurred together, unless the latter are mere historical spellings.
In Early Judaeo-Persian, the new form yār ‘friend’ already occurs in TA and Du1 instead of
older ayār, as can be seen in (24) and (27).
One of the Manichaean New Persian fragments (M 595a+; Provasi 2011: 161–62, 166)
shows a curious inverse spelling for the verbal prefix bi-, written <pd> like the
preposition pa(d), later ba < Middle Persian pad ‘to, at, in, on’. This may indicate that the
scribe of this fragment perceived the two morphemes as homophonous and confused
them, whereas south-western Judaeo-Persian texts still had pa(d) with voiceless initial
consonant.
The spellings kʾ, ky, and kw originally corresponding to Middle Persian ka ‘when, if’, kē
‘who, which’, and kū ‘where; that; than’ still occur in Manichaean New Persian (where the
spelling kʾ/qʾ prevails) as well as in the south-western Early Judaeo-Persian Argument; but
they tend to interchange and conflate probably on account of formal coalescence (de
Blois 2006: 106 s.v. kʾ; Provasi 2011: 165–66 s.vv. kʾ, kw, ky; MacKenzie 1968: 252).
Page 48 of 62

2.19 The birth of literary New Persian and the

adoption of the Arabic alphabet
The main languages of culture in Iran in the first centuries after the conquest were
Arabic and, still, literary Middle Persian (Zoroastrian Middle Persian literature was
entrusted to writing in the first centuries of Islam). From a piece of information provided
by the ninth-century Arab historian Balāḏurī, we know that Middle Persian in Pahlavi
script was used for administration until the late seventh or the early eighth century in
western Iran and even longer in eastern Iran, before being replaced by Arabic (Xānlari
1986: vol. 1, 307–14). In the same years, the coinage reform of 77/696 under caliph ʿAbd
al-Malik, directed at removing all symbols associated with the former Sasanian rule, put
an end to the so-called Arab-Sasanian coinage (Mochiri 1981: 168; Bates 1987), a very
interesting example of co-occurrence of such literary languages as Arabic and Middle
Persian in the early decades after the conquest.
The birth of literary New Persian, which entailed a new literature in the vernacular
language of Iran, is more a major cultural than a merely linguistic issue. It is connected
with the rise of courts more or less independent from the Arabic Abbasid caliphate in
eastern and north-eastern Iran and the emergence of a new Persian ruling class not
sufficiently assimilated to Arabic culture (Lazard 1971b, 1975a). The variety of Persian
spoken in north-eastern Iran (dari) came, thus, to be the basis of the literary language
(section 2.14).
We do not know when, where, and for what purposes (administration, literature,
(p. 45)
private documents, etc.) the Arabic script was first adapted for writing Persian. When,
towards the mid-ninth century, the new poetry in the vernacular language of Iran
emerged in the courts of eastern and north-eastern Iran, it was certainly written down in
Arabic script. As this poetry consisted of substituting Persian for Arabic within the
pattern of Arabic poetry (Bausani 1960: 307–11), one can suppose that the establishment
of a New Persian orthography in Arabic script was a part of this experiment. What is sure
is that New Persian in Arabic script is exempt from the historical spellings which hamper
the study of New Persian texts in other writings including, perhaps, Judaeo-Persian (one
cannot exclude that an adaptation of the Hebrew alphabet to write Persian had already
begun in Sasanian times, as was claimed by Bacher 1904).
The Arabo-Persian orthography betrays a clear normalizing aim. Middle Persian ka ‘when,
if’, kē ‘who, which’, and kū ‘where; that; than’ (section 2.18) merged in what had probably
become a single new form ki, so that they were no more distinguished in writing and were
spelled ky or kh (or simply k- joined on to the following word, and -k after ān ‘that’) in
early Arabo-Persian orthography. Likewise, of the prepositions bē ‘to, towards’ and pa(d)
‘to, at, in, on’ (sections 2.17.8 and 2.18), only the latter survived, also subsuming the
directional meaning of bē. Its initial voiceless labial, perhaps also by influence of Arabic
bi- ‘with, for, by’, became voiced and was written b- (generally joined on to the following
Page 49 of 62

word) even in manuscripts that use the four letters <p c ž g> added to the Arabic
alphabet for writing Persian phonemes (Lazard 1963: 387 section 582). The preposition u/
o < Middle Persian ō, apparently not very frequent in dari (section 2.17.8), was dropped
from pronunciation and from the literary language. The suffixes -īh of abstract nous and -
īg of adjectives, which had formally merged (section 2.18), were both represented by -y -ī
and the latter also merged, both formally and functionally, with the Arabic relation suffix -
iyyun (-ī of nisba).
The ezafe disappeared from writing (though of course not from pronunciation), apart
from rare cases where it is written <y> even after words ending in a consonant (Lazard
1963: 200 section 162). Though it is generally admitted that the ezafe had already been
shortened in New Persian, these occasional spellings, as well as its metrical value as
either short or long, point to the presence of a long variant of the ezafe in Early New
Persian (Meier 1981: 131–2). The use of the ezafe as a relative pronoun (section 2.17.9),
probably already marginal in north-eastern Persian (dari), was ousted from the literary
language, though some memory of it may survive until now in such expressions as vaqt-i(-
i) ānjā residam … ‘(in) the time (in which) I arrived there … ’ or in ce kār-i bud(-i) kardi
‘what kind of work was this (that) you did?’, where one can postulate the fall of a no
longer written nor pronounced relative ezafe.
The Early New Persian conjunction u ‘and’ < Middle Persian ud, though being a short
vowel, was written <w> as an independent word. However, in the non-literary Marriage
contract of 1078 the conjunction was regularly written only at clause beginning, where it
supposedly begun to be pronounced wa as in Arabic (Orsatti 2018, forthcoming).
2.20 The Arabic element in Persian

In the orthography of the earliest, eighth-century Judaeo-Persian documents (Ta, Du1,
Du2), Persian /k/ is written <q>, so that kaph was left to represent Persian /x/, for which
no (p. 46) special Hebrew letter was available.38 Only a couple of Arabic loanwords are to
be found in Du2: hqym ḥakīm ‘doctor’ (Du2 4, 13) and hrb ḥarb ‘war’ (Du2 33). They are
written without any attempt to transliterate their Arabic spelling by distinguishing Arabic
emphatic ḥā from non-emphatic hā, as it happens in later Judaeo-Persian.39 This suggests
that Arabic loanwords had not yet massively entered the current Persian language in the
eighth century.
The situation is significantly different from the tenth and eleventh centuries onwards,
when texts in Hebrew (especially the legal documents Kd and Lr) and Manichaean scripts
are full of Arabic loanwords. Their orthography shows a careful attempt to represent the
original Arabic spelling by means of the possibilities offered by the relevant alphabets
(Orsatti 2007b: 110–13, 158–63). A Manichaean New Persian text datable to the eleventh
century (Sundermann 2003: 251) testifies to the spread, precisely in ‘this time’, of a new
philosophical lexicon of Arabic origin, when it says that the body is dominated by jhl
Page 50 of 62

‘ignorance, foolishness’ (Arabic jahl), ‘which the people of this time call lust (Arabic hawā)
and temptations (Arabic waswās)’ (qš40 xlg ʾyg ʿyn zmʾng hwʾ ʾwd wswʾshʾ hmI xwʾnʾnd k-
aš xalq-i īn zamāna hawā wa waswāshā hamē xwānand, Lehrtext, c10–11).
Scholars generally agree that the Arabic element entered Persian as learned loanwords
from the written Arabic language (Telegdi 1973: 52; Bausani 1978: 13–14; Pisowicz 1985:
19). In Persian texts in Arabo-Persian script and, as far as possible, Hebrew, Syriac, and
Manichaean scripts, Arabic loanwords retained their original spelling, though they were
probably pronounced according to Persian phonology, as they are today. This seems to be
evidence of their origin from books. However, Perry has recently suggested that a number
of Arabic loanwords, which he terms ‘pre-literate Arabisms’, entered Persian by way of
speech and that the Arab settlements before and after Islam were a major contributing
factor to the Arabicization of Persian (Perry 2009a: 54; Windfuhr and Perry 2009: 419). In
his view, among Arabisms of this kind there are words assimilated to Persian morphology
and phonology like mosalmān ‘Muslim’ (perhaps a plural with metathesis from Arabic
muslim41) and such onomastic elements as mir from amir ‘prince’ or Bu from Abu ‘father’,
which underwent the same loss of initial a- as Persian words at the beginning of the New
Persian linguistic period (Perry 2014; cf. section 2.18).
The percentage of Arabic lexicon varies according to the literary genre and increases
over time at least until the twelfth century (Skalmowski 1961; Lazard 1965; Bausani
1969b; Telegdi 1973; Utas 1978). Lyrical poetry of the new type, i.e. composed according
to Arabic prosody, is from the beginning rich in Arabic words and phrases referring to the
common Islamic culture (exemplarily, Koranic quotations). What is considered one of the
most ancient pieces of New Persian poetry, the six line panegyric that Muḥammad b.
Waṣīf presented to the Saffarid Yaʿqūb-i Lays in the aftermath of his victory in 251/865—
preserved in the anonymous Tārīx-i Sīstān (eleventh century, with later additions)—is
already Arabicized in its lexicon and prosody (ed. Lazard 1964:vol. 2, 13–14). The
vocabulary of epic poetry is less Arabicized.
Though Arabic loanwords very often received Persian suffixes, with Arabic broken plurals
even re-pluralized (Moʿin 1977: 81–87), the preservation of the original spelling of Arabic
(p. 47) loans may have entailed the awareness of their non-Iranian origins, at least in a
learned context. Indeed, poetry seems to betray a sort of artificial and scholarly
pronunciation of Arabic loanwords. For example, the letters which in Persian have and
probably had one and the same phonetic reference (<z z ż ẓ> /z/, <s s̱ ṣ> /s/, <h ḥ> /h/,
and <ʾ ʿ> /ʔ/) never rhyme together (Meier 1981: 103). Nowadays, Arabic words or
expressions of common use like baʿd, baʿd az ‘after’ are felt as of a lower stylistic register
in comparison to their Persian counterparts (pas, pas az).
The orthography of the Arabic loanwords has remained unchanged throughout the history
of the New Persian written tradition, which suggests the idea of the Arabic vocabulary of
Persian as an immutable set. Only one morphological class of Arabic loanwords has
undergone a change since its embedding into New Persian. This is the Arabic loanwords
with tā marbūṭa (about 1500 items), which entered Persian either with the ending -a (later
Page 51 of 62

-e) or -at, according to semantic features and stylistic choices: -a is felt as ‘more
concrete’, -at as ‘more abstract’. A consistent part of the words originally borrowed with -
at (about 200 out of 800) shifted to -a in the course of the past thousand years and some
40 items present a double sorting with different meanings: qovve ‘(military) force,
(industrial) energy; faculty’ is felt as a concrete, countable noun, whereas qovvat
‘strength, power’ is felt as an abstract noun (Perry 1991, 1995).
The massive entrance of Arabic loanwords has sometimes been considered responsible
for the falling into disuse of the ancient Iranian verbs in New Persian, and their gradual
replacement by ‘compound verbs’ or verbal periphrases formed by an Arabic noun and a
Persian infinitive, as andišidan ~ fekr kardan ‘to think’. However, both Telegdi (1950–
1951: 321) and Ciancaglini (2011: 3) have noticed that such periphrases are also based
on Persian words, as in the cases of por kardan ‘to fill’ or kušeš kardan ‘to strive’, the
latter alternating with the corresponding simple verb kušidan. Ciancaglini (2011) has
shown that the verbal periphrases of the type noun + kardan are very ancient, and must
be traced back to Indo-Iranian. Compound verbs are further discussed in Chapters 3, 7, 8,
9, 10, 15, 17, and 19.
2.21 Turkish influence on Persian

Starting from the second half of the eleventh century, Turkish peoples moved from
Central Asia to Iran, furnishing the basis for a long series of Turkish dynasties. This led to
the Turkization of wide areas of Iran, particularly in western and north-eastern Iran,
where different varieties of Turkish supplanted Persian first in rural areas and later also
in towns, thereby gradually reducing the area where Persian was spoken. In Azerbaijan
and eastern Transcaucasia, this process may be considered accomplished around the
fourteenth century (Lornejad and Doostzadeh 2012: 18–19, 143–88).
Turkish was widely spoken in Iran. For the Safavid epoch, a number of European
travelers attest to the diffusion of Turkish as a language spoken both at the court in
Isfahan and largely by the population (Orsatti 2003b). Turkish loanwords, mainly relating
to the domains of power, politics, and popular culture, are less numerous than the Arabic
ones.42 However, the (p. 48) influence of Turkish languages and dialects on Persian of
Iran, especially on its phonology, was very strong. The Turkish adstratum has been
considered responsible for the replacement of the opposition of length between /i a u/
and /ī ā ū/ by an opposition of timbre, as well as for the fronting of /a/ (Pisowicz 1985: 90,
93); though the front articulation of /a/, represented by e in seventeenth-century
European transcriptions in Latin alphabet, might also be due to the influence of the
coeval dialect of Isfahan (Pisowicz 1985: 97–8; Smirnova 1978: 11–12). The Turkish
adstratum has also been regarded as a contributing cause for the replacement of the
opposition of voicing between /p/ ~ /b/, /t/ ~ /d/, /k/ ~ /g/, and /c/ ~ /j/ by an opposition of
Page 52 of 62

tenseness, and for the dephonologization of the opposition between /q/ and /γ/ (Pisowicz
1985: 106, 113).
Grammatical influences are more difficult to prove. The particular syntactic construction
seen in (33) has been explained as a calque on Turkish (Pisowicz 1985: 91), but the
unsoundness of the Turkish hypothesis has been shown by Rubinčik (2001: 355–58) in a
thorough discussion of this construction in the framework of Persian syntax:
(33)
However, Turkish influence can certainly be seen in expressions with the modifier before
the head noun like Nāder Šāh as against Šāh ʿAbbās, Mirzā Ṣādeq as against ʿAbbās
Mirzā, or juje kabāb as against kabāb-e barre (Perry 2001).
In the case of lexical doublets like Persian xar ~ Turkish olāɣ ‘donkey’, kārd ~ cāqu ‘knife’,
bānu ~ xānom ‘lady’, āheste ~ yavāš ‘slowly’, the Turkish loanwords tend to occupy a
lower sociolinguistic register than their Persian counterparts (Perry 2001: 196).
2.22 Post-classical developments
Page 53 of 62

2.22.1 Periphrastic future
Apart from the loss of the New Persian optative (section 2.17.4), a major development in
the verbal system of post-classical New Persian is the rise of a new periphrastic future
with the auxiliary xwāstan ‘to want, will’. While the phrase xwāham raft(an) had both a
volitional and a future force in Early and Classical New Persian, xwāham raft, with the
shortened form of the infinitive (raft), is grammaticalized in post-classical New Persian to
express only the future: ‘I will go’ (Jahani 2008; Lenepveu-Hotz 2014: 183–97).
In Middle Persian, future time reference was mainly expressed by subjunctive in both
main and subordinate clauses (Lazard 1984a: 2). The disappearance of the old
subjunctive may have been the reason for the meaning of future to be expressed just by
the present. On the one hand, Jahani (2008: 158–63) has shown that, of the three present
forms of Early and Classical New Persian—unmarked present (without prefix),43 present
with bi-, and present (p. 49) with (ha)mē—only unmarked forms and less often forms with
bi- can have future time reference, whereas she found no examples of present with
(ha)mē with future force in her corpus of Early and Classical New Persian. On the other
hand, Lenepveu-Hotz (2014: 190) has shown that only present forms occur in subordinate
clauses to express the future, the periphrastic forms with xwāstan being restricted to
principal clauses. These remarks clearly suggest that the two concurrent ways of
expressing the future in Early and Classical New Persian—present with or without be- and
periphrasis with xwāstan—specialized in later Persian as the new subjunctive and the new
future respectively.
2.22.2 Progressive present and past
The post-classical creation of a progressive present and past with the auxiliary dāštan ‘to
have’ is also relevant: dār-am mi-rav-am ‘I am going’, dāšt-am mi-raft-am ‘I was going’.
This periphrasis probably originated from northern or central Persian dialects and is little
attested in the literary language (Jeremiás 1993).
2.22.3 Possessive expressions
In Early and Classical New Persian, there occur various possessive expressions such as
ān-i … (34), az … (35), ān-i … -rā (36), ... -rā (37):
(34)
Page 54 of 62

(35)
(36)
(37)
These expressions have been replaced in post-classical New Persian of Iran by an ezafe
construction with māl ‘property’: māl-e man ‘property of me, mine’ (see Chapters 3, 6, 7,
8, 9, 10, and 19 for more discussion on this topic). The latter is still used with possessive
meaning in standard Contemporary New Persian but, in the colloquial variety, also has
other meanings (p. 50) such as in (38), where it expresses origin, and is being replaced
by the complex preposition barāy-e ‘for’ to express possession as in (39):
(38)
(39)
Page 55 of 62

2.22.4 The birth of the ‘relative -i’
In Persian a bare noun without any article may refer either to a whole class of items (esm-
e jens ‘generic noun’ in traditional grammar: ketāb ‘book’ instead of other things) or to a
fully determinate referent (ketāb ‘the book’, already known or referred to). Very early,
however, a need was felt for a more unambiguous reference. Already in Late Middle
Persian (Josephson 2011: 36–7) and Early New Persian (Orsatti 2011: 75–80), the suffix -ē
of the yā-ye nakere, the ‘indefinite article’, developed a strong individualizing meaning
and was redundantly used with nouns and nominal phrases endowed with a determinate
value to highlight the individuated or fully determinate meaning of the referent. This
particular individualizing value of the yā-ye nakere (Daniel Paul 2008 for Contemporary
New Persian), which can be identified as the yā-ye ešārat, the ‘deictic -i’ of Persian
grammarians, probably originated in the spoken language as a means to emphasize the
individuated or fully determinate reference to a denotatum. Indeed, it is only occasionally
attested in literary texts, as in (40):
(40)
In such usage, the suffix -ē had the same value as the modern suffix -e/-he of the spoken
language, the so-called ‘definite article’, which redundantly indicates an individuated or
fully determinate referent: ye tork-e ‘a Turk’ (not any Turk, but a certain one), doxtar-e
‘the girl’, ān āqā-he ‘that Sir’. An accented variant of the yā-ye nakere in this particular
individualizing value still survives as a facultative and stylistically marked suffix in
pronominal or adverbial expressions of the spoken language like digar-i ~ digar ‘the
other’, un-var-i ~ ān var ‘on the other side’, in-jur-i ~ in jur ‘in this way’, kodum-yek-i ~
kodām yek ‘which one’ (Orsatti 2005; 2011: 53–65).
At a later stage in the history of Persian the individualizing value of the ancient -ē suffix
gave rise to the ‘relative -i’, the suffix marking the head-noun of determinative relative
clauses (Orsatti 2011: 81–5). Indeed, it has been noted that in Early and Classical New
Persian the ‘relative -i’ before a determinative relative clause was less frequent than
today: its usage was optional and, after a substantive with specific or determinate
reference, especially if preceded by a demonstrative, it was altogether omitted: waqt-ē ki
… ‘when’ (AT 19.20), dar ān waqt ki … ‘at that time, when … ‘ (AT 25.1). The
grammaticalization process of suffix -i (<-ē) as (p. 51) a marker of the head noun of
determinative relative clauses was brought to completion only in modern times (Jahani
2000b).
Page 56 of 62

2.23 Summary
In this chapter, we looked at the evolution of Persian and provided a brief description of
the most significant features of Old, Middle, and New Persian, with an analysis of the
main changes over time. Besides an introductory section (2.1) the chapter includes ideally
two parts, preceded by a quick survey of research on the three stages of the language
(sections 2.2–2.4): the first part discusses the transition from Old Persian to Middle
Persian (sections 2.5–2.11) and the second how Middle Persian became New Persian and
finally Modern and Contemporary Persian (sections 2.12–2.22).
Notes:
(*) Sections 2.1–2.3 and 2.5–2.11 (Old and Middle Persian) by Mauro Maggi; sections 2.4
and 2.12–2.22 (New Persian) by Paola Orsatti
(1) Because of space constraints, for the modern and contemporary periods this chapter is
basically restricted to New Persian of Iran and does not cover the other national varieties.
For dari and the Persian dialects of Afghanistan see Kieffer (1985: 505–10); Farhadi
(1955); Farhadi and Perry (2011); Kieffer (2004). For tojikī see Lazard (1956);
Rastorgueva (1964); Perry (2005); Perry (2009b). References in this chapter generally
privilege recent publications or the most recent ones on a given subject, where
references to earlier literature can be found.
(2) Labels with specialized meanings and additional labels used in this chapter for
glossing are: AOR = aorist, EZF = ezafe (also for the Middle Persian relative particle,
2.10.2), GEN = genitive-dative, HORT = hortative particle, INS = instrumental-ablative,
IPRF = imperfect, IPRT = imperativ.
(3) The now standard system of sigla for referring to the Old Persian inscriptions was
introduced by Kent (1953) and expanded by others (Schmitt 2009: 7). Old Persian
quotations in this chapter are basically from Schmitt’s edition (note r̥ = [ər]), but no dot is
used in a.u = aʰu etc., as no ambiguity with the diphthong au̯ is possible, and accent
marks are added when appropriate.
(4) The conservative Old Persian past participle passive çuta- ‘famous’ ← ‘heard of’ is only
found as the first member in proper names attested in the parallel tradition (Tavernier
2007: 161–2).
(5) The stem haya- is used for the nominative singular masculine and feminine, the stem
taya- for all other cases. The Old Persian relative pronoun is an innovation which resulted
from the univerbation of the Indo-Iranian demostrative pronoun *sá-/tá- and relative
pronoun *i̯á- (Avestan ha-/ta- and ya-).
Page 57 of 62

(6) So spelled.
(7) The same applies to the legal and administrative documents (few third-century texts
from Dura-Europos in present-day Syria, various papyri from seventh-century Egypt,
documents on parchment and linen from seventh-century Iran (Weber 2008), and a
number of ostraka from post-Sasanian Iran) and the late private inscriptions in cursive
script mostly on tombs (Huyse 2009: 100–5), including the Middle Persian-Chinese one
from Xi’an in China (Rezai Baghbidi 2011).
(8) Carbon-14 dating of the Pahlavi Psalter now shows it to be not earlier than the late
eighth or ninth century (Dieter Weber, lecture at a workshop in Berlin in 2010).
(9) After the Pahlavi script was restricted to Zoroastrian circles in Islamic times, Middle
Persian texts in Pahlavi script were occasionally transposed in Avestan script (Pāzand,
twelfth century) or even Arabo-Persian script (Pārsī, twelfth and thirteenth centuries)
with adaptation to contemporary spoken Persian and replacement of aramaeograms and
difficult words (Durkin-Meisterernst 2014: 23–5). On the possibility of obtaining
information on Early New Persian from Pāzand texts, see Lazard (1991); Klingenschmitt
(2000: 195–6); and the criticism by de Jong (2003).
(10) Cantillations and transcriptions in Sogdian script of Manichaean Middle Persian texts
use <β δ γ> to record the late allophones [β δ γ] of postvocalic /b d g/ that Middle Persian
shares with Early New Persian (Durkin-Meisterernst 2014: 58, 116–17; cf. section 2.16.2).
(12) On the transformations of the accentual systems from Proto-Iranian to Old and
Middle Persian, see Klingenschmitt (2000: 210–15); Huyse (2003: esp. 47–61, 95).
(11) In this chapter, vowels with macrons are used in the phonological and the
conventional scholarly transcriptions for long vowels. Likewise, ’ is used for [ʔ], c j for [ʧ]
[ʤ], š ž for [ʃ] [ʒ], and y for [j].
(13) The only possible but debated future form in Old Persian is the ‘historical future’
patiyāvanhyai̯ ‘I will/was to implore’ in DB 1.55, if read correctly (Schmitt 2014: 275–6).
However, later forms like Middle Persian paywah- (Inscriptional <ptwḥ->, Manichaean
<pywh->) ‘to implore, entreat’, without -n- and with -h- as part of the present stem,
rather point to an Iranian root *u̯ah- ‘to venerate, implore, pray’—which also continues in
Avestan, Parthian, and Bactrian—and seem incompatible with the idea that patiy-ā-van-hy-
ai contains the future suffix -hy- (< Indo-Iranian *-si̯-) added to the root van- < Indo-
Iranian *u̯an- ‘to desire’, otherwise unattested in Iranian (Cheung 2007: 405–6).
(14) See section 2.18 on the late confusion of ka ‘when, if’, kē ‘who, which’, and kū ‘where;
that; than’.
(15) The enclitic personal pronouns (singular -m, -t, -š; plural -mān, -tān, -šān), most often
suffixed to the first word in a clause, only function as oblique in all text groups (Durkin-
Meisterernst 2014: 208–10, 291–6), as is still the case in New Persian.
Page 58 of 62

(16) Alternative endings only attested in Inscriptional and Zoroastrian Middle Persian are
enclosed in parentheses.
(17) Skjærvø’s terminology (2009a: 218–19) is adopted here for past tenses in Middle
Persian.
(18) New Persian equivalents are substituted here for al-Nadīm’s arabicized terms.
(19) Criticism of the concept of ‘Classical Persian’ as a term referring to any linguistically
based definition of any period of the history of the New Persian language has been voiced
by Paul 2002. However, in the absence of a better definition, such term has been retained
here.
(20) Christians used the Pahlavi script in Sasanian Iran (section 2.9.4). In the Islamic
period, they normally used the Arabic script, even dating their manuscripts according to
the Hegira, the Islamic era. On this phenomenon and the cultural dynamics among the
various ethnic-religious minorities in ancient Iran, see Orsatti 2007a.
(21) Rapp (1967: 55–6) unconvincingly questioned Henning’s dating and proposed the
much later date of 1299–300 CE.
(22) See Lentz (1926) on Parthian elements in the Šāhnāme and Henning (1939) on
Sogdian loanwords in New Persian.
(23) To write New Persian, the followers of Manichaeism used the Manichaean alphabet.
Because Middle Persian and Parthian were the languages of Manichaean liturgy, New
Persian Manichaean orthography shows influence from both these languages’
orthography. On the adaptation of the Manichaean script to write New Persian, see
Henning 1958: 73–5; Henning 1962: 90–1; Orsatti 2007b: 150–64.
(24) The symbol ā indicates a mid, back, labialized vowel [ɒ], distinct from mid, central,
unlabialized a. This symbol is used here in analogy with the symbol used in the scholarly
transcription of Persian.
(25) For colloquial educated Contemporary Tehrani Persian, Provasi (1979) reconstructs a
system of only twenty-two consonantal phonemes, not recognizing as phonemes /ʔ/ and /ž/
of the literary variety. As to vowels, the distinctive character opposing /i u ā/ to /e o a/ is
tenseness vs. laxeness.
(26) Recent summaries of the question of ‘dāl and ḏāl’ are de Blois 2006: 94–6; Orsatti
2007b: 94–8; Filippone 2011: 185–6. Orsatti (2018, forthcoming) regards the
complementary distribution of dāl and ḏāl in literary manuscripts mainly as a rule
intended to bring order in the multifarious dialectal realizations of /d/ in Early New
Persian. Therefore, in what follows the fricative allophones of /d b g/ are mirrored in
transcriptions of Early New Persian texts only if they are so recorded in writing.
Page 59 of 62

(27) This text also offers older constructions, as: kyš ʾyn kʾrhʾ ʾdst yšmʾ dʾd kē-š īn kārhā u-
dast-i šumā dād ‘who gave these matters into your hands (lit. ‘who (by) him these matters
(were) given into your hands)’ (Ar P5–6), where traces of ergativity may be seen (cf.
Orsatti 2007b: 121 with n. 197 and Paul 2013c: 127, section 156). On possible traces of
ergativity in literary Early New Persian texts, cf. Lazard 1963: 257–8, sections 319–20 and
Lenepveu-Hotz 2014: 57. Modern constructions, already attested in Early New Persian
literary texts, like goft-eš ‘he said’ (also applied to intransitive verbs: raft-eš ‘he went
(out)’) are considered as remnants of the ergative construction (Maḥjub 1959: 49).
(28) Transliterations in the examples reproduce the word division found in the
manuscripts, so that alignment with transcription and glossing is at times approximate.
(29) In Du1 and Du2 the second person plural pronoun is išmā.
(30) On these forms in Early New Persian, see Lazard 1963: 327–38, sections 449–72; Paul
2013c: 126, section 153 and 130–1, section 162. For occurrences of the verbal suffix -ē
directly after the stem, with or without an enclitic personal pronoun, see Lazard 1963:
329–31, sections 452–4.
(31) On this passage, cf. Lazard 1988: 205–9.
(32) Middle Persian rāy marked the cause, purpose, beneficiary, indirect object, and hence
possession; its use for the direct object is a late development. Likewise, Middle Persian ō
‘to’ could also mark the indirect and—in late Manichaean Middle Persian—the direct
object (Lazard 2009: 169–70). See (10) and (11), section 2.10.
(33) The preposition u is well-known in Persian dialects of western Iran (Filippone 2011:
198). Lazard (1986: 252) has suggested that a survival of this preposition—reduced to a
short vowel and then disappeared from pronunciation—can be detected when a
complement has no preposition in the contemporary spoken language: mira(va)m šahr ‘I
am going to the town’.
(34) For a discussion of this reading, see Orsatti (2007b: 111–13). The same reading is
given by Paul (2013c: 144, section 180b).
(35) On the origin of Middle Persian -īhā, see Salemann 1895–1901: 282, 284–5; Horn
1898–1901: 100; Henning 1958: 81, 90, n. 1; Sundermann 1989b: 155.
(36) The suffix -in was, and still partially is, in co-occurrence with and has been gradually
replaced by adjectival -i in both material adjectives (Paul 2007) and numerals (Orsatti
2005: 791): bolurin ~ boluri ‘of crystal’, avvalin ~ avvali ‘first’. See (15) for Middle Persian
examples of an -ist-ēn superlative and regular and irregular comparatives.
(37) For a discussion of this word, cf. Henning 1957: 337; Provasi 2011: 150.
(38) Afterwards, Hebrew qōph came to be used for Arabic /q/, and kaph for both /k/ and /x/
with or without diacritic.
Page 60 of 62

(39) In Judaeo-Persian, Arabic <ḥ> is transliterated by Hebrew ḥēth and Arabic <h> by
Hebrew hē.
(40) In Manichaean orthography, <q> and <k> alternate to write /k/, while the new letter
<q̈> was created for Arabic /q/. On the spelling hmI for hamē, see section 2.17.4.
(41) On this word and other possible explanations of its origins, see Moʿin 1977: 80–1.
(42) On Turkish loanwords in Persian, see Doerfer 1963–75; for Turkish words in classical
poetry, see Ganjei 1986, and Lornejad and Doostzadeh 2012: 93–108.
(43) ‘Non-past’ in Jahani’s terminology.
Mauro Maggi
Mauro Maggi (PhD 1992) is Associate Professor of Iranian philology at La Sapienza

University (Rome) and was Associate Professor of Indo-Iranian philology at
L’Orientale University (Naples) until 2008. His areas of expertise are the Khotanese
language and literature, Central Asian Buddhism, the history of the Iranian
languages, and early and sub-standard New Persian. He has published The
Khotanese Karmavibhaṅga (1995), Pelliot Chinois 2928: A Khotanese Love Story
(1997), and many scholarly articles. Among his edited books are The Persian
Language in History (with Paola Orsatti, 2011) and Buddhism among the Iranian
Peoples of Central Asia (with Matteo De Chiara and Giuliana Martini, 2013).
Paola Orsatti
Paola Orsatti (Laurea in Lettere, 1979) entered university as a researcher in 1983

and is currently Associate Professor of Persian Language and Literature at La
Sapienza University (Rome), where she formerly also taught history of the Persian
language. In addition to her Persian studies, she specialized as keeper of
manuscripts at the Scuola Speciale per Archivisti e Bibliotecari of La Sapienza
University in 1992. Her research focuses on the history of Persian, Persian classical
literature, the history of Persian studies in Europe, palaeography, and codicology of
Islamic manuscripts. Besides a number of scholarly articles, she has published Il
fondo Borgia della Biblioteca Vaticana e gli studi orientali a Roma tra Sette e
Ottocento (1996), Appunti per una storia della lingua neopersiana. 1: Parte generale,
fonologia, la più antica documentazione (2007), Corso di lingua persiana (with
Daniela Meneghini, 2012), and edited The Persian Language in History (with Mauro
Maggi, 2011).
Page 61 of 62

Page 62 of 62

Typological Approaches and Dialects

Mohammad Dabir-Moghaddam

Modern Persian reveals interesting typological properties. In terms of word order

parameters, it has grammaticalized a number of OV-type and a number of VO-type
parameters. As this mixed typological behaviour can be attested in Old Persian and
Middle Persian, the implications of this observation for typology, formal linguistics, and
theories of language change are worth pursuing. The agreement system of Modern
Persian is Nominative-Accusative. However, the majority of Modern Iranian languages are
split in this respect. Morphologically, Modern Persian is analytic. This morphological type
can be observed in Middle Persian as well. This two-millennium-old typological property
gives Persian a distinct place within the Indo-European languages. As Persian is spoken in
a widespread geographical area, there are many Persian dialects currently in use. A
number of grammatical features of Tajik Persian, Afghan (Dari) Persian, Isfahani Persian,
and Gha’eni Persian are briefly mentioned.
Keywords: Persian typology, word order, agreement, analytic morphology, Persian dialects
3.1 Introduction
THIS chapter intends to describe (a) the main morphosyntactic typological features of
Modern Standard Persian and (b) some of the linguistic features of a number of currently
spoken Persian dialects. In order to meet these objectives, I have divided this chapter into
two core sections each containing a number of subsections (sections 3.2 and 3.3). The
chapter ends with a summary section (section 3.4).
Page 1 of 48

The morphosyntactic typological features of Modern Standard Persian which will be

described are word order parameters, word-formation peculiarities as well as tendencies,
and agreement system. The Persian dialects surveyed here are Tajik Persian, Afghan
(Dari) Persian, Isfahani Persian, and Gha’eni Persian.
3.2 Typological features

In this section, three typological characteristics of Persian will be elaborated on. In one
subsection, the previous approaches to Persian word order will be highlighted (3.2.1) (see
Chapters 7, 8, 10, and 15 for more on word order). In another subsection, the findings of
my own research will be presented (3.2.2). Agreement is the second typological
characteristic of Persian which has received independent attention here (3.2.3).
Morphological traits of Persian from a typological perspective will be the topic of another
subsection (3.2.4). For more discussion on typology, see Chapters 8 and 9.
My motivations for choosing the mentioned typological topics are as follows. I have
chosen word order typology because this research area has been a major concern of
typology since the revival of this discipline in sixties up until the present time (i.e.
Greenberg 1966; Hawkins 1983; Dryer 1992, 2007, 2011, 2013a). Furthermore, as I have
shown in (p. 53) Dabir-Moghaddam (2013), Persian as well as other Iranian languages
present interesting behaviours in word order typology which have implications for areal
typology, contact linguistics, formal linguistics, and language change. For more
information about areal typology, refer to Chapter 8.
As Croft (2003: 79) has noted ‘[t]he study of typological patterns of word order variation
is a relatively new area, and will be increasingly important in typological word order
research.’ I have chosen agreement because agreement/cross-referencing/indexation is a
very strong morphosyntactic peculiarity of all of the Iranian languages of Iran (see Dabir-
Moghaddam 2012, 2013). Among the Iranian languages of Iran, Modern Persian is one of
the few languages which have grammaticalized a uniformly nominative-accusative type in
terms of agreement. Other Iranian languages of this type are Gilaki, Mazandarani,
Shahmirzadi, Southern Kurdish, Lori, and Sistani Baluchi (more specifically Baluchi of the
city Zabol). Other Iranian languages of Iran have a split agreement system (see Dabir-
Moghaddam 2013 for details). Further discussion of Iranian languages was discussed in
Chapter 2. Furthermore, agreement has been a major research topic in both typological
studies (e.g. Corbett 2006) and minimalist studies (e.g. Baker 2008). Finally, I have
addressed morphological typology of Persian because in one study, this language is
described as ‘agglutinative’ (Frommer 1981) and in another study it is characterized as
more analytic than the rest of Indo-European languages (Darmesteter 1883). I will rely on
typological evidence to cast light on the morphological typology of Modern Persian. For
further discussion on typological studies, see Chapter 9.
Page 2 of 48

The three typological topics which are chosen for discussion in this chapter though are
not directly related to each other yet they reveal the most fundamental typological traits
of the morphosyntax of a language: word order characteristics, agreement/cross-
referencing as an alignment mechanism, and the general morphological type.
3.2.1 Previous approaches: word order
In this subsection, first I will briefly introduce generative analyses which have addressed
word order in Persian (3.2.1.1), second I will mention statistical surveys on this topic
(3.2.1.2), and third I will review typological studies which have touched on Persian word
order and have revealed its importance for linguistic typology (3.2.1.3).
3.2.1.1 Generative analyses

An overview of the descriptions of the linear word order of Modern Standard Persian
shows that within generative frameworks a number of studies have postulated the SOV
order as the canonical word order of this language (e.g. Shoheili-Isfahani 1976; Hajati
1977; Dabir-Moghaddam 1982). Marashi (1970) is the only study which postulates an
underlying SVO order for Persian (pp. 11, 38–40). He then proposes an obligatory ‘verb-
object inversion’ transformational rule which preposes the object and as a result the SOV
order of the surface structure is formed (p. 26). Tabaian (1974) proposes two phrase
structure rules for the verb phrase (VP) in Persian. For simple sentences of this language
he assumes the (NP)(PP)V order for the extension of VP, but for verbs which
subcategorize a ‘complement (p. 54) clause’ the (NP)V[Ke1 S] order is suggested for the
extension of VP. Karimi (1989) also argues for two basic word order for spoken Standard
Persian. She presents the order (S)(PP)(DO)V(clause) as the canonical word order in
Persian (p. 192). In this pattern, the ‘phrasal arguments’, including the direct object (DO),
precede the verb and the ‘sentential arguments’, more specifically the object complement
clauses, follow the verb (pp. 141, 168, 189). The same general assumptions and
conclusions about the word order of simple and complex sentences of Persian are also
adopted in Karimi (2005: 4, 7, and 10). Darzi (1996) assumes the Government and Binding
as his theoretical framework and basically repeats the same positions expressed by
Tabaian and Karimi, namely he accepts the underlying SOV order for simple sentences
but in complex sentences he base-generates the object complement clause postverbally
(pp. 23, 62, 70). For more information on generative approach, refer to Chapter 7.
3.2.1.2 Statistical surveys

Shafa’i (1985)2 reports a study which has analysed 1218 simple declarative sentences in
two groups of literary contemporary Persian stories and announces that in this genre
‘subject always tends to be initial in the sentence and predicate also tends to appear
sentence-finally’ (p. 61, my translation).
Frommer (1981) assumes the canonical word order of formal Standard Persian to be as
follows (pp. 30, 31):
Page 3 of 48

(Subject) + (Adverbs) + (Direct Object) + (Indirect Object) + Verb
His corpus consisted of ‘spontaneous colloquial speech’ (p. 68) as well as dialogue and
monologue broadcasts of a Persian radio station in Southern California (p. 74), a play
designed for children (p. 76), and the dialogue parts of two contemporary Persian stories
written by the well-known author Sādeq Čubak (pp. 77–9). Frommer’s main concern was
to provide a survey of the constituents which occur postverbally in simple sentences in
contemporary colloquial Persian. The total number of sentences in his corpus was 5,784
(p. 84). The total number of postposed constituents, namely those appearing postverbally,
were 682. These constituents are classified into different syntactic categories and the
results are presented in numerous tables (pp. 126–72). For more discussion on corpus
studies, see Chapters 11 and 19, and for more discussion on colloquial speech, see
Chapters 2, 4, 5, 6, 10, 11, and 15.
Roberts (2009) is a text-based study which draws upon a corpus of sixteen narratives
consisting of two ‘oral texts and the rest are written’ (p. 1). In Chapter 4, where he
describes ‘Constituent Order in the Clause’, he mentions Mahootian (1997: 50–1) who has
given the following pattern as ‘the neutral order of constituents’ (Roberts 2009: 97):
subject-temporal-direct object-source-locative-benefactive/goal-instrumental-verb
Roberts basically accepts the mentioned pattern but provides a minor qualification to the
following effect: ‘when the direct object is indefinite, then the default position is
immediately preceding the verb’ (p. 98).
(p. 55) 3.2.1.3 Typological studies

Greenberg (1966)’s universal 4 claims ‘[w]ith overwhelmingly greater than chance
frequency, languages with normal SOV order are postpositional.’ (p. 79). He mentions
that there are ‘exceptions’ to this universal (pp. 78–9). In endnote 8, he names ‘Standard
Persian’ as one exception because it has SOV order yet it is prepositional (p. 105).
This ‘exceptional’ behaviour of Persian was also noted in Comrie (1989: 19 and 91). He
points out that Persian has the subject-object-verb order at sentence level, namely it is OV
in terms of the linear order of its constituents. However, since the language (a) has many
prepositions, (b) the head noun precedes the genitive (namely it has NG order), (c)
adjective follows the noun (namely it has N Adj order), and (d) the relative clause follows
the head noun (i.e. it has N Rel order), it should be classified as a VO-type language (p.
96). In order to avoid the possible confusion between OV/VO linear order and OV/VO type,
Comrie uses the terms operand-operator, less technically head-dependent, when he
intends to refer to type. With respect to Persian, he mentions that it is an operand-
operator language but its exception is that it has the order OV (p. 98).
3.2.2 Current research
Page 4 of 48

In this subsection, I will deal with two topics in two separate parts. In one part, I will
introduce the typological approach to word order that I have adopted and have applied it
to Persian (3.2.2.1). In this part, twenty-four typological parameters are relied on. In the
second part, I have discussed the implications of the findings in part one for linguistic
typology (3.2.2.2).
3.2.2.1 Typological parameters

In Dabir-Moghaddam (2013), I have adopted a typological framework and have relied on
the twenty four correlation pairs and the statistical results discussed in Dryer3 (1992,
2007, 2011, and 2013a) to cast light on the word order peculiarities of Persian. Dryer
(1992) has proposed the following definition for a correlation pair:
If a pair of elements X and Y is such that X tends to precede Y significantly more

often in VO languages than in OV languages, then <X, Y> is a CORRELATION
PAIR, and X is a VERB PATTERNER and Y an OBJECT PATTERNER with respect to
this pair.
(Dryer 1992: 87)
As an illustration of how the notions used in the above quotation are implemented in his
detailed study, he provides the following example:
For example, since OV languages tend to be postpositional and VO languages

prepositional, we can say that the ordered pair<adpostion, NP>is a correlation
pair, and that, with respect to this pair, adpositions are verb patterners and the
NPs that they combine with are object patterners.
(Dryer 1992: 82) (p. 56)
In the rest of this subsection I will introduce the twenty four correlation pairs proposed in
the mentioned publications of Dryer. Under each parameter I will present relevant
examples from Persian.
Parameter (1): Adposition type
Persian is a prepositional language. Its single postposition is -ra whose overall syntactic
function is that is marks direct objects (for different theories on the behaviour and
distribution of this postposition see Browne 1970; Lazard 1982; Karimi 1989, 2003, 2005;
Dabir-Moghaddam 1992, 2005; Ghomeshi 1997). Some of its prepositions are be ‘to,
bæraye ‘for’, æz ‘from’, ba ‘with’, dær ‘in’, ta ‘until’, tævæssot-e ‘by’, kenar-e ‘next to’, ruy-
e ‘on’, miyan-e ‘between’. Example (1) illustrates the use of adpositions in this language.
For more information on Persian adpositions, see Chapter 8.
Page 5 of 48

(1)
Parameter (2): Order of noun and relative clause
In Persian, a relative clause always follows its head noun. Example (2) shows this
structure.
(2)
Parameter (3): Order of noun and genitive
In this language, the head noun always precedes the genitive as exemplified in (3). The
Ezafe morpheme -e is a head marker linker in Persian.
(3)
The Ezafe construction is further discussed in Chapters 2, 6, 7, 8, 9, 10, and 19.
Parameter (4): Order of adjective and standard in comparative construction
Persian allows the two options here. In example (4) the adjective has preceded the
standard and in example (5) the adjective has followed the standard. Both express the
same propositional meaning.
(4)
(5)
(p. 57) Parameter (5): Order of verb and adpositional phrase
Page 6 of 48

The adpositional phrase predominantly precedes the verb, as exemplified in (6). It may be
noted that the occurrence of Ezafe morpheme on the preposition ruy ‘on’ is due to its
historical origin as a noun. In other words, this preposition is an instance of
grammaticalization.
(6)
In colloquial Persian, example (6) has a counterpart in which the noun zæmin loses its
preposition and follows the verb. This option is presented in (7).
(7)
Parameter (6): Order of verb and manner adverb
Manner adverb precedes the verb. Example (8) shows this pattern.
(8)
Parameter (7): Order of copula and predicate
The predicate always precedes the copula in Persian. Example (9) supports this
observation.
(9)
Parameter (8): Order of ‘want’ and subordinate verb
In Persian the verb xastæn ‘to want’ always appears before the subordinate verb, as
exemplified in (10). The verbal prefix mi- is an incomplete aspect marker and the verbal
prefix be- in the subordinate clause marks the subjunctive mood.
Page 7 of 48

(10)
Parameter (9): Order of noun and adjective
Modifying adjectives in Persian follow their head nouns. Example (11) illustrates this
structure. The Ezafe morpheme -e, as a head marker linker, is also used in this structure.
(11)
(p. 58)
Parameter (10): Order of demonstrative and noun
Demonstratives in ‘this’ and an ‘that’ always precede the noun, as exemplified in (12).
(12)
Parameter (11): Order of intensifier and adjective
Intensifiers always come before adjectives, as shown in (13).
(13)
Parameter (12): Order of content verb and auxiliary verb
The auxiliary verbs which are intended in this parameter are the tense/aspect auxiliary
verbs. Dryer (1992) in the section which deals with this parameter says:
This section deals only with auxiliary verbs whose stem conveys tense or aspect.
Applied to English, this includes will, have, and progressive be, but excludes the
passive auxiliary be and modal auxiliaries like can and should.
(Dryer 1992: 100)
Page 8 of 48

Examples (14) and (15) show that the future tense auxiliary and the progressive auxiliary
are grammaticalized preverbally. Example (16), on the other hand, indicates that the past
perfect auxiliary is grammaticalized postverbally. It should be pointed out that the future
tense auxiliary precedes ‘the Short Infinitive’ form of the main verb (Lambton 1984
[1953]: 18) called ‘1’ infinitif apocope’ by Lazard (2006: 145). Windfuhr and Perry (2009:
449) have mentioned that ‘[t]he ‘short infinitive’ is identical with the past stem’.
(14)
(15)
(16)
Needless to say that, therefore, Persian has grammaticalized two slots for the occurrence
of the tense/aspect auxiliaries.
Parameter (13): Order of question particle and sentence
The question particle aya which is used to indicate yes/no question in Persian appears in
sentence-initial position. It is a well-known fact, however, that in Japanese, for instance,
the (p. 59) question particle ka appears in sentence-final position. Example (17) contains
the question particle aya.
(17)
Parameter (14): Order of adverbial subordinator and clause
Adverbial subordinators ‘like although and when in English, are called ‘subordinate
conjunctions’ in traditional grammar, and that [they] mark adverbial subordinate clauses
for their semantic relationship to the main clause.’ (Dryer 1992: 103). For further
information about subordinate clauses, see Chapters 6 and 7. In Persian, these adverbial
Page 9 of 48

subordinators are positioned clause-initially. Examples (18) and (19) support this
observation.
(18)
(19)
Parameter (15): Order of article and noun
The indefinite article -i ‘a(n)’ and the colloquial definite article -e appear after the noun.
Examples (20) and (21) illustrate these uses.
(20)
(21)
Parameter (16): Order of verb and subject
In Persian, subject overwhelmingly4 and typically precedes the verb. Example (22) shows
this pattern.
(22)
(p. 60)
Additional examples supporting this pattern can be found under a number of previous
parameters (more specifically, examples (1), (6)–(10), (17)–(19)).
Parameter (17): Order of numeral and noun
Numeral is positioned before the nouns as exemplified in (23).
Page 10 of 48

(23)
Parameter (18): Order of tense-aspect affix and verb stem
As a rule, Persian contains the past tense suffixes -id, -ad, -d, -t, and -est which attach to
the present stem of verbs. Examples (24)–(28) are formed in this way.
(24)
(25)
(26)
(27)
(28)
This past tense formation pattern is the regular pattern. Persian has also irregular past
tense formations. In this group, the past tense stem is a suppleted form which always
ends with the dental consonant d or t found in the past tense suffixes of the regular past
tense formation pattern. Lazard (2006: 119) has made the following remarks about the
two verb stems of Persian:
Tout verbe persan a deux radicaux; sur l’un d’eux (radical I) sont formés: le
présent, le subjonctif simple, le participe présent; sur l’autre (radical II): le
prétérit, l’imparfait, le participe passé, les deux infinitifs.
He then adds:
On divise ordinairement les verbes persans en deux grands groupes, verbes

réguliers et verbes irréguliers, selon le rapport de leurs deux radicaux. (p. 61)
Dans les verbes réguliers, le rad. II se forme par l’addition du suffixe id au rad. I:
Page 11 of 48

Ex. ‘acheter’, rad. I xar- (présent … ), rad II. xar-id- (prétérit … infinitif … xar-id-an
…
Dans les verbes irréguliers, le rad. II est toujours en -d ou -t (infinitif en -dan ou -

tan …
Ex. … mordan ‘mourir’ …
… goftan ‘dire’.
(Lazard 2006: 120)
A similar analysis supported with relevant examples is provided in Windfuhr and Perry
(2009: 446–447).
Persian also contains the verbal prefix which marks incomplete aspect. This prefix is used
in example (29).
(29)
In example (30), the past tense suffix and the verbal incomplete aspect prefix are both
used.
(30)
Parameter (19): Order of noun and possessive
The possessive morphemes in Persian are always enclitics as exemplified in (31) and (32).
(31)
(32)
Parameter (20): Order of verb and auxiliary verb ‘able’
Page 12 of 48

The auxiliary verb ‘able’ in Persian always precedes the main verb. Example (33) shows
this pattern.
(33)
Parameter (21): Order of complementizer and sentence
Persian complementizer ke ‘that’ always appears at the beginning of the embedded

sentence. Example (34) substantiates this point.
(34)
(p. 62)
Example (10) is a similar token.
Parameter (22): Not-obligatory-initial-wh or obligatory-initial-wh
In Persian wh-words typically appear in-situ, namely it is a not-obligatory-initial-wh

language. Examples (35) and (36) illustrate this characteristic.
(35)
(36)
Parameter (23): Order of object and verb
In Persian if the direct object is a noun phrase its canonical position is before the verb.5
However, if the object is an embedded clause, its canonical position is after the verb.
Examples (37) and (38) support the first pattern and example (39) substantiates the
second pattern.
Page 13 of 48

(37)
(38)
(39)
Further examples which are in line with the first pattern are items (1) and (35). Examples
(10), (34), and (36) are additional items supporting the second pattern.
Parameter (24): Order of verb stem and negative affix
In Persian, negative affix is always a prefix. Example (40) contains a simple verb with a
negative prefix and example (41) has a compound verb whose light verb carries the
negative prefix. (p. 63)
(40)
(41)
3.2.2.2 Discussion
3.2.2.2.1 Modern Persian
Through the application of the correlation pairs and the statistical results discussed by
Dryer to Persian data, which were presented under the twenty four parameters
introduced in the previous section, I arrived at the following notable conclusions. Modern
Persian has grammaticalized a number of parameters typical of OV-type languages and a
greater number of parameters typical of VO-type languages. More specifically, in terms of
parameter (5), order of verb and adpositional phrase (example (6)), parameter (6), order
Page 14 of 48

of verb and manner adverb (example (8)), parameter (7), order of copula and predicate
(example (9)), and parameter (22), not-obligatory-initial-wh or obligatory-initial-wh
(examples (35) and (36)) Persian is a strict OV-type language. In terms of parameter (1),
adposition type (example (1)), parameter (2), order of noun and relative clause (example
(2)), parameter (3), order of noun and genitive (example (3)), parameter (8), order of
‘want’ and subordinate verb (example (10)), parameter (13), order of question particle
and sentence (example (17)), parameter (14), order of adverbial subordinator and clause
(examples (18) and (19)), parameter (20), order of verb and auxiliary verb ‘able’ (example
(33)), parameter (21), order of complementizer and sentence (example (34)), and
parameter (24), order of verb stem and negative affix (examples (40) and (41)) Persian is
a strict VO-type language. These results clearly show that the number of VO correlation
pairs are more than twice the number of OV correlation pairs (nine versus four). There
are also a number of parameters which are equally found in both OV-type and VO-type
languages. To be precise, parameter (9), order of noun and adjective (example (11)),
parameter (10), order of demonstrative and noun (example (12)), parameter (11), order of
intensifier and adjective (example (13)), parameter (15), order of article and noun
(examples (20) and (21)), parameter (16), order of verb and subject (example (22)),
parameter (17), order of numeral and noun (example (23)), and parameter (19), order of
noun and possessive (examples (31) and (32)). Furthermore, there are four parameters
for which Persian contains two options. Therefore, in regard to these parameters the
language remains indeterminate between an OV type or a VO type. The first one is
parameter (4), order of adjective and standard in comparative construction (examples (4)
and (5)). The second one is parameter (12), order of content verb and auxiliary verb
(examples (14)–(16)). The third one is parameter (18), the order of tense-aspect affix and
verb stem (examples (24)–(28)). The fourth one is parameter (23), order of object and
verb (examples (37)–(39)). On the basis of all these observations, I conclude that Modern
Persian has grammaticalized mixed word order typological parameters. This result
requires an explanation. But before I proceed, I have summed up these conclusions in
Table 3.1. (p. 64)
Table 3.1 Modern Persian’s word order typology
OV-type parameters 4
VO-type parameters 9
OV-type and VO-type parameters (shared) 7
OV-type or VO-type parameters (indeterminate) 4
In order to cast light on the mentioned result, I consulted Old Persian (whose texts belong
to sixth century BC to fourth century BC) and Middle Persian, equally called Pahlavi,
(from about fourth century BC to ninth century AD) sources as the predecessors of
Modern Persian. The three languages belong to the Southwestern Iranian group. In a
Page 15 of 48

separate section (section 3.2.2.2.4), I have reported the results of a study on typological
features of Avestan, an Ancient Iranian language, which further enhance our
understanding of the word order peculiarities of Iranian languages. For more information
about Avestan, refer to Chapters 2 and 11.
3.2.2.2.2 Old Persian

My consultation of Old Persian data and descriptions revealed interesting results: (a) ‘The
word order in the sentence in OP [Old Persian] is quite free, but the normal order is
subject-object-verb: DB [Darius Behistan] 1.85’ which I quote and add glosses in (42).
(42)
(b) Old Persian texts contain 18 prepositions, 2 prepositions which were used as
postpositions as well (ā ‘to’ and patiy ‘against, on’), 2 postpositions (rādiy ‘on account of’
and parā ‘along’) (Kent 1953: 86, section 268). (c) The relative clause follows its head
noun, e.g. DB 1.21 which I quote and add glosses in (43) (Kent 1953: 86, section 267(I)).
(43)
(d) ‘A genitive used as a genitive (not in a dative use), and depending upon a noun or
adjective, precedes that noun or adjective, unless the genitive is attached to its noun by
the article, in which instance it follows: DB 1.4 … [and] DB 2.27’ respectively (Kent 1953:
95, section 309). The relevant examples are cited in (44) and (45). (p. 65)
(44)
(45)
Page 16 of 48

(e) ‘Descriptive adjectives, if attributive, follow their nouns.’ (Kent 1953: 95, section
306(II)). Representative examples are from Kent, p. 116 AMH [Ariaramnes, Hamadan] 6–
7 and Kent 1953: 117, DB 1.46 are mentioned in (46) and (47).
(46)
(47)
(f) ‘Attributive adjectives precede their nouns if they are demonstrative, numerical,
quantitative, or month-names.’ (Kent 1953: 95, section 306(I)). ima xšaçam ‘this
kingdom’ (Kent 1953: 117, DB 1.26) illustrates the occurrence of demonstrative before
the noun and XXVII raucabiš ‘XVII days’ (Kent 1953: 121, DB 2.26) exemplifies the use of
numerical before the noun. (g) ‘A predicate noun or adjective stands between the subject
and the verb’ (Kent 1953: 95, section 307). The sentence quoted in (48) from Kent 1953:
82, section 252c supports this point. In this example, the sequence xšāyaɵiya amiy (lit.
‘king am’) shows that the predicate has preceded the copula.
(48)
(h) adverbial subordinators in Old Persian always precede their adverbial subordinate
clauses (Kent 1953: 94, section 304). A representative example is the expression in (49)
which begins with the temporal subordinator yadiy ‘whenever’ (Kent 1953: 139, DNb
[Darius, Naqš-i Rustam b.] .38–40).
(49)
(p. 66)
Page 17 of 48

(i) ‘The conjunction tya ‘that’ is used to introduce clauses of fact, of volition, of directly
and indirectly quoted statement and question, of result’ (Kent 1953: 93, section 299). The
example in (50) substantiates this observation (Kent 1953: 138, DNb.19–20).
(50)
It may be noted that the ‘order of complementizer and sentence’ is a word order
typological parameter (see section 3.2.2.1, parameter (21)).
From among the mentioned results, the tendency of Old Persian to put verb in sentence
final position (item (a)) and the occurrence of the predicate before the copula (item (g))
are the prominent OV-type characteristics of Old Persian. On the other hand, the large
number of prepositions in this language (item (b)), the fact that a relative clause always
follows its head noun (item (c)), the occurrence of the adverbial subordinator before its
clause (item (h)), the fixed position of the complementizer before the embedded sentence
(item (i)), and the appearance of a predicate with the meaning ‘want’ before the
subordinate verb (as exemplified in item (i)) are the prominent properties of the VO-type
languages. However, the possibility of the occurrence of the genitive (namely as a
dependent) both before and after the head noun (item (d)), the fact that demonstratives
always precede their nouns (item (f)), numerals are positioned before the nouns (item (f)),
and finally that attributive adjectives follow their nouns (item (e)) are the peculiarities
that are found both in the OV-type and VO-type languages. Therefore, it is clear that Old
Persian contains two OV-type parameters, five VO-type parameters, and four parameters
which are commonly found in both OV-type and VO-type languages. This means that Old
Persian was not uniform in terms of its word order typological parameters, and that the
data show a tendency towards VO-type languages. For more discussion on Old Persian,
refer to Chapters 2 and 11. A comparison of the typological peculiarities of Old Persian
and Modern Persian in the mentioned parameters shows that the two languages share a
number of typological parameters and that their main differences are in three properties:
further stabilization of the verb in the final position in simple sentences, stabilization of
the genitive after its head noun, the reduction of the two postpositions of the Old Persian
to one postposition (namely -ra), and the loss of the two prepositions which were also
used as postpositions. I assume that the further stabilization of the verb in the final
position is the natural consequence of the loss of the inflectional system of Old Persian.
Table 3.2 summarizes the results of my survey of the Old Persian word order parameters.
Table 3.2 Old Persian’s word order typology
Page 18 of 48

Hale (1988) entitled ‘Old Persian Word Order’ has dealt with the linear order of
(p. 67)
constituents and their permutation in the Old Persian inscriptions. Therefore, we do not
obtain much information on the other word order typological parameters of Old Persian in
that article. With respect to the linear order of constituents, Hale mentions
One of the most striking characteristics of Old Persian syntax … is the stable
structure of the verb phrase in the majority of sentences in the corpus. The
various subconstituents of the verb phrase are normally arranged in a contiguous
fashion, with the verb placed at the end of the VP string. The nominative subject
almost invariably stands before this verb phrase.
(Hale 1988: 27)
He has provided five examples (from Kent 1953) to substantiate the points made in the
above quotation. His first example (p. 27, ex. 1) which I quote and add glosses in (51) is
from (DB 1.12).
(51)
In this example, the finite verb frābara is in sentence-final position and the nominative
subject Auramazdā is in sentence-initial position. Hale also discusses a number of
examples in which a constituent which could be an accusative object, an oblique
argument, or a prepositional phrase is topicalized ‘to sentence-initial position for some
pragmatic function.’ (p. 28). His words are quoted below:
Thus, although we have noted above that the verb phrase in Old Persian is not
normally discontinuous, such discontinuities do arise from the application of the
rule of fronting or topicalization.
(Hale 1988: 28)
In example (52) which is from the text (AMH 6) the oblique argument manā is topicalized
(Hale, p. 28, ex. 10).
Page 19 of 48

(52)
Hale also discusses another ‘rearrangement rule’ which moves ‘a single constituent … to
the right of the finite verb.’ (p. 29). This process which he calls ‘end topicalization’
postposes ‘the entire constituent’ or ‘partial constituents’ and ‘[t]he constituents so
topicalized [namely postposed] include the full range of NP functions, as well as PPs and
adverbials’ (pp. 29, 38, endnote 8). For instance, in example (53) from (DB 1.40) the
subject NP is moved to the postverbal position (Hale, p. 29).
(53)
(p. 68)
Example (54) which is from the text (DNb 34) illustrates the fact that ‘[p]redicate
nominatives … may be topicalized into final position’ (p. 30).
(54)
Finally, I quote example (55) from Hale (p. 32) in which part of a constituent is moved
postverbally. This example is from (DB 2.11).
(55)
In Hale’s words ‘our general understanding of constituency tells us that ašnaiy … abiy
ūvjam ‘near the Elam’ must form a single constituent; the discontinuity is again to be
accounted for by the movement process’ (p. 32).
Page 20 of 48

I have two points to mention about Hale (1988). First, as I said at the beginning of my
summary of his paper, Hale has only described the linear arrangement of the constituents
(namely linear word order) of Old Persian inscriptions. The other word order typological
parameters are not dealt with in that paper. Second, his rearrangement rules may all fall
under a general scrambling process. In that case, his not so familiar term ‘end
topicalization’ falls under that general process. Scrambling is further discussed in
Chapter 7.
3.2.2.2.3 Middle Persian

My survey of Middle Persian (more specifically Pahlavi) sources yielded the following
results: (a) In terms of the linear order of constituents it is the case that ‘the most
common order of initial subject/agent and final verb’ is often represented in Middle
Persian (Brunner 1977: 181). Example (56) is from the mentioned source.
(56)
(b) Middle Persian contains about 24 prepositions (Shadan 1968: 181–204 (a Persian
translation of Rastorgueva 1966); Brunner 1977: 116–55). In regard to postpositions,
Brunner has said:
MP [Middle Persian] contains three types of postpositional words. Type A

comprises most of the prepositions, which have the additional functions of
postposition and preverb. Type B consists of those terms which occur only in
combination with a preposition, ō … rōn [‘to … direction’] and az … hammis [lit.
‘from … together’; ‘together with’]. To type C belongs only one word, rāy [‘for’]; it
occurs only as a postposition and is usually independent of a preposition.
(Brunner 1977: 148)
(c) ‘Relative constructions with ī ((z)Y) tend to follow immediately after the noun which
they modify’ (Heston 1976: 305). (d) In the genitive construction ‘in Pahlavi, dependent
nouns may either precede the head noun or, more commonly, follow it with the linking
particle (p. 69) Y.’ (Heston 1976: 21). The construction in (57) which I quote from Heston
(p. 22) and in (58) from Heston (p. 21), which appears with the linking particle,
substantiate the quoted observation.
(57)
Page 21 of 48

(58)
(e) As for the order of noun and adjective, three possibilities are reported in Pahlavi.
First, ‘descriptive adjectives may immediately precede the nouns they modify’, as shown
in (59) from Heston (1976: 3).
(59)
Second, ‘[a]lternatively, the adjective may follow a noun which is itself followed by the
linking particle (z)Y’ (ibid.) as quoted in (60).
(60)
Third, ‘[m]uch less frequently, and usually in short phrases, an adjective will immediately
follow a noun without a linking particle’ (ibid.) as exemplified in (61).
(61)
(f) Demonstrative pronominal modifiers systematically ‘precede both the nouns and the
adjectives’ (ibid.) as illustrated in (62). Elsewhere, Heston describes ZNE as ‘the
proximate ēn [“this”]’ (p. 130).
(62)
(g) Numerals usually precede the noun which they modify as shown in (63) from Brunner
(1977: 41) and in (64) Brunner (1977: 45).
Page 22 of 48

(63) (p. 70)
(64)
(h) The predicate (whether nominal or adjectival) precedes the copula as exemplified in
(65) which I quote from Brunner (1977: 26). Another relevant example is sentence (66)
which I quote from Shadan (1968: 216).
(65)
(66)
(i) ‘In Pahlavi, temporal clauses are typically introduced by the particle ka (AMT)
“when”.’ (Heston 1976: 326). (j) In Pahlavi the complementizer precedes the embedded
sentence. The main complementizer in this language is kū ‘that’ (Heston 1976: 273, 290;
Brunner 1977: 234–9; Shadan 1968: 208, 215).
These observations indicate that in regard to the final position of the verb (item (a)) and
the occurrence of the predicate (nominal or adjectival) before the copula (item (h)),
Middle Persian is an OV-type language. But based on the facts that the head noun
precedes the relative clause (item (c)), an adverbial subordinator precedes the adverbial
subordinate clause (item (i)), and the complementizer appears at the beginning of the
embedded clause (item (j)) Pahlavi is a VO-type language. On the other hand, the possible
occurrence of the genitive, more specifically dependent noun, before the head noun as
well as the opposite order which is more common (item (d)), the systematic appearance of
the demonstrative modifier before the noun (item (f)), the position of the numeral before
the noun (item (g)), the occurrence of adjective both before and after the noun (item (e)),
and the availability of many prepositions which are also used as postpositions (item (b))
are the characteristics which are shared by both OV- and VO-type languages. Therefore, it
is clear that Middle Persian was OV in two characteristics, it was VO in three other
characteristics, and contained five characteristics which are shared by both OV- and VO-
type languages. This means that Middle Persian, like Old Persian and Modern Persian,
Page 23 of 48

was not, it terms of word order parameters, a uniform language. In other words, Middle
Persian was also a mixed-type language. Middle Persian is further discussed in Chapters
2 and 11. Table 3.3 provides a summary of the Middle Persian word order parameters.
Table 3.3 Middle Persian (Pahlavi) word order typology
Despite the fact that Old Persian, Middle Persian, and Modern Persian have
(p. 71)
grammaticalized a number of OV-type parameters and a number of VO-type parameters

and hence they are mixed in this respect, there are typological differences between these
three languages as well. The major typological differences between Old Persian and
Middle Persian are as follows: In Middle Persian, one finds the disappearance of the
inflectional case system, the possible occurrence of the adjective before or after the noun,
an increase in the number of prepositions which were used as postpositions as well, and
the emergence of circumpositions. Similarly, the main typological differences between
Middle Persian and Modern Persian are as follows: In Modern Persian, we find the
stabilization of the order of noun before genitive (namely dependent noun), the
stabilization of the noun before adjective, the absence of any preposition which can be
used as postpositions as well, and the loss of all circumpositions (see Chapter 2 for more
information on circumposition). I assume that all of these developments have led to the
structural stabilization of Modern Persian and its further drift towards a language with
VO typological characteristics. Of course, a more detailed and exact comparison and
characterization of the word order typology of Persian in its three historical stages will
require the analysis of Old and Middle texts with respect to the twenty-four parameters
which were surveyed for Modern Persian. This I leave for future research.
3.2.2.2.4 Avestan
Avestan texts are divided into two chronologically distinct groups: Old Avestan, also
called Gatha Avestan, and Young Avestan. The texts were first written down around 600
AD, but over several centuries they were transmitted orally by specially trained priests.
The Old Avestan poems can be dated to around 1500 BC and are linguistically extremely
close to the oldest attested Indo-Aryan texts, namely Vedic Sanskrit. Young Avestan, on
the other hand, is typologically closer to Old Persian and is dated at a similar period as
the Old Persian (Haig 2008: 4, 23; Skjærvø 2009: 44). ‘Geographically, it is assumed that
Avestan reflects an Iranian language spoken in what is now Northeastern Iran’ (Haig
2008: 23).
Page 24 of 48

Friedrich (1975) has compared Gathic hymns, which are believed to have been composed
by Zoroaster himself around 900 BC in the east of Iran, with Younger Avestan Hymns,
which are probably composed by the Magi in the fourth and fifth century BC in the
Achaemenid court in the west of Iran (though its available documents belong to the third
century AD) and has reported the following observations (pp. 44, 45): (a) In Old Avestan,
the order of noun (N) and adjective (A) is practically equal to the order of adjective and
noun. But in Young Avestan, the order of the occurrence of noun and adjective is more
than the opposite order. Examples (67) and (68) are from Gatha Avestan.
(67)
(68)
(p. 72)
(b) In Old Avestan, the order of genitive (G) before noun (N) is more than the order of
noun before genitive. But in Younger Avestan the order of noun before genitive is more
frequent than the order of genitive before noun. Examples (69) and (70) substantiate the
two orders.
(69)
(70)
(c) In Old Avestan the frequency of the occurrence of the prepositions (Prep) is more than
the frequency of the occurrence of the postpositions (Post). This preference of the
occurrence is much higher in Younger Avestan. In addition to the preference of the
frequency of occurrence, the number of prepositions is more than the number of
postpositions in both Old and Younger Avestan. Representative examples are provided in
(71) and (72).
Page 25 of 48

(71)
(72)
(d) The dominant word order in both Old Avestan and Younger Avestan is SOV though
other orders are also found. In example (73) the non-dominant order SVO is used,
whereas in example (74) the dominant order is seen.8
(73)
(74)
(p. 73)
Based on the mentioned observations between Old Avestan and Younger Avestan,
Friedrich concluded that despite the occurrence of the verb in sentence final position the
language has prepositions and it tends to move towards a language with typologically VO
behaviour (p. 46). Friedrich has related these peculiarities to the role of language
contact. More specifically, he has considered the south-west of Iran, where Achaemenid
court was located, as a sprachbund in which Akkadian, which is a Semitic language, at
the same period has the same unusual and inconsistent typological characteristics. He
points out that, there was contact between scholars and aristocrats of these two
civilizations, one Indo-European and the other Semitic. Friedrich hypothesizes that
another reason for the more frequency of VO characteristics in Younger Avestan would
Page 26 of 48

have been the existence of bilingualism between this language and Greek, a language
which was used for two centuries before and after the Christian era in that area. He even
mentions Aramaic, another Semitic language, which was widely used at the time, was
known to the clergies in the Achaemenid court, and was also used to record certain
Avestan texts as a more probable source of the more frequent occurrences of the VO
characteristics in Younger Avestan (p. 46). For more information on Aramaic, refer to
Chapter 2.
The analysis proposed by Friedrich to the effect that there was a drift between Old
Avestan and Younger Avestan and that the drift was probably triggered by language
contact with Akkadian, Aramaic, and Greek illustrates an instance of syntactic borrowing
and convergence. This analysis has been noted and cited by Harris and Campbell (1995:
138–40). I add to these hypotheses the important role that Elamite, an isolated language
spoken in the Mesopotamia whose genetic affiliation is not settled,9 could have had.
Hawkins (1983: 324) has summarized the typological characteristics of Elamite as a SOV
language which had postpositions but had its adjective after the noun and the genitive
after its head noun as well.10 For more information on Elamite, refer to Chapter 2.
(p. 74) 3.2.2.2.5 Word order summary

In sections (3.2.1) and (3.2.2), I addressed the previous scholarship on word order studies
in Persian. I then adopted a typological approach and described the word order
parameters of Modern Persian, Old Persian, and Middle Persian (Pahlavi, further
discussed in Chapters 2 and 11). The results were summarized in Tables 3.1, 3.2, and 3.3.
The immediate conclusion of the study is that Persian in all its three stages has been a
mixed-type language with respect to the word order parameters surveyed. This general
conclusion was shown to be valid for Avestan as reported in Friedrich (1975) as well.
In the next section, I describe another major typological characteristic of Modern Persian:
Agreement.
3.2.3 Agreement
Agreement of verb with the person and number features of the subject is a strong
morphosyntactic property of Modern Persian.11 In this language, the subject of
intransitive verbs (namely S) and the subject of transitive verbs (namely A) are
systematically encoded as verbal agreement suffixes. On the other hand, the object of
transitive verbs (namely O), if cross-referenced, is encoded as pronominal enclitics. This
system of agreement means that there is an alignment between S and A. But O, if
encoded, it is encoded via a pronominal enclitic. Therefore, the language is a nominative-
accusative type with respect to this parameter. Verb agreement is further discussed in
Chapters 7, 8, 9, 10, and 15 and pronominal enclitics are further discussed in Chapters 6,
8, and 9. My assumption that cross-referencing enclitics are actually ‘agreement’ markers
is compatible with the following proposal which I quote from Anderson (2005: 239–40):
Page 27 of 48

I propose to regard enclitic pronominals as in fact a form of agreement, differing

from verbal agreement only in whether the functional content is realized as the
morphology of a phrase or a word. This is not a novel proposal … The overt
manifestation of agreement material by pronominal special clitics can appear in
various places, as we have already seen. Clitics may appear with reference to the
beginning of the clause—in second position …
In the context of the past tense Agent clitics, in short A-past clitics, in Kurdish (a north-
western Iranian language, which is further discussed in Chapter 8, Haig (2008: 288–9)
says:
the A-past clitic in fact exhibits the features of an agreement marker, i.e., it
obligatorily cross-references a different constituent, and is prosodically dependent
rather than independent. As Corbett (2003) points out, the distinction between
agreement markers and pronouns is often a gradual one; the A-past clitics of
Suleimani [Kurdish] are a case in point. Below I will point out further features of
the A-past clitics which bring them closer to a canonical form of agreement.
(p. 75)
It may be noted that, in those Iranian languages which have grammaticalized a split
agreement system, a pronominal clitic is used to cross-reference an O in the present
tense domain and it is similarly used to cross-reference the A in the past tense domain
(namely an A-past clitic). Now I return to agreement in Persian12 and as a first step I
concentrate on the data presented in (75)–(80) below. In this set, examples (79) and (80)
contain a pronominal enclitic which encodes the O.
(75)
(76)
(77)
(78)
Page 28 of 48

(79) 13
(80) 14
Examples (75)–(80) have simple verbs. Persian also contains a large number of compound
verbs, equally called complex predicates (see Dabir-Moghaddam 1997). Compound verbs
are further discussed in Chapters 2, 6, 7, 8, 9, 10, 15, 17, and 19. In these verbs,
agreement suffixes appear on the light verb. Pronominal enclitics cross-referencing the O
take the non-verbal part of the compound verb as their host. The occurrence of the
enclitic on the light verb is not impossible but it certainly sounds very casual and sloppy
and is judged to be low in acceptability. Examples (81)–(88) illustrate the mentioned
points.
(81)
(p. 76)
(82)
(83)
(84)
Page 29 of 48

(85)
(86)
(87)
(88)
With present perfect forms of the verbs, the agreement suffixes attach to the participle
suffix -e which is added to the past stems. Examples (89)–(94) support this observation.
These examples correspond to examples (76), (78), (80), (82), (84), and (86) respectively.
(89)
(90)
(91)
(92)
Page 30 of 48

(93)
(94)
(p. 77)
The past perfect forms in Persian are constructed periphrastically (see also Chapter 2). In
this construction, the past tense form of the budæn ‘to be’ copula is added to the past
participle form of the main verb. The agreement suffixes are attached to the copula.
Examples (95)–(100) are the past perfect counterparts of examples (89)–(94).
(95)
(96)
(97)
(98)
(99)
Page 31 of 48

(100)
The future tense in Persian is also formed periphrastically. In this construction, the future
tense auxiliary xah ‘will’, which is the grammaticalized form of the lexical verb xastæn ‘to
want’, takes agreement suffixes followed by the short infinitive form of the main verb
which is identical with the past stem. With compound verbs, the conjugated future
auxiliary appears between the non-verbal and verbal parts of the compound verb.
Examples (101)–(105), which correspond to our previous patterns, show the future tense
formation.
(101)
(102)
(103)
(104)
(105)
(p. 78)
There are two other points which deserve mentioning about agreement in Persian. The
first point is that number indexation of the subject of all verbs is sensitive to animacy
though this sensitivity has weakened through history. Today there is a tendency to index
Page 32 of 48

plurality of the subject in the verb, irrespective of the animacy of the subject. Examples
(106)–(109) show the two available choices.
(106)
(107)
(108)
(109)
Sedighi (2005, 2011) proposes that the verbal agreement restriction for plural inanimate
subjects in Persian does not reside in syntax. She suggests a postsyntactic morphological
treatment within Distributed Morphology for the restriction in which there is a feature
mismatch between number and animacy resulting in deletion or impoverishment of the
number. She points out that although Modern Persian exhibits an optionality with respect
to verbal agreement of plural inanimate subjects, in sentences with an animate direct
object the non-agreeing form is used favourably. (p. 79)
(110)
The second point is that there is a relict construction in Modern Persian which deserves
attention. This construction which I analyse as a non-canonical subject construction
contains a two-place verb. The verbs which appear in this construction are ‘(dis)liking’
verbs. Examples (111) and (112) substantiate this observation. In example (111), the first
argument is encoded by a pronominal enclitic and its second argument is a subordinate
clause. I propose that in this non-canonical subject construction, the first argument is
Oblique and the second argument is Direct. The first argument is the syntactic (namely
logical) subject of the predicate and the second argument (namely the subordinate
clause) is the morphological subject of the predicate.
Page 33 of 48

(111)
15
In example (112), the first argument is encoded via an enclitic, the second argument is
marked with the preposition æz ‘from’, and the verb has the third-person singular
conjugation.
(112)
16
There are a group of semantically related predicates to the ‘dis(liking)’ verbs in Persian
which I call predicates of ‘feeling’. These predicates are exemplified in (113) and (114). In
these examples the predicate consists of an adjective and the verb budæn ‘to be’ and
šodæn ‘to become’. The adjective is the host for the pronominal enclitic which encodes
the only argument of the clause.
(113)
(114) 17
Following Dabir Moghaddam (1997), Sedighi (2009, 2011) considers the psychological
state as the subject of the sentence calling them psychological constructions. Adopting a
(p. 80) Minimalist framework, she argues that the psychological state is the theme
argument which induces agreement on the verb and the experiencer is the mental
location. By nature, the psychological state is an entity in third-person singular form,
inducing third-person singular agreement on the verb. Thus the assumption that the verb
does not induce agreement on these constructions is unfounded (see Chapters 8 and 15
for more discussion on psychological constructions).
Page 34 of 48

To conclude, Modern Persian has grammaticalized a nominative-accusative system in

terms of agreement typology. This is the general state-of-the-art of the language in its
present phase. Putting this state-of-the-art in its historical context, it is worth mentioning
that in regard to agreement, this language shifted from a uniformly nominative-
accusative type in Old Persian (Kent 1953: 83), to a tense-sensitive split ergative type in
Middle Persian (Skjærvø 2009: 227), and again back to a nominative-accusative type in
New Persian. Meanwhile, Modern Persian is among one of the few Iranian languages of
Iran which has a uniform agreement type. The majority of the Modern Iranian languages
show various patterns of tense-sensitive split agreement systems (see Dabir-Moghaddam
2013).
3.2.4 Morphological typology
Frommer (1981) has described Modern colloquial Persian as being morphologically

‘agglutinative’ (p. 8). He has not provided any justification for this typological
classification. In this section, I will argue that Modern Persian is basically an analytic
language.
First, Aristar (1991: 7, 31) has stated that as adpositions are the most productive part in
the creation of structures in the languages, therefore adpositions in general and new
adpositions in particular are the best possible test for determining the typological
character of a language. The existence and creation of a large number of prepositions in
Modern Persian not only indicate that the language is from a syntactic typological
perspective a prepositional language, but it also shows that in terms of morphological
typology it is analytic. A number of prepositions of Modern Persian are listed below. This
list contains primary, secondary, and compound prepositions. The primary prepositions
are: æz ‘from’, bæra-y-e ‘for’, dær ‘in’, ba ‘with’, bi ‘without’, be ‘to’, ta ‘until’, bær ‘on’,
bedun-e ‘without’. The secondary and compound prepositions are: zir-e ‘under’, bala-y18 -e
‘over’, kenar-e ‘near’, dærun-e ‘inside’, ǰelow-e ‘in front of’, pæhlu-y-e ‘beside’, daxel-e
‘inside, into’, beyn-e ‘between’, næzd-e ‘near; with’, næzdik-e ‘close; near’, ruy-e ‘on’, tebq-
e ‘according to’, ætraf-e ‘around’, xareǰ-e ‘outside’, birun-e ‘out(side)’, pošt-e ‘behind’,
miyan-e ‘between’, suy-e ‘at; towards’, bæhr-e ‘sake; for’, payin-e ‘down; below’, mesl-e
‘like’, dowr-e ‘round; around; about’, væsæt-e ‘in the middle of’, piš-e ‘before’, sær-e ‘head
of; top of; at; tip of’, tævæssot-e ‘by’, bæna be ‘according to’, be onvan-e ‘as’, be sæbæb-e
‘because of’, be su-y-e ‘towards’, be xater-e ‘due to’, be næzær-e ‘according to’, be
mænzur-e ‘because of’, be væsile-y-e ‘by’, be læhaz-e ‘with respect to’, æz ǰaneb-e ‘on
behalf of’, ba voǰud-e ‘despite’, dær bare-y-e ‘about’, dær xosus-e ‘with respect to’, be
ræqm-e ‘despite’, ru be ruy-e ‘in front of’, dær bærabær-e ‘against’, be komæk-e ‘with the
help of’, bær ru-y-e ‘on’, be ellæt-e ‘because of’, bær færaz-e ‘on top (p. 81) of’, æz ruy-e
‘over’. It should be mentioned that the -e morpheme is a head marker linker in Persian.
The occurrence of this linker with the majority of the Modern Persian prepositions is a
clear clue to their historical source. These prepositions were head nouns in the genitive
constructions (NG) which are grammaticalized as prepositions. A number of these
Page 35 of 48

prepositions are still used as nouns as well (e.g. xareǰ ‘abroad’, pošt ‘back’, komæk ‘help;
aid; contribution’, mænzur ‘purpose; aim’, sær ‘head’).
Second, in Old Persian the suffix -ya was the passive morpheme which used to be affixed
to active verbs (Kent 1953: 73, section 220, and 88, section 275). In Middle Persian this
suffix is -īh or -yh (Heston 1976: 161). This suffix is absent in the early New Persian texts
(ibid.). For more information on Early New Persian see Chapters 2 and 4. In New Persian,
passive is formed via the combination of the past participle of the main verb or an
adjective with the passive auxiliary šodæn ‘to become’ (e.g. koš-t-e šod-Ø ‘s/he was killed
(lit. kill-PST-PTCP become.PST-3SG) and agah šod-æm ‘I was informed’ (lit. informed
become.PST-1SG)). ‘In Pahlavi, the verb šudan appears … always as a verb of motion …
However, šudan does not occur with adjectives in compound constructions … nor does it
occur with participles of transitive verbs’ (Heston 1976: 183). Its use as a motion verb in
Middle Persian is seen in šawed ‘he goes’ (Brunner 1977: 212). These observations
suggest that the emergence of šodæn as a passive auxiliary verb was an instance of
grammaticalization and furthermore this new passive construction is a periphrastic
passive, which shows the tendency of this language towards an analytic-type language.
For more discussion on passive construction, see Chapters 7 and 15.
Third, the productivity of the causative suffix -ēn in Middle Persian is reduced to its use in
about seventy verbs, in the form of -an, in Modern Persian (e.g. dæv-id-æn ‘to
run’ (consisting of ‘run-PST-infinitive’) vs. dæv-an-(i)d-æn ‘to cause to run’; xor-d-æn ‘to
eat’ vs. xor-an-(i)d-æn ‘ to feed’). This process of causative formation is not productive in
Modern Standard Persian. Instead, the periphrastic causative verbs such as baɁes šodæn
‘to cause’, mæǰbur kærdæn ‘to force’, gozaštæn ‘to let’ are used. This new tendency is
also another indication of the analytic nature of Modern Persian (causative forms are also
discussed in Chapters 2 and 7).
Fourth, a known fact about Modern Persian is that it contains about 250 simple verbs
both in spoken and written language. The general tendency is the formation of compound
verbs mainly consisting of a substantive and a light verb (also discussed in Chapters 3, 7,
8, 9, 10, 17, and 19). This is also a clue to the analytic morphology of the language. This
tendency can be observed in the Middle Persian as well. Rastorgueva (1966), translated
into Persian by Shadan in (1968), has reported that the number of verbs in Middle Persian
is not many and has added that this scarcity is compensated by the noun plus verb
combination and that in the majority of the cases kærtæn ‘do; make’ constitutes the
verbal part in the combination, e.g. zēn kærtæn ‘to saddle’, rošnih kærtæn ‘to light’,
azmayišn kærtæn ‘to test’ (Shadan 1968: 129). Interestingly, the number of simple verbs
formed based on Arabic loan words is about ten verbs (e.g. fæhm-id-æn ‘to understand’,
bælɁ-id-æn ‘to swallow; to devour’, ræqs-id-æn ‘to dance’). However, there are thousands
of substantives which are borrowed from Arabic and a large number of them are used as
the non-verbal constituent of the compound verbs in Modern Persian (e.g. motaleɁe
kærdæn ‘to study’ (lit. study-do), hæds zædæn ‘to guess’ (lit. guess-hit), tæqdim daštæn
‘to present’ (lit. presentation-have), tul kešidæn ‘to take time’ (lit. length-draw), tælaq
dadæn ‘to divorce’ (lit. divorce-give), tæhvil gereftæn ‘to take delivery of’ (lit. delivery-
Page 36 of 48

take), qute xordæn ‘to float’ (lit. floating-eat), mærdud šodæn ‘to fail’ (lit. failure-become),
tæslim (p. 82) nemudæn ‘to surrender’ (lit. submission-do), be etmam resandæn ‘to finish;
to bring to an end’ (lit. to-end-carry), be xater aværdæn ‘to recall’ (lit. to-mind-bring),
tæhlil ræftæn ‘to be exhausted’ (lit. exhaustion-go), tæxfif yaftæn ‘to decrease’ (lit.
abatement-find)).
Fifth, Modern Persian uses the future auxiliary verb xastæn ‘will’ which conjugates for
person and number, e.g. xah-æm ræft ‘I will go’, and examples (101)–(105). As it was
pointed out with respect to these examples, this auxiliary is the grammaticalized form of
the verb xastæn ‘to want’. Example (115) illustrates the lexical use of this verb.
(115)
This grammaticalization process is already observed in Early New Persian as exemplified

in extract (116) below which is from tarix-e beyhæqi (fifth century AH/ eleventh century
AD):
(116)
The grammaticalization of an auxiliary to mark future tense is another indication of the

tendency of this language towards analyticity.19
Sixth, there is a grammaticalization process in progress in Modern Persian which also

supports the analytic morphology of this language. The lexical verb daštæn ‘to have’ is
also grammaticalized as an auxiliary verb to mark progressive aspect. Examples (117)
and (118) show these two uses respectively. This auxiliary requires the incomplete aspect
prefix mi- on the main verb. The auxiliary and the main verb are both conjugated for
person and number agreement.
(117)
Page 37 of 48

(118)
Seventh, Greenberg (1954/1990) has proposed a quantitative method which has been
well received by typologists. He proposed that if the number of the morphemes making a
sentence is divided to the number of words used in that sentence and a number between
1 and 1.99 is obtained then the language will be analytic. If the resulting number is
between 2 and 2.99, then the language will be inflectional provided that the morphemes
making the words are indivisible. If the number is between 2 and 2.99 and the
morphemes can be separated, then the language is agglutinative. If the number is 3 or
more than that, then the language (p. 83) is polysynthetic/incorporating (Katamba 1993:
60). I applied this method to a randomly selected number of sentences in Persian and the
mean which I obtained was 1.56. This mean clearly indicates that this language is
analytic.
Finally, I found Darmesteter’s (1883: 117) statement that among Indo-European

languages, New Persian has become more analytic than the rest and has become
structurally simpler highly relevant to my conclusion that Modern Persian is
predominantly an analytic language.
3.3 Dialects
Persian is spoken in Iran, Afghanistan, and Tajikistan as the official and national
language. In Iran, the language is spoken in various regions throughout the country.
Thus, there are many Persian dialects. The Persian of Iran, known as Farsi, the Persian of
Afghanistan, called Dari, the Persian of Tajikistan, called Tajiki, Isfahani Persian, Gha’eni
Persian, Kashani Persian, Mashhadi Persian, Shirazi Persian, and Kermani Persian are a
number of Persian dialects currently spoken (see also Chapters 2, 11, 13, 14, and 19).
Perry (2005) in his A Tajik Persian Reference Grammar mentions ‘Tajik Persian, or Tajik
for short (zaboni tojīkī, zaboni forsii tojik), is the variety of New Persian used in Tajikistan
and parts of Uzbekistan, including the cities of Bukhara and Samarkand’ (p. 1). He adds
‘Tajik dialects may be divided broadly into two groups: Northwestern and Southwestern,
corresponding in rough topographical terms to the lowlands and highlands respectively of
the Oxus basin’ (p. 3). Windfuhr and Perry (2009), in a chapter devoted to ‘Persian and
Tajik’, focus on ‘Modern Standard Persian and Modern Standard Tajik’ (p. 416). They
point out the fact that ‘[b]oth evolved from Early New Persian’ (p. 416) and add:
Page 38 of 48

Western Persian [i.e. Modern Standard Persian] has typologically shifted

differently from Modern Tajik which has retained a considerable number of Early
Eastern Persian features, on the one hand, and has also assimilated a strong
typologically Turkic component, on the other hand.
(Windfuhr and Perry 2009: 416)
A selected number of dialectal differences (phonetic, phonological, lexical, and/or

grammatical) between Modern Standard Persian, shortly Persian, and Tajik Standard
Persian, briefly Tajik, are mentioned in the following paragraphs.
(a) In possessive construction, possession is expressed by Ezafe constructions: ‘Persian

māl-e, lit. ‘possession of’, Tajik az on-i ‘from that of’ followed by an independent pronoun
[or a noun].’ (Windfuhr and Perry 2009: 435). In example (119) which I directly quote
from Windfuhr and Perry (p. 435), in the first line the Persian construction is used
whereas in the second line the Tajik construction is presented.
(119)
(p. 84)
(b) Interrogatives (Windfuhr and Perry 2009: 438):
(120)
(c) Denominal verbs are verbs which ‘may be formed by suffixing regular -id to the noun
or nominal stem: nām-, nāmid- [Persian]/ nom-, nom-id- ‘name’ (<nom ‘name’)
[Tajik]’ (Windfuhr and Perry 2009: 447). Windfuhr and Perry (p. 447) note that the
denominal verb favt- / favt-id ‘pass away’ which is derived from the Arabic action noun
favt- ‘death’ is used in Tajik. I should add that this noun has no denominal verb
counterpart in Persian. In Persian the compound verb fowt kærdæn ‘to pass away’ (lit.
death-do) is in use. Windfuhr and Perry make it clear that ‘[i]n both Persian and Tajik this
procedure is no longer very productive.’ (p. 447).
(d) Derived causative verbs are formed by suffixation of -on to verb stems in Tajik.
Derived/ morphological causative formation in Persian was mentioned in section 3.2.4
(under point three). ‘While in Persian derived causativation is only partially productive, in
Tajik it is fully so’ (Windfuhr and Perry 2009: 448). The causative verbs only found in Tajik
are as follows (p. 448): dūz-/ dūxt- ‘sew’, dūz-on-/ dūz-on-id- ‘have something sewn’;
Page 39 of 48

transitive denominal and adjectival verbs such as mukofot-on-, mukofot-on-id

‘reward’ (<mukofot ‘reward’), elektr-on-, elektr-on-id- ‘electricity, power’ (<elektr[ika]
‘electric’), and ‘causativation of transitive compound verbs with kun-, kard- ‘do, make’:
remont kun-on-, kun-on-id- ‘have (something) repaired’.
(e) ‘The pluperfect, or distant past, is formed from the past participle and the simple past
of bud-an “be”’ (Windfuhr and Perry 2009: 455). Items (121) and (122) are among the
examples that the authors have provided (pp. 455, 456).
(121)
(122)
(f) In both Persian and Tajik the pronominal enclitics which cross-reference the direct
object may be attached to the end of a transitive verb, e.g. did-æm=etan (Persian
pronunciation) ‘I saw you’. But ‘[i]n spoken Tajik these forms may further add -a, a reflex
of the (redundant) object marker -ro … : megiran-ša (me-gir-and-aš-ro) ‘they’ll catch
him’.’ (Perry 2005: 114–15). This option is absent in Persian. (p. 85)
(g) In Tajik ‘a pronominal enclitic attached to the thing possessed may also express
temporary or alienable possession by a person: safar pul-aš bošad agar, metiyam-ta ‘if
Safar has (any) money, I’ll give it you’ (‘if Safar to-him is money’) (Perry 2005: 115). This
option is also non-existent in Persian.
(h) Another peculiarity of Tajik is its use of ‘conjectural mood’ (Perry 2005: 243). This
mood ‘expresses an unsubstantiated conjecture or assumption. It is constructed upon the
suppleted form of the past participle … plus the suffix -agī, in three tenses: past, present,
and present progressive’ (Perry 2005: 243). As an illustration, I quote an example from
the ‘past conjectural’ (p. 244). In Perry’s words
Page 40 of 48

This tense takes two forms: (1) personal endings resembling an elided form of the
Independent present of ‘to be’, hastam, [‘I am’], etc., giving karda-gi-st-am, [‘I
suppose he did’] etc., the ‘Standard Form’; (2) personal endings the same as the
personal enclitic present of ‘to be’, giving karda-gi-am etc. the ‘Short Form’. The
3rd person singular karda-gi-st [‘he might have done’] is common to both forms.20
(Perry 2005: 244)
Windfuhr and Perry (2009) have also discussed the ‘conjectural mood’ (pp. 466, 467).
Below I quote their description and examples of the ‘present-future conjectural’ (p. 467):
‘Constructed with the imperfective prefix me-, this form expresses a conjecture about a
potential or a current (habitual or iterated) action’ (Windfuhr and Perry 2009: 467). They
then have provided the following two examples which I quote in (123) and (124). The only
noteworthy adjustment in the glossing that I have made is that I have substituted my
abbreviation for their gloss of the verbal prefix me-.
(123)
(124)
(i) Also noteworthy is Windfuhr and Perry’s observation about the ‘TAJIK -ak’ (p. 451): ‘In
some Tajik dialects, such as Varzobi, occur forms with an apparent reflex of the nominal
diminutive affix -ak, with affective connotations’ (Windfuhr and Perry 2009: 451). The
example they have immediately provided for this observation is quoted in (125).
(125)
Ravaghi and Aslani (2013 [1392]) zabān-e fārsi-y-e afghānestān (dari) ‘Afghan Persian
(Dari)’ in a short section devoted to ‘linguistic features’ mention a number of these
features (p. 86) (pp. 39–43). Some of them are listed below. As the examples are not
transcribed, I have added the transcriptions, glossese, and translations. (a) Derived
infinitive causatives are formed through the suffixation of -andæn/ -anidæn to the past
stem, e.g. šekæst-andæn ‘to break’, suxt-andæn ‘to burn’, rixt-andæn ‘to pour’ (p. 39). (b)
Page 41 of 48

The past participle combined with the modal verb tævanestæn ‘can’ construction (p. 40)
as exemplified in (126).
(126)
I should mention that the counterpart of example (126) in Modern Standard Persian of
Iran (Farsi) is sentence (127).
(127)
(c) The aspect prefix mi- is attached to the preverbal constituent of verbs as exemplified
in (128) below (p. 40). In Modern Standard Persian (Farsi), this prefix attaches to the verb
stem itself and never to the preverbal constituent.
(128)
From among the Persian dialects spoken in Iran, I will briefly mention a few features of
two of them. Isfahani Persian, which was briefly discussed in Chapter 2, shows
grammatical differences with Standard Persian. For example, the Ezafe morpheme in
Isfahani Persian is pronounced -i after a consonant and is dropped after a vowel, e.g. pul-i
mæn ‘my money’ (lit. money-EZ I), baba mæn ‘my father’ (Kalbasi 1991 [1370]: 87). Also,
the third-person plural enclitic =ešun serves as subject agreement marker. Thus, an
intransitive verb might be doubly marked for subject agreement, first with a verbal suffix
and second with an enclitic. The following examples are cited from Kalbasi (p. 91): niss-
ænd=ešun ‘they are not here’ (lit. NEG.exist-3PL=3PL), ræft-ænd=ešun ‘they went’ (lit.
went-3PL=3PL), miya-nd=ešun ‘they come’ (lit. INCOMPL.come-3PL=3PL).
Zomorrodiyan (1989 [1368]) is a description of the Persian dialect of Gha’en spoken in

Khorasan in east of Iran. Two interesting grammatical differences between Gha’eni
Persian and Standard Persian are presented here. For plural formation, the suffix -u (cf.
Persian -ha ~ -a) is attached to the singular noun. When this suffix is added, some
phonetic alterations in the noun stem might take place. Examples (129)–(132) support
this observation (Zomorrodiyan 1989 [1368]: 42).
Page 42 of 48

(129)
(130)
(131)
(132)
(p. 87)
Also in Gha’eni the verbal negative prefix nɛ- attaches to the verbal aspect prefix be-, as
shown in examples (133) and (134) (Zomorrodiyan 1989 [1368]: 94). This verbal prefix is
absent in Modern Standard Persian. In Modern Standard Persian, the verbal prefix be- is
the subjunctive marker. The use of the verbal prefix be- with preterite in Gha’eni is
reminiscent of its use in early New Persian. MacKinnon (1977) has carefully addressed
the conditions for the occurrence of the New Persian verbal prefix bi-. Windfuhr and Perry
(2009: 451) have described the uses of be-/bi- in Persian and Tajik.
(133)
(134)
Phonologically, there are two highly noticeable differences between Gha’eni and Modern
Standard Persian (Zomorrodiyan 1989 [1368]: 16 and 17). The first difference is that /ɛ/
and /e/ are different phonemes in Gha’eni (as can be seen in examples (129) and (131)–
(134)). This phonemic distinction is non-existent in Modern Standard Persian. The second
difference is that length is distinctive in Gha’eni as witnessed by examples (135)–(142).
These examples show that /e/, /ē/, /o/, and /ō/ are distinct phonemes.
(135)
(136)
(137)
(138)
(139)
(140)
(141)
(142)
Page 43 of 48

3.4 Summary
In this chapter, I addressed two general topics: (a) Typological features of Persian and (b)
Dialects of this language. The typological features which were dealt with were word
order, agreement, and morphological traits. After a brief review of the previous studies
within different frameworks on the Persian word order, I adopted the typological
framework (p. 88) of Dryer (1992, 2007, 2011, 2013a) and applied the twenty-four
typological parameters which he has presented in his research. My findings suggest that
Modern Persian has grammaticalized word order parameters of both OV- and VO-type
languages. I concluded that Modern Persian should be considered as a mixed type
language in terms of word order parameters. I argued that the evidence from the
previous stages of this language, more specifically Old Persian and Middle Persian show
that these languages had mixed word order parameters as well. I, like a number of other
researchers, assume that this observation could be explained to have been the
consequence of a strong language contact situation, most probably with Elamite. The
mixed word order of Persian has theoretical implications for both functional-typological
approaches and formal approaches to language. For typology, it suggests that mixed type
is a type (not necessarily a transitional stage) which can last for several centuries. For
formal linguistics, it suggests that a binary parametric view of language, more specifically
the head-complement parameter, is empirically untenable. The reason for why this mixed
type has emerged diachronically can also have implications for theories of language
change. As for the agreement system of Modern Persian, I showed that this language has
grammaticalized a uniformly nominative-accusative type. In this respect, this language is
among the few Iranian languages which reveal this type. The majority of the other Iranian
languages of Iran are split in terms of agreement. I described the morphological typology
of Modern Persian as being analytic. This conclusion was supported by a number of
arguments. Finally, some of the phonological and grammatical characteristics of four
Persian dialects were enumerated. These dialects are Tajik Persian, Afghan (Dari)
Persian, Isfahani Persian, and Gha’eni Persian.
Acknowledgements
Page 44 of 48

I take this opportunity to sincerely thank Anousha Sedighi and Pouneh Shabani-Jadidi for
their initiative to publish this volume, the Oxford Handbook of Persian Linguistics, a well-
deserved topic. They very professionally approached the specialists in the field of Persian
linguistics and were highly proficient in bringing this enterprise to a successful end. I
also express my sincere respect to Oxford University Press for undertaking, funding, and
supporting this project. I appreciate the very careful review of my chapter by the first
anonymous reviewer. I received very useful and valuable comments from the reviewer. I
am also thankful to the second anonymous reviewer of this chapter for her/his noteworthy
suggestions. I also received very helpful feedback from Oxford University Press. I am also
very grateful to Thomas Jügel for sharing his expertise with me in adding interlinear
glosses to my Middle Persian and Avestan data. However, any remaining shortcomings
would be my own responsibility.
Notes:
(1) Ke ‘that’ is the Persian complementizer.
(2) This is a Persian translation of Mahmoodov (1981), originally published in Russian.
(3) I am indebted to Matthew Dryer for discussing word order issues with me which
began from my first visit to Max Planck Institute for Evolutionary Anthropology (Leipzig)
during July and August 2001. He also kindly provided additional statistics from his huge
and fascinating data base to me in that visit and my other visits till summer 2014.
(4) Frommer (1981) reports the number of postposed subjects to the postverbal position in
his corpus is very few and that they are found only in his data of spontaneous colloquial
speech and very rarely in radio broadcasts but even a single token of postposed subject is
not observed in the dialogue parts of the stories of Sādeq Čubak (p. 140). Roberts (2009)
also announces that in all his corpus of 16 narratives only a single occurrence of a
postposed subject to the postverbal position was found (p. 138).
(5) Frommer (1981) presents interesting data on the postposed direct objects to the
postverbal position. In his corpus of spontaneous colloquial speech, he has found a total
of 646 occurrences of direct objects (with the postposition -ra and without it, altogether)
and the total number of postverbal direct objects were 27 (with -ra 21 and without -ra 6).
In the dialogue parts of the stories by Sādeq Čubak, Frommer has observed that out of
the total of 500 occurrences of direct object only 3 (with -ra 1 instance and without it 2
instances) were postposed to the postverbal position (p. 143). Roberts (2009) noticed two
instances of the occurrence of a direct object to the postverbal position in a ‘spoken text’
and reports three instances of the occurrence of postverbal direct object in a ‘written
text’ (pp. 139 and 140).
(6) The right reading is [an] ‘I.NOM’ for the Middle Persian Pahlavi (Thomas Jügel p.c.).
Page 45 of 48

(7) Treating nī as a preposition is dubious. An appropriate example would be with the

preposition hačā ‘from’ which can be a preposition or a postposition (Thomas Jügel p.c.).
(8) Thomas Jügel kindly provided example (74).
(9) Starostin (2002) has addressed the genetic affiliation of the Elamite language. After he
expresses his dissatisfaction with the existing theories on the Elamite relations, he
proposes a more remote affiliation. He mentions
[a]ll the critique presented above seems to convince me that not only is there not
enough evidence to establish a direct Elamo-Dravidian or Elamo-Afroasiatic at the
present time, but that it is simply a near-impossible task to establish a close
relationship of Elamite with any of the currently known families or macrofamilies.
(Starostin 2002: 151)
The author then reports on his lexicostatistic comparison of Elamite, on one hand, and its
most important neighbouring macrofamilies, on the other, and announces the following
position:
At this point, I would probably describe Elamite as a ‘bridge’ between Nostratic

and Afroasiatic, perhaps a sole remnant of an old subbranch of the global
‘Eurasian’ or ‘Boreal’ family that also includes Nostratic and Afro-Asiatic.
(Starostin 2002: 169)
(10) From a more general perspective, I found Henkelman (2008)’s view on the
contribution of Elam to Persian culture quite important and relevant. Henkelman (2008:
8) makes the following remarks:
The Elamite state re-emerged [after the Assyrian raids] and continued to exist
and, apparently, prosper, until the rise of the Persian Empire. This means that
throughout the pre-history of that Empire, Elam was a tangible entity, not a name
from the past. The Teispids and Achaemenids did not emerge from a cultural void,
but must have been influenced by the continuing cultural and political radiation of
Elam.
(11) Elsewhere, I have stated the following remarks about the importance of agreement in
the Iranian languages:
The Iranian languages spoken in Iran show a very intriguing peculiarity. They all
contain a rich agreement system. So, I propose that agreement should be viewed
as a strong typological parameter in our characterization of these languages.
(Dabir-Moghaddam 2012: 36)
Page 46 of 48

(12) Sedighi (2005) and (2010) are detailed studies on subject-predicate agreement and
agreement restrictions in Persian.
(13) In natural and less formal speech, this stem is pronounced [xun] ‘read’. This
phonological alternation takes place when the low back vowel [a] precedes the nasal
consonants [n]/[m].
(14) See the previous note.
(15) Colloquially pronounced [miyad].
(16) See the previous note.
(17) Colloquially pronounced [miše].
(18) The palatal glide [y] in Persian is a hiatus filler.
(19) A detailed description of the expressions of future in Classical and Modern Persian is
presented in Jahani (2008).
(20) The difference in the translation provided for the first-person singular form and the
third-person singular form is pointed out by Perry:
Since the speaker does not usually conjecture about his own actions in past or
present, the 1st person is not often encountered in practice, and will not be the
one translated in the paradigms below.
(Perry 2005: 243)
Mohammad Dabir-Moghaddam received his PhD in linguistics from the University of

Illinois at Urbana-Champaign in 1982. He is Professor of Linguistics in Allameh
Tabataba’i University (Tehran). He is a permanent member of the Academy of Persian
Language and Literature. He is the author of Theoretical Linguistics: Emergence and
Development of Generative Grammar, Studies in Persian Linguistics, Typology of
Iranian Languages (2 volumes), and a number of articles.
Page 47 of 48

Page 48 of 48

Phonetics

Phonetics
Golnaz Modarresi Ghavami

Subject: Linguistics, Phonetics and Phonology, Languages by Region
This chapter discusses the articulatory and acoustic properties of the sound system of
Standard Modern Persian. It starts with a brief review of early work on the sound system
of New Persian and its development into Modern Persian. The second section examines
consonants and vowels in Standard Modern Persian. In this section, issues such as place
and manner of articulation of consonants, Voice Onset Time and its importance in
distinguishing voiced and voiceless obstruents, the acoustics of glottal consonants,
sibilant and non-sibilant fricatives, and rhotics are discussed. The section on vowels
addresses vowel space, vowel length, and the acoustics of diphthongs in Standard
Modern Persian. The phonetics of the suprasegmental features of stress and intonation
are the topic a final section in this chapter.
Keywords: phonetics, acoustics, consonants, vowels, stress, intonation, Modern Standard Persian, New Persian
4.1 Introduction
PHONETICS is the scientific study of speech sounds in terms of the way they are
produced (articulatory phonetics) and received (auditory phonetics), as well as their
acoustic properties (acoustic phonetics). Since it is purely physiological and physical in
nature, phonetics is considered by many to be a field of study related to linguistics, but
not a part of it. Within linguistics it is closely related to phonology, which is the study of
the way speech sounds are organized into systems in particular languages.
This chapter investigates the phonetic aspects of the sound system (phonology) of
Standard Modern Persian. In doing this, the phonemes of Persian are introduced and
Page 1 of 27
Subscriber: Freie Universitaet Berlin; date: 16 October 2018

Phonetics
their articulatory as well as acoustic properties are discussed. The phonetic aspects of
suprasegmental features of stress and intonation are also the topic of a final section in
this chapter.
Modern Persian is the standard language of Iran, itself a continuation of New Persian.
New Persian dates back to 800 AD and is described in terms of two main historical
periods: Early New Persian (eighth to twelfth century AD) and Modern Persian (thirteenth
century AD to the present). For more information about Early New Persian, refer to
Chapters 2 and 3. Before discussing the phonetics of the sound system of Standard
Modern Persian, a brief overview of the sound system of Early New Persian is presented.1
4.2 The sound system of Early New Persian

The sound system of Early New Persian consisted of the consonants presented in Table
4.1 (Sadeghi 1978 [1357 ]). (p. 92)
Page 2 of 27

Phonetics
Table 4.1 Consonants of Early New Persian
Bilabial Labio- Dental Alveolar Post-alveolar/ Velar Glotta

denta Palatal l
l
Plosiv p b(β) t d (ð) k g

es
Fricat f s z š ž x ɣ h
ives xw
Affric č ǰ
ates
Nasal m n
s
Liqui r
ds l
Glides w y (w)
(Sadeghi 1978 [1357]: 129)
Page 3 of 27

Phonetics
The consonantal system of Early Modern Persian consisted of twenty-two simple

consonants observed in seven places and six manners of articulation and one labialized
consonant. Nasir al-Din Tusi (1201–1274 AD) has also mentioned two more complex
consonants in Early New Persian, namely /ɣʷ/ and /gʷ/ as in /darɣʷiʃ/
‘dervish’, and /gʷas/ ‘enough’ (Samā’i 2009 [1388]: 65–7). Many of the consonants
are still present in Standard Modern Persian (Table 4.2). However, the system has
undergone the following developments:
a) The labio-velar glide /w/ is no longer present and has been replaced by the voiced
labio-dental fricative /v/.
b) The velar stops /k/ and /g/ have undergone a change of place in Standard Modern
Persian and have become more front (i.e. palatal).
c) The phoneme /ɣ/ ([ʁ]) has lost its phonemic status and has changed into an
allophone/ free variant of a new phoneme, i.e. /ɢ/ (voiced uvular stop).
d) The labialized velar fricative /xw/ has merged with /x/. The other complex
consonants have also disappeared.
e) The allophonic variants of /b/ and /d/, namely [β] and [ð] respectively, are no
longer observed in the language.
f) The phoneme /ʔ/ is added to the system. Ibn Duraid (837–933 AD) comments that
this consonant is only observed in initial position in Persian, unlike Arabic in which it
appears in all positions (Sadeghi 1978 [1357]: 120). In other words, [Ɂ] seems to be a
boundary marker in Early New Persian, but in Modern Persian has gained a
phonemic status through borrowing heavily from Arabic.
(p. 93)
The vowel system of Early New Persian consisted of three short (/a,i,u/) and five long (/
ā,ī,ū,ē,ō/) vowels (Sadeghi 1978 [1357]: 129). In the development of the language to
Modern Persian, the short vowels have changed into /a,e,o/ respectively and the first
three long vowels appear as (/ɑ,i,u/). The two long vowels /ē/ and /ō/ (referred to as ‘yāy-e
majhul’ or ‘the unknown’ yā and ‘vāv-e majhul’ (the unknown vāv) respectively) have also
undergone change. The long vowel /ē/ has merged with /i/ as in /šēr/ > /ʃiɹ/ ‘lion’
and /ō/ has merged with /u/ as in /dārōg/ > /dɑɹu/ ‘medicine’. This latter vowel
seems to have remained unchanged in pronunciation in certain words such as /rōšn/>
[ɹoːʃan] ‘bright’ and /gōhr/ > [ɡoːhaɹ] ‘jewel’.
4.3 Consonants and vowels of Standard Modern

Persian
Page 4 of 27

Phonetics
In Standard Modern Persian, briefly referred to as Persian in this section, all speech
sounds involve an egressive pulmonic airstream, i.e. they use the body of air that goes out
of the lungs in their production. Speech sounds are divided into two major groups of
consonants and vowels, depending on the amount of obstruction involved in their
production and the position they occupy in a syllable.
4.3.1 Consonants
Phonetically speaking, consonants are those sounds that are produced with a relatively
obstructed vocal tract. They appear in the margin of syllables, when considered from a
phonological point of view.
The sound system of Persian includes twenty-three consonants (Table 4.2) in nine places
and six manners of articulation. The individual places of articulation can be summarized
into (p. 94) four main places, taking into consideration the active articulator involved in
the production of consonants. The labial place of articulation includes bilabials (/p,b,m/)
and labio-dentals (/f,v/), i.e. those sounds that use the lower lip as the active articulator.
Coronals include dentals (/t,d/), alveolars (/s,z,n,ɹ,l/), and post-alveolars (/ʃ,ʒ,ʧ,ʤ/) which
involve the tip/blade of the tongue (corona) as the main articulator. Palatals (/c,ɟ,j/), velar
(/x/) and uvular (/ɢ/) consonants come under the dorsal place of articulation, as the
tongue body (dorsum) is the active articulator involved in their production. For more
discussion on dorsal sounds, see section 4.3.1.1 below and Chapter 5. The laryngeal place
of articulation includes sounds that involve the vocal folds and the space between them
known as the glottis. The two consonants /h/ and /ʔ/ are the laryngeal (glottal) consonants
of Persian.
Page 5 of 27

Phonetics
Table 4.2 Consonants of Standard Modern Persian
Labial Coronal Dorsal Lar

yng
eal
Bilabial Labio- Dental Alveolar Post- Palatal Velar Uv Glo

dental alveolar ula ttal
r
Ob Plo p b t d c ɟ ɢ ʔ
str siv
ue es
nts
Fri f v s z ʃ ʒ x h
cat
ive
s
Aff ʧ ʤ
ric
ate
s
So Na m n
nor sal
s
Page 6 of 27

Phonetics
ant
Ce ɹ j
s
ntr
al
Ap
pro
xim
ant
s
Lat l
era
l
Ap
pro
xim
ant
s
Page 7 of 27

Phonetics
Manners of articulation can be summarized under two main categories of obstruents and
sonorants. Obstruents (plosives, fricatives, and affricates) are produced with a
momentarily complete obstruction, a narrowing of the vocal tract to the degree that
friction is made, or a combination of the two. Obstruents can be voiced or voiceless, i.e.
the vocal folds vibrate or not in their production. Sonorants are articulated with a
relatively open vocal tract that makes spontaneous voicing possible. So the members of
this class are all voiced. This class includes nasals and approximants as far as consonants
are concerned.
Table 4.2 represents all consonants of Standard Modern Persian. In each box, following
the IPA convention, the symbol to the left represents a voiceless consonant and the one to
the right represents its voiced counterpart.
The articulatory as well as acoustic properties of these consonants are discussed below.
4.3.1.1 Obstruents
As mentioned, obstruents are produced with a momentarily complete obstruction or
narrowing of the vocal tract to the degree that friction is made, or a combination of the
two processes. This class includes plosives, fricatives, and affricates.
4.3.1.1.1 Plosives
Plosives, also known as (oral) stops, are produced with a complete obstruction of the
vocal tract, followed by a sudden release or explosion of air. Persian has eight plosives: /
b,p,t,d,ɟ,c,ɢ,ʔ/ seen in all four main places of articulation: (i) labial (/b,p/); (ii) coronal (/
t,d/); (iii) dorsal (/c,ɟ,ɢ/); and (iv) laryngeal (/ʔ/). The labial plosives /b/ and /p/ are made
with lower and upper lips as the active and passive articulators respectively.
The coronal stops /t,d/ are dental in Persian, produced with the blade of the tongue as the
active articulator and the back surface of the upper front teeth as the passive articulator.
Dorsal stops involve the tongue body as the active and the palate as the passive
articulator. The palate itself can be divided into three main sections: hard palate, soft
palate (velum), and the uvula. Consonants produced at the hard palate are called palatal.
Likewise, consonants produced at the velum have a velar place of articulation and those
produced at the uvula are called uvular.
The dorsal plosives /c/ and /ɟ/ are palatal in Standard Modern Persian. They assimilate in
place of articulation with the following vowel in syllable-initial position as in [caɹi]
‘deafness’, [ɟaɹi] ‘baldness’, [kɑɹi] ‘working, active’ and [ɡɑɹi]

‘carriage’. In other words, they acquire velar allophones syllable-initially, but are
pronounced as palatal syllable-finally irrespective of the preceding vowel, in words such
as [nic] ‘good’, [pɑc] ‘clean, pure’, [diɟ] ‘cooking pot’, and [suɟ]
‘mourning’. (p. 95)
Page 8 of 27

Phonetics
Besides the two dorsal

plosives discussed above,
there is another dorsal
consonant in Persian
produced with back of the
tongue as the active and
the uvula as the passive
Click to view larger
articulator. This voiced
Figure 4.1 Waveform and spectrogram of a glottal
consonant, represented as
stop in initial position
[ɢ] in IPA, appears in
words such as
[ɢalam] ‘pen’ and [ʔotɑɢ] ‘room’. This consonant has a voiced uvular fricative (free)
variant in intervocalic position in words such as /ɑɢɑ/ [ʔɑʁɑ] ‘Mr.’ and /bɑɢi/
[bɑʁi] ‘remaining’ and a voiceless velar fricative allophone in the context of voiceless
consonants in words such as /bɑɢʧe/ [bɑxʧe] ‘small garden’ and /vaɢt/
[vax(t)] ‘time’. This latter allophone is sometimes seen intervocalically in colloquial
Persian in words such as /jaɢe/ [jaxe] ‘collar’ (see Chapters 2, 3, 5, 6, 10, 11, and 15
for more on colloquial speech).
A glottal stop, as its name indicates, is produced by the obstruction of the glottis and its
abrupt opening. Thus, it shows up ideally as a silence gap (representing the obstruction
phase) followed by release burst (representing the release of obstruction) in
spectrograms. Spectrograms are frequency (y-axis) by time (x-axis) representation of
speech signals. Intensity is represented by shades; darker sections are more intense.
Figure 4.1 shows the waveform and spectrogram of the glottal stop [ʔ] in word initial
position in [ʔɑn] ‘that’. The release phase is indicated by an arrow on the waveform
and the obstruction phase is indicated by the flat horizontal line before the release.
Phonetically speaking, this consonant can appear as a stop in all positions in careful
speech, but can be replaced with a glottal trill (creaky voice) in intervocalic position
(Yazarlou 2014). Creaky voice is characterized by irregular and slow vibration of the vocal
folds. Figure 4.2 shows the waveform and spectrogram of a glottal trill in intervocalic
position in [xɑneʔaʃ ] ‘his/her home’. The sparse and irregular glottal pulses
observed in the boxed section, represent a glottal trill as a variant of [ʔ].
The glottal stop (as well as the glottal fricative discussed below) can be deleted syllable-
finally and result in the lengthening of the preceding vowel in words such as /baʔd/
[baːd] ‘after’, /ʤamʔ/ [ʤaːm] ‘addition’, and /maʔni/ [maːni] ‘meaning’.
Page 9 of 27

Phonetics
In terms of voicing, Standard Modern Persian has four voiced /b,d,ɟ,ɢ/ plosives: /bɑm/
‘roof’; /dɑm/ ‘trap’; /ɟɑm/ ‘step’; and /ɢam/ ‘sorrow’, and three voiceless
ones: /p,t,c/ as in /pɑc/ ‘pure’; /tɑc/ ‘grape’; and /cɑc/ ‘type of cookie’.
The glottal stop is also (p. 96) voiceless, as its production involves the same articulators
involved in voicing (the vocal folds) and these articulators cannot perform both actions
simultaneously.

Figure 4.2 Waveform and spectrogram of a glottal
trill (boxed section)
Page 10 of 27

Phonetics
Table 4.3 VOT (ms) of voiceless plosives in initial and intervocalic positions
[p] [t] [c] [k]
Initial +66 +74 +93 +88
Intervocalic +42 +54 +57 +45
(Nourbakhsh 2009 [1388]: 177)
Page 11 of 27

Phonetics
All voiceless stops /p, t, c [k]/ are aspirated word-initially. This means that voicing of the
following vowel is delayed after the release of plosives and the air that escapes the oral
cavity comes out as an audible puff of air. The time lapse between the release of plosives
and the beginning of voicing for the following vowel is called Voice Onset Time (VOT)
measured in milliseconds (ms). Voiceless aspirated stops have an average VOT of 80 ms in
initial position. This places the voiceless plosives of Persian in the category of heavily
aspirated stops. Degree of aspiration reduces intervocalically to an average of 50 ms
(Table 4.3). This observation is consistent with Samareh (1378 [1999]) who introduces
partially aspirated allophones for voiceless stops in Persian in intervocalic position. This
half-aspirated allophone is considered by Samareh to be limited to unstressed intervocalic
position. However, statistical analysis by Nourbakhsh (2009 [1388]: 142–3) indicates that
there is no significant difference in VOT as a function of stress.
Voice Onset Time for voiced stops in initial and intervocalic position is reported in Table
4.4 below. Negative VOT values indicate that voicing is present during the closure phase
of the plosive, while positive values indicate lack of voicing during this phase. As numbers
indicate, [b, d, ɟ] are slightly voiced word-initially, while [ɡ] and [ɢ] are totally voiceless in
this position. Voicing is present in the production of [b,d,ɟ,ɡ] in intervocalic position as
(p. 97) negative VOT values indicate. The uvular stop realizes as the fricative [ʁ] in
intervocalic position making VOT irrelevant.
Page 12 of 27

Phonetics
Table 4.4 Average VOT (ms) of voiced plosives in initial and intervocalic positions
[b] [d] [ɟ] [ɡ] [ɢ]
Initial –18 –6 –6 +4 +5
Intervocalic –66 –58 –51 –66 ––––––
(Nourbakhsh 2009 [1388]: 177)
Page 13 of 27

Phonetics
Voiced stops become partially or fully devoiced when adjacent to voiceless consonants as
well as word-finally, as in [xɑb̥] ‘sleep’, [zud̥] ‘early’, [saɟ̥] ‘dog’,
[hab̥s] ‘custody’, [ʔasb̥] ‘horse’, [had̥s] ‘guess’, [ɢasd̥]
‘intention’, [diɟ̥ʧe] ‘small pot’, and [mesɟ̥aɹ] ‘coppersmith’ (Samareh
1999 [1378]).
To summarize, Standard Modern Persian has two sets of oral plosives; aspirated /ph,th,ch/
and /b,d,ɟ,ɢ/. The latter are only fully voiced in intervocalic position and are mainly
voiceless phonetically in other positions. This observation has led some investigators
(Lazard 1972; Modarresi Ghavami 2007 [1386]; Nourbakhsh 2009 [1388]; Bijankhan 2013
[1392]) to conclude that the main distinguishing characteristic of the two sets of stop
consonants is not voicing, but aspiration in Persian.
4.3.1.1.2 Fricatives
Fricatives are sounds that are made with the narrowing of the vocal tract to the degree
that the passage of air results in random noise. The sound system of Persian includes
three voiced /v,z,ʒ/ and five voiceless /f,s,ʃ,x,h/ fricatives as in [vɑm] ‘loan’; [zaɹ]
‘gold’; [ʒaɹf] ‘deep’; [fɑm] ‘colour’; [saɹ] ‘head’; [ʃiɹ] ‘lion’;
[xeɹs] ‘bear’, and [hes] ‘sense’. In terms of place of articulation, /f,v/ are labial, /
s,z,ʃ,ʒ/ are coronal, /x/ is dorsal, and /h/ is laryngeal. These consonants can be divided into
two main groups of sibilants /s,z,ʃ,ʒ/ and non-sibilants /f,v,x,h/.
Sibilants have a hissing sound produced by air passing through the narrow passage made
by the tongue blade and the alveolar ridge/post-alveolar region. Acoustically, this hissing
sound shows as compact noise or concentration of energy in certain frequencies. In
Persian, the alveolar sibilants /s,z/ have a compact noise at frequencies above 5 kHz,
while the post-alveolars /ʃ,ʒ/ have a compact noise between 3 and 5 kHz (Sepanta 1998
[1377]; Bijankhan 2013 [1392]).
The non-sibilant oral fricatives /f,v,x/, on the other hand, are characterized by diffuse
noise, i.e. noise is seen in their spectrogram in all frequencies. Figure 4.3 below shows
the spectrogram of the oral voiceless fricatives of Persian in [fe], [se], [ʃe], and [xe]
sequences. The noise part of each sequence is shown in boxes. As Figure 4.3 shows, [f]
and [x] have diffuse noise seen in all frequencies, while [s] and [ʃ] have compact noise in
certain frequencies characteristic of sibilant fricatives.
The special characteristic of [x] is that the formants of the following/preceding vowel is
observable during its noise. This observation indicates that the tongue is in the position
required for the production of an adjacent vowel during the production of [x]. As such,
the (p. 98) fricative /x/ has three positional allophones; a front allophone (velar) that
appears before and after front vowels, and a back allophone (uvular) that appears before
and after back vowels. There is a third post-velar allophone that appears post-
Page 14 of 27

Phonetics
consonantally in clusters in words such as [neɹx] ‘rate’, [tabx] ‘cooking’,
[talx] ‘bitter’, etc. The velar allophone has a spectral peak at an average of 1,646 Hz in
the context of front vowels. The post-velar allophone has a spectral peak at an average of
1,421 Hz word-finally in clusters and the uvular allophone has a spectral peak at an
average of 785 Hz in the context of back vowels (Asadi 2012 [1391]). These results
indicate that the phoneme /x/ is mainly a velar/post-velar consonant rather than a uvular.
The glottal fricative [h] as

its name indicates is
produced at the glottis.
The vocal folds are
relatively open in the
production of this
consonant and the air that
passes through (p. 99)
Figure 4.3 Waveform and spectrogram of voiceless
them produces friction
fricatives (boxed sections)
noise that resonates in the
cavities above the larynx.
Since the supralaryngeal
cavity is either in neutral
position or in the position
for adjacent vowels during
the production of this
consonant, the formants of
Click to view larger adjacent vowels are
Figure 4.4 Glottal fricative [h] in intervocalic [ɑ__a] observable throughout the
position. Vowel formants are indicated by dotted
horizontal lines
noise representing [h]
(Figure 4.4).
Table 4.5 VOT values for Persian affricates (ms)
[ʤ] [ʧ]
Initial Position +28 +113
Medial Position –47 +79
(Nourbakhsh 2009 [1388])
The glottal fricative [h] is basically a voiceless consonant as it involves a relatively open
glottis in its production, nevertheless, it can become voiced in intervocalic position in
words such as [baɦɑɹ] ‘spring’ and [ʧɑɦɑɹ] ‘four’. In the production of the
voiced glottal fricative [ɦ], part of the vocal folds vibrate, while part of them is open and
Page 15 of 27

Phonetics
the air that passes through them creates noise. This glottal fricative, like its stop
counterpart [ʔ], can be deleted word/syllable-finally resulting in the compensatory
lengthening of the preceding vowel in examples such as /tehɹɑn/ ‘Tehran’ and
/dah#tɑ/ ‘ten units’, which are pronounced as [theːɹun] and [daː thɑ] in colloquial
Persian respectively.
4.3.1.1.3 Affricates
Affricates involve a complex production in which a complete obstruction of air in the oral
cavity is followed by a gradual escape of air through a narrow cavity. Persian has two
affricates: /ʧ/ as in /ʧanɟɑl/ ‘fork’ and /ʤ/ as in /ʤomʔe/ ‘Friday’
produced in the post-alveolar region.
As with plosives, VOT is important in distinguishing /ʤ/ and /ʧ/. The VOT values for these
two consonants in initial and medial positions are given in Table 4.5. Numbers indicate
that [ʧ] is heavily aspirated in initial position and aspirated medially. The other member
of this contrast, i.e. [ʤ] is voiceless in initial position and voiced medially. Like plosives,
this observation leads to the conclusion that the main distinguishing feature of the two
consonants is aspiration rather than voicing.
4.3.1.2 Sonorants
Sonorants are articulated with a relatively open vocal tract that makes spontaneous
voicing possible. This class includes nasals, approximants, and vowels. Acoustically,
sonorants are characterized by formants, which are resonating frequencies of the air in
the vocal tract observable as dark horizontal bands in spectrograms. As the vocal tract is
more constricted in the production of sonorant consonants relative to vowels, their
resonating frequencies are less intense, hence consonantal formants are seen lighter in
shade compared to the surrounding vowels. (p. 100)
4.3.1.2.1 Nasals
Persian has two nasal
consonants: /m/ and /n/.
The oral cavity is closed in
the production of nasal
consonants and air
Click to view larger escapes through the nasal
Figure 4.5 Spectrogram of nasals cavity. Acoustically, these
sonorant consonants are
characterized by weaker formants compared to vowels (Figure 4.5). The first three
formants of [m] are around 250, 1,000, and 2,700 Hz. The same formants are reported to
be around 250, 1,500–1,600, and 2,800–3,000 for [n] in Persian (Sepanta 1998 [1377]: 89–
90).
Page 16 of 27

Phonetics
The bilabial nasal /m/ assimilates in place of articulation only with the following labio-
dental fricatives as in /amvɑl/ [ʔaɱvɑl] ‘properties’ and /samfoni/
[saɱfoni] ‘symphony’. The alveolar nasal shows assimilation with all places of
articulation, as in /man baɹ miɟaɹdam/ [mam baɹ miɟaɹdam] ‘I will
return’; /anvaʔ/ [ʔaɱvɑ] ‘types’; /ɢand/ [ɢan̪d̪] ‘sugar cube’; /pance/
[paɲce] ‘fan’; /anɟuɹ/ [ʔaŋɡuɹ] ‘grape’; and /menɢɑɹ/ [meɴɢɑɹ] ‘beak’.
4.3.1.2.2 Approximants
In the production of approximants, articulators are close together to the degree that no
friction is made by the air that passes through. Persian has /l,ɹ,j/ as approximants. /l/ is a
lateral approximant and /ɹ,j/ are central approximants.
The lateral approximant /l/ is produced by making a complete closure at the alveolar
ridge with tip/blade of the tongue, keeping the sides of the oral cavity open. As with other
sonorants, this consonant is also characterized by weak formant structure (Figure 4.6).
The first three formants of this consonant are reported to be around 250, 1,500–1,600,
and 2,450 Hz by Sepanta (1998 [1377]: 92); 321, 1,532, and 2,588 Hz by Bijankhan (2013
[1392]: 199); and 296, 1,944, and 2,547 Hz by Alinezhad and Hosseinibalam (2013
[1392]: 173). /l/ is devoiced word-finally. When this consonant appears after a voiceless
consonant in word-final position, it is produced as a voiceless fricative.
The glide /j/ is produced by raising front of the tongue towards the hard palate and
pushing the air through the narrowing with no friction. This consonant is characterized
acoustically (p. 101) by weak formants with frequencies around 275, 2,100, and 2,650 Hz
(Sepanta 1998 [1377]: 94). It has the same formant frequencies of the high front vowel [i]
(Figure 4.7).

Figure 4.6 Waveform and spectrogram of [le]
Page 17 of 27

Phonetics
Different types of rhotic (r-

like) consonants are
observed in the languages
of the world. IPA has eight
symbols for different types
of rhotics. These sounds
are observed in three
places (alveolar, retroflex,
Figure 4.7 Waveform and spectrogram of [je] and uvular) and four
manners of articulation
(trill, tap/flap, approximant, and fricative). Trills involve an articulation in which one
articulator is held loosely near another so that the flow of air between them sets them in
motion, alternately sucking them together and blowing them apart. This kind of
production is seen in some forms of Scottish English represented as /r/ in IPA. A tap is a
sound made by a rapid movement of the tip of the tongue upward the alveolar ridge, then
returning to the floor of the mouth along the same path. This consonant is observed as an
allophone of /t,d/ in American English in words such as ‘better’ and ‘butter’ pronounced
as [bɛɾɚ] and [bʌɾɚ] respectively, (p. 102) with a tap/flap as the intervocalic consonant.
An approximant is an articulation in which one articulator is close to another, but without
the tract being narrowed to such an extent that a turbulent airstream is produced
(Ladefoged and Johnson 2011). Many forms of English have an approximant rhotic
consonant represented as an alveolar /ɹ/ or retroflex /ɻ/ in IPA.
The only rhotic consonant

of Persian is considered by
the majority of linguists
and grammarians to be an
alveolar trill with a tap
allophone in intervocalic
position and a voiceless
fricative allophone in
Figure 4.8 Waveform and spectrogram of [ɹe]
word-final position. This
latter allophone is also
observed before voiceless consonants in clusters (Nye 1954: 15; Jazayeri and Paper 1961:
29; Kord-Zafaranlu-Kambuziya 2006 [1385]). Others such as Samareh (1999 [1378]),
Majidi and Ternes (1999: 124–5) have introduced an approximant allophone for this
consonant as well. Acoustic investigation of this consonant (Shekari and Nourbakhsh
2012 [1391]; Modarresi Ghavami, 2016 [1394]) has shown that this consonant is basically
an approximant in Persian, although tap allophones and voiceless fricative allophones are
also observed. Trills are rarely observed and their occurrence is limited to positions of
emphasis. Figure 4.8 shows a spectrogram of [ɹ] in initial position. As the spectrogram
indicates, [ɹ] has the same formant structure as the following vowel except that it has less
intense formants (indicated by a lighter colour) in comparison. This formant structure
Page 18 of 27

Phonetics
shows that this consonant is vowel-like in its production (an approximant) and not a trill
or tap. Another feature of this consonant is that its third formant (F3) is lower in
frequency compared to the adjacent vowel.
4.3.2 Vowels
Vowels are produced with no obstruction in the vocal tract. Standard Modern Persian has
six simple and six complex vowels.
4.3.2.1 Simple vowels (monophthongs)

Standard Modern Persian has six simple vowels [i, e, a, u, o, ɑ]. This vowel system
includes three front and three back vowels, [i, e, a] and [u, o, ɑ] respectively. In terms of
height, [i, u] are high, [e, o] are mid, and [a, ɑ] are low. (p. 103)
4.3.2.1.1 Vowel quality

Acoustically, vowels are
characterized by strong
formants seen as dark
horizontal bars in the
spectrographic
Click to view larger representation of vowels.
Figure 4.9 Waveform and spectrogram of Persian The spectrogram in Figure
Simple Vowels
4.9 shows the formants of
the six simple vowels of
Persian produced by a
male speaker. The first
four formants are
represented as horizontal
bands for each vowel.
Each vowel has its own

specific formant pattern
Click to view larger which distinguishes it from
Figure 4.10 Vowel formant frequencies (Female other vowels. The
speakers) frequency of the first three
formants of the six vowels
of Persian are given in Figure 4.10 for female speakers and in Figure 4.11 for male
speakers.
The first formant (F1) is sensitive to vowel height. The higher the vowel, the lower the
frequency of the first formant. As the values for the first formant in Figures 4.10 and 4.11
show, (p. 104) F1 frequency increases as the front vowels [i,e,a] and the back vowels
Page 19 of 27

Phonetics
[u,o,ɑ] become more open. The second formant (F2) is sensitive to place of articulation.
Front vowels have high F2 frequencies, while back vowels are characterized by low F2
values.
The vowel space of female

and male speakers is seen
in Figure 4.12.
Acoustically, the vowel
space of female speakers
is larger than those of
men. This is due to the fact
that the female vocal tract
is smaller and its resonant
frequencies are hence
Figure 4.11 Vowel formant frequencies (Male
speakers) higher.
4.3.2.1.2 Vowel quantity

Up to the sixteenth century AD the vowel system of Early New Persian was the same
system observed in Middle Persian which included five long (/ī, ū, ā, ē, ō/) and three short
(/i, u, a/) vowels. The short vowels /i/ and /u/ have changed into /e/ and /o/ respectively in
Standard Modern Persian. Moreover, the long vowels /ā/, /ē/ and /ō/ have been replaced
by /ɑ/, /i, ey/ and /u, ow/ respectively (Sadeghi 1978 [1357]: 129–33). Thus, in the
development of Standard Modern Persian vowel system, the quantitative distinction
between the vowels has been replaced by a distinction in quality between /i/ and /e/, /u/
and /o/, and /a/ and /ɑ/.
Despite this development, vowels are still divided into two groups by many linguists:
long /i,u,ɑ/ and short /e,o,a/ vowels. This division is reflected in writing by the fact that
long vowels are generally represented by letters and short vowels by diacritics,2 which
are omitted in writing after children master the writing system in the first grade. This
distinction is also important in Persian metrics. There is a difference between the
behaviour of short and long vowels in the phonology of Persian: short unstressed vowels
can be deleted in rapid speech, while long vowels are almost never deleted (Lazard 1957:
12–13); in monosyllabic words with CVCC syllable structure, where a glottal consonant
appears in the cluster or the final C is a nasal or a liquid, only short vowels appear in V
position; short vowels undergo compensatory (p. 105) lengthening, while long vowels are
never lengthened (Kord Zafaranlu Kambuziya and Hadian 2009 [1388]).
Page 20 of 27

Phonetics

Figure 4.12 Vowel space of Female (dots) and Male
(squares) Persian Speakers
Page 21 of 27

Phonetics
Table 4.6 Duration (ms) of Persian vowels
[i] [e] [u] [o] [ɑ] [a]
Female 190 179 186 173 226 198
Male 169 159 163 145 203 177
(Modarresi Ghavami 2015 [1393])
Page 22 of 27

Phonetics
Some investigators (Sokolova et al. 1952; Hodge 1957; Lazard 1957; Rastorgueva 1964)
have preferred to refer to long vowels as ‘stable’ and short vowels as ‘unstable’, due to
the observation that long vowels are never deleted or shortened, while short vowels
undergo both processes. Moreover, early acoustic investigations had shown that the
length distinction between the two sets has disappeared in Standard Modern Persian
except in open unstressed syllables (Sokolova et al. 1952; Hodge 1957; Mohammadova
1974). However, recent work on the duration of vowels in Persian has shown that the
duration distinction is also present in closed stressed syllables (Modarresi Ghavami 2015
[1393]). Table 4.6 shows the duration of vowels in closed stressed syllables. As the
numbers indicate, the long vowels [i,u,ɑ] are consistently longer than [e,o,a] respectively.
The low back vowel [ɑ] is the longest and the mid (p. 106) back vowel [o] is the shortest
vowel in Persian. Moreover, vowels are consistently longer in the speech of women
compared to men.
Phonetic context can affect the duration of vowels. For example, all vowels are longest in
CVCC syllables, stressed vowels are longer compared to their unstressed counterparts,
and all vowels are lengthened before voiced codas (Samareh 1999 [1378]).
The duration of short vowels increases when a following syllable-final glottal consonant,
i.e. /ʔ, h/ is deleted, in a process known as compensatory lengthening. Examples are
/daʔva/ ‘fight’ and /tehɹɑn/ ‘Tehran’ which are pronounced as [daːvɑ] and
[teːɹun] respectively in colloquial Persian. This phonetic increase in duration, results in
phonetically distinct pairs such as /baʔd/ [baːd] ‘after’ versus /bad/ [bad] ‘bad’
and /daʔvɑ/ [daːvɑ] versus /davɑ/ [davɑ] ‘drug, medicine’. Length also plays a
distinctive role between loanwords such as [mɑ̆dɑm] ‘madam’ and [kŭɹi]
‘(Marie) Curie’ on the one hand and loan/native words such as [mɑdɑm] ‘as long
as’ and [kuɹi] ‘blindness’. In such examples, the vowels in European loanwords
seem to be shorter compared to native words or words borrowed from Arabic.
4.3.2.2 Complex vowels (diphthongs)

Diphthongs are sequences of vowels in a syllable. Persian has six complex vowels: [ei] as
in [cei] ‘when’, [ai] as in [hai] ‘alive’, [ui] as in [ɹui] ‘zink’, [oi] as in
[xoi] ‘(the city of) Khoi’, [ɑi] as in [ʧɑi] ‘tea’, and [ou] as in [ʤou]
‘barley’. The spectrogram in Figure 4.13 shows these six complex vowels. As seen, the
formants for each diphthong change in frequency during the production of these vowels.
Since the second member of each diphthong is a high vowel, the frequency of the first
formant reduces from the beginning to the end of the diphthong, as the first formant is
sensitive to vowel height. For the complex vowels that end in [i], the second formant has
a rising pattern, as [i] is a front vowel characterized by high F2 frequencies. The only
Page 23 of 27

Phonetics
diphthong that ends in [u], i.e. [ou] shows a reduction in F2 frequency, as [u] is a back
vowel characterized by low F2 frequencies.
(p. 107)
Acoustic investigation of
the diphthong [ou] has
shown that this vowel is
mainly pronounced as a
long vowel which can be
Click to view larger represented as [oː]
Figure 4.13 Waveform and spectrogram of Persian (Modarresi Ghavami 2010
Complex Vowels [1389]). The diphthong
variant of this vowel is
seen in a minority of cases
in careful exaggerated
speech. Figure 4.14 shows
the diphthongal vowel
space of female (dots) and
male (squares) speakers of
Persian.

Figure 4.14 Vowel space of Persian diphthongs
4.4 Suprasegmentals
Suprasegmentals are those aspects of speech that involve more than one segment. These
include stress, tone, and intonation. The three acoustic properties of duration, intensity,
frequency, and spectral characteristics, which are respectively known as length, loudness,
pitch, and quality in the perceptual domain, are used to realize the suprasegmental
features of stress, tone, and quantity in the lexical domain and intonation in the non-
lexical domain (Hirst and Di Cristo 1998: 7). The two most relevant suprasegmental
features in Persian are stress and intonation discussed below. (p. 108)
4.4.1 Stress
Page 24 of 27

Phonetics
Stress is the perceived prominence of a syllable compared to other syllables in the string
of speech. Stressed syllables can be heard as louder, longer, and higher in pitch. In
certain languages such as English, vowels are fully realized in stressed syllables, but are
reduced to other vowels such as a schwa in unstressed ones. Earlier studies have
indicated that Persian vowel space gets smaller when vowels are unstressed compared to
when they are stressed (Ghara’ati 2010 [1389]; Alinezhad 2012 [1391]). In other words,
vowels tend to shift to a central position similar to the position for a schwa in unstressed
syllables. However, further acoustic investigation shows that keeping all variables
(including word length) constant except for stress, vowel space does not change as a
function of stress (Modarresi Ghavami 2014 [1392]), indicating that vowel quality does
not change in unstressed syllables compared to stressed syllables.
Investigations of the acoustic correlates of stress in Persian (Sepanta 1975 [1354]; Natel-
Khanlari 1988 [1367]: 150–1; Vahidiyan Kamyar 2000 [1379]: 23–4; Mousavi 2007 [1386])
have found frequency (pitch) to be the main acoustic correlate, while the involvement of
duration and intensity has been found to be minimal. Gender seems to play an important
role in this issue, as women have been found to use duration to make a syllable more
prominent, while frequency seems to be the main acoustic correlate of stress in the
speech of men (Modarresi Ghavami 2014 [1393]).
4.4.2 Intonation
Intonation is defined as patterns of pitch changes used by speakers to convey linguistic as

well as pragmatic meaning (see also Chapter 6). As such, the study of intonation from a
phonetic point of view involves the investigation of changes in fundamental frequency
(i.e. the frequency of vocal fold vibration) as the acoustic correlate of pitch. For example,
in Figure 4.15 the variations in fundamental frequency (F0) in the phrase [bɑjad zudtaɹ
baɹɟaɹdand] (p. 109) ‘they must return immediately’ is shown as a pitch contour
superimposed on the spectrogram of this phrase. Two prominent peaks are observable in
this figure: a less prominent one on the second syllable of [bɑjad] ‘must’ and a more
prominent one on the second syllable of [zudtaɹ] ‘immediately’. Speakers can use
pitch to make a word more prominent in a linguistic unit such as a phrase, clause, or
sentence. The speaker of the sentence represented in Figure 4.15 has made two words
more prominent in the phrase in order to highlight the importance of departing
immediately. Using intonation to highlight words important in conveying the intended
meaning is called tonicity. Tonic peaks fall on the stressed syllable of intended words.
Tonicity is dependent on the intended linguistic and pragmatic meaning and is not
predictable. The tonic peak could have fallen on the first syllable of
[baɹɟaɹdand] ‘return’ (3rd pl.) to highlight ‘returning’ rather than any other action.
Page 25 of 27

Phonetics
At the same time, we can

see that the pitch contour
falls at the end of the
phrase in Figure 4.15. This
is the typical intonation
pattern of a statement in
Persian. The use of
Figure 4.15 Waveform, spectrogram, and the pitch
contour of a statement intonation to mark the
beginning and end of
phrases, clauses, and
sentences is called tonality.
Intonation can also be

used to convey
definiteness,
incompleteness, objection,
Click to view larger irritation, etc. without
Figure 4.16 Waveform, spectrogram, and pitch changing the lexical
contour of [bale] ‘yes’
meaning. This use of
intonation comes under
the heading of tone. Figure 4.16 shows the waveform and pitch contour of the word
[bale] ‘yes’. The first instance on the left represents the word in its neutral condition,
i.e. expression of consent and agreement marked by a falling intonational pattern. The
second instance with a rising pattern represents a question form of the word meaning
‘what?’ with a touch of irritation on the part of the speaker. The third case, which has a
relatively long and prominent first syllable and shows a peak on the second syllable
conveys definite agreement and consent. The last case still means ‘yes’ but conveys a
sense of reluctance and frustration on the part of the speaker due to its high-falling pitch
contour pattern.
Intonation plays many important roles in language: grammatical, pragmatic, attitudinal,

sociological, psychological, etc. which are topics discussed in phonology.
(p. 110) 4.5 Summary

This chapter was an overview of the phonetic aspects of the sound system of Modern
Standard Persian. It started with a brief introduction to the sound system of Early New
Persian spoken between the eighth and twelfth centuries AD and the development of this
system into what we observe today in Modern Standard Persian. The articulatory as well
as acoustic properties of Modern Standard Persian consonants were introduced in two
main sections on obstruents (plosives, fricatives, and affricates) and sonorants (nasals
and approximants). In discussing consonants, the place and manner of articulation of
Page 26 of 27

Phonetics
Persian consonants, as well as issues of voicing and VOT were introduced. This section
was followed by a review of the acoustics of simple and complex vowels of Persian.
Simple vowels are not only qualitatively different, but also a difference in quantity is
observable between short and long vowels acoustically. Phonetically complex vowels were
also investigated acoustically, showing that at least five complex vowels are observable in
Modern Standard Persian. The last section of the present chapter included a brief
discussion on the acoustics of the suprasegmentals of stress and intonation in Persian.
(p. 111)
Notes:
(1) The system used in phonetic and phonemic transcription of sound segments and
suprasegmental features in this article is that of the International Phonetic Alphabet
(IPA). The sound system of Early Modern Persian, proper names of Iranian scholars, as
well as the reference section are transliterated.
(2) Short vowels are also represented by the letters and word-finally as in [to],
[na], and [ɡoɹbe].
Golnaz Modarresi Ghavami
Golnaz Modarresi Ghavami is a faculty member of the Linguistics department at

A.T.U., Tehran, Iran. She teaches general phonetics, introductory and advanced
phonology, Persian and English phonetics and phonology, acoustic phonetics, and
historical linguistics. Her research is mainly focused on the phonetics and phonology
of Persian. She is the author of Phonetics: The Scientific Study of Speech (2011,
2015) and A Glossary of Phonetics and Phonology (2015), both in Persian.
Page 27 of 27

Phonology

Phonology
Mahmood Bijankhan

This chapter reviews the organization of sounds in the contemporary Persian language
and discusses the issues in phoneme inventory, syllable structure, distinctive features,
phonological rules, rule interaction, and prosodic structure according to the framework of
the derivational phonology. Laryngeal states responsible for contrast in pairs of
homorganic stops and fricatives are different in Persian. Phonological status of
continuancy is controversial for the uvular obstruent. Glottal stop is distinctive at the
beginning of loan-words while not at the beginning of the original Persian words.
Phonotactic constraints within the codas of the syllables violate the sonority sequencing
principle. Glottals are moraic in the coda position. Feature geometry is posited on the
sound distinctions and patterns within phonological processes. Eleven phonological rules
are explained to suggest natural classes. Interaction of some rules is derived. Laryngeal
conspiracy, syllable structure, and intersegmental processes are analysed according to
interaction of ranked violable constraints of optimality theory.
Keywords: phoneme, syllable, aspiration, voicing, glottal stop, phonological rules
5.1 Introduction
THIS chapter investigates the phonology of contemporary Persian according to the formal
and colloquial speech data as spoken slowly by literates in the Tehrani dialect. First of all,
a phoneme inventory is posited using identical and complementary distributions. Then
Persian syllable structure is explained on the basis of the sonority hierarchy. Phonological
processes in Persian are tested against feature geometry in order to posit the hierarchical
structure of the phonetic features. Afterwards, Persian phonological rules are discussed
Page 1 of 45

Phonology
as evidence for natural classes. Prosodic features of length and stress are also discussed.
Finally, several appropriate processes are chosen to investigate the interaction of
violating markedness and faithfulness constraints which lead to a laryngeal conspiracy
and hierarchical Hasse diagram.
5.2 Phoneme inventory

When two words differ minimally in one sound, the two sounds can belong to two
separate phonemes. Examples of how this works in the consonantal system of Persian are
provided in (1). Noticeably, the pronunciations of these surface minimal n-tuplets follow -
aɹ or -ɑɹ frames, except for [ʒaɹf] and [ɹɑn] in (1b). For instance, from the minimal 5-
tuplet in (1a), five consonantal phonemes, i.e. /p/, /b/, /f/, /v/, and /m/ can be derived.
Single quotations are used to represent the meaning of words. ‘̥’ stands for absence of
voice.
(1)
(p. 112)
Page 2 of 45

Phonology
In order to classify Persian consonants, the articulatory gestures described by Catford

(2003) and the articulator-based theory in phonology (Halle et al. 2000) were taken as
criteria for a sound classification. Accordingly, consonants are phonetically classified into
six groups: labials, dentalveolars, palatals, velars, uvulars, and glottals. For labials, for
instance, the lower lip articulates with either the upper lip, as in /p/, /b/, and /m/, or the
upper teeth, as in /f/ and /v/. In (1), subordinate examples of Persian consonant categories
are headed by the place of articulation of the bolded first segments of each word. A closer
examination of these examples encourages us to posit additional Persian consonants that
are phonemically distinct. Controversial distinctions will be reviewed and discussed in (i)
to (v) below.
(i) The laryngeal states responsible for contrast in pairs of homorganic stops and
those responsible for fricatives are different. While scholars agree that pairs of
homorganic fricatives are easily categorized as either voiceless or voiced phonemes,
i.e. /f/ vs. /v/, /s/ vs. /z/, and /ʃ/ vs. /ʒ/, the situation for stops is not straightforward.
Whether aspiration or voicing causes the contrast is significant. Experiments by
Zavjalova (1961, cited in Windfuhr 1979) showed that voiceless stops are generally
aspirated whereas voiced stops are never aspirated but may be (partially) devoiced
or (partially) voiced in specific environments. Qarib (1965, cited in Windfuhr 1979)
found that the voiceless stops are marked by various degrees of aspiration that are,
however, entirely lost after voiceless fricatives. Lazard (1992: 8–9) claimed that
voiced stops become voiceless in the final position but without being confused with
unvoiced counterparts, which, by comparison, are more energetic and strongly
articulated. Mahootian (1997) believes that voiceless stops are aspirated in the
syllable-initial position but unaspirated at the end of a syllable. Samareh (1999)
considered voiceless stops to be aspirated except when they precede stops and
fricatives. Based on impressionistic judgement, he believes that all voiced obstruents
lose their voice at the end of a word or when they occur before or after a voiceless
consonant, and that voiced stops in the word-initial position lose their voice almost
totally. According to VOT (Voice Onset Time) studies by Bijankhan and Noorbakhsh
(2009), Persian uses mainly {voiceless unaspirated} and {voiceless aspirated}
categories for voiced and unvoiced distinctions in the initial position and {voiced}
and {voiceless aspirated} categories (p. 113) in the intervocalic position. Windfuhr
(2009b) asserted that the distinctive feature of the pairs of stops and fricatives is still
being debated. It may be identified either as voice or as tenseness, and tense stops
are aspirated word-initially. UPSID (UCLA Phonological Segment Inventory
Database) considers Persian voiceless stops as phonologically aspirated segments.
It seems that the amount of space between vocal folds signals opposition for the
homorganic stops, and the presence vs. absence of the vocal fold vibration (or
tension) signals opposition for the homorganic fricatives. Some Persian scholars used
the tense/lax distinction for voiced–voiceless opposition (Samareh 1999; Windfuhr
2009b; see also Jessen 1998 for German). However, the conventional IPA
Page 3 of 45

Phonology
(International Phonetic Alphabet) transcription for voiced and voiceless stop is

maintained here.
(ii) No general consensus has been reached for the place of articulation of /t/ and /d/.
While some consider them to be dental (Windfuhr 1979; Pisowicz 1985; Lazard 1992;
Samareh 1999; UPSID), others describe it as either dentalveolar (see Haghshenas
1990) or alveolar (Majidi and Ternes 1999). Mahootian (1997) described it as either
apico-alveolar or apico-dental. Therefore, consistent with the opinion of most
scholars, /t/ and /d/, whose constrictions are made by the tip and blade of the tongue
(as active articulators) and the region including the upper teeth and alveolar (as two
passive articulators), could be dentally articulated. Affricates along with /ʃ/ and /ʒ/
are considered to be post-alveolar and thus are classified with dentalveolars (Majidi
and Ternes 1999; UPSID), because their active articulator depends on the blade of
the tongue. Windfuhr (2009b) classifies them as palatal on account of the passive
articulator that forms the constriction. Like voiceless stops, voiceless affricates are
aspirated (Bijankhan and Noorbakhsh 2009). /s/, /z/, /l/, and /ɹ/ are considered to be
apico-alveolar.
(iii) There is no consistency in the recognition of the phonemic status of the palatal
and velar stops. While some scholars consider them as velars or prevelars (c.f.
Mahootian 1997; Majidi and Ternes 1999; Windfuhr 2009b; UPSID), Pisowicz (1985)
considers the palatal articulation as the chief one. Velars occur only in the syllable-
onset position when the nucleus is a back vowel while palatals occur in all other
positions. Therefore, posterodorso-velar stops [k, ɡ] should be considered as
allophones of anterodorso-palatal stops /c, ɟ/.
(iv) There is also some debate as to whether the voiced uvular phoneme is a dorso-
uvular stop or dorso-velar fricative. Pisowicz (1985), Mahootian (1997), Samareh
(1999), and UPSID consider it to be a stop, while Windfuhr (1979, 2009b) and Majidi
and Ternes (1999) classify it as a fricative. According to Windfuhr (2009b), while it is
systemically a lax fricative, intervocalically it is a lax velar fricative, [ɣ]; in initial and
final positions it is a lax uvular stop [ɢ]. Majidi and Ternes (1999) state that /ɣ/ is [ɢ]
in the word-initial position after nasals, and when geminated; otherwise it is
postvelar. Sadeghi (2006) posited that the voiced dorso-uvular stop in Persian was
borrowed from Arabic in the first centuries of the modern Persian era. Bijankhan and
Noorbakhsh (2009) concluded that the voiced uvular should be treated as a stop,
although in some positions (e.g. between two vowels or in the word-final position) it
is converted into its fricative or sonorant allophone. For example, /ɢ/ in /ɑɢɑ/ ‘Mr.’
becomes [Ɂɑʁɑ]. Hayes (2009) considers the uvular stop as voiceless. (p. 114)
(v) There is also no consistency in the explanation of the phonemic status of glottal
stops (Windfuhr 1979). /Ɂ/ is an Arabic loan similar to /ɢ/. When used at the
beginning of words and before vowels, the main issue is whether [Ɂ] is predictable in
being automatically inserted by Persian speakers or is distinctive there. A dual
function can be suggested for the glottal stop (Haghshenas 1990): it is distinctive at
the beginning of Arabic (and English) loan words though not at the beginning of the
Page 4 of 45

Phonology
original Persian words, and it is a prosodic element (in Firthian terminology), like /j/,
when resolving hiatus in accordance with the syllable-onset obligation.
Based on the IPA consonant chart, the Persian consonant system is tabulated in (2).
(2)
Oral stops and affricates are symmetrical in terms of voicing and aspiration, except for
the uvular stop. In terms of voicing, labial and dentalveolar stops and fricatives also form
a complete symmetry.
From the minimal 6-tuples in example (3), six vowels can be derived: /i/, /e/, /a/, /u/, /o/, /
ɑ/.
(3)
Based on Catford’s (2003) coding system for the cardinal vowels, Persian vowels can be
coded as CV1 (/i/), CV2 (/e/), CV4 (/a/), CV8 (/u/), CV7 (/o/), and CV5 (/ɑ/), where CV
stands for the cardinal vowels. The place of articulation is either front, i.e. anterodorso-
palatal, or back, i.e. posterodorso-velar. Front vowels are unrounded and back vowels are
rounded. The three degrees of vowel height are distinctive. Lip shape and nasality are not
distinctive.
Vowels lengthen before voiced consonants and clusters (Samareh 1999), and vowels are
classified into two groups in terms of length: short vowels /e, o, a/, and long vowels /i, u,
ɑ/. Scholars interested in diachronic phonology use the term ‘stable’ for long vowels
because they retain their duration in all positions, and ‘unstable’ for short vowels because
their duration varies in accordance with their position (Windfuhr 1979, 2009b; Lazard
1992). Length distinction is also the basis of rhythm in classical Persian verse. The
Persian vowel system is displayed in (4).
Page 5 of 45

Phonology
(4)
(p. 115)
A partition of the vowels into the long vowels /i, u, ɑ/ surviving from the Middle Persian /
e:, o:, a:/, and the short vowels /e, o, a/ surviving from the Middle Persian /i, u, a/ (Sadeghi
1978), is needed to understand some phonological patterning.
Some minimal pairs like [ɢom] ‘name of a city’ vs. [ɢowm] ‘people’, [doɹ] ’pearl’ vs. [dowɹ]
‘around’, and [hol] ‘push’ vs. [howl] ‘about’ may provide evidence of the diphthong /ow/ as
a phoneme. However, it is one of the least frequently occurring sounds in Persian, being
observed in few words and not participating in the nucleus position of CVCC syllables at
all. [w] would be in complementary distribution with [v] when it occupies the position of
the first consonant of the syllable coda, in which case /o/ is the only vowel that can
occupy the nucleus position. Thus [w] emerges as an allophone of /v/ (Haghshenas 1990;
Yarmohammadi 1995; Samareh 1999). Accordingly, the above-mentioned words can be
phonemically transcribed as /ɢovm/, dovɹ/, and /hovl/.
In total, the Persian phoneme inventory contains 23 consonants and 6 vowels.
5.3 Syllable structure

A syllable is a formal and psychological entity to which phonological rules, phonotactic
constraints, alternations, stress, and intonational patterns refer. A phonological word is
parsed into syllables. In general, a syllable (σ) contains an obligatory nucleus (N)
preceded by an optional consonantal onset (O) and is followed by an optional consonantal
coda (CO). The onset and the coda form the margin of a syllable. Rhyme (R) is another
constituent of the syllable formed by a nucleus and a coda. Rhyme and onset are
dominated by the syllable node. The hierarchical structure of Persian syllables is
displayed in (5).
Page 6 of 45

Phonology
(5)
The template of the Persian syllable is CV(C)(C). The onset must contain only one
consonant. Thus Persian has three syllable structures: CV, CVC, and CVCC. A glottal stop
occurs at the beginning of a vowel-initial word. Clear evidence in Persian supports a
strong bond between the nucleus and the coda rather than the nucleus and the onset. For
example, no onset constraint exists in CV words except that /ʒ/ cannot be followed by /o/.
A syllable template makes syllabification more straightforward. Given that vowels alone
comprise the nucleus in Persian, syllabification of words into sequences of syllables starts
with the assignment of each vowel to a syllable root. Then each intervocalic consonant
before a vowel is assigned to the onset of the following syllable, and by joining the
remaining intervocalic consonants to the coda of the preceding syllable, the process of
syllabification ends. For example, syllabification of the word /ʃahɹvandi/ ‘citizenship’
proceeds as shown in (6). (p. 116) Nuclei are indicated by bolded underlined vowels and
the syllable boundary is indicated by a dot.
(6)
Persian phonotactic constraints within the codas of the syllables violate the sonority
sequencing principle (SSP) (Hayes 2009), which requires segments to progressively
decrease in sonority as one proceeds outwards from the nucleus towards the end of the
syllable. For example, the order of the segments in the coda of a monosyllabic word such
as /sadɹ/ ‘top’ disagrees with the sonority hierarchy according to which liquids are more
sonorant than stops. According to a statistical analysis of the Persian lexicon conducted
by the author, it was found that 21.5 percent of the clusters in the coda position violate
the SSP. Sequences of fricatives+nasals comprise the most frequent violations. Putting
the sequences made of affricates aside, sequences of nasal+glides and liquids+glides do
not exhibit any violations. This being the case, given a consonant cluster in disagreement
with the sonority hierarchy in Persian, one will also find a corresponding cluster in
agreement with it.
Page 7 of 45

Phonology
5.4 Distinctive features

Sound distinctions within a minimal n-tuplet can be characterized by phonetic features as
basic units of phonological description. For example, the difference between /baɹ/ and /
mɑɹ/, or /daɹ/ and /naɹ/, which results in identification of the phonemes /b/ vs. /m/, and /d/
vs. /n/, can be characterized by the distinctive feature [nasal]. Since [nasal] is a function
of velic aperture and only two values, i.e. being closed or not, are used to differentiate the
words, the distinctive feature [nasal] is binary: [+nasal] for /m/ and /n/, and [–nasal] for /
b/ and /d/. Features may be single-valued or multi-valued depending upon how they
function in the sound system. Features are, in fact, formal devices for the partitioning of
phonological space into natural classes of sounds.
Based on the methodology of McCarthy (1988), phonological processes like assimilation,

reduction, and dissimilation operate on consistent subsets of features, resulting in a
hierarchical organization of features, called feature geometry. In this section, some
Persian phonological processes are tested against the feature geometry of Halle et al.
(2000), as illustrated in (7).
(7)
(p. 117)
First of all, broad articulative categories, i.e. vowels, glides, liquids, nasals, and
obstruents, can be specified by two main articulator-free class features: [consonantal]
and [sonorant] (8), abbreviated as [cons] and [son].
Page 8 of 45

Phonology
(8)
While voiced and voiceless obstruents are phonemically distinct except for uvulars (see
(2)), no such distinction exists for sonorants because they are all voiced in Persian. This
justifies two natural classes: [+son] and [–son]. Vowels and glides, /j/ and [w], belong to
the [–cons] class because they do not have a radical constriction in the supralaryngeal
cavity. However, /j/ and [w] unlike vowels occupy the position of margins in Persian
syllables. The laryngeals /h, Ɂ/, like /j/ and vowels, belong to the [–cons] class, because
they do not have a radical constriction in the supralaryngeal cavity. In Persian, laryngeals
and glides can be characterized as a natural class because they occur between two
consecutive vowels in order to resolve hiatus.
Changes in the values of the features of [son] and [cons] affect the entire segment, so
they represent the root node. Deletion of /t/ and /d/ in the word-final position provides
evidence for reduction of the major class features of [-son] and [+cons] in conjunction
with the features of [-cont], [voice], and Place which constitute the whole segments of /t/
and /d/.
Distribution of laryngeal states in different contexts, as stated in section 5.2, suggests

that aspiration is more fundamental than voicing in the distinction between voiced and
voiceless stops. Therefore, the feature [spread glottis] differentiates between voiceless
stops ([+spread glottis]) and voiced stops ([–spread glottis]), as in German (Jessen and
Ringen 2002). (From now on, [spread glottis] is abbreviated as [sg].) Laryngeal features
form a class node independent of place and manner features. Aspirated (voiceless) stops
coming after voiceless fricatives deaspirate and reduce to unmarked [–sg] (see (9)), and
voiced obstruents assimilate in [voice] to their adjacent voiceless consonant (see (11)).
In Persian, an alveolar nasal is assimilated to the oral consonant following it, but it
remains unchanged before laryngeals. Taken from Halle et al. (2000), (7) shows the
division of place nodes into three active articulators: [lips], [tongue blade], and [tongue
body]. In Persian /n/ becomes a labial [m] before the bilabials /m/, /b/, and /p/ and it
becomes a labio-dental [ɱ] before the labio-dentals /f/ and /v/. Both bilabials and labio-
dentals employ the lower lip as an active articulator but differ in the passive articulator.
Also, affricates become fricative before consonants sharing the same active articulator,
i.e. the tip or blade of the tongue. The sequence /ʤd/ becomes [ʒd], as in [maʒd] ‘glory’,
simply because /ʤ/ loses its [-cont] portion. In colloquial speech, the place of articulation
of the palatal stops /c, ɟ/ in the morpheme/word-final position accommodates to the velar
region when occurring before morpheme/word-initial velar or uvular consonants, [k, ɡ, ɢ,
Page 9 of 45

Phonology
χ]. For example, the word /ɹoch-ɟu/ is pronounced as [ɹok-ɡu] ‘frank’ (see Chapters 2, 3, 4,
6, 10, 11, and 15 for more on colloquial form). Such accommodation could easily occur as
a result (p. 118) of the palatal, velar and uvular sharing the same active articulator (i.e.
the body of the tongue) in Persian.
Terminal binary features are responsible for finer distinctions within coronal (abbreviated
as [cor]) and dorsal sounds (see Chapter 4 for more information about dorsal sounds).
Place assimilation of the non-continuants /t/, /d/, and /n/ to the place of the coronals /s/, /
z/, /ʃ/, /ʒ/, /ɹ/, and /l/ suggests the binary features [anterior] and [distributed], abbreviated
as [ant] and [dist]. [+ant] coronals are articulated at the alveolar ridge or further
forward, while [−ant] coronals are articulated behind the alveolar ridge. [+dist] coronals
have a longer contact area than [–dist] coronals (Hayes 2009). /t, d, ʧ, ʤ, ʃ, ʒ/ are [+dist].
Additionally, place assimilation of the palatal stops /c, ɟ/ to the following back vowels /u, o,
ɑ/, together with the three degrees of height contrast within the vowels, lead one to
suggest the binary features [back], [high], and [low]. In Persian, the palatal, velar and
uvular consonants should be considered as [–back, +high], [+back, +high], and [+back, –
high], respectively.
However, there is no evidence for the unity of manner features under one class node in
Persian. [continuant], abbreviated as [cont], distinguishes stops, affricates, and nasals, all
involving airflow interruption, i.e. [–cont], from other segments, i.e. [+cont]. Depending
upon their place of articulation, Persian stops have a tendency to convert to fricatives in
colloquial speech. For example, the words /vaɢt/ ‘time’ and /χaste/ ‘tired’ are pronounced
as [vaχ] ‘time’ and [χasse] ‘tired’, respectively. Affricates, as will be seen, should be
regarded as a sequence of [–cont] and [+cont]. The only Persian [lateral] is alveolar. The
frequent allophone of the liquid /ɹ/ is approximant- or spirant-like (Bijankhan 2014).
However, it becomes a tap [ɾ] in the intervocalic position.
Some scholars consider the glides and liquids as [+approximant] according to their
distribution in the syllable structure (Kenstowicz 1994; Gussenhoven and Jacobs 2011),
though no evidence for this conclusion exists in Persian. In some languages, like English,
sonorants occur only before obstruents in the coda clusters, whereas Persian sonorants,
like other segments, occur before and after obstruents in the coda clusters.
Tables 5.1 and 5.2 contain the feature specifications of twenty-three consonants and six
vowels in Persian, respectively. In the tables, phonemes are specified for features that are
responsible for phonological distinctiveness and natural classes. If a feature specification
does not help to differentiate one phoneme from another, no value for that feature is
specified. For example, vowels are not marked for [lab] and [cor], simply because vowels
are dorsal. Since the vowel distinctive features [back], [high], and [low] and their
combination cannot specify vowels in terms of the length, the feature [long] can be
specified for the classification of vowels into two natural classes, i.e. /i, u, ɑ/ as [+long]
and /e, o, a/ as [-long]. The value of [voice] is specified for all stops and affricates because
they form a natural class with their fricative counterparts. /ʧ/ and /ʤ/ are represented by
a sequence of the values of minus and plus for the feature [cont]. Membership of
Page 10 of 45

Phonology
laryngeals to the [+son] or [–son] class is controversial in various languages

(Gussenhoven and Jacobs 2011). Chomsky and Halle (1968) classify laryngeals as [+son].
Since laryngeals and /j/ form a natural class in Persian, specified as [–cons] (see section
5.8), they are unspecified for [son]. (p. 119)
Page 11 of 45

Phonology
Table 5.1 Feature chart of Persian 23 consonants. Unary features are marked by ✓
ph b th d ch ɟ G ʧh ʤ f v s z ʃ Ʒ χ m n l ɹ j h Ɂ
s – – – – – – – – – – – – – – – – + + + + +
o
n
c + + + + + + + + + + + + + + + + + + + + – – –
o
n
s
c – – – – – – – – – + + + + + + + – – + + +
o + +
nt
n + + – –
a
s
la + –
t
v – + – + – + + – + – + – + – + – + + + + +
oi
Page 12 of 45

Phonology
c
e
s + – + – + – – + – – + –
g
c +
g
la ✓ ✓ ✓ ✓ ✓
b
c ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
or
a + + – – + + – – + + +
nt
di + + – – + +
st
d ✓ ✓ ✓ ✓ ✓
or
s
b – – + –
a
Page 13 of 45

Phonology
c
k
hi + + +
g
h
lo + +
w
Page 14 of 45

Phonology
Page 15 of 45

Phonology
Table 5.2 Feature chart of Persian six vowels
i e a u o ɑ
high + – – + – –
low – – + – – +
back – – – + + +
Page 16 of 45

Phonology
5.5 Phonological rules

Phonological rules map the underlying form (UF) of morphemes onto either the
intermediate representation or the surface form (SF) of the morphemes. One of their
functions is to provide evidence for distinctive features and natural classes. In this
section, first a (p. 120) descriptive generalization characterizing regularities for the data
under review is provided for each phonological pattern. Then, to account for
generalizations, phonological rules are formulated. Steps of derivation are not, however,
shown for the data.
5.5.1 Deaspiration
Voiceless stops lose their aspiration after voiceless fricatives or before obstruents.
(9)
Voiceless stops appear aspirated in all positions except after voiceless fricatives and
before obstruents. They are optionally unreleased before nasals. Unaspirated stops
appear as voiced only in the voiced environment while they are either unvoiced or semi-
voiced unaspirated in all other positions. Thus, aspiration is more distinctive than voicing
in contrasting homorganic stops. The relevant feature in differentiating homorganic stops
should be [sg]. The phonological rule in (10) can be postulated when [sg] is distinctive.
(10)
Rule (10) implies that phonological distinction between aspirated and unaspirated stops
is neutralized in contexts in which deaspiration occurs.
5.5.2 Devoicing
Two kinds of devoicing are accounted for:
(i) Obstruents become voiceless in the word-final position and before or after a voiceless
consonant.
Page 17 of 45

Phonology
(11)
The rule in (12) formalizes the devoicing process for Persian obstruents. Stops and
affricates are also voiceless or devoiced in the word-initial position. (p. 121)
(12)
(ii) Sonorants lose their voice after aspirated stops and become spirant-like.
(13)
Rule (14) formalizes the devoicing process for Persian sonorants.
(14)
5.5.3 Degemination
Word-final geminates are reduced to singletons when in isolated form and before
consonants.
Page 18 of 45

Phonology
(15)
-i marks the adjectival or nominal suffix. Alternation between a final single consonant and
consonant doubling should be considered to postulate UF. Allomorphs in the root and
before consonant columns in (15) include a final single consonant but allomorphs before a
vowel-initial suffix include consonant doubling. Let us assume that the root represents
the UF. This might be a reasonable postulation, because a single consonant rather than
consonant doubling occurs in more phonological contexts. If so, then a rule that doubles
final consonants before a vowel generates forms included in the second column. However,
such final consonant doubling does not exist for a large number of Persian words.
Therefore, the root with a final geminate is the better candidate to represent UF.
Subsequently, a rule that reduces gemination would generate the surface root and the
third column forms in (15). Moreover, conversion of a consonantal doubling into a
singleton suggests that the structural position of a phoneme, represented non-linearly by
the X tier in (16), could be independent of the phoneme itself. In other words, X is a
phonological position that represents the quantity of the consonant, which is completely
separated from the featural content of the consonant (Kenstowicz 1994: 424). Delinking
of the association line represents degemination. The final consonant of the geminate is
not moraic and its omission does not trigger lengthening.
(16)
(p. 122)
5.5.4 Nasal place assimilation (NPA)
An anterior nasal assimilates in place to the following consonant.
NPA suggests that [place], unlike manner features, is a group feature. In Persian, an
anterior nasal assimilates to the following consonant in the place of articulation, no
Page 19 of 45

Phonology
matter what place it bears, but remains unchanged before laryngeals (17). This may
further suggest that laryngeals need not be specified for place or manner features.
(17)
NPA posits that the three oral active articulators, i.e. the lips, tip, or blade of the tongue
and the dorsum, act as if they are independent of each other. Considering feature
geometry, one unary feature for each active articulator is suggested: [lab] for bilabials
and labio-dentals, [cor] for dentalveolars, and [dors] for palatals, velars, and uvulars. All
three are dependent on the group feature [place] (18). Since the coronality of a nasal
consonant is affected by whatever place follows an oral consonant, [place] would be an
autosegment. Thus the spreading of [place] leftward to a preceding anterior nasal and the
delinking of [place] of a nasal explain the Persian NPA.
Page 20 of 45

Phonology
(18)
(p. 123)
5.5.5 Dorsal place assimilation (DPA)
The palatal stops assimilate in place to the following velars.
(19)
(20)
The velar phoneme in rule (20) can be either a back vowel or a velar stop.
Page 21 of 45

Phonology
5.6 Coronal assimilation

Apico-dentalveolar stops assimilate in place to the following coronal.
Place assimilation of the non-continuants /t/, /d/, and /n/ to the place of the coronals /s/, /
z/, /ʃ/, /ʒ/, /ɹ/, and /l/ suggests the binary features [cor] and [dist].
(21)
Elsewhere allophones of /t/ and /d/ in Persian are dental. [t̪]and [d̪] are [+dist] just like
interdentals in English, because they have extended constriction in comparison to /s, z, l,
ɹ/. /n/ assimilates in place to /t, d/, and becomes apico-dental, i.e. [+dist, +ant]. /t, d, n/
assimilate in place to /s, z, l, ɹ/, and become apico-alveolar, i.e. [–dist, +ant]. However,
rule application for /n/ is vacuous (Hayes 2009). /t, d, n/ also assimilate in place to /ʃ , ʒ/,
and become lamino-post-alveolar, i.e. [+dist, –ant]. ‘ ̠ ‘ stands for the place of lamino-post-
alveolars. Rules (22) and (23) formalize the process linearly and non-linearly, respectively.
(22) (p. 124)
(23)
5.5.7 Compensatory lengthening (CL)
Two kinds of CLs are accounted for:
(i) Deletion or shortening of glottal consonants in the coda is compensated for by

lengthening of the preceding vowel.
Page 22 of 45

Phonology
(24)
The colloquial data in (24) show that segmental duration could be independent of the
segments themselves. As the glottal consonant in the coda is deleted, the overall length
remains constant through the lengthening of the preceding vowel. This process will not
occur if a glottal consonant in the onset is deleted. Since the CV tier treats the onset and
coda as the same, it is not suitable for CL justification. Hayes (1989) proposed the mora
(μ) tier as an intermediate level between segments and syllables to distinguish the
asymmetrical relation between the onset and the coda in CL. Therefore, onset consonants
are attached directly to the syllable node, whereas according to the rule of weight-by-
position (Hayes 1989), a mora is assigned to each consonant in the coda position. Hayes
assigned moraic status to consonants, whose deletion triggers vowel lengthening. Vowels
are always moraic. Darzi (1991) explained Persian CL according to the moraic phonology
of Hayes (1989) and concluded that glottals in the coda are moraic. Shademan (2005)
shows experimentally that the deletion of the glottal consonant does not always result in
the lengthening of the vowel.
(ii) Lenition and then deletion of /v/ in the first consonantal coda position is compensated
for by lengthening the preceding vowel /o/ (25). (p. 125)
(25)
Page 23 of 45

Phonology
A question raised here is ‘How can variation between [ow] and [oː] be interpreted in
order to achieve a reasonable UF?’ If /oː/ underlies the variation, a rule is required to
shorten the vowel and insert [w]. But in the case of /ow/ as a diphthong, another rule is
required to replace the glide portion with [o]. However, lack of a CVCC in Persian cannot
be justified if we accept /o:/ or /ow/ as a nucleus (Haghshenas 1990). If [ow] is taken as a
sequence of a vowel and consonant, then [w] could be regarded as either a phoneme or
an allophone of some phoneme. [w] is not distinctive in Persian, because it has a very
restricted distribution. Since [w] does not appear initially and is conditioned by the
nucleus /o/, and since [v] does not occupy the position of the first consonant after /o/, [v]
and [w] are in complementary distribution. In that case, /v/ is underlying and [w] is an
allophone (Yarmohammadi 1995; Samareh 1999). Another convincing piece of evidence
for accepting /v/ as underlying is the pronunciation of [v] in the derivative or inflected
forms of data stated in (26), while formal pronunciation of their roots end in [w] (Samareh
2001; Kord-Zafaranlu-Kambuziya 2007).
(26)
To account for the data in (25), a lenition rule converts /v/ to [w] (27a), and then the
delinking of [w] is accompanied by the spreading of /o/ to the stranded mora (27b).
(27) a.
(27) b.
(p. 126)
Page 24 of 45

Phonology
Data in (24) and (25) show that the subset {[w], /Ɂ/, /h/} forms a moraic natural class,
because their deletion in the coda triggers the vowel lengthening.
5.5.8 Epenthesis
Two kinds of epenthesis are accounted for: glide and short vowel epenthesis.
(i) Glides insert to resolve hiatus in morphemic boundaries.
Phonetically, glides /j/ and [w] belong to the [–cons] class, while phonologically they
function as consonants because they occur between two vowels to avoid hiatus. From the
following colloquial speech data in (28), which consist of nouns suffixed with the plural
marker -hɑ, it is clear that when a noun ends in a consonant, /h/ is deleted and the final
consonant of the noun syllabifies with /ɑ/. However, [j] or [w] is inserted when a noun
ends in /i/ or /u/, respectively. [h] survives when a noun ends in other vowels.
(28)
In the framework of the CV phonology of Clements and Keyser (1983), glide insertion is
interpreted as the spreading of the features of /i/ or /u/ to the C-position, which leads to
epenthetic [j] or [w] (29). The phonetic content of glides is similar to the corresponding
vowels (Catford 2003).
Page 25 of 45

Phonology
(29) (p. 127)
(30)
From the formal data in (30), consisting of verbal roots prefixed with the progressive
morpheme mi- and suffixed with the agreement endings (agr), it is clear that [Ɂ] inserts
between the prefix and the verbal root, and [j] or [Ɂ] inserts between the verbal root and
the agreement endings to resolve hiatus. ‘=’ stands for enclitics.
(31)
(32)
Therefore, the data in (28) and (30) suggest that laryngeals and glides form a natural
class because they have a similar function in occupying the floating C-position to prevent
the occurrence of adjacent vowels.
(ii) Short vowels are inserted to break clusters between the stem and suffix.
Page 26 of 45

Phonology
(33)
There are lexical exceptions that are not subject to the above rule. /e/ is the most
frequent short vowel in epenthesis. No rule is formalized because the inserted short
vowel cannot be predicted phonologically.
5.5.9 Deletion
Post-fricative and post-nasal anterior stops are deleted at the end of a word in colloquial
speech. (p. 128)
(34)
Words that are not frequent in colloquial speech do not undergo such deletion. Rule (35)
formalizes the process.
(35)
Page 27 of 45

Phonology
5.5.10 Spirantization
Oral stops spirantize as a function of their place of articulation.
(36)
In colloquial speech, Persian oral stops have a tendency to convert to fricatives while
retaining their place of articulation. Dentalveolar stops tend to spirantize after alveolar
fricatives (37), affricates spirantize before dentalveolar stops (38), and uvular stops tend
to spirantize intervocalically and before dentalveolars (39).
(37) (p. 129)
(38)
(39)
Page 28 of 45

Phonology
Additionally, there are some free variations differing in /p/ vs. /f/. For example, /pɑɹsi/ vs. /
fɑɹsi/, ‘Persian’, and /sephid/ vs. /sefid/, ‘white’, among others. Therefore, a binary
distinction within obstruents between stops and fricatives results in the feature of
continuancy.
There are disagreements among scholars about how to treat Persian uvulars in terms of
the stop-fricative dimension, i.e. whether /ɢ/ is distinctive, or /ɣ/ or /ʁ/. There is no doubt
that word-initial uvulars are realized as stops. Given the data in (36), if either /ɣ/ or /ʁ /
were distinctive, then an implausible rule would be needed to convert either one to the
stop [ɢ] in the word-initial position. By accepting /ɢ/ as distinctive, spirantization in
postvocalic and intervocalic contexts would be cross-linguistically reasonable (Kenstowicz
1994).
A convincing argument as to why affricates should be regarded as a sequence of [–cont]

and [+cont] can be provided by the lenition of affricates in the postvocalic position before
coronals, resulting in the loss of the [–cont] portion (see Kenstowicz 1994: 32 for English),
which means that the contrast between homorganic affricates and fricatives is
neutralized.
It is reasonable to take the formal forms of the data in (36) as the UF for colloquial forms.
5.5.11 Vowel harmony (VH)
Two kinds of VH are accounted for: within-morpheme and between-morpheme (Modarresi

Ghavami 2011). This dichotomy is explained here in terms of vowel length if necessary
(see Chapters 13 and 15 for more on vowel harmony).
5.5.11.1 Within-morpheme VH
Vowels in open syllables harmonize with following vowels under specified conditions.
(i) Short vowels in open syllables harmonize with following low back vowels when
the intervening consonant is glottal.
(p. 130)
Page 29 of 45

Phonology
(40)
The data in (40) provide evidence for back and height harmony in a large number of
Persian words.
The mora tier is used to represent short vowels as monomoraic, i.e. linked to one µ slot,
and to represent /ɑ/ as bimoraic, i.e. linked to two µ slots. Then VH can be characterized
by the spreading of the place node leftward to the harmonic vowel and the delinking of
the original place of the vowel. Since glottals are unspecified for the place of articulation,
they are transparent to the spreading (41).
(41)
(ii) Mid vowels in open syllables are raised to the following corresponding high
vowels.
Page 30 of 45

Phonology
(42)
The data in (42) provide evidence for height harmony in a large number of Persian words.
(43)
5.5.11.2 Between-morpheme VH
Front vowels in the open syllable of a functional morpheme either convert to the
following high vowels or are raised in height by one degree. (p. 131)
(44)
The data in (44) include the UF of imperative/subjunctive verbs with their surface
representation of colloquial pronunciation, and the UF of negative imperfective verbs
with their surface representation of formal pronunciation.
The rules in (45) and (46) formalize the /e/ agreement of imperative/subjunctive
morphemes in height and backness with the vowels /i/, /o/ or /u/. The rule in (47)
formalizes the /a/ partial agreement of negative morphemes by one degree vowel (/i/) of
the imperfective.
Page 31 of 45

Phonology
(45)
(46)
(47)
5.6 Rule interaction

Rules apply in order to derive the allophones from UF in their specified context. Rule
interaction results from the relationship that exists between the structural descriptions of
any two rules and the phonotactics of the data under analysis. There are many within-
and intermorphemic consonant sequences that involve ordered rules of derivation for the
desired SF. Typical examples are given in (48). (p. 132)
(48)
Regardless of the cycles, the ordering relation among rules is given in (49).
(49)
Page 32 of 45

Phonology
NPA both precedes and follows word-final anterior stop deletion. To solve this problem,
ordering relation should be defined in two separate cycles: in the first cycle on the word
level and in the second cycle on the compound (or phrasal) level. NPA and deletion are in
a counterbleeding relation. DPA feeds NPA.
(50)
The ordering relation among rules for the data in (50) is given in (51).
(51)
The ordering of spirantization and deaspiration can be optional if affricates are taken as a
sequence of [–cont] and [+cont], because in such a case, the [+cont] portion of the
affricate, i.e. the unvoiced fricative [ʃ], appears before the aspirated stop. Therefore, the
sequence satisfies the structural description of the deaspiration.
5.7 Prosodic structure

In this section, prosodic features of length and stress are discussed. There are word pairs
in Persian whose opposition comes from a geminate consonant (Tashdid, in Arabic and
Persian) in one word and singleton in the other. Such cases are realized phonetically with
quantity difference between long versus short consonant, as illustrated below.
(52) (p. 133)
As an example, in opposition between ban.nɑ and ba.nɑ, /n/ of / ban/ contrasts with zero
of /baØ/. Thus a phonetic contrast between [nː] and [n] reveals. There are also some
Page 33 of 45

Phonology
Arabic loan-words containing final consonantal geminate such as /sadd/ ‘dam’ vs. /sad/
‘hundred’ and /chamm/ ‘quantity’ vs. /cham/ ‘little’
A number of phonological patterns provide evidence for long and short vowels as natural
classes. For example, the first consonant of clusters in CVCC syllables is strictly
constrained by the length of the nucleus. While any consonant is allowed to be the first
member of any cluster when the nucleus is a short vowel, only a small subset of clusters,
mostly consisting of oral unvoiced fricatives and /t/, can occur after long vowels. From
this respect, words like /dasth/ ‘hand’, /cheɹm/ ‘worm’, /sobh/ ‘morning’ are comparable
with /mɑsth/ ‘yogurt’, /ɟuʃth/, ‘meat’ and /ɹiχth/ ‘poured’ (Samareh 1999). Consonantal
clusters within the words and in the loanwords are mostly broken by short rather than
long vowels (Lazard 1992). As was seen, short vowels agree in place and height with long
vowels, and not vice versa. Vowel length is also a major determinant of the syllable
weight. Light and heavy syllables are monomoraic and bimoraic, respectively. Based on
Persian quantitative metrics, Hayes (1989) classifies Persian syllables into four types:
light (CV), heavy (CVV, CVC), superheavy (CVVC, CVCC) and ultraheavy (CVVCC) (here, V
stands for short vowel and VV for long vowel). Based on compensatory lengthening
process in colloquial speech, Darzi (1991) reduces Hayes’ classification into light (CV),
heavy (CVVC, CVCC, CVVCC), and superheavy (CVVC).
There is a consensus among scholars that pitch is the main phonetic correlate of stress in
Persian (Abolhasanizadeh et al. 2012). Traditional studies distinguish between verbal and
non-verbal stress placement rules: while verbal stress changes as a function of tense, the
last syllable of all non-verbal content words bears stress (Ferguson 1957; Lazard 1992).
This is then followed by a unified account of Persian stress, independent of lexical
categories (Eslami 2005). Kahnemuyipoor (2003) theorized the unified account by arguing
that the same stress rule applies to different syntactic categories at a certain level of the
prosodic hierarchy. According to his analysis, Persian stress is assigned rightmost at the
phonological word level, leftmost at the phonological phrase level, rightmost at the
intonational phrase level and leftmost at the utterance level.
5.8 Optimality theoretic analysis

While rule-based phonology takes UF as an input and applies some operations, such as
phonological rules, to change it into a SF as an output, optimality theory (OT), as a
constraint-based phonology, usually takes UF as an input to generate an output candidate
set from which one candidate will be selected as the optimal SF. Selection of an output
candidate is done by considering members of the candidate set against a hierarchy of
violable constraints. The candidate that is most favoured by ranked violable constraints,
hence the most harmonic among candidates, would be considered as an optimal SF. The
most harmonic (p. 134) candidate is the one that performs best on the highest-ranking
constraint (Prince and Smolensky 1993).
Page 34 of 45

Phonology
OT distinguishes two types of constraints: markedness constraints, which prohibit an

output from having some property; and faithfulness constraints, which prohibit
differences between the input and output (McCarthy 2002, 2008).
5.8.1 Glottal reduction: a conspiracy
To start an OT analysis, some data on deaspiration and devoicing, i.e. dataset (9), are
repeated and new data are added.
(53)
For each UF, two output candidates are given in the SF column. The starred candidate is
the nearest competitor for SF. Returning to the rules in (10), (12), and (14), laryngeal
distinctions are neutralized to plain voiceless in two ways: aspiration distinction is
neutralized to plain voiceless after a voiceless fricative or before an obstruent, and
voicing in obstruents is neutralized to plain voiceless after or before obstruents. The
other way to describe the rules is to state those constraints that prohibit aspirated and
voiced stops in the SF when they occur in the related contexts above. In OT terminology
(McCarthy 2008), aspirated stops are not allowed after the voiceless fricatives or before
the obstruents. This requirement is enforced by reduction in glottal spreading. In
addition, voiced obstruents are not allowed after or before obstruents. This requirement
is enforced by reduction in vocal fold vibration.
Seven violable constraints are responsible for laryngeal contrast and neutralization:
Ident(lar), *Lar, *[–sg], *Lar/Lar, *[voice]#, *#[+voice]
Ident(lar): Every laryngeal autosegment in the input does not change in the
corresponding output segment (Lombardi 2001; McCarthy 2008).
Page 35 of 45

Phonology
Ident(lar) is a featural faithfulness constraint that prohibits changing the values of the
laryngeal features in the output. *Lar is a context-free markedness constraint that
prohibits a consonant bearing marked laryngeal features, i.e. [+sg] and [+voice].
Lombardi (2001) uses *Lar just for the marked [+voice] feature. Unlike *Lar, *[–sg]
prohibits an (p. 135) unmarked laryngeal feature. *Lar/Lar is an intersegmental, context-
sensitive markedness constraint that prohibits a stop bearing marked laryngeal features
in the obstruent sequence. This constraint is justified by Browman and Goldstein (1986)
in two organizational principles governing glottal opening-and-closing gestures occurring
in the word-initial onsets of Germanic languages: (1) that glottal peak opening is
synchronized to the midpoint of any fricative gestures and otherwise to the release of any
closure gestures and (2) there is at most a single glottal gesture word-initially. The same
reasoning can hold for sequences in word-final and medial positions in Persian.
*[+voice]# and *#[+voice] are intrasegmental markedness constraints that prohibit a
word-initial and word-final obstruent from being voiced.
To construct an OT analysis, two kinds of inputs are taken into account: /chaɹ/ and /ɟaɹ/.
Since Persian preserves aspiration in the word-initial position and in sonorant
environments, a faithful mapping results. Ident(lar) dominates *Lar and *[–sg]. A priority
relationship among constraints is explained by means of constraint conflict, i.e. one
constraint acts counter to another one in favouring the two competing output candidates.
Ident (lar) dominates *Lar because it favours the winner [chaɹ], while *Lar favours the
loser [caɹ] (see Tableau 5.1). Ident(lar) also dominates *[–sg] because it favours the
winner [caɹ], while *[–sg] favours the loser [chaɹ] (see Tableau 5.2).
Tableau 5.1
Tableau 5.2
Page 36 of 45

Phonology
Tableau 5.3 illustrates an unfaithful mapping in which the winner [caɹ] violates Ident(lar)
while the loser [ɟaɹ] disobeys the markedness constraint *#[+voice] having higher
priority.
Tableau 5.3
Since the rule of deaspiration is triggered by an output constraint that forbids stops in
the obstruent sequences to have the features [+sg] or [+voice], Ident(lar) is violated and
dominated by *Lar/Lar (see Tableau 5.4). *Lar/Lar favours [Ɂascaɹ] while Ident(lar)
favours [Ɂaschaɹ]. (p. 136)
Tableau 5.4
By the same reasoning, the output constraints *[+voice] have higher priority than Identl
(lar), and together with *Lar/Lar are in partially ordered relation with each other, as
shown in Tableaux 5.5 and 5.6. (54) represents the laryngeal constraints hierarchy for
Persian obstruents.
Page 37 of 45

Phonology
Tableau 5.5
Tableau 5.6
(54)
One could posit the generalization that the deaspiration and devoicing rules join a
conspiracy according to which they support reduction in the laryngeal magnitude of the
SF, whether in glottal spreading or vocal cord vibration.
5.8.2 Syllable structure
According to Persian phonological structure, syllable onset must contain one consonant.
Thus ONSET that forbids initial vowel syllables is undominated. Accordingly, when a
phoneme string starts with a vowel, two methods may be chosen for syllabification to deal
with ONSET: whether the initial vowel is deleted or a consonant is inserted at the
beginning of the syllable. Persian chooses the second and inserts an initial glottal stop.
Thus DEP, as a faithfulness constraint, interacts with ONSET and results in a preference
for the universal unmarked CV, as illustrated in Tableau 5.7.
ONSET: No syllable is allowed to start with a vowel.
DEP: Every segment of the output has a correspondent in the input. (p. 137)
Tableau 5.7
Page 38 of 45

Phonology
Since Persian prefers initial consonant insertion to vowel-initial deletion, the faithfulness
constraint MAX, which penalizes deletion, should dominate DEP (see Tableau 5.8).
MAX: Every segment or autosegment of the input has a correspondent in the

output.
Tableau 5.8
Persian resolves hiatus, i.e. a sequence of two vowels, by either an epenthetic consonant
between two vowels or the deletion of one of the vowels. Therefore, ONSET should
dominate both MAX and DEP. For example, formal and colloquial pronunciations of the
verb /be+ɡu+am/ ‘say’ are [beɡujam] and [beɟam], respectively. Tableau 5.9 illustrates
interaction of ONSET, MAX, and DEP.
Tableau 5.9
Page 39 of 45

Phonology
Since Persian assigns intervocalic consonants to the coda rather than the onset,
*Complex-Onset has higher priority than *Complex-Coda. Tableau 5.10 illustrates the
syllabification of the VCCCV string.
*COMPLEX-ONSET: A consonant cluster is not allowed in the onset.
*COMPLEX-CODA: No coda is allowed to have more than one consonant.
Tableau 5.10
(p. 138)
Since the whole CVCC string is parsed into one syllable, i.e. .CVCC., NO-CODA is
dominated by ONSET, MAX, and DEP, as shown in Tableau 5.11. The number of violations
of NO-CODA is equal to the number of consonants in the coda position.
NO-CODA: No syllable-final consonant is allowed.
Tableau 5.11
Tableaux 5.12 and 5.13 illustrate that loanwords with CCVC and CVCCC patterns can be
syllabified as CV.CVC and CVC.CVC, respectively. For example, Persian speakers syllabify
the English words /class/ and /lustɹe/ as /ce.lɑs/ and /lus.teɹ/.
Page 40 of 45

Phonology
Tableau 5.12
Tableau 5.13
Finally, the constraint hierarchies for Persian syllable structure are as follows:
5.8.3 Intersegmental constraint interactions
We saw in section 5.6 that segmental features of voicing, aspiration, place, and manner
interact with each other. In this section, rule interaction is interpreted in line with the OT
approach. The UF and SF of two typical words are reiterated in (55). (p. 139)
Page 41 of 45

Phonology
(55)
In colloquial speech, an anterior stop, i.e. /t/, /d/, is deleted when preceded by a fricative
or an anterior nasal, and followed by a consonant. In OT terminology, the triconsonantal
sequence [Ct/dC] is not allowed. This requirement is enforced by [t/d] deletion. Since [t/d]
deletion is triggered by a need to simplify the triconsonantal sequence Ct/dC, MAX should
be dominated by a context-sensitive markedness constraint, or *Ct/dC, as illustrated in
Tableau 5.14. Coetzee (2004) proposed similar constraints for [t/d] deletion in English
dialects.
*Ct/dC: A word-final anterior stop is not allowed if it is followed by a consonant and

preceded by a fricative or anterior nasal.
Tableau 5.14
Unfaithful mapping of /ʧhand-chɑɹe/ →[ʧhaŋkhɑɾe],*[ʧhaɲchɑɾe] in (53) denotes that a high

dorsal stop is not allowed to be different from the following vowel in the value of [back].
This requirement is enforced by [back] spreading from the vowel to the stop. Thus
AGRRE (Back) requiring dorsal agreement dominates faithfulness to the featural
autosegment, as illustrated in Tableau 5.15.
AGREE (Back): High dorsal stops must have the same value of [back] as the
following vowel.
IDENT: No feature in the input changes in the corresponding output segment.
Tableau 5.15
Page 42 of 45

Phonology
Additionally, the same unfaithful mapping of /ʧhand-chɑɹe/ →[ʧhaŋkhɑɾe], *[ʧhankhɑɾe] in

(55) denotes that the nasal is not allowed to be different in place from a subsequent
consonant. This requirement is enforced by [Place] spreading from the consonant to the
nasal. Thus AGREE (Place) requiring place agreement dominates faithfulness to the
featural autosegment, as illustrated in Tableau 5.16.
AGREE (Place): Non-labial nasals must have the same place of articulation as a
subsequent consonant.
Tableau 5.16
(p. 140)
Given *[ʧhakhɑɾe] as the candidate having two deletions in relation to the input, Tableau
5.17 illustrates the priority of MAX over IDENT.
Tableau 5.17
A summary ranking of constraints is shown in (56), as a hierarchical Hasse diagram.
Page 43 of 45

Phonology
(56)
Another unfaithful mapping in (55), ɁeʤthemɑɁ → ɁeʃtemɑɁ, *ɁeʧthemɑɁ, indicates that

Persian reacts to the triautosegmental sequence of [–cont][+cont][ –cont], since the first
two autosegments belong to an affricate and all three are attached to the same place
node. Thus a sequence of a homorganic affricate and a stop is not allowed. This
requirement is enforced by the spirantization of the affricate, i.e. deletion of the first [–
cont]. Tableau 5.18 illustrates a ranking argument for the priority of *Affricate/Stop over
MAX.
*Affricate/Stop: A sequence of a homorganic affricate and a stop is not allowed.
Tableau 5.18
5.9 Concluding remarks

The Persian phoneme inventory includes 23 consonants and 6 vowels. The typical
template of the Persian syllable is CV(C) (C). A glottal stop is inserted at the beginning of
a vowel-initial word. Given a consonant cluster in disagreement with SSP in Persian, one
will find a corresponding cluster in agreement with SSP. Testing the phonological
processes against the feature geometry of Halle et al. (2000) led to the positing of 14
binary and 3 unary features. The laryngeal features responsible for the contrast in the
stops and fricatives are different: [sg] (p. 141) for stops and [voice] for fricatives. Voicing
is more robust in fricatives than in stops. [Voice] is the laryngeal feature that partitions
the phonological space into voiced and voiceless obstruents. The sound resulting from the
Page 44 of 45

Phonology
deaspiration of the voiceless stops is perceived as a corresponding voiced stop. [Tense]

could also play the same role as [voice]. No consistent agreement is reported on the
phonemic status of the dorsal obstruents. The production and perception experiments
could resolve such an inconsistency. Segmental, autosegmental, and moraic phonologies
were studied by means of 11 phonological rules accompanied by many typical examples.
Rule interaction posited a hierarchy for a subset of cross-linguistic rules that should be
completed as part of the Persian phonology. Furthermore, phonological analysis of the
laryngeal features for stops in the framework of optimality theory resulted in the rules for
joining a conspiracy that support reduction in the laryngeal magnitude of the SF, whether
in glottal spreading or vocal fold vibration. Finally, an OT approach to derivational
interactive rules led to a hierarchical Hasse diagram of some other posited violable
constraints. Interpretation of the constraint interaction in the framework of the harmonic
serialism could lead to new findings.
Acknowledgements
I thank Mr Justin Cancelliere and Dr Parvaneh Shayestehfar for editing this chapter. I
also wish to sincerely thank two anonymous reviewers for helpful comments.
Mahmood Bijankhan
Mahmood Bijankhan is a Professor of Linguistics in the department of General

Linguistics at the University of Tehran. He received his BS degree in Mathematics
from the University of Texas at Arlington and his MA and PhD in Linguistics from the
University of Tehran. His research interests lie in the area of phonetics, phonology,
and corpus linguistics. In recent years, he has focused on Persian proficiency test for
non-Persian speakers.
Page 45 of 45

Prosody

Prosody
Arsalan Kahnemuyipour

This chapter provides an overview of the prosody of the Persian language. The chapter
starts with a discussion of word stress, which is known to be word-final in all categories
other than the verb. It is shown that stress in the verbal domain is no exception to the
word-final stress but a reflex of stress at the phrasal level in this domain. In the process,
the chapter also explores a handful of words with exceptional non-final stress and
provides a brief overview of the interaction between stress and information structure in
Persian. The chapter ends with a discussion of the phonetics of Persian prosody, focusing
on the phonetic correlates of prosodic prominence in the language and a brief overview of
intonation in Persian.
Keywords: stress, Persian, prominence, intonation, focus
6.1 Introduction
THIS chapter provides an overview of the prosody of the Persian language.1 The chapter
starts with a discussion of word stress and builds on that to cover stress at the phrasal
and clausal levels. In the process, we will briefly consider some accounts of Persian
prosody at the various levels and its interaction with information structure. In the end, we
will briefly consider the phonetic realization of prosodic prominence and intonation in
Persian. The chapter is organized as follows. Section 6.2 deals with final stress at the
word level and the divergence from this pattern. Section 6.3 provides an account for the
non-final stress in the verbal domain. In section 6.4, we look at the phonetic correlates of
Page 1 of 22

Prosody
prosodic prominence in Persian and provide a brief overview of Persian intonation.

Section 6.5 concludes the chapter.2
6.2 Word stress

This section looks at stress at the level of the word in Persian with the aim of establishing
a general rule which can account for the various stress patterns. While a general
tendency for word-final stress had been noted by many linguists starting with Chodzko
(1852), the superficial diversity found in Persian stress patterns had led many scholars to
suggest various splits, in particular based on lexical categories. Looking at the examples
in (1), it is easy to detect the (p. 143) divergence from word-final stress. We hope to be
able to make more sense of this divergence by the end of this chapter.3
(1)
The first thorough discussion of Persian stress can be found in Chodzko (1852). He
identifies word-final stress as the basic stress rule in Persian and attributes this pattern
to simple, derived, and compound nouns and adjectives, as well nominal verbs (a type of
infinitive).4 For verbal stress, he suggests different rules for different tenses. Ferguson
(1957) makes a distinction between verbal stress and the other categories. ‘It is certainly
safe to say that in modern Persian the verb has recessive stress. This is in sharp contrast
with the noun, where the stress tends to be near the end of the word’ (Ferguson 1957:
26–7). In a similar fashion, Lazard (1992) draws a line between non-verbal words and
verbs, taking the former to have word-final stress and the latter ‘recessive stress’.
Mahootian (1997) states that stress is word-final in simple nouns, derived nouns,
compound nouns, simple adjectives, derived adjectives, infinitives, and the comparative
and superlative forms of adjectives as well as in nouns with plural suffixes, and
underlines verbal stress as one of the exceptions to this rule. Finally, the clearest divide
between verbal and non-verbal stress in Persian comes in the work of Amini (1997),
where an End Rule Right is proposed for all categories including non-prefixed verbs and
an End Rule Left for prefixed verbs. While these accounts rely on a split between verbs
and other categories to account for the stress patterns found in Persian, they also reveal
that even such a split fails to capture the discrepancies observed in Persian, exemplified
in (1). Just to illustrate the point, take the examples in (1f) and (1h). While both are verbs,
stress is final in (1f) but initial in (1h). This issue is irrespective of whether one considers
Page 2 of 22

Prosody
using categorial splits to account for the diversity in stress patterns to be a desirable
move from a theoretical perspective.
In order to decipher the rule governing the stress system of Persian, we need to look at
the stress patterns exemplified in (1) more closely. Let us start with simple non-affixed
words of varying lengths and categories, exemplified in (2).
(2)
(p. 144)
The generalization is very clear here: stress falls on the last syllable of the word. This
generalization also covers (1a, b, f, i). The example in (1c) also shows word-final stress
but it is different from the ones in (2) in that it contains a derivational affix. When we look
at more examples of derived words, we note that all derivational suffixes take stress (see
also Ferguson 1957). As a result, word stress falls on the last syllable in derived words in
Persian. With derivational prefixes too, the stress falls on the last syllable of the whole
word. Some examples involving derivational affixes are given in (3). The examples in (3e)
and (3f) show the pattern very clearly. In (3e), with a derivational prefix bi ‘without’,
stress is on the last syllable of the whole word, hence on the root. Once the derivational
nominalizing suffix -i is added to the same form, stress falls on this suffix as the last
syllable in the whole word.
(3)
Page 3 of 22

Prosody
It is worth noting here that the plural marker and the comparative/superlative markers in
Persian are also part of the stressed word with the stress falling on these suffixes, as
shown in (4). While these affixes are typically treated as inflectional across languages,
Kahnemuyipour (2000b, 2004) argues that they behave like derivational affixes in
Persian, making their stress behaviour unsurprising.
(4)
We can therefore maintain the generalization so far that stress is word-final in Persian, as
long as we take the word to include derivational affixes. Kahnemuyipour (2003)
formulates this generalization in the framework of Phrasal (or Prosodic) Phonology
(Selkirk 1980a,b, 1981, 1984, 1986; Nespor and Vogel 1982, 1986; among others). In this
framework, various prosodic domains such as the phonological word, the phonological
phrase and the intonational phrase are derived from morphosyntactic constituents. The
domain relevant to our discussion here is the phonological word, which is typically
defined as the domain for word stress, phonotactics, and segmental word-level rules. For
Persian, stress falls on the (p. 145) last syllable in the phonological word, which contains
the root and all derivational affixes (including the plural and comparative/superlative
markers).5
Unlike derivational affixes, inflectional ones are not part of the domain of word stress in
Persian (see also Ferguson 1957). In (1d), for example, the first person singular
possessive marker -am does not carry stress, with stress falling on the root sag ‘dog’.6
Similarly, stress falls on the last syllable of the preterite verb in (1f) and when the
inflectional suffix -i marking second person singular agreement is added as in (1g), stress
remains on the stem. More examples of words involving inflectional suffixes are given in
(5). In (5a–c), we see several examples of nominal inflectional endings, in addition to the
possessive marker we have seen so far. In (5d), we see another example of the verbal
agreement markers not receiving stress.7
Page 4 of 22

Prosody
(5) 8
The different behaviour of derivational and inflectional affixes in Persian is not surprising.
In many languages, affixes behave differently with respect to whether they are part of the
phonological word or not (see, for example, Hall and Kleinhenz 1999). Dixon (1977a,b)
(p. 146) refers to this distinction using the terms ‘cohering’ and ‘non-cohering’. Using this
terminology, derivational affixes in Persian are ‘cohering’, while inflectional ones are
‘non-cohering’. The division in Persian seems to be particularly well behaved, given the
plausibility of taking suffixes involved in derivation (i.e. a lexical process) to be part of the
phonological word and inflectional suffixes that are often considered to have syntactic
status to be outside the phonological word. It is worth noting that cohering suffixes are
ordered before non-cohering ones, leading to the schema in (6), where ω marks the
phonological word boundary, the domain of word stress assignment. In (7), we see
examples of words involving both a cohering and a non-cohering suffix.
(6)
Page 5 of 22

Prosody
(7)
Let us take stock of what we have covered so far. We started with the examples in (1),
repeated below as an illustration of the range of stress patterns we can find in the
Persian word. We then noted that the word-final stress rule can capture word stress in
non-affixed words, exemplified in (1a), (1b), (1f), and (1i). With the additional distinction
between cohering and non-cohering affixes and the proposal that the word-final stress
rule applies to the phonological word which includes cohering suffixes, but excludes non-
cohering ones, we managed to capture the stress pattern exemplified in (1c), (1d), and
(1g). We are now left with three words in our list, namely (1e), (1h), and (1j). The
following section is devoted to examples like (1h), where stress is on a prefix in a verb. It
is precisely this type of example which had led most Persian linguists to suggest a
categorial division with respect to word stress, as discussed above. Meanwhile, before
turning to that case, I would like to take on examples (1e) and (1j) which exhibit stress on
the first syllable in clear contradiction to the more general word-final stress in Persian.9
(1)
(p. 147)
The examples in (1e) and (1j) belong to a list of perhaps a handful of words with
exceptional non-final stress. To be more precise, these are all two-syllable words with
stress on the first syllable. While these exceptional cases have been noted in the
literature (see, for example, Amini 1997; Hosseini 2014; Lazard 1992; Mahootian 1997),
often a selected list is provided with no major discussion of possible generalizations,
making further exploration of these cases worthwhile here. Ferguson (1957) has the most
detailed discussion, where he has also attempted to provide an exhaustive list, but he is
still missing a few words. Here, I will try to add those missing words and provide a
complete list. Meanwhile, I am leaving out those words on Ferguson’s list which appear
to be archaic or outdated. I am hoping, therefore, to provide an exhaustive list of words
Page 6 of 22

Prosody
with non-final stress in contemporary Persian and discuss their distribution. In (8), we
find the full list of these words with stress on the first syllable. The cases that are from
Ferguson are marked with F in brackets.10,11
(8)
1213
(p. 148) Many of the examples in (8) defy any generalization and may simply need to be
accepted as lexical idiosyncrasies. Meanwhile, one might be able to provide some partial
explanations or extract some general tendencies from the above facts. To begin with, a
few of the words in (8) may consist of a root plus a non-cohering affix, which makes their
initial stress unsurprising (see also Hosseini 2014). (8u) is a clear example of this kind
with vaqt being a free root meaning ‘time’ and -i the indefinite marker, a non-cohering
affix. One may be able to place (8j) and (8n) in the same category, while noting that xeil
meaning ‘group’ has a marginal use in contemporary Persian and ba’z or barx are not
used as free roots in Persian and their status as bound roots in this context are at best
questionable. (For more information on the indefinite marker, refer to Chapters 2, 3, 7, 8,
and 9.) Another example which can be easily broken down into a root plus a non-cohering
suffix is given in (8o), where the Persian word cherā ‘why’ can be split into che ‘what’ and
the non-cohering accusative marker -rā. While the synchronic status of this division may
be questionable, this is certainly the historical reason for the stress pattern of this word.
Page 7 of 22

Prosody
Similarly, one might classify zirā ‘because’ in (8p) with cherā even though zi does not have
a root status in contemporary Persian (see Hosseini 2014 for a possible historical
explanation).14
A number of other examples in (8) appear to consist of two (phonological) words, in which
case the stress should not really be seen as non-final stress in a single word but main
stress on the first word in a two-word phrase. This is particularly plausible in the context
of a leftmost phrasal stress rule (see Hosseini 2014; Kahnemuyipour 2003). The words in
(8f), for example, can be broken down into ham ‘also’ and in/ān ‘this/that’. Similarly,
chonke ‘because’ in (8r) can be split into chon ‘because’ (see footnote 13) and ke ‘that’.
The examples in (8h) may also be broken down to kāsh ‘I wish’ and the complementizer ke
‘that’, with ki perhaps seen as a variant of ke (Hosseini 2014). Finally, harchand ‘even
though’ in (8t) may debatably be taken to consist of har ‘every’ and chand ‘few’.
Several of the examples in (8) are words that can be used as single-word utterances
raising the question of whether their initial stress may be related to their status as an
utterance.15 The following fall under this category: (8a), (8b), (8d), (8i), (8o), (8q), (8s).
The example in (8d) is particularly interesting, because of the two modals shāiad ‘may,
perhaps’ and bāiad ‘must’, only the former can be used as a single-word utterance and
only that one has initial stress. While this pattern is interesting especially because it
applies to quite a few forms, it can by no means be generalized to all one-word
utterances. The absence of generality in this regard can best be seen in the various forms
for ‘thank you’ shown in (8q). Of those forms, only one, albeit the most common one mérsi
has initial stress. Similarly, not all the forms for ‘yes’ in Persian show initial stress. In fact,
as can be seen in (8a), the most common form in colloquial Persian āré exhibits word-final
stress. (p. 149)
Finally, there is perhaps one category which shows initial stress without exception,
namely clausal conjunctions.16 This category has the most examples in (8) and presents
the strongest generalization in the sense that Persian clausal conjunctions (almost) never
show final stress.17 The following examples from (8) fall into this category: all the forms
for ‘but’ in (8c) as well as (8g), (8k), (8l), (8p), (8r), (8t), and (8u).18
We started this section by looking at a number of examples of Persian words in (1) which
at first sight did not appear to follow a systematic rule. With closer inspection, we noted
that a general word-final stress rule can account for the stress pattern of most of those
examples, as long as the word is seen as the Phonological Word which consists of the root
and all cohering affixes that are attached to the root. Crucially, non-cohering affixes (i.e.
inflectional affixes) fall outside the domain of the Phonological Word and as a result
words containing them will exhibit non-final stress. We then looked at a number of
exceptional cases in Persian which do not seem to follow the general word-final stress
rule. We noted that these 20–30 words seem to defy any comprehensive generalization to
account for their unusual stress pattern. In the meantime, we underlined some general
tendencies in the data which may pave the way for a better understanding of these
examples in future. Of the words we started with in (1), we are left with only one example
Page 8 of 22

Prosody
to account for, namely mí-xor-e ‘s/he eats’. In section 6.3, we will see that this example is
part of a much more general pattern of non-final stress in the verbal domain. I will argue
that this stress pattern can be accounted for if we consider stress at the level of the verb
phrase.
6.3. Non-final stress in the verbal domain

We ended the previous section with the only remaining seemingly exceptional example
from (1), namely mí-xor-e ‘s/he eats’. If we take this example to constitute a single
Phonological Word, one should expect to have main stress on the second syllable, * mi-
xór-e, with the stress rule skipping the agreement maker -e, a non-cohering suffix, and
falling on the stem. This pattern is not observed. In fact, with all verbal forms involving
the durative marker mi-, stress falls on the durative marker. This is also true of the
subjunctive marker be- and the negative marker na-/ne-, as shown in (9).
(9)
(p. 150)
At first glance, the stress pattern in (9) seems surprising, as stress seems to have shifted
to the left for no obvious reason. Meanwhile, things start looking more systematic once
we add more material within the verb phrase to the left of the verb. In (10a), we have
added a non-specific object, and main stress falls on the object. In (10b), a measure
adverb is added, and main stress falls on the measure adverb.
(10)
Page 9 of 22

Prosody
Kahnemuyipour (2009) argues that the correct generalization capturing the facts in (10)
is that main stress falls on the leftmost element (or Phonological Word) in the verb phrase
(see also Same’i 1996). 19 Manner and measure adverbs are argued to mark the left edge
of the verb phrase (see Holmberg 1986; Webelhuth 1992; among others), thus explaining
the main stress on the measure adverb in (10b).20
It is worth noting that the same pattern obtains with the very productive Persian
construction known as complex verbs (see, for example, Dabir-Moghaddam 1997; Karimi
1997; Megerdoomian 2001; Vahedi-Langarudi 1996). Some examples are shown in (11).
(11c) shows that when a manner adverb is added, main stress falls on the manner adverb.
For more discussion on complex verbs, see Chapters 2, 7, 8, 9, 10, 15, 17, and 19. (p. 151)
(11)
We are now ready to revisit the facts in (9). Given the general leftmost rule at the level of
the verb phrase, the stress on the prefixes in (9) can be seen as the result of the same
rule. If we take the verbal prefixes to constitute separate phonological words, then the
stress on these prefixes can be seen as the regular leftmost stress in the domain of the
verb phrase.21
The fact that Persian verbal prefixes behave as independent phonological words is not
surprising. Similar proposals have been made for some affixes in other languages in the
Phrasal Phonology framework (e.g. Cohn 1989 on Indonesian; Kang 1992a,b on Korean;
Nespor and Vogel 1986 on Italian; Selkirk and Shen 1990 on Shanghai Chinese). In
particular, Rice (1993) has argued that the verb in Slave (a Northern Athabaskan
language) is parsed as a phonological phrase. The verbal prefixes in Persian just appear
to be another example of the same type of behaviour. Under this view, the seemingly
‘recessive’ stress pattern of Persian verbs should not be seen as an exception to the
general word-final stress rule in Persian but rather as a consequence of the general
leftmost rule at the level of the verb phrase.22
Page 10 of 22

Prosody
The generalization that main stress falls on the leftmost element in the verb phrase finds
support from a contrast in the stress behaviour of non-specific and specific objects. In
Persian, while non-specific objects receive main stress (12a), specific ones do not (12b).23
This contrast receives a straightforward account in the context of a widely accepted view
that specific objects are in a higher syntactic position compared to their non-specific
counterparts cross-linguistically (see de Hoop 1996 and Koopman and Sportiche (p. 152)
1991 for Dutch; Diesing 1992 and Enç 1991 for Turkish; Mahajan 1990 for Hindi; among
others). This syntactic difference between specific and non-specific objects was proposed
for Persian by Browning and Karimi (1994) (see also Ghomeshi 1996; Karimi 1996;
Megerdoomian 2002). If we take the specific object to have moved out of the domain of
the verb phrase, its failure to receive main stress is expected under the view presented
here which takes main stress to fall on the leftmost element in the verb phrase. The idea
that the specific object is outside the verb phrase is supported by the fact that it appears
to the left of manner/measure adverbs (12d), which mark the left edge of the verb phrase.
The non-specific object, however, appears on the right side of manner/measure adverbs
(12c). In all the examples in (12), the main stress falls on the leftmost element in the verb
phrase, indicated by the acute accent in the data. 24
(12)
This correlation between structural height and stress behaviour manifests itself in the
domain of adverbs as well. As discussed above, manner adverbs are often argued to mark
the left edge of the verb phrase, leading to their receiving the main stress in Persian.
Meanwhile, other adverbs such as speaker-oriented or subject-oriented adverbs are
typically taken to have higher structural positions (see, for example, Cinque 1999;
Jackendoff 1972). Putting these two ideas together, the expectation is that these higher
adverbs should not receive main stress in Persian. This prediction is borne out. This is
best illustrated by lexical items which are ambiguous between a manner reading and a
subject-oriented reading. In Persian, this difference is realized as a difference in stress,
as shown in the examples in (13). When sexâvatmandāne ‘generously’ is used as a manner
Page 11 of 22

Prosody
adverb, it is inside the vP and receives (p. 153) main stress (13a) but when it is used as a
subject-oriented adverb, it is outside the vP and the main stress falls on the leftmost
element within vP, namely the non-verbal element in the complex verb in (13b).
(13)
Before we end this section, it is important to note that the stress pattern illustrated here
and the corresponding generalization which places the main stress on the leftmost
element within the verb phrase are formulated in the context of a focus-neutral sentence,
i.e. a sentence which contains all-new information (Context question: What happened?). If
an element is focused in a particular sentence in Persian, that element will receive the
highest prominence (for more details, see Kahnemuyipour 2009). Let us take (12d) as an
example. If this sentence was to be uttered with the subject focus (Context question: Who
eats his ice-cream well?) or with the object focus (Context question: What does Ali eat
well?), then the highest prominence would be on the subject and object, respectively, as
shown in (14). In this example, underlining marks focus and the highest prominence in
the sentence has been marked with an acute accent.25 Note that this highest prominence
is realized on the vowel which receives word stress: i in Ali in (14a) and i in bastani-sh-o in
(14b). Recall that the accusative marker -o is a non-cohering suffix and does not receive
stress.
(14)
We started this chapter with a question about stress at the level of the word in Persian. In
trying to understand the seemingly exceptional behaviour of verbs in this regard, we
explored prosody at the level of the verb phrase and learned that with a correct
understanding of prosody at this level, the stress behaviour of verbs is unsurprising. We
cannot complete this chapter, however, without a discussion of the phonetic realization of
prosody and intonation in Persian, the topic of the next section. (p. 154)
Page 12 of 22

Prosody
6.4 The phonetics of prosody and intonation in

Persian
In the previous sections, we looked at the distributional properties of Persian prosody
without any reference to the phonetic realization of prosody in Persian. This section turns
to the phonetics of prosody in Persian and provides a brief overview of the phonetic
correlates of stress in Persian as well as the intonational structure of the language (see
Chapter 4 for more discussion on intonation).
Stress has been known to have several possible phonetic correlates cross-linguistically.
The phonetic correlates of stress in English, for example, are typically known to be pitch,
intensity, and duration (see, for example, Ladefoged and Johnson 2015; Reetz and
Jongman 2009; Rogers 2000). As for Persian, until recently, only some intuitive
statements could be found with respect to the phonetic realization of stress. For example,
Ferguson (1957) suggests that the phonetic property involved seems to be relative
loudness or intensity. Lazard (1992) takes intensity to be a relevant phonetic correlate but
adds pitch as an equally important property (see also Mahootian 1997). Other scholars
(for example, Haghshenas 2001; Sepanta 1977; both cited in Hosseini 2014) underlined
pitch (rise in f 0) to be the main correlate of acoustic prominence in Persian. Meanwhile,
no experimental work was done to corroborate these impressionistic claims until the
study by Abolhasanizadeh, Bijankhan, and Gussenhoven (2012), which I turn to below
(see also Sadeghi 2012, cited in Hosseini 2014).
Abolhasanizadeh, Bijankhan, and Gussenhoven (2012) is the first study of its kind to
provide an experimental basis for the phonetic correlates of prosodic prominence in
Persian. For their experiment, Abolhasanizadeh et al. came up with minimal pairs of
segmentally identical words, where one member consisted of a single root and the other a
root plus a non-cohering suffix (or clitic). Recall that non-cohering suffixes are not part of
the phonological word and do not receive stress. This provided them with words that
were minimally distinct only in the position of stress. An illustrative pair of this kind is:
tābésh ‘light’ vs. tā́b-esh swing-her/his ‘her/his swing’. These targets words were then
placed in carrier sentences with differing syntactico-semantic conditions. In (15), I show
only the two conditions discussed here, namely focus-neutral (15a) and post-focal (15b),
where underlining marks focus.26
(15)
Page 13 of 22

Prosody
The sentences were read by twelve speakers, six male and six female, and the
(p. 155)
results were analysed by Praat (Boersma 2002). The authors compared a stressed syllable
with its segmentally identical unstressed syllable, e.g. esh in the two words in (15a), for
pitch accent (f 0), duration, intensity, and spectral measures. They found that the only
significant difference between the two stressed and unstressed syllables was in pitch
accent (f 0) and the differences in the other factors were insignificant and could be
attributed to side effects of pitch accent placement. Their conclusion is that prosodic
prominence in Persian is only marked by pitch (f 0).27
Abolhasanizadeh et al. also consider a second question related to the condition illustrated
in (15b). Their question is whether post-focal words undergo complete deaccentuation. In
other words, in the examples in (15b), is the distinction between the stressed and
unstressed syllables esh lost when they are used after the focused un? They conduct both
production and perception experiments to test this and find that while there is post-focal
compression, i.e. the pitch range is reduced, no neutralization occurs. Put differently, the
stressed syllables maintain some prosodic prominence (marked by pitch) even in the
context of a prosodically more prominent focused element (see footnote 25). It is worth
noting, however, that Abolhasanizadeh et al.’s second claim with respect to lack of
complete deaccentuation in post-focal contexts has been questioned in more recent work
by Rahmani et al. (2016). They question the design of Abolhasanizadeh et al. with respect
to how focus was implemented in their experimentation, with focused words simply being
printed in bold letters. Rahmani et al. incorporated the notion of focus in their
experimentation much more carefully to ensure that the subjects truly treated the
focused words accordingly. They concluded that indeed, as suggested by some previous
scholars (e.g. Eslami 2000; Sadat-Tehrani 2007), Persian word accents are deleted in
post-focal contexts.
In short, while previous phonetic descriptions of prosodic prominence in Persian seemed

to be inconclusive with respect to the relevant phonetic correlates, the experimental work
by Abolhasanzadeh et al. seems to clearly show that prosodic prominence is marked by
pitch and any other differences that may be found are statistically insignificant.28
With this brief discussion of the phonetic correlates of prosodic prominence in Persian,
we can now turn to an overview of intonation in Persian. The most comprehensive work
on Persian intonation to date is that of Sadat-Tehrani (2007), which the discussion that
follows (p. 156) is largely based on.29 Sadat-Tehrani (2007) used 528 utterances involving
different types of simplex and complex sentences read by eight native speakers of Persian
and examined the pitch track of all these utterances using Praat. He then analysed these
pitch tracks in the autosegmental-metrical framework (Bruce 1977; Ladd 1996; Liberman
1975; Pierrehumbert 1980; among others). According to this framework, the tonal
structure consists of phonologically significant tonal events such as pitch accents and
edge tones. In this model, there are two primitive tonal levels, H(igh) and L(ow). Sadat-
Tehrani takes the smallest unit of Persian prosody to be the Accentual Phrase (AP), which
typically consists of a content word and all its clitics. Each AP is associated with the pitch
accent pattern L+H*, where the asterisk shows association with the stressed syllable of
Page 14 of 22

Prosody
the word (see also Mahjani 2003). According to Sadat-Tehrani, the L+H* representation
has two variants, with L+H* the default, used in polysyllabic words with final stress and
H*, the variant used for initially-stressed and monosyllabic words.
Sadat-Teharni takes the next level of Persian prosody to be the Intonational Phrase (IP),
which dominates one or more APs. The right edge of an IP is marked by a low or high
boundary tone (L% or H%) depending on the type of sentence involved, to be elaborated
below.30 Sadat-Tehrani uses these basic notions to map out the intonational structure of
Persian sentences in various types of simplex and complex clauses. Below, we review
some of the basic patterns he discusses (see the original work for more details).
The intonational structure of a simple declarative sentence in Persian consists of one IP

and one or more APs, with everything after the Nuclear Pitch Accent (NPA) (referred to in
previous sections as main stress) being deaccented. Each AP is associated with L+H* and
there is an L% boundary tone at the end. The declarative pattern is exemplified in (16)
(adapted from Sadat-Tehrani 2007: 8, fig. 6).31 In this example, āli receives the NPA.
(16)
The tonal pattern of yes/no questions is very similar to declaratives, according to Sadat-
Tehrani, with the only difference that they have a H% boundary tone in contrast to the
L% in declarative clauses. The location of the NPA is still similar to a declarative sentence
with following material being deaccented up to the H% boundary tone. An example is
given in (17) (adapted from Sadat-Tehrani 2007: p. 9, Fig. 7). Recall from the previous
section that specific (p. 157) objects do not receive the main stress in the sentence, and
as such the verb receives the NPA in this example.
(17)
The tonal pattern of wh-questions is different from yes/no questions and similar to
declaratives in that it has a L% boundary tone. Meanwhile, the wh-word attracts the NPA
of the whole IP and causes deaccentuation up to the end of the clause. This is not
surprising as the wh-word behaves like a focused phrase and receives the main stress of
the clause (see section 6.3, example (14)). The similarity between the tonal patterns of
focused phrases and wh-phrases is also confirmed by Sadat-Tehrani who illustrates the
Page 15 of 22

Prosody
pitch tracks for both of these constructions. An example of a wh-question is given in (18)
(adapted from Sadat-Tehrani 2007: 10, fig. 8).
(18)
Sadat-Tehrani also discusses the tonal pattern of several types of complex clauses in
Persian. The first case he considers is coordinated sentences. He shows that in
coordinated clauses, each clausal conjunct behaves like a regular IP in Persian.
Meanwhile, only the last IP has a L% boundary tone, expected in Persian declarative
clauses. All the other IPs are realized with an ‘incomplete’ intonation pattern ending with
a H% boundary tone. An example of a coordinated clause is given in (19) (adapted from
Sadat-Tehrani 2007: 11, (6)). In this example, the first clausal conjunct ends in a H%
boundary tone, while the second one ends in L%. Note that the NPA of the second clausal
conjunct is on zang and as a one-syllable word, the pitch accent realized on it is H*.
(19)
Subordinate clauses show a pattern very similar to the coordinated clauses discussed
above in that there is a H% boundary tone before the embedded clause, which itself ends
in a L% boundary tone (see Chapters 3, 7, and 8 for more information on subordinate
clauses). An example of a clause subordinated under the main verb say is given in (20)
(adapted from Sadat-Tehrani 2007: 13, fig. 10).
(20)
(p. 158) In this section, after a brief discussion of the phonetic correlates of stress in
Persian, I provided an overview of some basic tonal structures of the language. A more
detailed exploration of the phonetics of prosody and intonation in the Persian language is
beyond the scope of this chapter and the interested reader is referred to the original
works cited above.
Page 16 of 22

Prosody
6.5 Conclusion
In this chapter, I have provided a brief overview of the prosody of Persian. We took our
starting point to be word stress and the seemingly unsystematic behaviour of the
distribution of prosody at the word level. To begin with, it looked like several words in
various categories defied the otherwise general word-final stress rule. We showed that
these cases can be accounted for with a correct understanding of the morphosyntax of
Persian and a distinction between cohering and non-cohering suffixes. Under this view,
the word-final stress rule applies to the phonological word which includes cohering
suffixes, but excludes non-cohering ones. We further explored the apparently non-final
stress in prefixed Persian verbs and showed that their behaviour is by no means
exceptional and falls within a more general pattern of leftmost stress in the verb phrase.
This view of stress in the verb phrase coupled with a better understanding of the syntax
of Persian enabled us to explain the stress pattern differences between specific and non-
specific objects and different types of adverbials. This still left us with just a handful of
words with exceptional non-final stress. Several possible generalizations were explored in
this regard, while none enabled us to explain away the idiosyncratic nature of the stress
in all of these forms. We ended the discussion of Persian prosody with a brief overview of
the phonetic correlates of prosodic prominence and intonational patterns in Persian.
Notes:
(1) While many of the generalizations made in this chapter may apply to various dialects
of Persian spoken in Iran and other neighbouring countries, the discussion in this paper is
based on the dialect spoken in Tehran, the capital of Iran. The data in this paper are
largely based on the author’s native judgment. In addition, when needed, several other
sources have been consulted for confirmation, e.g. Ferguson (1957); Lazard (1992);
Mahootian (1997); Sadat-Tehrani (2007); Same’i (1996); Thackston (1993); Windfuhr
(1979).
(2) Much of what is discussed in sections 6.2 and 6.3 is based on my older work,
particularly Kahnemuyipour (2003, 2009). I have put technical details aside and have
focused mostly on descriptive generalizations here.
(3) I use acute accent ́to mark primary stress.
(4) Persian long infinitives (what Chodzko 1852 referred to as nominal verbs) align
themselves with nouns not only with respect to stress, but also when you consider their
morphological behaviour: they take the nominal plural marker (e.g. xābidan-ā, sleeping-
pl, ‘the acts of sleeping’) or take the suffix -i which is typically added to nouns to form
adjectives (e.g. compare qānun-i, law-i, ‘legal’ with xundan-i, reading-i, ‘readable,
reading-worthy’).
Page 17 of 22

Prosody
(5) Compounds are also treated as single words in Persian with stress falling on the final
syllable of the whole compound, e.g. ketāb-xuné book-house ‘library’, bozorg-manésh
great-attitude ‘magnanimous’. Morphologically, too, compounds behave like single words
with no affix interrupting the two parts of the compound.
(6) Most grammars of Persian treat the possessive marker (and a few other elements
discussed in this paper) as enclitics rather than suffixes. (For more information about
enclitics, refer to Chapters 3, 8, 9, and 10.) I am abstracting away from this distinction as
it is not relevant for the discussion in this paper. The important point here is the
inflectional status of these elements. I will refer to them as suffixes below.
(7) It is worth noting here that agreement markers receive stress in present perfect forms
in colloquial Persian, e.g. did-ím ‘we have seen’. The present perfect consists of the past
participle plus the agreement marker. In its full form, the past participle ends in the
vowel -e and stress falls on this vowel as expected: didé-im. In colloquial Persian, the
vowel -e is dropped, giving its stress to the adjacent vowel in the suffix -im. Also, see
Kahnemuyipour (2003) for a discussion of the difference in stress behaviour between
verbal agreement markers in the past and present tenses. I am abstracting away from
such details here.
(8) In Persian, the noun is connected to its post-nominal modifiers via a vowel known as
the Ezafe (marked in the example as Ez). For a more detailed discussion of this marker
and different accounts of its syntax, see Ghomeshi (1997); Kahnemuyipour (2014); Larson
and Yamakido (2008); Samiian (1994). Crucially, Ezafe behaves like an inflectional suffix
and does not carry stress. The Ezafe construction is further discussed in Chapters 3, 6, 7,
8, 9, 10, and 19.
(9) I am leaving aside a productive stress shift that occurs in Persian in the formation of
the vocative, the only process of this kind in the language. While the stress on (proper)
nouns is word-final, in the vocative, the stress shifts to the first syllable, e.g. rézā ‘Reza!’,
dóktor ‘Doctor!’, xā́num ‘Ma’am!’. In Modern Persian, there are also some remnants of an
older process of vocative and optative formation from Classical Persian using the non-
cohering suffix.
-ā. Due to the non-cohering status of this suffix, the stress fell on the stem. As a result,
these remnants show non-final stress: e.g. daríqā ‘alas’, xóshā ‘how pleasant is … ’,
xodā́yā ‘Oh God’, bádā ‘how bad is … ’, bā́dā ‘may it be’ (see also Ferguson 1957).
(10) For the words adapted from Ferguson (1957), transcriptions have been modified for
consistency. Some translations have also been modified.
(11) Ferguson (1957) also has a list of Arabic formulaic expressions which have been
borrowed into Persian, often with their original stress that does not match that of Persian.
He concedes that most of these expressions are no more in use (even in 1957). Of those,
there are only two that are still commonly used: alhamdo lellāh ‘Thank God!’ and
inshāllāh ‘God willing’. The first expression has two possible stress patterns, one with
Page 18 of 22

Prosody
main stress on the second syllable (alhámdo lellāh) and one with stress on final syllable
(alhamdo lellā́h). The second example also has two possible stress patterns, one word-
final (inshāllā́h) and one word-initial (ínshāllāh), when uttered in isolation in response to a
statement. Also in this category, one might consider Arabic ordinal adverbials which are
still commonly used, e.g. ávvalan ‘firstly’, sā́niyan ‘secondly’, sā́lesan ‘thirdly’.
(12) This is a French borrowing but by far the most common word for expressing
gratitude.
(13) A variant of this form is chón with a single stressed syllable.
(14) The word guyā in (8m) may also fall in the same category of a root plus a non-
cohering suffix, with -ā possibly a remnant of the Classical Persian optative marker, see
footnote 9.
(15) Relating the initial stress to utterance-level stress may be particularly plausible in the
context of Kahnemuyipour’s (2003) proposal that stress at the utterance level is leftmost
in Persian. That still leaves open the question of why utterance level stress which should
apply to an utterance consisting of two clauses applies at the word level in these cases,
choosing one syllable over another for the application of stress.
(16) Some clausal conjunctions such as chon ‘because’ or pas ‘so, therefore’ are
monosyllabic and as such cannot distinguish between word-initial or word-final stress.
(17) An anonymous reviewer has brought two examples to my attention which seem to
allow final stress: agarche and garche, both meaning ‘even though’. It should be noted
that these bi-morphemic words, consisting of (a)gar and che, allow for both initial and
final stress. Meanwhile, in light of the existence of these forms which allow final stress, I
qualified the generalization with ‘almost’.
(18) Some of the words with non-final stress in (8) contrast with other segmentally
homophonous words based only on the stress pattern, e.g. váli ‘but’ vs. valí ‘guardian’, ā́ri
‘yes’ vs. ārí ‘devoid’ (see also Ferguson 1957). Note, however, that this contrastive
pattern cannot be taken as the motivation behind the non-final stress as it is limited only
to very few examples from the above list. Also, there are many homophonous words in
Persian with no stress difference.
(19) Kahnemuyipour (2009) is written in a minimalist framework using the notion of

phases and multiple spell-out (Chomsky 1995, 2001, and subsequent authors). In this
context the relevant verbal domain is the phasal vP (where v is the functional head
introducing the external argument) and manner/measure adverbs are taken to mark the
left edge of this domain. I am abstracting away from these technical details in this
overview.
Page 19 of 22

Prosody
(20) An anonymous reviewer introduces a measure adverb, which does not receive the
main stress of the sentence, as a possible counterexample to this generalization: hesābi ‘a
whole lot (colloq.)’. While a thorough examination of this adverb is beyond the scope of
this chapter, I should point out that even regular measure adverbs such as xub ‘well’ can
sometimes appear without the main stress of the sentence (see Kahnemuyipour 2009).
Meanwhile, there is a subtle semantic difference in such instances (something like: What
he did well was … ), which may justify a higher syntactic position outside the verb phrase.
I have the intuition that in a similar manner, hesābi may be in a higher syntactic position
outside vP. If this intuition is on the right track, the mapping between the stress domain
and the vP can be maintained.
(21) Kahnemuyipour (2003) treats the negative marker as inherently focused and its stress
as focus stress (see below). In other words, the stress on the negative marker is obtained
differently from the other two prefixes. Note that in the context of the negative marker,
when more material is added to its left, unlike with the other two prefixes, stress does not
shift to the left and stays on the negative marker. I argue that this contrast follows from
the focus status of the negative marker.
(22) The idea that the stress on the verbal prefixes (or any other element within the verb
phrase) is the result of leftmost stress may pave the way for a different analysis of non-
cohering suffixes, where non-cohering suffixes are taken to be independent phonological
words with the absence of prosodic prominence on them attributed to a general leftmost
rule at the phrase level. This is the path I took in Kahnemuyipour (2003), but I am no
more committed to it for several reasons. For one, there are reasons to believe the
leftmost prominence rule in the verb phrase does not extend to the noun phrase (see
Kahnemuyipour 2009), while the non-cohering suffixes are found in all categorial
domains. Also, taking functional and prosodically weak elements to constitute
independent phonological words is problematic (see Hosseini 2014). The crucial point
here is that non-cohering suffixes are not part of the phonological word containing the
root and are thus outside the domain of word stress.
(23) It is worth noting that when the non-specific NP consists of more than a single word,
then the main stress of the sentence falls on the word within the NP with the highest
prominence (for more details on how the notion of leftmost element is implemented, see
Kahnemuyipour 2009). For example, if a non-specific object such as se tā bastani-ye
bozorg three claasif. ice-cream big ‘three big ice-creams’ is used, main stress falls on the
adjective bozorg within this leftmost phrase in the vP.
(24) It is worth pointing out that the sentences in (12) can be reordered in Persian via a
process known as vP-preposing, placing the whole vP at the beginning of the clause. The
main stress still remains on the leftmost element in the verb phrase in accordance with
the system laid out above. Also, various elements in the verb phrase can be topicalized
outside of the vP, thus escaping the main stress of the clause (see Kahnemuyipour 2009).
Page 20 of 22

Prosody
(25) Kahnemuyipour (2009) argues that in a sentence with a focused constituent, while the
focused element receives the highest prominence, the element which receives main stress
by the default stress rule receives secondary stress.
(26) The examples in (15) have been modified in line with the transcription and notational
conventions used in this chapter. Also, in addition to the conditions in (15),
Abolhasanizadeh et al. considered a focal condition, where the target words were focused
and moved clause-initially. They also considered the three conditions in interrogative
sentences. I am abstracting away from these details for convenience.
(27) It is worth noting that under some approaches, once it is established for a particular
language that the only phonetic correlate of prosodic prominence in a word is pitch
features, the use of the term ‘stress’ is considered inappropriate. Beckman (1986), for
example, uses the term ‘non-stress accent’ for these instances and retains the term
‘stress accent’ for those cases where other phonetic cues such as duration or intensity
are also involved. From a different perspective, ‘stress’ is the term used for the prosodic
prominence which is obligatory on all content words regardless of how it is phonetically
realized, pitch accent or otherwise (Hyman 2006). This paper is more in line with the
latter approach. The term ‘stress’ is used here simply to refer to a high level of prosodic
prominence.
(28) In a recent study of the phonology and phonetics of prosody in Persian, Hosseini
(2014) compares nuclear (final) and pre-nuclear (non-final) accents in Persian and finds
that they are phonetically distinct. Hosseini finds that two types of accents differ in the
shapes of the pitch curves and where they fall and this seems to be the most significant
factor based on his production and perception experiments. In the production
experiment, he found that the syllable with the nuclear accent has a longer duration than
one with pre-nuclear accent but this difference did not seem to play as significant a role
in perception.
(29) For additional discussions of Persian intonation, see Eslami (2000); Hayati (1998);
Lambton (1957); Mahjani (2003); Mahootian (1997); Sadat-Tehrani (2009); Towhidi
(1974).
(30) Sadat-Tehrani (2007) argues for another intermediate level of boundary tones, l (low)
and h (high), which mark the right edges of APs. He suggests that the l boundary tone
marks the edge of the AP with nuclear pitch accent (what was referred to as main stress
of the sentence in previous sections), while all other APs are marked by the h boundary
tone. He discusses some exceptions to this generalization. In this discussion, I am
abstracting away from the intermediate boundary tones and will not show them in any of
the examples.
(31) The examples in this section are all from Sadat-Tehrani (2007), but they have been
modified in accordance with the transcription conventions used in this paper. I am also
not showing the pitch tracks here.
Page 21 of 22

Prosody
Arsalan Kahnemuyipour
Arsalan Kahnemuyipour received his PhD in Linguistics from the University of

Toronto in 2004. He is currently an Associate Professor of Linguistics at the
University of Toronto Mississauga. His areas of expertise are syntax, morphology,
and the interface between syntax and prosody. He has worked on a number of
languages including his native Persian, as well as English, Armenian, Turkish,
Niuean, among others. He has published a book with Oxford University Press and
several articles in journals such as Lingua, Linguistic Inquiry, Natural Language, and
Linguistic Theory and Syntax.
Page 22 of 22

Generative Approaches to Syntax

Simin Karimi

Subject: Linguistics, Morphology and Syntax, Languages by Region
This chapter offers an overview of some of the major syntactic and morphosyntactic
properties of Persian. Of the topics introduced in this chapter, three have extensively
been examined by various researchers over several decades: complex predicates, Ezafe
constructions, and differential object marking. Issues related to scrambling, wh-
constructions, and raising and control have also been discussed. Some of the issues
introduced in this chapter have not been thoroughly examined in the literature. For
example, problems related to complex DPs, specifically with respect to extraposition of
the CP out of the complex DP, require close attention. Furthermore, the nature of
resultative constructions, and whether Persian allows secondary predicate constructions
need to be examined. Finally, this chapter touches on some topics that are under-studied:
modality, negation, aspect, ellipsis, and sluicing. Due to the descriptive nature of this
chapter, theoretical considerations are not thoroughly discussed, although briefly
mentioned in some cases.
Keywords: syntax, scrambling, differential object marking, complex predicates, passives, resultatives, causatives,
raising and control, ellipsis, modality
7.1 Introduction
PERSIAN is a member of the Southwestern Iranian language family spoken in Iran
(Farsi), Afghanistan (Dari), and Tajikestan (Tajiki). Various aspects of this language,
specifically its syntactic properties, have attracted the attention and interest of
traditional grammarians and modern linguists. The goal of this chapter is to offer a
descriptive overview of some of the major syntactic and morphosyntactic properties of
Page 1 of 55

Persian, and to introduce the reader to the major literature provided by grammarians and
linguists inside and outside Iran.
Persian grammars in the twentieth century were primarily written for the purpose of
language learning, and consisted of descriptions of various aspects of this language. One
of the first such grammars was authored by Mirza Habib Esfahani, published in 1910 in
Istanbul (Natel-Khanlari 1986). The first grammar that was actually used in schools was a
booklet written by Mirza Qarib in 1911 (Windfuhr 1979). One of the most influential
grammars was a book known as dastur-e panj ostâd ‘grammar of five masters’, written by
Qarib and four of his colleagues, published in 1950.1 This book offers descriptive
discussions of various properties of the language, including the syllable structure, nouns,
verbs, numbers, and the Ezafe morpheme.2 It also includes descriptions and examples of
what they call goftar ‘complete sentence’ and soxan ‘incomplete sentence’. Almost all the
data presented in this grammar are borrowed from the work by famous Iranian poets.
Another influential work within the grammarian tradition is Dastur-e Zabân-e Farsi
‘Grammar of Persian language’ by Natel-Khanlari (1984). This book is specifically devoted
to a descriptive analysis of Persian morphology and syntax, including sentence types
(interrogative, imperative, complex), as well as verbal and nominal morphology, and has a
section on specific lexical variations such as bâyad, bâyast, and bâyesti, special phrasal
modifiers, and (p. 162) ‘incorrect’ application of the morpheme -râ. The majority of data
discussed in this work are also borrowed from famous classical texts and poetry.3
There are a number of Persian grammar books written by non-Iranian scholars, with the
same pedagogical goal in mind. These works include Lambton (1953), Lazard (1992), and
Thackston (1993). They are primarily based on Persian formal written language, although
Thackston has examined aspects of the spoken language as well.
The first description of Persian syntax based on modern linguistics criteria is Bateni
(1969). This work rests on a description and analysis of 11,000 Modern Persian sentences
taken from newspapers, journals, various writings, as well as the colloquial language. The
discussion includes analyses of the structure of main and subordinate sentences, verb
phrases, noun phrases, and adverbial phrases.
Another grammar based on modern linguistics is Meshkoatoddini (1987). This work

addresses various aspects of Persian syntax within the framework of Transformational
Grammar (Chomsky 1965). Mahootian’s (1997) widely cited work is a linguistics-based
comprehensive analysis of contemporary conversational Persian, and consists of
discussions of major grammatical aspects of this language, including its syntactic and
morphological properties.
During the last half-century, specific aspects of Modern Persian have been analysed
within various theoretical frameworks. Based on the number of dissertations and
published work in the last several decades, it is evident that the syntactic and
Page 2 of 55

morphosyntactic properties of Persian have attracted more attention than other aspects
of this language.
The organization of this chapter is as follows. Section 7.2 discusses the clausal and
phrasal architecture of Persian, including word order variations, scrambling, and wh-
constructions. One of the most discussed properties of Persian syntax is the so-called
complex predicates (CPr). This topic is addressed in section 7.3. Persian noun phrases,
including their simple and complex versions, as well as the Ezafe construction and
classifiers are the topics of section 7.4. Another property of Persian, shared by some other
languages such as Turkish, is the differential object marking. The nature of this property,
and the role of the morpheme -râ, are reviewed in section 7.5. Passive, causative, and
resultative constructions are subjects of section 7.6, followed by a discussion of raising
and control in section 7.7. Section 7.8 is devoted to a number of topics that have not been
explored widely in the literature: modality, aspect, and negation, as well as ellipsis and
sluicing. Section 7.9 concludes this chapter. As for the data taken from other sources, I
have retranscribed, reglossed, and retranslated them in some cases for the purpose of
uniformity.
7.2 Clausal architecture, scrambling, and wh-

constructions
A description of the basic clausal architecture of Persian is offered in 7.2.1, followed by a
discussion of scrambling, one of the major syntactic properties of this language in 7.2.2.
(p. 163) Wh-constructions are discussed in 7.2.3. For further information on scrambling,
see Chapter 3, and for wh-constructions, see Chapter 5.
7.2.1 Clausal architecture
Persian is a head-initial language with the exception of the verb phrase.4 The word order
in a discourse neutral sentence is the one exemplified by (1a) with a specific direct
object5 and (1b) with a non-specific direct object.6
(1)
Page 3 of 55

As these data suggest, the verb appears in the final position. However, the complement
clause to the verb follows the verb, as in (2).
(2)
Note that the complementizer ke ‘that’ is optional in these examples. Also, the verbal
concept in the embedded clause is realized as a complex unit.
The same word order holds for subordinate clauses indicating indirect questions.
(3)
Persian word order changes drastically when the information conveyed by the sentence
interacts with discourse phenomena. This is the subject of the next section. (For more
discussion on word order, see also Chapters 3, 8, 10, and 15.) (p. 164)
7.2.2 Scrambling
Scrambling is a syntactic property observed in many languages (Persian, Turkish,

Japanese, Korean, Hindi, to name a few). In these languages, phrasal categories may
appear in different positions in the clause. Karimi (1999, 2005) and Rasekh-Mahand
(2003) offer elaborated discussions of this phenomenon in Persian. The following clausal
architecture is based on Karimi’s (2005) proposal.
(4)
Karimi suggests that both TopP and TP host the topic in this language, thus a focal
element may precede or follow the topic (in Spec of TP), or appear between two topics.
Consider the following data in which the scrambled element reveals either a focus or a
topic interpretation, depending on the intonation. Unstressed scrambled elements reveal
Page 4 of 55

a topic interpretation, while their stressed versions receive a focus reading.7 The bold ‘e’
in these examples represents the original position of the moved element.
(5)
(6)
Note that the direct object and the indirect object may both scramble to precede the
subject. In this case, one of them may receive a topic interpretation and the other a focus
reading. These elements may appear in different orders depending on the position of the
topic, as in (7a,b).
(7)
Non-specific objects may also scramble. They usually receive a contrastive reading when
scrambled, as in (8a), although they may also reveal a topic interpretation, as in (8b).
(p. 165)
(8)
Persian also exhibits long-distance scrambling. That is, elements of the subordinate
clause, with the exception of the verb, may move into the main clause, as the following
data reveal.
Page 5 of 55

(9)
(10)
(11)
Similar to simple sentences, long-distance scrambling of multiple arguments is also

possible, as exemplified by the following sentences. Again, the interpretation of the
dislocated elements is based on their stress pattern.
(12)
Adjuncts undergo scrambling as well.
(13)
Next, the displacement of wh-arguments and adjuncts will be discussed. (p. 166)
7.2.3 Wh-constructions
Persian does not exhibit structural wh-movement. That is, wh-phrases may stay in situ,
and yet receive a wh-interpretation.8
Page 6 of 55

(14)
Wh-arguments and adjuncts, however, may scramble. This movement has been suggested
to place the wh-phrase in the Specifier of a focus phrase (Karimi 1999, 2005;
Megerdoomian and Ganjavi 2000; Karimi and Taleghani 2007; Toosarvandani 2008).9
(15)
Note that scrambling of wh-phrases is subject to superiority condition (Karimi 1999,

2005; Kahnemuyipour 2001; Lotfi 2003; Karimi and Taleghani 2007). That is, a wh-phrase
may not cross another one in a higher position. (p. 167)
(16)
Page 7 of 55

(16a) has a pair-reading interpretation. The answer is something like ‘Kimea bought a
book, Parviz a shirt, Arezou a pair of shoes’ (Kahnemuyipoiur 2001; Lotfi 2003; Karimi
2005).
Long-distance scrambling is also possible, and is subject to superiority condition.10
(17)
Note that the sentence in (17b) is grammatical if the wh-phrase in situ is not stressed. In
that case, it is interpreted as an indefinite DP with no quantificational force, similar to
‘someone’ in English (Karimi 1999).
Furthermore, the surface order of two scrambled wh-phrases is subject to a certain

restriction: they must appear in the same order as those in situ.
(18)
The indirect object precedes the subject in (18b), rendering the sentence ungrammatical.
The next section is devoted to a descriptive discussion of Persian complex predicates, one
of the most discussed properties of Persian syntax.
7.3 Complex predicate constructions

Complex predicates (CPr) are verbal complex constructions consisting of more than one
word which convey information that is normally expressed by a single verb in a language
like English.11 In (19b), for example, the Persian complex predicate shekast dâd ‘defeat
gave’ corresponds to the English verb defeated in (19a). These constructions typically
consist of a light verb (LV) and a non-verbal element (NVE) (dâd and shekast, respectively,
in the Persian example). Complex predicates are further discussed in Chapters 2, 3, 8, 9,
10, 15, 17, and 19. (p. 168)
Page 8 of 55

(19)
Complex predicates are particularly common in South Asia (among Turkic, Indic, and
Iranian languages) as well as in Northern Australia and some parts of Papua New Guinea.
Due to the syntactic and morphological peculiarities of these elements, this topic has
received extensive attention in the last few decades. In this regard, the role of different
components of the complex predicate, and their contribution to the meaning and
argument structure of the whole, have been the focus of many research projects. Thus
alternative views have been proposed regarding the relation between the NVE and the LV
in these constructions. Some have interpreted a nominal NVE as the internal argument of
the light verb (Lieber 1980). Others have considered the formation of complex predicates
as syntactic incorporation, by which one semantically independent word comes to be
inside the other (Baker 1988, 1996). Grimshaw and Mester (1998) suggested that light
verbs are semantically deficient, and serve as a host for agreement and tense
morphology. They argue that the nominal NVE of the complex predicate lends its
arguments to the light verb, turning it into a theta marker. Finally, Mohanan (1997)
suggests an argument-sharing theory based on Hindi.
In Persian, complex predicates have gradually replaced simple verbs since the thirteenth
century. The tendency to form complex verbs has resulted in the existence of two sets of
verbs, simple and complex, for a number of verbal concepts. In many cases, the
application of the simple verb is restricted to the written and elevated language (Dabir-
Moghaddam 1995; Karimi 1997). The productivity of CPr formation is such that it has
completely replaced the former morphological rule of simple verb formation in this
language (Bateni 1989). Thus it is not surprising that this topic has received enormous
attention by linguists interested in Persian syntax.12
In this section, I discuss the properties of the light verb and the non-verbal element of
Persian complex predicates in 7.3.1 and 7.3.2, respectively. I conclude this section with a
brief introduction to an interesting construction that involves a special case of complex
predicates.
7.3.1 Light verbs
The light verb of Persian complex predicate ranges over a number of simple verbs (Karimi
1997; Megerdoomian 2012a). These verbs include, among others, kardan ‘doing, making’,
(p. 169) shodan ‘becoming’, xordan ‘colliding’, gereftan ‘catching, taking’, zadan ‘hitting’,
Page 9 of 55

keshidan ‘pulling’, dâdan ‘giving’, dâshtan ‘having’, âmadan ‘coming’, raftan ‘going’, and
bordan ‘carrying’.
Following Grimshaw and Mester (1988), some authors have suggested that the Persian LV
is semantically bleached (Mohammad and Karimi 1992; Karimi-Doostan 2005), and serves
as a Case assigner (Karimi-Doostan 2005). Others have proposed that the choice of LV
determines whether or not the CPr selects for an agent (Karimi 1997; Megerdoomian
2002a; Folli, Harley, and Karimi 2005).13 This is shown in the following contrasts: the
light verb dâdan ‘to give’ is agentive, while xordan ‘to collide’ is unaccusative.
(20)
Similarly, the causativity of CPr is suggested to be determined by the LV (Megerdoomian

2002b; Folli, Harley, and Karimi 2005).
(21)
Finally, it has been suggested that the LV is responsible for the eventiveness (Folli,
Harley, and Karimi 2005) and the duration (Megerdoomian 2002a; Folli, Harley, and
Karimi 2005) of the CPr. Compare the following data.
(22)
Page 10 of 55

(23)
(p. 170)
I continue the discussion of the Persian CPr by an overview of some of the properties of
their NV elements.
7.3.2 Non-verbal elements
The Persian NVE ranges over a number of different elements.
(24)
The Persian NVE seems to be syntactically independent of the LV, since it can be
modified, scrambled, and elided.14 The example in (25) shows that the NVE kotak
‘beating’ is syntactically modified by the adjective bad ‘bad’ in an Ezafe construction.
Note, however, that this element receives an adverbial interpretation, modifying the CPr
kotak xord ‘got beaten’ (Karimi 1997; Megerdoomian 2012a; Karimi, Key, and Tat 2014).
Thus the actual meaning of the sentence in (25) is ‘he was beaten badly’.
(25)
Furthermore, the NVE may scramble away from the LV, as in (26). Again, even though che
‘what’ and saxti ‘hard’ syntactically modify the NVE zamin ‘earth’, semantically they
modify the whole CPr, as the English translation indicates.
Page 11 of 55

(26)
The NVE is also subject to ellipsis, as in (27).
(27)
(p. 171)
Furthermore, the NVE is considered to determine the telicity of the complex predicate.
That is, eventive nominals, adjectives, particles, and prepositional phrases are
responsible for the telic (accomplishment and achievement) interpretation of the CPr,
while nominal elements provide an atelic (activity or semelfactive) interpretation for them
(Folli, Harley, and Karimi 2005).
Finally, one of the issues discussed in the literature is whether the nominal NVE and the
non-specific object receive a uniform treatment, or whether they should be considered as
belonging to two distinct categories. Ghomeshi and Massam (1994) argue in favour of the
former, based on evidence suggesting that they both give rise to unbounded predicates,
while the specific object reveals bounded properties. Compare the following data:
(28)
However, as shown by Karimi-Doostan (1997), Folli, Harley, and Karimi (2005), and
Megerdoomian (2012a), the nominal NVE may also appear in bounded predicates.
Page 12 of 55

(29)
Furthermore, the non-specific object and the verb can appear in an Ezafe construction,
while this is not possible in the case of an NVE and LV, as the contrast in (30b) and (31b)
reveals (Karimi 1997).
(30)
(31)
(p. 172)
Finally, the non-specific object of a verb can undergo scrambling to the sentence initial
position without the need to have a quantificational quality. This derivation is blocked
with respect to the NVE of a complex predicate (Karimi 1997).
(32)
The data presented above indicate that the non-specific object of a heavy verb is
syntactically distinct from the NVE of a complex predicate.
Page 13 of 55

7.3.3 Impersonal complex predicates
There is a class of Persian complex predicates, generally known as impersonal

constructions, that exhibit interesting properties. One of the peculiarities of these
constructions is that the optional DP in the clause initial position is co-indexed with a
clitic that is attached to the NVE of the CPr. Furthermore the verb is always in third-
person singular, regardless of the person and number of the optional DP.
Karimi (2005) discusses two types of these constructions, calling them inalienable
possessor constructions and inalienable pseudo-possessor constructions, exemplified by
(33) and (34), respectively. The difference between the two is that the former has BE as
its LV, while the latter has an unaccusative LV other than BE.
(33)
(34)
(p. 173)
Considering constructions involving psych verbs such as fear, Harley (1999) expresses
the intuition that HAVE is in fact a prepositional element incorporated into a verbal be.
Based on such analysis, Karimi (2005) suggests the structure in (35) for the inalienable
possessor construction in (33): the adjective HUNGER moves into HAVE, providing the
interpretation ‘possessing hunger’. Furthermore, the copy of the possessor DP in the
Specifier of the PredP appears as phi-features on the root HUNGER, and is realized as a
clitic pronoun attached to the root.
Page 14 of 55

(35)
As for the inalienable pseudo-possessor constructions in (34), Karimi proposes the

structure in (36), similar to that in (35). The main difference between the two
constructions is the choice of LV, as mentioned before.
(36)
Page 15 of 55

Yadgar Karimi (2013) defines the second type of impersonals, exemplified by

(p. 174)
(34), as a CPr consisting ‘of a psychological state nominal and an unaccusative verb
which is predicated of an experiencer argument.15
Note that some linguists have argued that impersonal constructions do not belong to the
category known as complex predicates (Dabir-Moghaddam 1997; Sedighi 2005, 2009)
(see also Chapters 3 and 15). As Karimi (2013) states, the structure one assumes for
impersonal constructions, as well as the theoretical assumptions one holds to define the
class of complex predicates, dictate whether or not impersonals form a subclass of the
general CPr category.
7.4 Persian noun phrases

This section is devoted to an overview of scholarship on the structure of Persian NOUN
PHRASES. The first subsection provides a discussion of the Ezafe construction and the
literature on this phenomenon. Subsection 7.4.2 offers an examination of Persian complex
noun phrases, followed by a brief discussion of classifier(s) in this language in 7.4.3. For
more discussion on noun phrases, see Chapters 3, 8, and 9.
7.4.1 Ezafe construction
One of the most intriguing properties of the Persian noun phrase is the Ezafe
construction, a morphosyntactic phenomenon that ranges over several constructions
inside the DP. Thus it is not surprising that this subject has received enormous attention
by grammarians and linguists over several decades. The study of the Ezafe construction
goes back to Phillott (1919), Qarib et al. (1950), Lazard (1957), Palmer (1971), Natel-
Khanlari (1972), and continues to the present day.16 I start this section by a descriptive
introduction to the Ezafe construction in 7.4.1.1, and continue with a review of some
theoretical analyses of this construction in 7.4.1.2. The Ezafe construction is further
discussed in Chapters 2, 3, 6, 7, 8, 9, 10, and 19.
7.4.1.1 What is the Ezafe construction?

Literally, Ezafe means ‘addition’, and is derived from the Arabic idafa(t) (Karimi and
Brame 2012). It has been suggested that its origin in Modern Persian can be traced back
to the Old Persian relative/demonstrative hya/tya (Samvelian 2007, based on Darmesteter
1883). This element gives rise to an extremely common construction in Persian. The Ezafe
affix, which surfaces as -e or -ye (following vowels) can attach to a wide range of category
types, including (p. 175) count nouns, mass nouns, pronouns, adjectives, some
prepositions, verbal nouns, past participles, and quantifiers. It cannot, however, be
affixed to verbs, adverbs, certain prepositions, or conjunctions. The Ezafe construction is
Page 16 of 55

employed to express modification, possession, origin, material, specification, and more.

Here are some examples.
(37)
Several modifiers may appear in the same noun phrase, each one linked to the previous
element by the Ezafe affix. The possessor, if present, is always the final nominal in these
constructions. Since the Ezafe affix links two elements, it cannot attach to the last
element within the noun phrase (38c).
(38)
As mentioned before, the Ezafe affix may not attach to verbs, adverbs, and conjunctions.
Page 17 of 55

(39)
(p. 176)
Although the Ezafe affix may be attached to some prepositions, as in (37c) and (40b), it is
not allowed with some others (40a).
(40)
I will come back to this issue in the next subsection where I provide an overview of some
of the specific analyses of certain properties of the Ezafe construction and the function of
the Ezafe affix itself.
7.4.1.2 Distinct treatments of the Ezafe construction: an overview

Ghaniabadi (2010) suggests the following order within the Ezafe domain:
(41)
The Ezafe affix appears between each one of the constituents in the post-nominal domain.
(42)
Samiian (1983, 1994) considers the Ezafe affix as a Case marker. She argues that clauses
(CPs) and real prepositions do not allow the Ezafe to precede them since they do not need
a Case assigner. Following Samiian, Larson and Yamakido (2006, 2008) argue along the
same lines.
Samiian (1983, 1994) divides Persian prepositions in two groups: one group (P1) does not
allow the Ezafe affix (be ‘to’, az ‘from’, bâ ‘with’, dar ‘in’, bi ‘without’ tâ ‘until’), while the
other one (P2) does (zir ‘under’, ru ‘on’, bâlâ ‘up’, kenâr ‘next to’, jelo ‘front of’, barâ ‘for’).
She suggests that these two groups differ in three ways: (a) P1 is a real function word,
while P2 has some semantic content; (b) P1, but not P2, is strictly subcategorized for a
DP complement; and (c) P2, but not P1, displays some nominal properties (can take a
demonstrative, be pluralized, and be marked by -râ). Samiian rejects, however, the idea
Page 18 of 55

that members of P2 group are true nominals based on the grounds that they cannot be
modified by an adjective or a relative clause.
Karimi and Brame (2012), on the other hand, suggest that prepositions that allow the
Ezafe affix are in fact nouns. Their argumentation is based on pieces of evidence in
addition to those discussed by Samiian. First, they show that these elements can in fact
be modified by an adjective:
(43)
(p. 177)
Second, these elements can be reduplicated for emphasis.
(44)
Reduplication of nouns is fairly common as a grammatical or morphological device in a

variety of languages. It is used, for example, as a means of pluralization in some
languages, and for the purpose of diminutivization in others. Reduplication can also be
found in conjunction with pronouns, verbs, and even adjectives. These authors state,
however, that to their knowledge, reduplication is not found with prepositions.
As for the internal structure of the Ezafe construction, Ghomeshi (1996, 1997) provides a
novel analysis of Persian common nouns by suggesting that these elements do not
project. That is, they are of the category X0, and do not appear with a complement or a
Specifier. She suggests that the fact that there is only one possessor inside the Persian
noun phrase follows from this analysis: since nouns cannot project complement and
Specifier positions, the only position available for the possessor phrase is the Specifier of
the DP, which she suggests to be on the right of the phrase. The structure she provides
for the Ezafe construction is the following.
(45)
Unlike Samiian and Larson and Yamakido, Ghomeshi argues that the Ezafe element is a
linker attached to an X0. Ghaniabadi (2010), working within the framework of Distributed
Morphology, adopts the same analysis with respect to the nature of the Ezafe, and
Page 19 of 55

suggests that the Ezafe insertion is a phonological rule that applies at the Late-
Linearization stage at PF.
Based on empirical evidence, Samvelian (2007, 2008) argues against Ghomeshi’s

proposal that the Persian nouns do not project. She further suggests that the Ezafe affix is
neither a Case assigner, nor a linker inserted at PF.17 Samvelian (2007) states that the
fact that relative clauses in Kurdish dialects and Zazaki can be introduced by the Ezafe
affix further supports the claim that this element is not a Case assigner, since CPs are not
assigned Case. She argues that this element has undergone a process of
grammaticalization. That is, the Ezafe is best regarded as an affix which has the function
of indicating dependency relations between the head noun, its modifiers, and the
possessor NP. In other words, the Ezafe is a phrasal (inflectional) affix that occurs at the
right edge of the nominal projections, similar to the determiner -i and personal enclitics.
Kahnemuyipour (2000a, 2006) builds the Ezafe construction primarily on Cinque’s (1994)
assumption that the cross-linguistic asymmetry concerning the relative order of nouns
with respect to their modifiers is the result of the syntactic head raising of the noun to a
functional head within the noun phrase. Kahnemuyipour (2014) offers a treatment of the
Ezafe construction based on the structure of DP proposed by Cinque (2010). Cinque
argues that any word order variation is the result of a roll-up movement of phrasal
(p. 178) elements. Extending this roll-up analysis to Persian DP, Kahnemuyipour argues
that the Persian noun phrase is head-final, and that the surface word order is derived by
the phrasal movement of the NP containing the head noun through the Specifiers of the
intermediate functional heads.18 The structure in (46) represents this roll-up movement:
XP and YP stand for the modifier phrases, and AgrP for the functional phrases intervening
between the modifiers.
(46)
In (46), the NP (carrying the head noun and the plural marker, if present) moves
cyclically into the Specifiers of the intervening AgrPs. The Ezafe functions as a linker, the
realization of the inversion process. Kahnemuyipour’s analysis explicitly suggests that the
Ezafe realization is directly correlated with the phrasal status of the elements it links with
each other. This analysis, thus, explains why demonstratives and numerals, being heads,
are never followed by the Ezafe element due to their nature as heads. This is in a sharp
contrast with Ghomeshi’s view that considers the elements inside the Ezafe domain as X0.
The next subsection offers an overview of complex DPs in Persian.
7.4.2 Complex DPs
Page 20 of 55

Persian sentential arguments of nouns and relative clauses exhibit an interesting

property. That is, the morpheme -râ, known as specific direct object marker, may
intervene between the head noun and the complement/relative clause, illustrated by ‘a’
and ‘b’ in (47).
(47)
The structures of the data in (47a) and (47b) would be something like those in (48a) and
(48b), respectively.
(48)
Karimi (1996) suggests the following configuration for the Persian DP followed by -râ: KsP
is a head initial phrase, with -râ in the head position. The DP moves into the Specifier of
this phrase for Case purposes, providing the surface word order. (p. 179)
(49)
Adopting a revised version of Kayne’s (1994) theory of relative clauses, Karimi suggests
that the demonstrative and the head noun reside in the Specifier of the DP containing the
relative clause. The movement of these two elements as one constituent into the Specifier
of KsP gives us the word order in (50).
Page 21 of 55

(50)
The clausal complement of N receives a similar treatment.
As reported in Karimi (2001), constructions such as those in (51) and (52), where -râ
follows the whole complex DP, are employed in mass media and by younger speakers.
However, these forms are considered to be incorrect by prescriptive grammarians (for
example, Najafi 1991).
(51)
(52)
‘I gave the book that Sepide bought yesterday to Kimea.’
The treatment of complex DPs proposed in Karimi (2001) provides a simple solution for
the generation of these forms: In both cases, the entire complex DP containing the CP
moves into [Spec, KsP], as in (53):
(53)
(p. 180)
Page 22 of 55

As for relative clauses within the subject DP, there are some interesting data that need
careful examination. Here are a few examples:
(54)
There is a scope difference between these two sentences. The sentence in (54a) means
that the set of students who are smart is the subset of students who study well. In other
words, among the students who study well, there are some who are not so smart. The
sentence in (54b) means that the set of students who study well is the same as the set of
smart students. That is, all students who study well are smart. Here is another set.
(55)
Again, (55a) means that those Iranians who live in Lake Oswego are rich, but there are
other rich Iranians as well who live elsewhere. In other words, the set of Iranians who
live in Lake Oswego is the subset of rich Iranians. (55b) means that the set of Iranians
who live in Lake Oswego is the same as the set of rich Iranians.
Are the ‘b’ sentences in (54) and (55) the result of a syntactic extraposition rule? If so,
why should the extraposition of the relative clause in these and similar data provide a
different interpretation? This is an interesting topic that I leave for future research.
7.4.3 Classifiers
Classifiers are functional morphemes which, in some languages including Persian, appear
between a numeral and a noun (Gebhardt 2008, 2009). Following Samiian (1983),
Ghaniabadi (2010) proposes the existence of three types of classifiers: true classifiers,
Page 23 of 55

measure nouns, and group nouns. True classifiers are used with count nouns, while
measure nouns and group nouns are used with mass nouns. (p. 181)
(56)
The classifier tâ is the most common one and may replace the other ones. Ghaniabadi
suggests that this classifier may not appear with the other ones.
(57)
Furthermore, Persian classifiers are suggested to be optional (Qarib et al. 1950; Bateni
1969; Mahootian 1997; Megerdoomian 2002; Gebhardt 2009).19
Gebhardt (2009) provides a feature-driven theory of classifiers in several languages,

including Persian, organized along the feature-geometric analysis of Harley and Ritter
(2002). He suggests that the classifier tâ has the features [group, abs].20 This means that
the appearance of tâ indicates plurality. The plural nature of tâ thus explains why it
cannot appear with ye(k) ‘one’.
(58)
Page 24 of 55

Gebhardt goes on stating that there are some elements in Persian that semantically agree
with the noun with regard to shape, material, animacy, etc., such as nafar (for people),
ghabze (for swords, rifles), adad (for smallish inanimate things like pencil), etc. He
(p. 182) does not consider these elements as classifiers. For him, they function as
modifiers of the classifier tâ, in case they appear together. Thus for him the string se tâ
jeld ketâb (cf. 57b) is grammatical; jeld ‘volume’ restricts the semantic domain of the
classifier tâ. Comparing jeld with tâ, he shows that the former may appear with ye(k),
while the latter may not (cf. 58).
(59)
The contrast between (58) and (59) might be due to the specific semantics of tâ and jeld:
while the former is inherently plural, the latter is not. However, the order of these two
elements is the same as the order of nouns and their modifiers. A revers order of tâ and
jeld makes the string ungrammatical.
(60)
Although the phrase in (60a) sounds a little odd to me, it is far better than the one in
(60b). Furthermore, the appearance of tâ with other elements that are classified as group
noun classifiers by Samiian (1983) and Ghaniabadi (2010) sound perfectly well formed.
(61)
Page 25 of 55

Finally, Gebhardt shows that the optional classifier tâ is in fact obligatory in partitive
constructions, as in (62).
(62)
Gebhardt argues that the obligatory presence of tâ in partitive constructions, such as the
one in (62), provides a serious problem for previous accounts of classifiers (Chierchia
1998, Borer 2005). This is so because Chierchia’s theory states that in some languages all
nouns are mass, (p. 183) and therefore, they need a classifier to convert them into
predicates, which can then be used with numerals. However, the prepositional phrase az
pesær-hâ is not a noun in the first place. Thus his theory cannot explain the obligatory
presence of the classifier in these cases. Borer, on the other hand, argues that if a noun is
the complement of a ‘divider’ (either a classifier or a plural morphology), then it becomes
a count noun. The problem for her is that the PP az pesar-hâ is not a noun, and therefore,
it cannot be placed in an appropriate position within the DP to be divided by the
classifier.
Gebhardt then goes on suggesting that the classifier tâ is optional in Persian, since the
divider feature can float on other elements such as numerals in this language. Borer’s
theory can account for this optionality. However, the numeral subcategorizes for a
Number Phrase. Since PPs are not Number Phrases in the sense of Ritter (1991), the
classifier is the only option to save the derivation, and thus becomes obligatory in this
case.
7.5 Marked and unmarked objects and the

morpheme -râ
Differential object marking (DOM) is the morphological marking of direct objects in some
languages based on one or more nominal hierarchies, such as definiteness or animacy. In
some languages, marking is obligatory on definite objects and prohibited on non-
referential objects, regardless of animacy, while referential indefinite objects are
sometimes marked and sometimes not, based on specificity (Enç 1991; Karimi 2003; Key
2008). I discuss the unmarked object in 7.5.1, followed by an overview of the morpheme -
râ, and the DPs marked by this element in 7.5.2.
7.5.1 Unmarked objects
Page 26 of 55

The semantics of the unmarked object combined with the verb gives the impression that
the nominal element is not an argument of the verb, but rather part of the predicate, as in
(63).
(63)
In fact, it has been suggested that this element incorporates into the predicate (Dabir-
Moghaddam 1997). Along the same lines, Ghomeshi and Massam (1994: 183–4) suggest
that non-referential objects undergo Type I Noun Incorporation (NI) in the sense of
Mithun (1984). Given Baker’s (1988, 1996) Noun Incorporation, this would imply that the
unmarked object, incorporated into the verb, does not receive Case. As noted by Ganjavi
(2007), however, bare objects may be modified by adjectives, and thus receive a phrasal
status. In those cases, they cannot incorporate into the verb since this operation involves
only Xo (p. 184) categories.21 A number of questions arise: is there any evidence that the
unmarked object is a DP, saturating the internal argument of the verb, or an NP, lacking a
clear argument status?22 Does it require Case?
There are pieces of evidence suggesting that the unmarked object is syntactically
independent of the verb, saturates its internal argument, and therefore, is a DP that
requires Case. Evidence for this claim is provided by discourse constructions where the
unmarked object is contrastively focused or topicalized as in (8) in section 7.2 of this
chapter. Furthermore, the unmarked object may serve as the subject of a passive
construction, thus confirming that it is in fact the internal argument of the verb.
(64)
(65)
These data suggest that the unmarked object is in fact the argument of the verb, and thus
has a DP status and requires to be checked for Case.
The next subsection is devoted to an overview of the morpheme -râ and the DP it marks.
7.5.2 Marked objects and the nature of -râ
Page 27 of 55

No other morpheme in Modern Persian has attracted as much interest on the part of
grammarians, linguists and language teachers as the postpositional morpheme -râ. In Old
Persian, -râ appears as râdi marking a cause with the meaning ‘for the sake of’. The same
interpretation holds for rây, the reflex of râdi in Middle Persian. According to Brunner
(1977), Middle Persian rây served other functions as well. It appeared as an illustration of
purpose, reference, beneficiary, or indirect object (Karimi 1990).
Traditional grammarians have assumed -râ to be a direct object marker (Natel-Khanlari

1986; among others). Some linguists have argued that this element has a secondary
function as marking the object for definiteness (Phillott 1919; Sadeghi 1970; Vazinpour
1976; among others). Lazard (1957) distinguishes between polarized objects (NP+râ with
transitive verbs) and quasi-polarized objects (NP+râ with intransitive objects). Examining
the diachronic development of -râ, Key (2008) suggests that this element marked
animate, rather than (p. 185) definite, objects in early texts (e.g. Qabusnâme, eleventh
century). Browne (1970) argued that -râ appears with definite as well as specific
indefinite objects. The following examples represent both definite and specific indefinite
objects.
(66)
While -râ is obligatory with the definite object in (66a), it is optional in (66b). Its presence
with the indefinite object provides a specific interpretation.
Some linguists have considered -râ as a topic marker in Modern Persian (Peterson 1974;
Windfuhr 1979; Ghomeshi 1997). In fact, the DP followed by -râ may receive a topic
interpretation, as we will see below. However, it may also mark a contrastively focused
DP as well.
In terms of the position of the object, Karimi (2005) argued that the non-specific object as
well as the specific (marked) object are both merged in the object position of the verb.
While the former remains in situ adjacent to the verb in a discourse neutral sentence, the
latter moves out of the VP (or Pred(icate)P) into the lower Specifier of vP. This movement
can be considered an instance of object shift, an operation observed in some of the
Germanic languages such as Icelandic and German (Holmberg 1986, 1999; Diesing 1996).
The assumption is that the VP is the domain of novel/existential interpretation in the
Page 28 of 55

sence of Heim (1982), Kratze (1995), Diesing (1992) and others, and thus the specific
object, representing old information, has to move out of that domain.23
(67)
(p. 186)
DP+râ might move out of the vP into a higher position, presumably a topic position or a
contrastive focus position, preceding the subject.24
Karimi and Smith (2015) argue that -râ is not a direct object marker, but rather a default
case marker in the sense of Marantz (1991). Marantz argues that case assignment only
indirectly reflects the syntactic structure, and that there is a special Morphological
Structure (MS) of the grammar in which m(orphological)-case is assigned. He further
suggests that the morphological realization of case obeys the disjunctive hierarchy in
(68).
(68)
Each type of case in this hierarchy is more specific than the case below it, and thus takes
priority in case realization. Therefore, lexically governed case is always assigned over any
other case in the hierarchy. This is motivated by the facts of Icelandic quirky case, in
which the case assigned by the verb is retained on the quirky case marked DP regardless
of its syntactic position.
Page 29 of 55

Karimi and Smith’s analysis is based on the well-known facts that the appearance of -râ is
not specific to direct objects (cf. Lazard 1957, 1992), as the following data indicate. The
data in (69) represent the Classical Modern Persian (CMP), although they are still
employed in the formal written language.
(69)
(70)
(p. 187)
The morpheme -râ also appears in a different possessive construction represented by the
example in (71): bud ‘was’ is a copula, yet -râ appears following the DP pâdshâh ‘king’.
(71)
Karimi and Smith argue that the presence of -râ is clearly not lexically governed, since it
can appear with various types of verbs. It does not represent Marantz’s dependent case
either, since its occurrence does not depend on the presence of a higher case (cf. 69–71).
Nor can it be an unmarked case, since it does not appear with subjects. The only
remaining case is the default case. These authors argue that -râ marks a specific DP when
Page 30 of 55

there is no other case available in a derived position, proposing the following

generalization.
(72)
The claim that -râ is a postsyntactic default case-marker even in Modern Persian is
evident by the fact that this element may appear with intransitive verbs in non-possessive
constructions when the DP is in a derived position revealing a discourse interpretation.
The fact that subjects and objects of preposition are not marked by -râ follows from this
analysis, since those elements receive unmarked case (cf. iii in 68)
(73)
7.6 Passives, causatives, and resultatives

In this section, I discuss three different constructions in Persian: passives, causatives, and
resultatives. Although the first two have been discussed by some grammarians and
linguists, the latter has not received much attention. I start with a discussion of passive in
7.6.1, followed by an examination of causatives and resultatives in 7.6.2 and 7.6.3,
respectively. For more discussion on causative form, see Chapters 2 and 3.
7.6.1 Passives
Passives may appear in various forms in Persian. The data in (74–6) provide three types of
constructions that can be considered passives, at least semantically. For more discussion
on passive construction, see Chapter 3. (p. 188)
Page 31 of 55

(74)
(75)
(76)
In (74b), the participle form xorde ‘eaten’ is combined with the verb shod ‘became’ to
provide a passive reading. In (75b), the unaccusative light verb xord ‘collided’ replaces
the agentive light verb dâd ‘gave’ in (75a), providing a passive reading. In (76), there is a
semantically understood, but syntactically unspecified and lexically empty, subject.
The discussion of passive constructions goes back to Phillott (1919), who suggests that
this construction is not used productively in Persian. Some linguists postulate a
transformational passive rule for Persian (Marashi 1970; Palmer 1971; Soheili-Isfahani
1976; Dabir-Moghaddam 1982b, 1985). Moyne (1974), on the other hand, argues that
Persian lacks a passive construction, either morphological or syntactic, by suggesting
that there is no underlying agent in so-called passive constructions in this language. He
acknowledges, however, that there are examples such as those in (77) with an overt
agent.
(77)
Page 32 of 55

Moyne suggests, however, that these agentive phrases are new and awkward in Persian.
He concludes that there are no active–passive pairs in Persian, and those constructions
with shodan ‘become’ are in fact inchoatives, and the agentive phrases are instrumental.
Dabir-Moghaddam (1982b, 1985) argues that, in addition to inchoative constructions,

there are also structural passives in this language. This author suggests that the verb
shodan ‘become’, although a motion verb in Middle Persian, has taken on a special
function as a passive auxiliary in Modern Persian. He further shows that, in addition to its
new function, shodan continued to appear in its earlier function in Classical Modern
Persian (e.g. be Kerman shod ‘S/he went to Kerman), indicating a functional transition at
that period. In Modern Persian, he suggests, this element represents inchoative as well as
passive constructions. As for the latter, Dabir-Moghaddam shows that the direct object of
an active sentence (p. 189) becomes the structural subject of the corresponding passive,
and the agent-phrase (used with an instrumental preposition) is optionally employed.
(78)
Dabir-Moghaddam suggests that these constructions differ from inchoatives based on the
following contrast: while (79a) represents a passive construction that requires an
underlying agent, (79b) is inchoative, lacking such an element.
(79)
Thus Dabir-Moghaddam suggests a passive rule that relates the underlying active
sentence to the corresponding passive version. He further argues that this rule applies to
verbs that express volitional force.
Folli, Harley, and Karimi (FHK) (2005) argue that the so-called passive constructions are
instances of complex predicate constructions in which the past participle of the verb
serves as an NV element.25 Based on their analysis of complex predicates, they suggest
Page 33 of 55

the structure in (81) for the sentence in (80). In this sentence, the past participle dâde
has adjectival properties. The complement of the adjective moves into the subject position
in this construction.
(80)
(81)
(p. 190)
FHK suggest that this structure is similar to the regular unaccusative CPr consisting of
an adjective as the NV element and an LV. Thus the sentence in (82) has the phrasal
structure in (83).
(82)
(83)
These authors further suggest that their analysis predicts that there is no ‘passive’ of
CPrs with a nominal NV element. That is, the light verb shodan ‘become’, selects for a
predicative small clause complement where there is no room for both a nominal NV
Page 34 of 55

element and a deverbal adjective. The following data show that this prediction is in fact
borne out.26
(84)
Two of these authors (Harley and Karimi) have come across some (unproductive) data
contradicting the generalized analysis presented in their 2005 work. That is, there seem
to be data that allow both, the nominal NV element and the deverbal adjective, in
addition to the light verb.
(85)
(86)
There are also cases in which the NV element is a prepositional phrase, allowing the
presence of the PP and the deverbal adjective, in addition to the light verb, as in (87–8).
(87)
(88)
(p. 191)
As for (85, 86) and similar cases, it seems that this kind of construction is only possible if
there is no alternative LV construction available in the language. Compare the sets in (89–
90) with those in (91–2).
Page 35 of 55

(89)
(90)
(91)
(92)
As for (91) and (92), it seems that the passive form of these constructions is almost
exclusively restricted to data that are used in elevated formal language. However, a
careful analysis of these constructions is required.27
7.6.2 Causatives
Persian distinguishes several types of causatives. One group, called labile causatives,
retain their surface form, but take on an additional argument in order to be interpreted
as causative, as in (93b). These forms are not very common in Persian.28
(93)
(p. 192)
Page 36 of 55

Light verb alternating causative is another type which is achieved by changing the light
verb of the CPr.
(94)
The third type is the periphrastic causative that is also formed as a bi-clausal complex
predicate.
(95)
There are, however, periphrastic causative CPrs that are used with the LV shodan, and
have no counterpart with kardan. These causative predicates are also biclausal.
(96)
Another type of causative is expressed semantically by verbs such as koshtan ‘killing’.

These are direct causatives, in which the verb expresses the idea that the act is
completed on a patient by an agent, yet there is no overt morpheme expressing causation
(Nabors 2014). The sentence in (97) means that the hunter caused the bear to die.
(97)
Finally, the morphological causation is formed by adding the affix -ân to a number of
transitive and intransitive verbs.29
Page 37 of 55

(98)
(p. 193)
7.6.3 Resultatives
In Persian, change-of-state CPrs, revealing a resultative reading, are made up of a light

verb plus a resultative NV element, as in (99). In this example, change-of-state results in
yax ‘ice’ becoming âb ‘water’.
(99)
English allows a secondary resultative predicate, as in the ice melted away, in which away
is a secondary resultative predicate. Additional examples are provided in (100).
(100)
Persian does not allow a secondary resultative predicate in complex predicate

constructions.
(101)
Page 38 of 55

As argued in FHK (2005), sâf ‘straight’ cannot be a secondary resultative predicate in this
sentence. To obtain a resultative reading, Persian adds a second clause.
(102)
Persian resultative constructions have not been thoroughly examined. This is one of the
interesting syntactic areas that needs some attention.
7.7 Raising and control

The discussion of raising and control constructions within the area of generative
linguistics goes back to early stages of this theoretical framework. English examples of
these two constructions are provided in (103a,b). (p. 194)
(103)
The empty category (e) in (103a) is considered to be the trace/copy of the noun phrase
children that has moved into the subject position of the main clause. The main verb does
not subcategorize for an external argument (subject), and thus the subject position is
empty. This movement is considered to be Case-driven. That is, the infinitive verb in the
embedded clause lacks Case, and therefore, the subject moves into the main clause to
receive Nominative Case from the finite matrix verb. The empty category in (103b), on
the other hand, is suggested to be PRO by Chomsky and Lasnik (1977) and consequent
work by others. That is, the main verb subcategorizes for an external argument which is
co-indexed with PRO, the phonologically empty subject of the embedded clause.
As for Persian, the following two sentences exemplify these two constructions.
(104)
Page 39 of 55

I start with an overview of the literature on raising constructions in 7.7.1, and continue
with an examination of various properties of control constructions in 7.7.2.
7.7.1 Raising
Several authors have argued that Persian lacks raising constructions (Hashemipour 1989;
Karimi 1999, 2005; Ghomeshi 2001). This assumption is based on the following facts: (i)
the embedded subject may stay in situ (105a); (ii) the embedded verb agrees with the
embedded subject even when the latter appears in the main clause (105b); and crucially,
(iii) the main verb is inflected for the third-person singular, regardless of the number and
person of the raised subject (105b).
(105)
Furthermore, any other phrasal element may move out of the embedded clause into the
matrix clause in these constructions: (p. 195)
(106)
In (106), the object has moved into the matrix clause while the embedded subject is in
situ. Similar to (105b), there is no agreement between the verb and the raised object DP.
Karimi (2005) suggests that the derived DP in these constructions is in a topic position.
This explains why the verb does not agree with it.30
7.7.2 Control
There are several types of control constructions in Persian, represented by the data in
(107–10).
Page 40 of 55

(107)
(108)
(109)
(110) 31
The discussion of control constructions in Persian goes back to Hashemipour (1989).

Several other authors have discussed these constructions since then, including Ghomeshi
(2001), Darzi (2008a), Pirooz (2008), Karimi (2008), Darzi and Motavallian (2010),
(p. 196) Asudeh and Mortazavian (2011) and Ilkhanipour (2014). Two issues have
specifically been the centre of attention with respect to these constructions. First, the
size of the complement of the control predicate in obligatory and non-obligatory
constructions: is this constituent a finite clause (CP), or a verb phrase (vP)? If the latter,
what is the nature of the complementizer ke in those constructions? The second question
has to do with the nature of the empty subject in the embedded clause in these
constructions: is it PRO, as suggested for corresponding English constructions?
Ghomeshi (2001) proposes that the syntactic category of the Persian control complement
is smaller than CP. Her proposal is based on several arguments, including the following:
first, there is no independent tense in the embedded clause in these constructions, and
therefore, there is no Tense Phrase (TP). Second, there is no indirect question in Persian
control construction, and therefore, there is no Complementizer Phrase (CP). Thus she
suggests the following structure for Persian control constructions.
Page 41 of 55

(111)
As for the complementizer ke, she suggests that this element is a clitic in these
construction, hosted by the matrix control predicate.
Darzi (2008a) and Karimi (2008) provide counter arguments to Ghomeshi’s analysis, and
argue that the embedded constituent in Persian control constructions is CP. Asudeh and
Mortazavian (2011) arrive at the same conclusion. As for ke, Darzi convincingly shows
that this element is in fact a complementizer. For example, the following sentence, where
the adverbial follows ke, would be ambiguous if this element were a clitic attached to the
matrix verb. This prediction, however, is not borne out, since the adverbial hamishe can
only be interpreted as modifying the embedded predicate.
(112)
Furthermore, ke may lose its vowel in certain contexts. Darzi shows that, in those
situations, this element is phonetically attached to the element following it, not the one
preceding it.
(113)
Ilkhanipour (2014) suggests that the complement of the control predicate is not a full CP,
but a defective one. Employing Cinque’s (1999, 2004) universal hierarchy of functional
phrases (FPs), and the idea that adverbials are base-generated in the Specifiers of
relevant phrases, she argues that the complement constituent of the control predicate in
Persian is a defective CP that lacks value(s) on mood and modal projections. This is
evident by an example such as the one in (114) in which the complement of the control
predicate is incompatible with an evaluative adverb. (p. 197)
(114)
Page 42 of 55

As for the nature of the empty category in the subject position of the embedded clause, it
is not clear if it can be considered PRO. This is because these constructions do not
uniformly exhibit lack of Tense. The presence of tense indicates the presence of
Nominative Case, an issue that conflicts with the nature of PRO, since this element is
considered to receive Null Case (Martin 2001) in the subject position of an infinitive
clause.32
7.8 Other topics in Persian syntax

There are several interesting topics in Persian syntax that have not been studied as
vigorously as some others in the past. I briefly discuss some of them in this section, and
refer the reader to the existing literature on these topics. Section 7.8.1 is devoted to
modality, and the interaction of modals with subjunctive mood. Section 7.8.2 discusses
negation and its interaction with modals, followed by an overview of Persian aspect in
7.8.3. Two other topics, namely ellipsis and sluicing, are briefly introduced in section
7.8.4. For more information, refer to Chapter 9.
7.8.1 Modality
Taleghani (2006) provides an extensive analysis of the syntax and semantics of Persian
modals. She divides these elements into two groups: verbal and adverbial. The first group
is divided into two subgroups: auxiliary modals and complex modals. This is summarized
in (115).
(115)
(p. 198)
Semantically, Taleghani divides these modals in two major groups: root or event modals
that express ability, permission or obligation, and epistemic modals that involve possibility
and probability. Root modals consist of two subgroups: those related to obligation and
Page 43 of 55

permission from an external source are called deontic, while the ones related to internal
ability and willingness are called dynamic. The data in (116) represent the two types of
root modals in Persian.
(116)
The following data represent epistemic modals in Persian.
(117)
One of the interesting aspects of modals is their interaction with the subjunctive mood,
expressed by the prefix be- (or bo-, bi-, depending on the phonological context). Taleghani
shows that verbal root modals are compatible only with present subjunctives, while
verbal epistemic modals are compatible with both present and perfective mood. The data
in (116) exemplify the interaction of the present subjunctive with root modals. Those in
(118) represent epistemic modals with both present and perfective forms of the
subjunctive mood.33
(118)
Taleghani shows that adverbial modals are not compatible with the subjunctive mood, and
can only appear with the prefix mi- (see section 7.8.3 for a discussion of this prefix).
(p. 199)
Page 44 of 55

(119)
Taleghani suggests that verbal modals take a clausal complement. Thus the subjunctive
form be- appears on the embedded verb in those constructions. Adverbial modals,
however, do not take a complement clause. Therefore the verb, as the matrix predicate, is
incompatible with the subjunctive prefix in a non-imperative context.34, 35
7.8.2 Negation
Pollock (1989) proposes that negation is the head of its own functional phrase, similar to
tense and agreement. Laka (1994) suggests that the position of the negation phrase
within the clause is parameterized cross-linguistically. As for Persian, Taleghani (2006)
proposes that the negation prefix na- appears in the head position of the negation phrase
which is located above the Tense Phrase (TP), and is in an Agree relation with a negative
feature on the verb, which results in the surface realization of this prefix on the verb.
Furthermore, the negation morpheme attaches to the prefix mi-, and is incompatible with
the subjunctive morpheme be-.
(120)
According to Taleghani, the underlying structure for (120b) is the one in (121), where
AspP stands for Aspect Phrase:
(121)
The arrows represent an Agree relation between the verb, on the one hand, and negation,
tense (past tense in this case) and aspect (thus the presence of mi-), on the other.
As for the incompatibility of the subjunctive prefix with negation, Darzi (2008b) offers a
morphosyntactic solution. He proposes the existence of a phrase (PolP) between TP and
AspP in Persian. According to him, the head of this phrase is the locus of either the
subjunctive or the post-modal negative feature, thus capturing the complementary
distribution of the two.
The interaction of modals with negation is another interesting topic. (p. 200)
Page 45 of 55

(122)
In (122a), the negation prefix is attached to the modal, and receives a wide scope. In
(122b), it is attached to the main verb, and receives a narrow scope. Taleghani suggests
that Persian negation may take wide or narrow scope over root modals, depending on its
position, with the exception of the dynamic root modal ehtiyâj dâshtan ‘to need’, in which
case negation takes only a wide scope. She reports inconsistencies with respect to the
interaction of epistemic modals with negation.36
7.8.3 Aspect
In this section I concentrate on two elements with respect to aspect in Persian: the prefix
mi- and the auxiliary dâshtan ‘to have’.
As for mi-, Ghomeshi (2001) suggests that this element represents the ongoing nature of
the event. Windfuhr (1979) considers it to refer to a habitual event, while Mahootian
(1997) categorizes it as representing both habitual and imperfect aspects. Taleghani
(2006) suggests that mi- is an aspect marker that refers to continuity and habituality of an
action. Syntactically, she puts this element in the head position of the aspect phrase
(AspP) which is realized on the verb by an Agree relation (cf. 121).
The auxiliary dâshtan (to have) represents progressive aspect in Persian colloquial
language, and is fully inflected, along with the main verb, as in (123). This element
requires the presence of mi- on the main verb, regardless of tense.
(123)
Nematollahi (2015) offers the following example as one of the instances of this
paradigm.37 (p. 201)
Page 46 of 55

(124)
Taleghani (2006) suggests that the colloquial progressive construction with dâshtan is an
instance of Serial Verb Constructions (SVC). This observation is based on Butt’s (1995)
description of SVCs, some of them repeated below in (126).
(125)
The data in (123–4) have all the properties in (125): each sentence involves only one
event, there is only one external argument shared by both verbs, and neither verb is
embedded within the complement of the other.
An intriguing property of Persian progressive constructions with dâshtan is that they may
not co-occur with negation.
(126)
This is an issue worth close examination in future.
7.8.4 Ellipsis and sluicing
Although ellipsis and sluicing have been the subject of much attention with respect to
English, the Persian counterparts of these constructions were, to my knowledge,
untouched until recently. These two topics are briefly introduced in this section.
7.8.4.1 Ellipsis
Ellipsis refers to the omission of one or more word(s) from a clause that are nevertheless
recoverable in the context of the remaining elements in the sentence. In this section I
introduce two articles and one work in progress on Persian ellipsis.
Page 47 of 55

Toosarvandani (2009) discusses ellipsis in Persian complex predicates. In this

construction, the light verb survives, while the NVE and the object are elided, as in (127).
(127)
(p. 202)
In this construction, the NV element, along with the object, are elided, leaving the light
verb stranded. Toosarvandani suggests that this construction is similar to VP ellipsis in
English.
Eliding the subject and object is yet another type of ellipsis. Sato and Karimi (2015)
discuss this kind of ellipsis and show, among others, that this language exhibits a subject–
object asymmetry with respect to sloppy interpretation of null arguments in these
elliptical constructions.
(128)
The missing argument here allows both strict and sloppy interpretations. In other words,
the sentence in (128b) means either that Parviz also loves Kimea’s teacher (the strict
interpretation) or that Parviz also loves Parviz’s own teacher (the sloppy interpretation).
Turning now to ellipsis of subjects, the example in (129b) illustrates a null subject
construction in which the embedded empty subject is anaphoric to the overt subject in
the full-fledged antecedent clause in (129a). Unlike null objects, however, null subjects
disallow the sloppy interpretation; thus (129b) can mean that Parviz said that Kimea’s
friend knows French, but it cannot mean that Parviz said that Parviz’s own friend knows
French.
Page 48 of 55

(129)
Based on various syntactic constructions in Persian, Karimi and Sato argue against
analyses built on Verb-Stranding VP-ellipsis (VVPE) proposed by Goldberg (2005; among
others) for this type of ellipsis in Persian. According to VVPE, the verb is stranded in
these constructions, followed by VP deletion (similar to Toosarvandani’s analysis of the
complex predicate construction discussed in this section). Sato and Karimi provide
arguments showing that the only elements that are missing in these constructions are in
fact the arguments themselves, and not a larger constituent.
Smith et al. (2016) discuss NVE ellipsis in Persian. They argue that this type of ellipsis,
contrary to Toosarvandani (forthcoming), is allowed in this language, although it is
restricted by the specificity property of the direct object. The data in (130) exemplify this
contrast.
(130)
(p. 203)
Smith et al. argue that this restriction boils down to the fact that copies of specific objects
(of type <e>) can convert into bound variables (Fox 2002; Sauerland 2004), thus
permitting parallelism when specific objects scramble. Copies of non-specific objects may
not convert, leading to a violation of parallelism that is required for ellipsis.
7.8.4.2 Sluicing
The sluice construction was introduced by Ross (1969). (131) represents a typical
example.
(131)
Page 49 of 55

It has been argued that the wh-phrase in a sluice construction moves into the Specifier of
CP, followed by a process of TP deletion (Ross 1969; Merchant 2001; among others). Thus
(132) would be the underlying structure of (131).
(132)
Persian exhibits sluicing as well, as attested by the following data.
(133)
Persian is a wh-in-situ language, as discussed in section 7.2 of this article. Thus the
underlying structure of the sluice construction seems to be a mystery at the first glance,
since the wh-phrase does not move into the Specifier of CP.
Building on the focus construction proposed by Karimi (2005), Toosarvandani suggests

the following underlying construction for a sentence like (133a).
(134)
In (134), the wh-phrase has moved into the Specifier of the FP, followed by the deletion of
the TP which contains the rest of the sentence.
(p. 204) 7.9 Conclusion

This chapter offered an overview of some of the major syntactic and morphosyntactic
properties of Persian. I also referred to the existing literature on each topic without
getting into the detailed theoretical discussions. Of the topics introduced in this chapter,
Page 50 of 55

three have been examined extensively by various grammarians and linguists over several
decades: complex predicates (section 7.3), Ezafe constructions (section 7.4.1) and marked
objects and the nature of -râ (section 7.5.2).
Elaborated discussions of these constructions are offered in Chapter 9. Due to the

descriptive nature of this chapter, theoretical considerations were not thoroughly
discussed, although they were briefly mentioned in some cases. I refer the reader to
Chapter 8 for an examination of theories employed in the literature for Persian syntax.
Some of the issues introduced in this chapter have not been thoroughly examined in the
literature. For example, problems related to complex DPs, specifically with respect to
extraposition of the CP out of the complex DP requires close attention. Furthermore, the
nature of resultative constructions, and the reason why Persian does not allow secondary
predicate constructions such as I hammered the metal flat need to be examined. Finally,
topics briefly introduced in section 7.8 (modality, negation, aspect, ellipsis and sluicing)
have specially been under-studied. Due to the space limitation, many other interesting
topics in Persian syntax and morphosyntax were not even touched on in this chapter.
There is no doubt that Persian syntax and morphosyntax offer interesting and exciting
topics that cry out for further exploration and examination in future work.
Notes:
(1) Qarib, Bahar, Foruzanfar, Homayi, and Rashidi are the authors of dastur- e panj ostâd
‘grammar of five masters’ mentioned above.
(2) See section 7.4.1 in this article for a discussion of this element.
(3) Other grammar books within the grammarian tradition devoted to Persian morphology
and syntax include Vazinpour (1976) and Mogharrabi (1993). Again, these works base
their descriptions on formal written language and Persian poetry.
(4) See Karimi (1989); Darzi (1996); Mahootian (2007); among others, for discussions of
Persian word order.
(5) See section 7.5 for a descriptive discussion of the properties of specific and non-
specific direct objects in Persian.
(6) Abbreviations: Ez: Ezafe affix; -râ: Accusative marker for specific objects; sg: singular;
pl: plural; asp: aspect, neg: negation; subj: subjunctive; Cl: classifier; rel: relative, ind:
indefinite, emph: emphatic.
(7) See also Ganjavi (2003), who claims that Persian scrambling is not driven by focus.
(8) Dari, a variant of Persian spoken in Afghanistan, has an interesting property. While the
wh-phrase is in situ, another wh-phrase appears in sentence initial position.
Page 51 of 55

((i))
See Karimi and Taleghani (2007) for an analysis.
(9) Raghibdust (1994) suggests, however, that the dislocated wh-phrases represent topic.
(10) Persian is a Null-Subject language, hence the presence of pro in the subject position.
(11) These elements are also called compound verbs (e.g. Dabir-Moghaddam 1995) and
phrasal verbs (e.g. Bateni 2007) in the literature.
(12) Various aspects of complex verb constructions have been discussed by Moyne (1970);
Tabaian (1979); Bashiri (1981); Barjesteh (1983); Karimi (1997, 2005); Heny and Samiian
(1991); Mohammad and Karimi (1992); Sadeghi (1993); Ghomeshi and Massam (1994);
Dabir-Moghaddam (1995); Vahedi-Langrudi (1996); Karimi-Doostan (1997, 2005, 2008,
2011); Megerdoomian (2002a, 2002b, 2012a); Goldberg (2003); Folli, Harley, and Karimi
(2005); Family (2006); Samvelian (2006, 2012); Toosarvandani (2009); Sedighi (2009);
Müller (2010); Pantcheva (2010); Shabani-Jadidi (2014); Samvelian and Faghiri (2013);
among others.
(13) See Samvelian (2006a) and Samvelian and Faghiri (2013) for a different view
regarding this issue.
(14) See Karimi-Doostan (2011) for a discussion of separability of the Persian nominal
NVE. This author suggests that some nominal NVEs may function as the direct object of
the verb, in which case they may be modified by an adjective, be relativized, scrambled,
and focused.
(15) See Yadgar Karimi (2013) for a different proposal representing the structure of the
impersonal construction in (34).
(16) The Ezafe construction has also been examined in other Iranian languages. See, for
example, Holmberg and Odden (2005, 2008) for Hawrami; Larson and Yamakido (2006)
for Zazaki; Samvelian (2008) for Kurmanji and Sorani; among others. In some of these
languages, the form of the Ezafe varies depending on specific properties of the noun,
including its phi features.
(17) Furthermore, Kahnemuyipour (2015) provides data indicating that Ezafe may appear
preceding pure prepositions in Persian, thus further undermining the Case assigning
nature of the Ezafe affix.
(18) Holmberg and Odden (2005), too, propose a ‘roll-up’ derivation of the Ezafe
construction in Hawrami.
Page 52 of 55

(19) See Ganjavi (2007), however, who argues that optionality boils down to formal versus
informal style: while classifiers are not employed in the former style, they are in fact
obligatory in the latter. See also Lazard (1992) for a similar account of Persian classifiers.
I tend to agree with this assessment.
(20) Gebhardt uses ‘abs’ for ‘absolute’, a more specific quantity feature for numerals.
(21) Ganjavi (2007) suggests that the bare object is a complement of the verb, and
undergoes a Pseudo Noun Incorporation (PNI) in the sense of Massam (2001), and thus
does not require Case. Discussing Niuean, Massam suggests that the object in a VSO
order in this language is a full DP, while the less common VOS order involves an NP
object that undergoes PNI. Ganjavi (2007, 2011) further suggests that the non-râ marked
object is either an NP or a NumP.
(22) Mahootian (1997: 203 fn.) suggests that Persian bare noun phrases are either
indefinite or generic. Her analysis includes subjects as well. Modarresi and Simonenko
(2007) and Modarresi (2014) suggest that the indefinite reading of the bare object is
likely to be the result of a semantic incorporation. See also Ghomeshi (2008) whose
analysis of bare noun phrases includes objects of locative prepositions, in addition to
subjects and objects.
(23) If we assume that the novel domain is not VP, but vP in a more modern sense, then it
is not surprising that DP+râ may stay in the Specifier of vP if it reveals novel information
such as emphesis or comparison. This is in fact borne out empirically.
((i))
In this example, DP+râ follows the adverbial hanuz ‘yet’, an adverb that marks the vP
edge.
(24) Ganjavi (2007) suggests a similar situation for the marked direct object, by moving it
out of the VP into the Specifier of a functional head.
(25) See also Vahedi-Langrudi (1996) for a similar idea.
(26) See also Vahedi-Langrudi (1999) who argues that shodan is not an auxiliary, but a
main light verb. He derives the complex predicate formation of the light verb and the NV
element by a postsyntactic operation.
(27) Examining these types of constructions is part of an elaborated NSF grant on

complex predicates in a number of Iranian languages awarded to Karimi (PI), Harley and
Carnie (Co-PIs) for the period of July 2015–December 2018.
Page 53 of 55

(28) For an elaborated discussion of all types of causative constructions, see Dabir-
Moghaddam (1982a, 1987). Lotfi (2008) examines various causative constructions as well.
Soheili-Isfahani (1987) discusses morphological causatives. Nabors (2014) offers a
theoretical analysis of morphological causatives within the framework of Distributed
Morphology. She examines, among others, causatives with verbal roots (xor-ân-d-an ‘to
make eat) and those with nominal roots (tars-ân-d-an ‘to make scare’).
(29) In colloquial Persian, â is pronounced as u preceding nasal consonants.
(30) Darzi (1996) rejects the idea that there is no subject-to-subject raising construction in
Persian. His analysis is partially based on the observation that the subject position of the
matrix clause can be filled with the demonstrative in ‘this’ which he considers to be an
expletive. He also provides evidence indicating that the moved subject has some matrix-
subject-like properties with respect to binding and quantifier floating. However, lack of
agreement provides a problem for his analysis.
(31) For a discussion of Persian arbitrary control see Karimi (2008).
(32) Hornstein (1999) and work thereafter suggest that control and raising constructions
have the same syntax. That is, the surface subject in both cases is merged in the
embedded clause, and moves into the subject position of the matrix clause. This proposal
has several problems, including the fact that the scope interpretation of the subject in
these two constructions is quite different: while the subject in a raising construction may
receive a wide- or a narrow-scope reading, the one in a control construction may only
receive a wide-scope interpretation, indicating that it could not have moved from a lower
position to its surface position.
(33) For a discussion of the interaction of subjunctive mood with past tense, see Tavangar
and Amuzadeh (2009). These authors examine the function of the simple past tense as a
grammaticalized exponent of epistemic and deontic modality within a future-oriented
temporal framework.
(34) Rahimian, Najari, and Hesarpuladi (2015) provide a report of the historical
development of modal auxiliaries in Persian. They state that Old Persian lacked modals,
while Middle Persian employed four modals which had their roots in Old Persian main
verbs. They further report that two of those modals are still employed in Modern Persian:
bâyestan ‘must’ and tavânestan ‘can’.
(35) For a critical review of previous literature on modals, including Taleghani’s work, see
Ilkhanipour (2012).
(36) See chapter 5 in Taleghani (2006) for a syntactic analysis of the interaction of root
and epistemic modals with negation.
Page 54 of 55

(37) Nematollahi (2015) reports that Zhukovskij (1888) was the first to mention this type
of progressive form in Persian colloquial language. Nematollahi’s own study is based on
data she collected from the literary work published between 1907 and 2010. She shows
that this progressive form is not specific to the colloquial language, and has been
increasingly employed in the literary style as well. See also Dehghan (1972) for a
comprehensive discussion of this phenomenon.
Simin Karimi
Simin Karimi is a Professor in the Department of Linguistics at the University of

Arizona. She has worked on various syntactic topics in Persian, including word order
and scrambling, syntax–discourse interaction, complex predicates, and complex DPs.
Her current research focuses on control constructions, ellipsis, and the syntax and
semantics of complex predicates in various Iranian languages. She has published
journal articles, book chapters, and one book length monograph. She has also edited/
co-edited five books and a special issue for the journal Lingua.
Page 55 of 55

Other Approaches to Syntax

Jila Ghomeshi

This chapter surveys theoretical approaches to Persian syntax, with an emphasis on more
recent work. It begins with a brief discussion of what constitutes a theoretical, as
opposed to descriptive, approach, and proceeds with a look at Linguistic Typology,
Construction Grammar, and Cognitive, Functionalist, and Corpus approaches. At the end
of the chapter, formal approaches such as RRG, HPSG, and LFG are touched upon as well
as formal generative work prior to Minimalism, which is covered in the previous chapter.
The discussion in each case is intended to showcase the way in which aspects of Persian
syntax have been addressed. The advantages to considering Persian both in a historical
context and within the family of other Iranian languages is highlighted, and a number of
distinctive constructions within the language are discussed in light of how they have been
treated within the literature.
Keywords: theoretical approaches, linguistic typology, construction grammar, cognitive linguistics, corpus
linguistics, generative grammar
8.1 Introduction
IN Chapter 7, a description of Persian syntax was presented, primarily from the
perspective of the Minimalist framework. This chapter presents an overview of some of
the other theoretical approaches that have been taken to Persian syntax. It is not
intended to be a comparison of the merits and flaws of the theories themselves, but
rather to showcase the way in which aspects of Persian syntax have been addressed
within a number of different approaches. The coverage is restricted to the scholarly
literature in linguistics that has been written in English. While the focus is on Persian,
Page 1 of 26

some of the discussion extends to languages in the same family as well as contact
languages. Data and interlinear glosses have been modified, where necessary, to maintain
one standard format throughout the article.
8.2 Descriptive approaches to Persian
In the contemporary linguistics literature, theoretical approaches are often contrasted

with descriptive approaches, which purport to present the facts about a language in a
theory-neutral way. As Dryer (2006) notes, however, even descriptive works are written
within a theoretical framework, meaning that there is no such thing as a purely
atheoretical description. Dryer argues that the relevant distinction is not between
theoretical and theory-neutral work, but between descriptive and explanatory theories.
Descriptive theories tell us what languages are like, while explanatory theories aim to tell
us why languages are the way they are (Dryer 2006: 207). Moreover, Dryer asserts that a
single theory cannot serve both goals simultaneously.
One outstanding example of a truly descriptive, comprehensive, and linguistically

informed grammar of colloquial contemporary Persian is Gilbert Lazard’s (1957)
Grammaire du Persan contemporain, translated into English by Shirley Lyons as A
Grammar of (p. 206) Contemporary Persian (1992). Lazard’s work is a rich resource for
anyone interested in Persian, particularly Persian morphosyntax. It combines
grammatical description with notes on register and on variation of use. Examples are
drawn from both the spoken language and from written texts and are given in Persian
orthography with accompanying translations.
Lazard’s grammar has been followed by a number of others that have similarly been
intended to serve primarily as linguistic description rather than as teaching tools (e.g.
Mahootian 1997; Windfuhr 2009a; and Perry 2007 specifically on Persian morphology). A
slightly different kind of contribution but still in the descriptive vein is Windfuhr’s (1979)
book on Persian grammar with the subtitle ‘History and State of its Study’. Windfuhr
presents the linguistic research literature up until the late 1970s, but situates this
literature within a historical context and provides references for many of the ideas about
Persian morphology and syntax that pre-date the modern linguistic period. Reflecting the
growing interest in lesser-studied languages and language documentation, Windfuhr’s
(2009b) volume on Iranian languages provides a descriptive chapter on Persian and Tajik
(with John R. Perry), alongside grammatical sketches of Western Iranian languages such
as Kurdish and Zazaki and Eastern Iranian languages such as the Pamir group.
It is worth mentioning at this point that the reference to ‘contemporary linguistics

literature’ at the start of this section does not include research into Iranian languages
within comparative-historical philology, a longstanding tradition of scholarship that
continues to the present day, primarily in Europe. It has only been within the last decade
or so that linguists working on Iranian languages within the North American ‘theoretical’
school and those working within the European comparative-historical tradition have
Page 2 of 26

begun to interact in a meaningful way. This is due in large part to the inauguration of an
International Conference on Iranian Linguistics that has taken place biennially since 2005
(see Borjian 2015 on this point). The diversity of scholarly research is reflected in the
publications these meetings have produced (see Karimi et al. 2008; Korn et al. 2011; Haig
and Jahani 2013). In this chapter, I will try to blur the line between the two traditions,
although merely setting up an opposition between ‘descriptive’ and ‘theoretical’ work
frames the ensuing discussion within the modern linguistic paradigm rather than the
comparative-historical one.
Turning now to the focus of this chapter, which is on work in the more ‘theoretical’ (pace
Dryer) area of syntax, we start with linguistic typology. Typological approaches bear some
resemblance to traditional grammar in the sense that they do not purport to be a theory
of the mind, but rather aim to capture generalizations about languages. Linguistic
typology can reveal both what languages have in common and the dimensions along
which they vary. Persian, as it turns out, provides rich territory for work of this kind.
8.3 Linguistic typology

The field of linguistic typology can be traced back to Greenberg (1963), who introduced
the idea of word order correlations and their potential to reveal universal tendencies
across languages (see also Chapters 3, 7, 8, and 15 for more on word order). One well-
known set of correlations is based on whether the direct object precedes (OV) or follows
(VO) the verb in a (p. 207) given language. In OV languages complements tend to precede
heads, so postpositions are expected, while in VO languages complements tend to follow
heads and prepositions are expected. A number of other correlations are thought to be
associated with OV vs. VO order, although complications arise depending on whether the
relevant contrast is between heads vs. complements or between heads vs. phrasal
modifiers (see Dryer 1992). Thus, within noun phrases the only robust correlations
concern genitives and relatives clauses: in VO languages these follow the noun and in OV
languages they precede it.
As it turns out, these three correlation pairs: adpositions with respect to their nominal
complements, and genitives and relative clauses with respect to the nouns they modify,
are sufficient to show that Persian is exceptional in its patterning. It is an OV language as
shown in (1a), but it exhibits VO properties in that it has prepositions (1b), and genitives
and relative clauses that follow the head noun (1c, 1d):
Modern Standard Persian
Page 3 of 26

(1)
Persian adpositions are further discussed in Chapter 3.
To put Modern Standard Persian in a wider context, it is one of only fourteen OV

languages listed in The World Atlas of Language Structures (WALS) that has prepositions
in contrast to the 472 OV languages that have postpositions (Dryer 2013b). Thus it is
highly atypical and this opens up a number of lines of enquiry: How long has Persian been
atypical? Are its closest neighbouring languages also atypical in the same way? What
other features of the language are implicated? The answers to these sorts of questions
can be found in the typological work pursued by Stilo (2005, 2006) and Dabir-
Moghaddam (2001, 2006, 2012), among others.
Stilo (2005) considers geographic proximity as well as genetic affiliation to explain the
‘mixed’ properties of Iranian languages. More specifically, he proposes that languages of
mixed typology are often found ‘sandwiched’ (geographically speaking) between
languages of opposite syntactic types and thus demonstrate a kind of hybridization. In the
case of Iranian languages, they are found in a buffer zone between Arabic and
Mediterranean languages which are typically VO, and Turkic, North Caucasian, and Indic
languages, which are typically OV, resulting in a mixed typology (Stilo 2005: 38). In Stilo
(2006), he narrows his focus to adpositions, which vary across languages in the buffer
(p. 208) zone. He shows that some languages in the Iranian area have prepositions, some
have postpositions, or circumpositions, and some have more than one type (for more
information on circumposition in Persian, see Chapter 2). Again he demonstrates that
areal factors are relevant given that languages to the north such as those of the Turkic
family are consistently postpositional while languages to the south, including those of the
Page 4 of 26

Semitic language family are consistently prepositional. For further discussion on areal
typology, see Chapter 3.
Geographic proximity and contact is one explanation for how typologically inconsistent
languages arise, however, historical processes are also always at play. Dabir-Moghaddam
(2001) looks back to Old and Middle Persian to consider the changes that have occurred
over the passage of time. He notes Persian has changed from being an inflectional
language to an analytic one, and has gone from exhibiting relatively free word order to
showing more configurational properties. He proposes that along with these changes,
Modern Persian is in the process of changing from an OV to a VO language.
In the same work, Dabir-Moghaddam briefly considers the expected correlations

corresponding to OV and VO word order in three other Iranian languages: Gilaki and
Mazandarani spoken in the North of Iran and Kurdish spoken in the Western province of
Kurdistan.1 This comparison set is expanded in a later work (Dabir-Moghaddam 2006) to
an additional eight languages: Howrāmi, Vafsi, Laki, Lori, Delijan, Delvāri, Lāri, and
Nāini. He takes twenty-four correlations pairs and determines how each language
patterns with respect to each pair. From this he identifies the parameters along which all
the languages pattern together, which he calls ‘pan-Iranian parameters’ and the
parameters along which they vary, which he calls ‘parameters of variation’. To take an
example, we find that within the nominal domain all twelve languages have the same
order of intensifier and adjective, but the order of the noun and the adjective can vary.
Within the verbal domain, the order of the verb and the auxiliary verb corresponding to
be able is fixed across all twelve languages but the order of a content verb and other
auxiliaries can vary. (See also Mahmudweyssi and Haig 2009, who find a similar pattern
in four West Iranian languages, where the modal can, like be able, shows a more fixed
order with respect to the verb than other modals.) For more information about Kurdish,
see Chapter 3.
Linguistic typology not only affords us a way of determining the parameters along which
languages can be compared and categorized, it has also led to the identification of new
substantive concepts that can be used to describe languages. Concepts such as ergativity
(further discussed in Chapter 2) or internally headed relative clauses would not be part of
our descriptive apparatus if we were restricted to English and other European languages.
One such concept, evidentiality, is the subject of a recent volume edited by Johanson and
Utas (2000). The papers in this volume address the question of whether Iranian
languages (along with Turkic and other contact languages) have such a category in their
verbal systems.
The category of evidentiality is coded by grammatical elements that serve, loosely

speaking, to indicate the evidence a speaker has for an assertion. These markers can
code direct evidentiality where the speaker has some sensory evidence (e.g. visual,
auditory) for a statement made. Languages of this type cluster primarily among the
indigenous languages (p. 209) of North and South America (de Haan 2013). There are
also languages in which markers can code indirect evidentiality where the speaker learns
Page 5 of 26

of an event after the fact. In this case the category is linked to resultativity and/or
perfectivity as the situation described has to have reached an endpoint. There has been
growing interest in whether Modern Persian has markers of ‘indirectivity’ and those who
claim that it does link indirect evidentiality to the rich tense-aspect system in the
language.
Consider examples (2) and (3), which show eight ways of indicating ‘past’ tense in
Modern Persian (adapted from Jahani 2000: 189–91):
(2)
Page 6 of 26

(3)
As these examples show, Modern Persian has two sets of forms with past time reference.
Those in (3a–d) contain a participial form of the main or auxiliary verb and serve as the
Perfect counterparts to the forms in (2a–d), respectively. Several of the papers in the
volume by Johanson and Utas (2000) suggest that the forms in (3) are connected in some
way or another to indirectivity; that is, they are more likely to be used when the speaker
is reporting on inferred information rather than directly experienced information. As
Comrie (2000) notes in his introduction to the volume, Lazard (2000) makes the most
definitive claim in this regard. Perry (2000) discusses what he calls the ‘epistemic’
function of the perfect tenses, not only in Persian but in Dari and Tajik as well. Jahani
(2000a) reports on an empirical study, which indicates that there may be some variation
depending on register, and Utas (2000) provides a historical view suggesting that
indirectivity is an innovation in Modern Persian.
Another example of an in-depth look at a single phenomenon for the purposes of

typological comparison is found in Haspelmath’s (2004) volume on coordinating
constructions, which includes a comprehensive paper by Stilo on coordination in three
Western Iranian languages: Vafsi, Persian, and Gilaki. This is a rich work containing
information not only on types of coordination, but details regarding the historical origins
of the relevant conjunctions, the stress and intonation patterns of the coordinating
constructions, and careful attention to issues of style and register in the three languages
considered. For example, from the list of the fifteen or so coordinators that Stilo covers,
we learn that, despite being written identically in Persian, the unstressed enclitic =ò and
the word va are of different origins: the enclitic (p. 210) =ò is derived from Old and then
Middle Persian while va is a loanword from Arabic (Stilo 2004: 273). In terms of their
Page 7 of 26

phonological properties, Stilo (p. 280) notes that =ò is encliticized to the word that
precedes it and as such is part of the intonational contour of that word. Va is not
encliticized and can therefore appear clause-initially. Moreover, =ò is never stressed
while va may be.2
While =ò is clearly phonologically enclitic, it acts syntactically as if it is attached to the

following word. Compare (4a) in which ye barādar ‘a brother’ and ye xāhar ‘a sister’ are
coordinated, with (4b) in which ye xāhar ‘a sister’ is extraposed. We see that the enclitic
is also extraposed and attaches to the preceding constituent, which is not part of the
conjunctive construction:
(4)
This phenomenon of extraposition, or shifting, is common among coordinating

constructions. For example, with the conjunctive bisyndetic3 coordinators ham … (=ò)
ham … it is more common to extrapose the second coordinand, as shown in (5b) rather
than to leave it beside the first coordinand as in (5a). The example in (5b) also involves
ellipsis of the main verb:
(5)
Similarly, with the disjunctive bisyndetic coordinators, ya … ya … ‘either … or … ’, there

is a preference for NP shifting and verb ellipsis as in (6b):
Page 8 of 26

(6)
(p. 211) Thus in Stilo’s documenting of one deceptively simple type of construction,
namely coordination, we see two of the characteristics of Persian that have made it so
interesting to study from a mainstream generative (Minimalist) perspective: the shifting
of constituents (aka scrambling; see, for example, Karimi 2005) and the omission of
constituents (aka ellipsis; see, for example, Toosarvandani 2009).
8.4 Construction Grammar

Construction Grammar differs from other formal theories of syntax in that it treats
syntactic constructions as having the same status as lexical items, namely as elements of
the grammar. Thus, Construction Grammar blurs the distinction between the grammar
and the lexicon (for further elaboration, see Goldberg and Jackendoff 2004, for example).
Fried (2015) identifies at least three strands of Construction Grammar: the one originated
by Fillmore and his colleagues and students in the 1980s (e.g. Fillmore 1988), the one
that focuses on argument structure and language acquisition (see Goldberg 1995, 2006),
and the one concerned with issues of typology (see Croft’s (2001) Radical Construction
Grammar). In this section, I will discuss the way in which the notion of ‘construction’ has
been used to describe aspects of the syntax of Persian, without delving too much into the
issues that distinguish these three strands.
One phenomenon that is highly amenable to being viewed from a Construction Grammar
perspective is the issue of ‘alignment’ in Iranian languages. Alignment systems reflect the
way in which the two arguments of a transitive predicate pattern with the single
argument of an intransitive predicate. Following Comrie (1978) I will use the
abbreviations S for the argument of an intransitive predicate, A for the agentive
argument of a transitive predicate and P for the patient argument, though other scholars
adopt different conventions for the same concepts; see Dixon (1994), or Haig (2008) for a
discussion of alignment in Iranian languages.
Alignment systems, i.e. the patterning of core arguments, are expressed in three ways.
Their syntactic realization is through word order, but alignment is also expressed
morphosyntactically via Case (often termed ‘flagging’) and/or via agreement and other
types of person-agreement markers (termed ‘indexing’). These three dimensions may
themselves diverge so that a language may lose ergative Case marking but still exhibit
Page 9 of 26

reflexes of ergativity in its agreement marking. It is for this reason that Haig (2008)
advocates for a Construction Grammar approach over others. In seeking to explain how
languages, specifically Iranian languages, change over time, he notes that alignment
changes can be gradual, taking centuries to take full effect. He argues for an
accumulation of small changes, over generations, that can amount to an overall drift,
rather than one large-scale change that takes place during language transmission from
one generation to another. (The latter view is the one he attributes to Mainstream
Generative Grammar.)
The history of Modern Persian involves a rise and then fall of an ergative alignment
pattern. The ergative alignment pattern arose during that Old Persian period and
continued well into Middle Persian and Parthian but was restricted to transitive clauses
formed with the past stem form of the main verb. This system remains evident in some
(p. 212) Iranian languages of the present day. For example, in Zazaki, a Western Iranian
language, the subject of an intransitive predicate (S) and the object of a transitive
predicate (P) both appear in the Direct case and determine agreement on the verb, while
the subject of a transitive predicate (A) in the past tense appears in the Oblique case and
does not trigger agreement:
Zazaki
(7)
Of special interest to typologists is the fact that some Iranian languages exhibit an
extremely rare form of alignment whereby the two arguments of a transitive predicate (A,
P) share the same form in contrast to the single argument (S) of an intransitive predicate
(see Comrie 2013; Dabir-Moghaddam 2012). Comrie (2016) gives the following examples
from Payne (1980) to illustrate:
Roshani
Page 10 of 26

(8)
Comrie (2013) notes that such examples challenge the functionalist perspective that Case
marking serves to distinguish the core arguments that occur within a single clause.
Moreover, he points out that the Iranian languages exhibiting this highly unusual system
of ‘flagging’ or Case marking are neither genealogically related nor are found within the
same areal continuum. He concludes from this that there is a predisposition for this type
of alignment in Iranian as a whole but that it cannot have arisen from a single historical
event. His conclusion is in line with the way syntactic change is viewed within
Construction Grammar whereby small-scale construction-specific changes can occur in
parallel across languages.
Returning to the history of ergativity in the Iranian languages and its residual effects in
their contemporary counterparts, Haig (2008) presents diachronic evidence supporting
the hypothesis that ergative alignment arose out of a reanalysis of the External Possessor
Construction rather than out of a Passive construction. This analysis shifts focus onto the
way in which non-core arguments of predicates, or what Haig terms ‘indirect
participants’, have been encoded and interpreted in Iranian languages. Moreover, his
work highlights the development of several aspects of Persian syntax that intrigue
present-day linguists. For example, he connects the loss of a rich system of nominal case,
one that is reduced to a single (p. 213) opposition between an unmarked Direct case and
an Oblique case by Middle Iranian, and then disappears altogether in some languages
such as Persian, to the subsequent rise of innovated case markers in those languages,
including accusative/object marker -rā from the Old Iranian postposition rādiy (Haig 2008:
95–6; see also Bossong 1985 cited therein). For those who work on -rā in Modern Persian
the diachronic view contributes a relevant piece of information in that innovated object
markers inevitably display differential object marking, meaning that these markers will
target objects that rank highly on an animacy hierarchy first (Haig 2008: 159.134). See
Chapter 9 for more discussion on -rā and differential object marking.
Another area that provides rich territory for theoreticians of various persuasions is the
occurrence and distributions of clitics and agreement. Haig (2008) suggests that the
simplification of the case system from Old to Middle Iranian might have been
‘compensated for by the massive increase in the use of clitics’ (p. 105, see also pp. 334–8
for a brief overview of the way in which the syntax of clitics has changed throughout the
history of Western Iranian, along with references cited therein).
Page 11 of 26

The distribution of pronominal enclitics differs widely from language to language within
the Iranian group, yet there are persistent ‘family resemblances’. Pronominal enclitic are
further discussed in Chapters 3, 6, 9, and 10. Clitics and agreement are both ways of
indexing arguments, and indexing is one of the three ways (along with Case and word
order) of determining the patterning of core arguments. Unsurprisingly then, Old Iranian
not only had ergative case marking but also ergative agreement. Haig (2008) states that
this agreement system was lost by Middle Iranian but has been retained somewhat in
certain Kurdish dialects. In Central Kurdish, the system of cross-referencing the A
through a clitic pronoun is still robustly attested, while the verbal agreement with the P
only surfaces in the absence of an overt P noun phrase in the clause. The example in (9)
from the Suleimani dialect illustrates this (see Öpengin 2013; Öpengin 2016; and Haig
2017 for details):
Suleimani
(9)
In contrast, in Modern Persian agreement on the verb is with A and it is P that is cross-
referenced via a clitic pronoun on the constituent preceding the verb. Thus in the
examples, (10b) is closer to (9) than (10a) in terms of the order of elements, but the
arguments indexed by the clitics and agreement are the opposite of those in (9).
Modern Persian
(10)
(p. 214)
These sorts of data are mystifying if they are viewed synchronically and in isolation from
the rest of the language family. Moreover, the examples themselves are not necessarily
representative of their respective languages as a whole. In many Iranian languages there
are at least two different patterns of agreement as the history of ergativity in Iranian is
also the history of various kinds of splits.
Page 12 of 26

Case and agreement patterns have historically been split in Iranian, such that one kind of
alignment is found when the verb is in its past form and another is found when the verb is
in its present form. Haig (2008: 9–10) takes care to point out that it is the form of the
main verb that matters, not past or present time reference. This is evident from
languages in which there can be past-time reference with constructions formed on the
present stem, such as the Imperfect in the Awroman dialect of Gorani, a west Iranian
language. In this case, despite the past-time reference, the alignment is the one
associated with present stem forms, i.e. accusative.
Given the existence of split systems, it is possible for a single language to have
constructions similar to both the one exemplified in (9) and the one in (10). Dabir-
Moghaddam (2012) in his survey of alignment systems in several different Iranian
languages, as manifested through clitics and agreement, presents data from Talyshi, a
North-Western Iranian language, showing precisely this. In (11) below we see that the
verbal agreement on a present stem verb is with the single argument of an intransitive
verb and the agentive argument of a transitive verb:
Talyshi
(11)
When the verb appears in its past stem form, however, there is no agreement on the verb
and the agentive argument of a transitive is cross-referenced with a clitic whose host can
be the direct object or the verb:
(12)
Page 13 of 26

We can note that the examples in (11) resemble Modern Persian, while the examples in
(12a) and (12c) bear a closer resemblance to Suleimani in that the A-marking clitic
appears on the (p. 215) preverbal constituent. However, the possibility of the A-marking
clitic appearing on the verb as in (12b), along with the fact that it can co-occur with an
overt pronoun, brings the past stem constructions back in line with constructions like
(13) from Modern Persian:
(13)
The facts in (11)–(13) suggest that rather than looking for language-specific patterns, it
may be preferable to look at construction-specific patterns. This point can be reinforced
by considering a non-canonical construction in Modern Persian that is not based on the
tense form of the verb but on the predicate type. The construction in question goes by an
unusually large number of different names: the Experiencer Subject Construction, the
Impersonal Construction (Ghomeshi 1996), the Psychological Predicate Construction
(Sedighi 2005), the Inalienable possessor construction (Karimi 2005), Impersonal
Complex Predicates (Karimi 2013), and Pronominal Complex Predicates (Kazaminejad
2014). In this construction, the ‘subject’ is expressed as a pronominal enclitic on the
constituent preceding the main verb, and the verb itself takes third-person agreement
(see also Chapters 3, 7, and 15):
(14)
The superficial resemblance between this construction and the one in (12c) revolves
around the lack of agreement on the verb and the fact that the pronominal enclitic must
appear on the preverbal constituent. However, the Experiencer Subject Construction is
not limited to the past stem (as (14b) shows) but rather to predicates of psychological or
physical states (being warm or cold, liking something, etc.). And the clitics in these
constructions mark ‘non-canonical subjects’ (in Haig’s terms) in that they index
experiencers rather than agents.
Page 14 of 26

The type of analysis this construction has received within the formal/Minimalist literature
is found in Sedighi (2011), to name but one example. Sedighi accounts for the properties
of this construction by proposing that the experiencer is merged via an Applicative
Phrase (ApplP), i.e. a projection that is used for merging applied (rather than core)
arguments. While such an analysis links the Experiencer Subject Construction in Modern
Persian to similar phenomena in other languages (e.g. oblique marked subjects that are
non-agentive, see Cuervo 2003; Rivero 2004), it has less to say about the resemblance
between this construction and the patterns (or remnants) of ergativity found in related
languages.
Notable exceptions to the charge that formal work looks neither far enough back in time,
nor broadly enough across language varieties, are emerging however. Karimi’s (2013)
analysis of the experiencer argument as undergoing Possessor Raising, for example, links
the clitic-licensing of the experiencer to the licensing of the external argument (i.e. the
subject) (p. 216) in past transitive clauses in Kurdish (see Karimi 2013: 121, fn. 6 and
references therein; see also Kazaminejad 2014, who adopts a blend of formal, functional,
and corpus approaches to propose that the Experiencer Subject Construction is an
instance of Haig’s (2008) External Possessor Construction). Thus the true issue may be
that many of the relevant insights from diachronic and typological syntax are simply too
recent to have been incorporated into formal work yet.
So far we have seen that Construction Grammar has been used to explain the types of
changes that have led to the variation in case marking and indexing across Iranian
languages. Haig (2008) argues that the separation of case, agreement and cliticization
into distinct components of the grammar such that each can change independently of the
other4 leads to far greater variation across languages than if a single parameter is held
responsible for a change from ergative to accusative alignment, say. An additional point
being made here is that this framework is also a useful way to compare and contrast
types of constructions in the languages as they are spoken today. Mapping the network of
possible patterns enriches the information associated with each particular one.
We now turn to another phenomenon that has been fruitfully discussed within a
Construction Grammar approach: Persian complex predicates. One of the earliest works
in this area is by Goldberg (1996, 2003), who argues that complex predicates can be
lexically listed but syntactically non-atomic, hence their status as a ‘construction’. Her
work in Construction Grammar in general and on Persian complex predicates in
particular has resulted in a number of other publications within Head Driven Phrase
Structure Grammar (HPSG) and Construction Grammar (see, for example, Müller 2010).
Complex predicates are further discussed in Chapters 2, 3, 7, 9, 10, 15, 17, and 19. The
focus of the next section, however, is on the intersection of Construction Grammar with
Cognitive and Functional Linguistics.
Page 15 of 26

8.5 Cognitive and Functionalist approaches to

syntax
It should be noted at the outset that even though I have presented Construction Grammar
in a separate section of this chapter from Cognitive and Functional approaches, these
frameworks are not incompatible with each other and linguists often see themselves
working within all three (see Tomasello 2014: vii–xiii, on this point).
Cognitive Linguistics is situated firmly at the explanatory rather that descriptive end of
the spectrum set out at the start of this chapter. It differs, however, from other
explanatory theories in that it doesn’t view language as a separate or autonomous
cognitive faculty (see, for example, Croft and Cruse 2004; Langacker 2008). The tools
used to explain linguistic phenomena within Cognitive Grammar include the concepts of
metaphor, metonymy, and polysemy, which hold not only of words but of morphemes,
grammatical constructions, and of cognition in general. Key also to Cognitive Grammar is
the idea of linguistic categorization that draws on the notion of prototypes and fuzzy
boundaries for category membership (Taylor 2003). (p. 217)
Let us return to complex predicates, or light verb constructions, which have gradually
been replacing simple verbs since the beginning of Modern Persian (Natel-Khanlari 1986,
as cited in Family 2014: 15). Under a Minimalist approach these constructions raise
questions regarding whether they are formed ‘in the lexicon’ or ‘in the syntax’, and
among their perplexing properties is the fact that they exhibit a degree of syntactic
transparency, albeit limited, regardless of whether they are semantically transparent or
opaque. As noted at the beginning of section 8.4, Construction Grammar does not draw a
sharp distinction between lexicon and grammar. Rather than attributing the building of
words and phrases to different modules (e.g. lexicon vs. syntax) both are seen as
generated by rules, but those rules can range from very general to completely
idiosyncratic (Family 2014: 21). This cline is based in large part on semantic
transparency. Thus, drawing on work in Cognitive Linguistics, the types of questions that
might be asked about non-compositional light verb constructions under this approach
revolve around productivity and predictability: How is it that some light verbs appear to
be more productive than others? How do Persian speakers know which light verb to use
when producing novel verbal notions? More generally, a cognitive approach also concerns
itself with how the semantic space occupied by light verb constructions is organized.
These and other issues are addressed clearly in Family (2014, see also Family 2011,
2006). She takes what she calls a bottom-up approach to studying light verb
constructions in that she considers the hundreds of collocations possible with each light
verb. Her lists are extensive and in themselves provide a valuable resource for those
interested in the topic. In considering the range of meanings that can be associated with
constructions based on a single light verb, Family shows that the issue of compositionality
is not black and white. Constructions can be more or less compositional with the meaning
Page 16 of 26

coming as much from the construction itself as from its individual components. Consider
the following data in which zadan ‘to hit’ in (a) and keshidan ‘to pull’ in (b) are used to
create a variety of meanings:
(15)
The data in (15) show that there is considerable variation in the way in which a light verb
construction can be non-compositional. The meaning of the verb zadan ‘to hit’ is more or
less evident in the various combinations given in (15a) and the resulting verbal notions
can be transitive, as in rang zadan ‘to paint’, or intransitive as in dād zadan ‘to yell’.
Moreover, (p. 218) Family notes that these constructions cluster into groups based on not
only on the properties of their constituent parts but also on real-world knowledge. Thus
zadan ‘to hit’ can combine with any substance that can be transferred into an object via a
nozzle (see 16a), but when it combines with roghan ‘oil’, the construction has a different
meaning, as oil is not added to a car through a nozzle:
(16)
Family (2014) refers to groups of collocations that express very similar meanings as
clusters of productivity (p. 22, see also pp. 46–55). Clusters of productivity are groups
that are based on a single light verb, the selectional restrictions on the preverbal element
it occurs with, and the resulting constructional meaning. Gradience, a key notion of
Cognitive Linguistics that can be applied to the semantic compositionality of clusters, is
Page 17 of 26

also evident when assessing their productivity. Consider the following two sets of
clusters, again based on zadan ‘to hit’:
(17)
In terms of token frequency, i.e. the number of times each of these constructions might be
used in a corpus, harf zadan ‘speak’ is probably at the top of the list. However, the cluster
it is a part of, one that encodes emission of speech (the set in (17a)), is of low productivity
since new ways of referencing speech acts are not that common (Family 2011: 24). In
contrast, the cluster represented in (17b) is of high productivity as any new noun, X,
referring to a musical instrument can combine with zadan, to form the corresponding
verb ‘play X’ (see also Family 2014: 214).
As an additional point regarding the way principles of Cognitive Linguistics can shed light
on light verb constructions, let us consider the meaning of zadan itself in (17a) vs. (17b).
We could propose that the verb means ‘play’ in (17b) and could arrive at some
comparable but different meaning of zadan for the examples in (17a) along with the
examples in (15a). This leads to a list of lexical entries bearing a somewhat random and
arbitrary connection to one another. Cognitive Linguistics starts with the notion that
polysemy is the norm rather than (p. 219) the exception and focuses on the way in which
the meanings of lexical items can be extended in a motivated way to participate in novel
patterns and structures. Family (2014: 71) gives a map of the semantic space of zadan
such that its ‘emitting’ sense can be subdivided into the visual (barq zadan ‘shine’, lit.
shine HIT) and the aural (jez zadan ‘sizzle’, lit. sizzle HIT) and its ‘piercing and
transferring’ sense can be subdivided into a fuelling sense (benzin zadan ‘fill with gas’, lit.
gasoline/petrol HIT) and an injecting sense (āmpul zadan ‘get a shot’, lit. shot HIT). Each
of these divisions can be motivated on semantic grounds. The principle of polysemy
alongside constructional meaning permits us to treat zadan as the same linguistic unit in
all cases—one that participates in a variety of constructions, or clusters, contributing a
more or less semantically articulated meaning.
Page 18 of 26

We have seen how Cognitive Linguistics can handle gradient phenomena, but another
area that presents a challenge to formal syntactic approaches is optionality, that is, where
there are two seemingly equivalent grammatical expressions available to express a single
meaning. Here, approaches that consider utterances in terms of their communicative
function and their connection with other cognitive faculties have the potential to shed
light on why speakers make the choices they do in particular contexts.
Consider the following examples taken from Sharifian and Lotfi (2003) and (2007),
respectively:
(18)
(19)
In (18) we see that the mass noun shekar ‘sugar’ can sometimes appear with plural
marking but without the kind of coerced plural reading that sugars in English would get
(e.g. types of sugar, or defined quantities of sugar). In (19a) we see that the plural
inanimate subject ketābā ‘books’ may or may not trigger plural agreement on the verb,
while in (19b) we see that singular agreement is not possible if the subject is animate
(human) and plural. Sharifian and Lotfi (2003, 2007) do not consider these data to
exemplify true ‘optionality,’ but rather how speakers conceptualize events. Drawing on
Langacker’s (1990) discussion of ‘schematicity’, or degrees of resolution, Sharifian and
Lotfi argue that speakers may choose (18a) if conceptualizing sugar at a high level of
resolution, i.e. in terms of its individual granules, perhaps because the situation involves
a scattering of sugar. Similarly, a plural subject like ketaba (p. 220) ‘books’ may be
conceptualized at a lower degree of construal resolution, i.e. as a whole rather than as
individual parts, perhaps because books are typically sent in a parcel or a box.
Page 19 of 26

This type of ‘conceptual-functional’ approach (as Sharifian and Lotfi call it) can be
contrasted by formal approaches to the same phenomena. For example, Ghaniabadi
(2012) considers plural marking and definiteness to be features that bundle together in
Persian such that the appearance of plural on mass nouns may be marking definite
quantities. Sedighi (2005), also taking a featural approach, proposes that a number
feature may fail to be spelled out in the context of the feature marking inanimates. In
both of these cases, the research looks at the way in which features interact in Persian
using Minimalism and Distributed Morphology to formalize the principles at play.
However, the use of features themselves is not ultimately incompatible with a conceptual-
functional approach in that features may represent grammaticalized elements that in turn
represent conceptual notions.
Returning to Sharifian and Lotfi (2003, 2007), they support their claim that the choice of
plural marking on mass nouns and singular agreement with plural inanimate subjects is
driven by the communicative intention and conceptualization of Persian speakers, by
setting up a number of ‘tasks’ for groups of native Persian speakers to complete. Such
tasks involve describing a picture or being presented with a scenario and being required
to complete it with one final sentence. Their results suggest that for each construction
there are contexts in which it is preferred and contexts in which it is dispreferred. In
other words, the constructions are not in free variation with each other—the expected
result if we are dealing with true optionality.
In a broader sense, the type of research Sharifian and Lotfi are pursuing opens up the
door for considering the other types of communicative functions (e.g. formality, respect)
that might be at play when speakers opt for one formulation over another—something
that is best determined by looking at linguistic behaviour over groups of speakers, as they
do.
One more example of optionality in Persian involves the pronominal enclitics in their
function of indexing the direct object when they appear on the main verb. Typically they
appear when there is no overt direct object, however, they can co-occur with a direct
object as well and in this case their appearance seems to be related to information
structure. Bahrami and Rezai (2014) explore the factors affecting the indexing of overt
direct objects using Role and Reference Grammar, primarily because it is a theory that
incorporates the way in which a speaker takes into account the addressee’s knowledge at
the time of utterance. This in turn affects the way in which a noun phrase is coded in an
utterance. Bahrami and Rezai use data gathered from a corpus of standard spoken
Persian to look at the topicality of indexed direct objects and their order with respect to
other constituents in the clause. They find that indexed direct objects are both highly
topical and quite mobile within the clause. In the next section we will look at other work
that similarly uses data drawn from corpora to look at properties of direct objects in
Persian.
Page 20 of 26

8.6 Corpus approaches

Corpus Linguistics is not so much an explanatory theory in Dryer’s sense (see
introduction to this chapter) as it is a methodology and a commitment to working with
data beyond what (p. 221) one native speaker-linguist alone can generate. Corpus
linguistics involves using databases and corpora of all kinds along with questionnaires
and experimental results. In some cases, a corpus of spoken language can be a source for
documenting construction types. In an early work of this type, before corpus linguistics
was so named, was carried out by Frommer (1981) in which spoken data was analysed in
order to determine how robustly Persian conforms to a verb-final pattern. The answer,
based on statistical analysis, was that it is far ‘less verb-final’ than many grammars and
textbooks would have us believe. More recently, Stilo’s (2010) article on ditransitive
constructions in Vafsi draws on a corpus of spoken Vafsi generated by his own linguistic
fieldwork in Iran. The corpus provides a resource for identifying the three ways in which
ditransitives can be coded in Vafsi: via a double object construction, an indirect object
construction, and via the indexing of the recipient with oblique person-agreement
marking on the verb. Stilo discusses each construction in detail using only naturally
occurring utterances.
Working with corpora need not only be for documenting or describing constructions but
for testing hypotheses and predictions. In recent work, Faghiri and Samvelian (2014; F&S
hereafter) consider the relative order of direct and indirect objects in Persian using the
Bijankhan corpus, a corpus collected from daily news and common texts (F&S 2014: 225
and references therein). Their findings provide a richer picture of the relationship
between the verb and its objects (direct and indirect) than has been previously
understood. For instance, it has been noted that Differential Object Marking (DOM) in
Persian correlates with word order such that whether or not a direct object is marked
with -rā affects its order with respect to an indirect object (see Chapter 9 for more
discussion on DOM). Specifically, it has been claimed that -rā-marked objects precede the
indirect object and those not so marked follow it (see Faghiri and Samvelian 2014: 224
citing Ghomeshi 1997; Karimi 2003; Ganjavi 2007 in this regard). F&S show that the facts
are more nuanced in that it is not only marking by -rā, but the degree of ‘determination’
of the direct object which affects its order with respect to the indirect object. Thus
indefinite objects, which in Persian may involve numerals or the indefinite suffix -i for
example, are more likely to pattern with -rā-marked objects than with bare nominals. This
scalar or continuum-like result is not easily accommodated within theories that
categorically associate one type of nominal with one position (see also Ganjavi 2011, who
shows that there is at least a three-way split in the syntactic patterning of definite,
indefinite, and bare objects).
F&S (2014) also use their corpus research to show that there are length effects, primarily
with indefinite objects. Contra the view, from sentence processing and production, that
shorter constituents will precede longer ones, they find a tendency for ‘long before short’
Page 21 of 26

order, that has similarly been posited for other head-final languages (see Faghiri,
Samvelian, and Hemforth 2014: 208 and references therein). F&S argue that this
preference shows the significance of conceptual factors in determining word order rather
than categorial ones (i.e. factors related to the form of the object). In subsequent work
they support their claims with an experimental study (see Faghiri, Samvelian, and
Hemforth 2014).
Corpus approaches have become increasingly viable with advances in technology and
easy access to data via the internet. They in turn have revealed far more variation in word
order and sentence structure than had been previously assumed and/or predicted within
purely theoretical models. This trend is bound to continue for the foreseeable future.
(p. 222) 8.7 Formal approaches

The formal approaches not yet covered in this chapter are of two types: theories that do
not have many adherents who work on Persian; and Minimalism, which has comparatively
many. The description of Persian syntax given in Chapter 7 provides ample reference to
the Minimalist literature, so I will conclude this chapter with a brief survey of non-
Minimalist approaches. Before doing so, however, it should be noted that Minimalism
itself dates only from the early 1990s with the publication of Chomsky’s (1993) essay on
the Minimalist Program for linguistic theory. The Minimalist Program evolved from
Government and Binding (GB) Theory, in the sense that key concepts and ideas were
reformulated. GB in turn replaced Transformational Grammar (TG), the theory originated
by Chomsky (see Chomsky 1957, 1965 for the beginnings of Transformational Grammar;
Chomsky 1981a&b for a key presentation of Government and Binding Theory).5 The one
common denominator that has distinguished each iteration of what is now called the
Minimalist Program from almost all other formal theories is that it relies on derivation,
i.e. on the notion that structures can be transformed or elements can be moved in order
to obtain a grammatical construction.
During the TG and GB periods, there were relatively few journal articles published on
Persian, even though scholarly articles are perhaps the dominant way to disseminate
research results in linguistics. This fact reflects the fact that the number of scholars who
had been trained within the TG and, later, GB paradigms and who were working on
Persian, was small. Nevertheless in the handful of articles that appeared in the 1970s
themes of future research were identified. Browne (1970) argued for -rā as a marker of
specificity, rather than definiteness, an issue that became a focus of debate and enquiry
for several decades. Moyne (1974a) argued that there is no passive in Persian and that
constructions formed with shodan ‘to become’ are instead inchoative, and Moyne and
Carden (1974) presented a transformational account of the doubling of subjects with an
emphatic reflexive element—a phenomenon that has yet to receive its corresponding
Minimalist account.
Page 22 of 26

The history of generative formal work on Persian within TG/GB is to be found not in the
leading journals of the time, but instead in the doctoral dissertations published in the
1970s and 1980s.6 Moyne (1970) wrote his dissertation on verbal constructions in Persian
at Harvard University while the University of Illinois hosted a number of scholars who
completed PhDs in Persian syntax. Soheili-Isfahani (1976) wrote a dissertation on noun
phrase complementation, Hajati (1977) on ke-constructions, and Dabir-Moghaddam
(1982) wrote his dissertation on causative constructions, in which he argued, among
other things, that Persian does have a passive construction, contra Moyne (1974).
One of the key developments between TG and GB theory was the introduction of X-bar
theory (see Chomsky 1970; Jackendoff 1977). Samiian’s (1983) dissertation was one of the
first (p. 223) to implement a strict X-bar theoretic approach for Persian and to explore the
consequences, particularly for the Ezafe construction. Meanwhile the government and
binding principles that gave GB theory its name were put to good effect by Karimi (1989)
and Hashemipour (1989) respectively. Karimi (1989) explored, among other things, the
limits that the principles of government imposed on Case and movement operations while
Hashemipour (1989) considered the licensing of empty pronominal categories.
Dissertations that were underway at the time early versions of the Minimalist Program
were in circulation in the early 1990s continued to use a GB approach. Thus Darzi (1996)
on raising and control constructions, Ghomeshi (1996) on Case, agreement, and the Ezafe
construction, and both Vahedi-Langrudi (1996) and Karimi-Doostan (1997) on complex
predicate constructions drew more on principles of GB than of Minimalism in the
analyses they undertook.
We turn now to theoretical approaches that have developed and co-existed alongside
Minimalism and its precursors. Role and Reference Grammar (RRG; see van Valin 1993,
2005), is a theory that incorporates information structure (Lambrecht 1994) and related
pragmatic notions into the core of the grammar. This approach is particularly useful for
research that seeks to go beyond sentences in isolation in order to consider the
properties of connected discourse. In one such work, Roberts, Barjasteh, and Jahani
(2009) analyse Persian narrative text looking at, among other things, the activation of
referents in a discourse, the coding of participant reference, and the discourse-pragmatic
structuring of sentences. Their analysis reveals the way in which syntactic structure is
informed by discourse structure. In a similar vein, the paper by Bahrami and Rezai
(2014), which was briefly discussed in section 8.5, shows that factors such as
identifiability of the referent and topicality are at play when an overt object is ‘doubled’
or indexed by a pronominal clitic on the verb in Persian. (For other representative RRG
work, see Rezai 2003)
Head Driven Phrase Structure Grammar (HPSG; see Pollard and Sag 1994) is a formal
theory in which lexical entries are formalized as highly articulated feature matrices.
These features, along with a system of constraints on them, are employed to explain
syntactic phenomena. Taghvaipour (2004, 2005a&b) uses HPSG formalism to account for
gaps vs. resumptive pronouns in restricted and free relative clauses, respectively.
Because of its focus on the features that comprise inflectional morphology, HPSG is well
Page 23 of 26

suited to handle questions regarding the status of Persian morphemes. This is reflected in
a number of published works. For instance, Samvelian (2007) argues that the Ezafe vowel
is best analysed as a phrasal affix and Samvelian and Tseng (2010) explore the question of
whether object clitics in Persian are truly clitics or inflectional suffixes both with the
HPSG framework. Bonami and Samvelian (2015) use both HPSG and Paradigm
Inflectional Morphology (Stump 2001) to provide a lexicalist account of periphrastic
verbal constructions in Persian. These works are highly formal in that they employ
detailed formalism to model pieces of inflection and use the resulting analyses to answer
theory-internal questions regarding the division of labour between syntax and
morphology, periphrasis vs. valence-reducing constructions and clitics vs. affixes.
Lexical Functional Grammar (LFG; see Bresnan 1982, 2001; Dalrymple et al. 1995) is
similar to HPSG in that it is a monostratal (i.e. non-transformational) generative theory
with a richly structured lexicon. It observes the strong Lexicalist Hypothesis whereby
there is a one-to-one mapping between words and nodes in a syntactic representation.
That is, there cannot be two words under one node nor can there be empty nodes in the
syntax. Within this framework, the research originating with Butt (1995) on Urdu complex
predicates (p. 224) has inspired some similar investigation on Persian complex predicates
(see, for example, Nemati 2010).
Optimality Theory (OT; see Prince and Smolensky 1993) began as a constraint-based
theory of competition in which there was one winner, but has been adapted to permit the
modelling of gradience in grammar. Adli (2010) uses gradient grammaticality judgements
obtained via an experimental approach and statistical methods to formulate a set of
preference constraints on Persian wh-questions—constructions that permit a wide range
of possible word orders. OT is also insightfully used in Aissen (2003), a work that is not on
Persian specifically but on Differential Object Marking (DOM) and which is therefore of
great relevance to those interested in -ra (see the Chapter 9). Aissen is able to capture the
variation in DOM across languages by using ranked constraints that express two
opposing principles: one of iconicity and the other of economy.
The Minimalist literature on Persian syntax is well-surveyed in Chapter 7 of this volume.

Those who characterize the Minimalist approach as formal-mathematical (e.g. Tomasello
2014: xx) see, perhaps, a daunting formalism that is impenetrable to those who are not
practitioners of the theory. Those within the theory working on clausal architecture,
discontinuous dependencies, and/or features and their geometries, are keenly interested
in questions of design and how best to formulate cross-linguistic principles that apply to
all, not just a few languages. In some sense, then, Minimalist work isn’t about Persian but
about a theory of language to which the work on Persian contributes. Syntactically
speaking, there is enough about Persian to make a significant and exciting contribution.
8.8 Conclusion
Page 24 of 26

In this chapter, I have surveyed theoretical approaches to Persian syntax while noting
that the division between descriptive and theoretical work on a given language is not all
that clear-cut. The survey is intended to show the kind of work on Persian syntax that has
been undertaken within a number of different approaches. However, there are two other
observations that emerge from looking at scholarly work across time and from outside the
borders of any particular theory. First, this sort of wide view provides us the opportunity
to understand the rise and fall of certain theories within a wider context. For example, we
can see that the technological advancements of the last two decades have stimulated an
increased interest on the part of researchers in corpus linguistics and other approaches
that depend on quantitative analysis. On the other hand, the rise of Minimalism had led to
a sharp decrease in interest in working within Government and Binding Theory. The
second observation we can make is that what is interesting about a language is in large
part theory-dependent. A cognitive linguist is more likely to be interested in the
semantics of complex predicates and less interested in their scrambling possibilities than
a formalist might be. A corpus linguist is more likely to investigate the statistical
probability that one word order will occur over another than a typologist, who in turn
might be more interested in how a given word order correlates with other properties of
the language. What this reveals is that a healthy diversity in our theoretical approaches
to a language is bound to yield a richer picture of the language itself.
(p. 225) Acknowledgements

I would like to thank Neiloufar Family and Geoffrey Haig for agreeing to read an earlier
draft of this chapter and for their very helpful comments. I would also like to thank the
editors of the volume and an anonymous reviewer for their feedback and direction. All
errors and omissions are my own.
Notes:
(1) He does not specify the dialect or variety of Kurdish that he considers.
(2) Stilo does not explain why he chooses to give the citation form of this enclitic with
secondary stress on the vowel.
(3) The term ‘bisyndetic’ means there are two coordinators, while ‘monosyndetic’ means
there is only one.
(4) The order of change is not free. Haig (2008: 303.347) suggests that case changes
before agreement. This means that in a shift from ergative to accusative alignment, it is
not possible for a language to retain ergative alignment in its agreement marking while
having accusative alignment in its case marking.
Page 25 of 26

(5) While nomenclature is not the main point here, the discussion would not be complete
without noting that Government and Binding Theory is also known as the Principles and
Parameters approach and that Transformational Grammar comprises three distinct
periods: standard theory; extended standard theory; and revised extended standard
theory.
(6) In this and the paragraph that follows, I do not mean the list of dissertations that I
provide to be exhaustive. There are many excellent PhD dissertations written between
1970 and the mid-1990s that are not mentioned here for reasons of space but that are
easily found by searching a good university library.
Jila Ghomeshi
Jila Ghomeshi is Professor of Linguistics at the University of Manitoba. She has

carried out research and published articles on many aspects of Persian syntax and
morphology. In addition to her scholarly research, she has sought to bring linguistics
to a more general audience with short radio columns and a book on prescriptivism
entitled Grammar Matters. These efforts earned her a National Achievement Award
from the Canadian Linguistic Association in 2014.
Page 26 of 26

Specific Features of Persian Syntax: The Ezafe construction, differential
object marking, and complex predicates

Specific Features of Persian Syntax: The Ezafe
construction, differential object marking, and complex
predicates
Pollet Samvelian

This chapter is devoted to three specific features of Persian syntax, namely, the Ezafe
construction, differential object marking with the enclitic rā, and complex predicates,
which have received a great deal of attention for more than thirty years. Each of these
phenomena involves language-specific challenging facts which need to be accurately
described and accounted for. At the same time, each constitutes a topic of cross-linguistic
investigation for which the Persian data can be of crucial interest. The chapter is divided
into three sections. Each section provides an overview of empirical facts and the way
various theoretical studies have tried to account for them. While it was impossible to do
justice to all influential studies because of the impressive amount of work on each topic,
the article is nevertheless intended to be as exhaustive as possible and to maintain the
balance between different theoretical approaches.
Keywords: Ezafe, differential object marking, complex predicates, head-marking, case-marking, noun phrase,
topic, high transitivity, light verb, idiom
9.1 Introduction
THREE main aspects of Persian syntax have received a great deal of attention for more
than thirty years: the Ezafe construction, differential object marking with the enclitic =rā,
and complex predicates. Why such enduring interest? Each of these phenomena involves
language-specific challenging facts which need to be accurately described and accounted
Page 1 of 56

for. At the same time, each constitutes a topic of cross-linguistic investigation for which
the Persian data can be of crucial interest.
The Ezafe construction, a specific feature of the noun phrase in many Western Iranian
languages, sheds a new light on the way dependency relationships, that is,
complementation vs. modification, are realized within the noun phrase and the
morphological correlates of these relationships with respect to head vs. dependent
marking patterns. It also contributes to the debate on the nature of linkers in a variety of
languages.
Differential object marking (DOM) with the enclitic =rā displays a complex interaction
between various semantic and discourse parameters such as referentiality, topicality, and
high transitivity. Modelling the interaction between these parameters in order to account
for the occurrence of =rā has been, and still is, an interesting challenge for formal and
theoretical studies on Persian. Cross-linguistically, rā-marking is of great interest for
typological studies on DOM in the languages of the world, because of the way =rā has
been grammaticalized to realize not only DOM but also topicalization, the range of
grammatical functions that can be rā-marked, and the role of discourse parameters.
Finally, complex predicate formation, which is the main device for enriching the verbal
lexicon in Persian, provides another theoretical and typological domain of investigation in
order to highlight differences and resemblances between syntactic and morphological
processes of lexeme formation and the way different syntactic components contribute to
the (p. 227) makeup of a complex lexical unit. Persian complex predicates constitute an
interesting case study for theories of predicate decomposition which postulate the same
underlying structure for simplex and complex predicates.
This article is devoted to these three phenomena and is divided into three sections. Each
section provides an overview of empirical facts and the way various studies have tried to
account for them. While it was impossible to do justice to all influential studies because of
the impressive amount of work on each topic, the article is nevertheless intended to be as
exhaustive as possible and to maintain the balance between different theoretical
approaches.
9.2 The Ezafe construction
9.2.1 Overview and historical facts
Ezafe, from Arabic i āfa ‘addition, adjunction’, designates an enclitic realized as =(y)e,
which occurs within the noun phrase and links the head noun to its modifiers and to the
possessor NP. The surface word order pattern is strongly head-initial within the Persian
NP, as illustrated in (1) and exemplified in (2). A restricted class of determiners,
Page 2 of 56

quantifiers, classifiers, and adjectives precede the head noun, while all modifiers and
arguments follow it. The possessor NP comes last after attributive nouns, and adjectival
and prepositional modifiers. All elements occurring between the head noun and the
possessor NP are linked to the head noun and to one another by the Ezafe. The relative
clause, on the other hand, is not introduced by the Ezafe and is placed after other
modifiers and the possessor NP, outside the Ezafe domain. Argument prepositional
phrases (PPs) are merely juxtaposed to the head noun and also occur outside the Ezafe
domain if the head noun is followed by either modifiers or the possessor NP, as in (3). As
shown by (4), multiple modifiers may occur within the Ezafe domain. In this case, the
Ezafe enclitic is reiterated on each modifier except the last. The possessor NP, on the
other hand, is unique. In other words, if a noun has two arguments—which may be the
case with eventive nouns—only one of them, generally the second one or the Patient, can
be introduced by the Ezafe, as in (5). The first argument (Agent) either is not realized or
has a prepositional realization, (5c).1
(1)
(2)
(3) (p. 228)
(4)
Page 3 of 56

(5)
The Ezafe is not restricted to the NP and may also occur within adjective phrases (APs)
and some PPs to link the head to its unique complement:
(6)
The Ezafe construction is not specific to Persian and is found in a significant number of
Western Iranian languages (Windfuhr 1989), like Kurdish dialects (MacKenzie 1961),
Hawrami (Mackenzie 1966), Zazaki (Paul 1998), and Kermanian dialects (Lecoq 2002).
Although neither the shape nor the properties of the Ezafe are identical from one
language to another, all these languages display head-initial word order in the NP.2 The
correlation between the head-initial word order pattern and the availability of the Ezafe
can be accounted for on historical grounds. The enclitic Ezafe has been generally
assumed to have its origins in a demonstrative morpheme in Old Iranian. In Modern
Persian, it can be traced back to the Old Persian relative haya, hayā, taya (Darmesteter
1883; Kent 1944, 1953; Meillet 1931).
Kent (1944) thoroughly argues in favour of a relative analysis of haya, hayā, taya, which
(i) in most cases introduces a subordinate clause headed by a finite verb, (7a); (ii) takes
its case from the relativized function in the subordinate clause rather than from its
antecedent, accusative in (7a), since the relativized function is the direct object in the
subordinate clause. Upon 388 available instances of haya, hayā, taya in OP, Kent (1944)
classifies 276 of them as relatives. In most of these occurrences, haya, hayā, taya indeed
introduces a subordinate containing a finite verb (219 instances). However, Kent (1944)
also groups with relatives instances such as the one in (7b), where haya, hayā, taya is
followed only by a predicative (p. 229) noun phrase and the copula is lacking. The reason
for this is the fact that haya is in the nominative case, as required by its function in the
Page 4 of 56

reduced relative clause (i.e. the subject of the copula), rather than in the accusative case
of its antecedent NP. These copula-less constructions where haya, hayā, taya introduces a
predicative noun phrase, as in (7b), (7c), (7d), or an adjective phrase, (7e), pave the way
to its uses in Middle Persian and the emergence of the Ezafe construction.3
(7)
Haya, hayā, taya becomes = ī in Middle Persian (haya > hyǝ > yǝ > = ī) and progressively
loses its demonstrative/relative value to end up as a simple linker (cf. Jügel 2015: 290ff.).
The possessor, as well as adjective modifiers, are introduced by the Ezafe particle ī in
Middle Persian:
(8)
As noted by Bubenik (2009), with the consolidation of the sequence Possessee-Possessor

and Modified-Modifier we reach the New Persian state of affairs. This explains why the
Ezafe construction is correlated with the head-initial order within the NP.
Page 5 of 56

These changes in the function of the relative/linker go hand in hand with a
(p. 230)
prosodic change. Like haya in OP, ī is an independent word in Middle Persian.4 However,
given its constraint position, ī is generally the only element that intervenes between the
head noun and the adjective or the genitive modifier (Estaji 2009). This regular adjacency
prepares the ground for the change of ī to the enclitic =e in New Persian.
The Ezafe construction raises several issues in syntax and morphology and has thus been
a particular focus of interest in numerous studies on Iranian languages (Ghomeshi 1997a;
Haider and Zwanziger 1984; Haig 2011; Hincha 1961; Holmberg and Odden 2008;
Kahnemuyipour 2014; Karimi and Brame 2012; Karimi 2007; Larson and Yamakido 2008;
Palmer 1971; Samiian 1983, 1994; Samvelian 2006b, 2007, 2008; Schroeder 1999; among
others). The most debated issue is the status of the enclitic Ezafe itself and its functions.
As noted by Haig (2011), the Ezafe is not straightforwardly accountable in terms of
available functional categories and conventional X-bar phrase structure. Narrowly related
to this first issue is the internal structure of the Persian NP and the nature of dependency
relationships within this syntactic domain.
The Ezafe enclitic has received various and sometimes diametrically opposed analyses. It
has been considered as a:
• case marker (Hashemipour 1989; Karimi and Brame 2012; Larson and Yamakido
2008; Samiian 1994);
• phonological linker, that is, an element inserted in Phonological Form, with no proper
value or meaning (Ghaniabadi 2010; Ghomeshi 1997a; Samiian 1983);
• marker associated with the syntactic movement of the noun and realizing a strong
feature (Kahnemuyipour 2014);
• linker indicating subject-predicate inversion (Den Dikken 2006);
• head-marking affix adjoined to the head noun and its intermediate projections and
marking them as waiting for a dependent (Samvelian 2007, 2008).
Page 6 of 56

9.2.2 Ezafe as a case marker
The fact that the Ezafe occurs as many times as there are dependents within the NP has
favoured its analysis as a case marker. Several studies within the generative framework
have developed different variants of this analysis. Hashemipour (1989) considers the
Ezafe as a structural case marker on nouns, adjectives, and some PPs. For Karimi and
Brame (2012), the Ezafe structurally relates a head to phrases governed by the latter, by
transferring the case of the head noun to its complements.
Samiian (1994) provides one of the most detailed analyses of the Ezafe as a case marker.
She considers the Ezafe as a dummy case assigner, comparable to of in English, which
occurs within phrases with non-case-assigning heads, that is, NPs, APs, and some PPs,
and thus enables the head to case-mark its complements. Note that in this view, the Ezafe
structurally belongs to the modifier it precedes, while it is prosodically attached to the
item it follows. The fact that the Ezafe occurs in both NPs and APs is expected, since
nouns and adjectives (p. 231) share the feature [+N], and thus do not assign case.5 Its
presence in PPs, on the other hand, is not expected, since prepositions are assumed to be
case assigning. In order to overcome this problem, Samiian (1994) considers that those
prepositions that occur with the Ezafe, P2s in her classification, constitute ‘a kind of in-
between category, sharing some properties with ‘true’ prepositions (P1s) and some with
nouns’. This assumption is supported by a set of empirical facts, namely, the semantic
content of P2s, their subcategorization frame, the internal structure of the PP they
project—specifically their ability to allow for a specifier—and their distribution. Adopting
the Neutralization Hypothesis (Riemsdijk and Williams 1981), Samiian (1994) assumes
that P2s are neutralized in their [-N] feature, which leaves them with the only feature
specification [-V]. Therefore, P2s cannot assign case, since only [-N] categories directly
assign structural case. Sharing their only feature with the category N, they behave like
the latter with respect to their case-assigning properties and need the Ezafe to assign
case.
While nouns, adjectives, and P2s cannot assign case, the Case Theory (Chomsky
1981a&b) requires all NPs to receive case. The Ezafe thus endorses the role of a dummy
case-assigner, like of in English, in order to compensate for the inability of these
categories to assign case. In order to account for the occurrence of the Ezafe before APs,
Samiian (1994) extends the case-receiving categories to include the AP. To support this
idea, she refers to the case-assigning of attributive adjectives in Latin and Sanskrit.
Larson and Yamakido (2008) agree with Samiian’s (1994) analysis of the Ezafe as a case
marker, but are not satisfied with the way it deals with modifiers. Case marking (as
opposed to agreement) is typically associated with argument status; however, at least
some of the Ezafe-marked constituents are modifiers. So the question arises of why
modifiers should need case and what their case-assigner is. To answer these questions,
Larson and Yamakido (2008) extend the shell theory of the VP (Larson 1988) to the DP.
The DP is projected from the thematic structure of determiners, which assign scope and
Page 7 of 56

restriction as thematic roles to their arguments. For instance, the NP that combines with
D saturates the quantifier restriction of the D. Under this account, (most) nominal
modifiers originate as arguments of D. Therefore, they are not modifiers or adjuncts, but
‘oblique complements’ which combine with the head prior to other arguments. All
modifiers are base-generated in a post-head position. Those that bear case features, APs
for instance, are required to move to a site where they can check case, that is, the
pronominal position. PPs and relative clauses, on the other hand, remain in situ, since
they do not bear case features. The fact that in Persian, and some other Iranian
languages, APs also remain in situ is explained by the availability of the Ezafe, which acts
as a ‘generalized genitive preposition’, inserted to check Case on [+N] complements of D
inside the DP. Following this account, the Ezafe heads its own X-bar phrase, with the
modifier as complement. However, for apparently purely prosodic reasons, phonologically
it attaches to the preceding item. The analysis for (9) is given in (10). The determiner in
‘this’ checks its one Case feature on its restriction. The Ezafe is inserted and licenses the
remaining modifiers in their base positions. (p. 232)
(9)
(10)
Several studies claim that the analysis of the Ezafe as a case marker faces serious
problems (Ghomeshi 1997a; Samvelian 2007; Haig 2011). In particular, PPs headed by
P1s as well as adverbial phrases do occur within the Ezafe domain. However, all the
studies mentioned in section 9.2.2 acknowledge that the latter do not need to be case-
marked. Moreover, the conceptual problem remains of why APs and PPs would need case.
Note that in many languages, adjectives follow the noun and are neither case-marked nor
introduced by any specific device.
9.2.3 Ezafe as a phonological linker
Building on Samiian’s (1983) data, Ghomeshi (1997a) argues against the view of the Ezafe
as a morpheme heading any sort of syntactic projection. Ezafe is semantically vacuous
and although it iterates throughout the NP, it is not the expression of a concord between
the head noun and its dependents. Ghomeshi (1997a) therefore suggests viewing the
Ezafe not as a morpheme at all, but rather as an element inserted in Phonological Form,
in a certain syntactic configuration. The need for the Ezafe results from the fact that,
nouns being non-projecting in Persian, a ‘phonological linker’ must be present in order to
indicate phrasing within the nominal constituent. This task is carried out by the Ezafe.
The hypothesis of Persian nouns being non-projecting, which is the keystone of

Ghomeshi’s analysis, is based on a set of restrictions on the Ezafe construction
Page 8 of 56

highlighted by Samiian (1983), see (1)–(3), to which Ghomeshi (1997a) adds the
restriction in (4):
(i) Attributive noun phrases surface only as bare nouns in the Ezafe domain (11).
(ii) Adjectival modifiers cannot take either nominal (12a), prepositional (12b), or
sentential (12c) complements when occurring within the Ezafe construction.
(iii) Prepositions may appear with a nominal complement within the Ezafe domain
(13), but sentential complements are excluded (13b).
(iv) NPs including a possessor are obligatorily construed as definite or presupposed,
and possessors are in complementary distribution with the indefinite enclitic
determiner =i (14).
(11)
(12)
(13)
(14)
Page 9 of 56

(p. 233)
In order to account for these facts, Ghomeshi (1997a) assumes that:
(i) Persian nouns are inherently non-projecting. They never appear with filled
specifier and complement positions and the NP node cannot dominate any phrasal
material.
(ii) In spite of the fact that they are non-projecting, Persian nouns may still appear as
NPs, provided they are selected by a projecting head, e.g. D°.
(iii) The Ezafe never attaches to a phrase, which implies that the Ezafe domain is the
domain of X°s or bare heads.
On the basis of these assumptions, Ghomeshi (1997a) suggests the following structure for
the Persian NP: (p. 234)
(15)
Since Persian nouns cannot dominate phrasal material, the possessor DP, which is fully
phrasal, is base-generated as sister to D′, in [Spec, DP] position. An empty D-head
bearing the feature [+ def] is stipulated, whose validity is further supported by the
constraint stated in (4).
Page 10 of 56

The Ezafe insertion rule, operating in PF, inserts the Ezafe vowel on a lexical X° head that
bears the feature [+N], when it is followed by phonetically realized, non-affixal material
within the same extended projection.
In the light of this analysis, the restrictions pointed out by Samiian (1983) are
straightforwardly accounted for. The only cases that would seem to resist Ghomeshi’s
analysis are PPs, which can occur within the Ezafe domain with a complement (13a).
Ghomeshi (1997a) claims, however, that (13a) is not a counterexample to her analysis
since the noun within the modifying PP is in fact a N° and not an NP (or DP). The
combination of the lexical head P° with N° provides another P° and not a P″. To support
this claim, Ghomeshi (1997a) contrasts (16a) with (16b), imputing the ungrammaticality
of the latter to the fact that the complement of the preposition zir ‘under’ is a DP
containing a possessor and not an N°.
(16)
(p. 235)
This uniform analysis of modifiers as X°s in the Ezafe domain has been challenged in
subsequent studies by Samvelian (2007, 2008) (cf. 2.4), Ghaniabadi (2010), and
Kahnemuyipour (2014) (cf. 2.5).
Ghaniabadi (2010), who adopts a similar analysis with respect to the nature of the Ezafe
and suggests that the Ezafe is inserted by a phonological rule in the Late-Linearization
stage at PF, assumes that the modifiers occurring within the Ezafe domain may either be
bare heads, A°s, or phrasal, APs and PPs, and suggests the following ordering of post-
nominal modifiers within the Persian NP:
(17)
The main argument of Ghaniabadi (2010) for this bipartition comes from an elliptic
construction he refers to as the Empty Noun Construction, where the head noun is elided
leaving behind one or more modifiers. He claims that this type of ellipsis is only possible
with bare adjectives, (18a), and not with AP or PP modifiers, (18b) and (18c) respectively.
In other words, a head noun can be elided (along with other head-adjoined elements) only
if the remnant is a bare adjective (A) and not an AP or a PP.
Page 11 of 56

(18)
Samvelian (2007) and Kahnemuyipour (2014), on the other hand, argue that all post-
nominal modifiers within the Ezafe domain are XPs.
9.2.4 Ezafe as a head-marking affix
Samvelian (2007, 2008) considers the Ezafe as a suffix attaching to the head and to its
intermediate projections in NPs, APs, and some PPs, and marking them as awaiting a
modifier or a complement. Comparable to some extent to the case-marking analyses, in
that the Ezafe is considered a ‘morpheme’ marking a dependency relationship between a
head and its dependents, Samvelian’s analysis nevertheless adopts the exact opposite
standpoint in considering the Ezafe as marking the head and not its dependents and
(p. 236) forming a constituent with the head both prosodically as well as functionally.
Viewed as such, the Ezafe construction is an illustration of the head-marked pattern of

morphological marking of grammatical relations (Nichols 1986) and reminiscent, all
things being relative, of the Semitic construct state construction. This analysis entails
that the Ezafe, which once grouped with the constituent it introduced, has undergone a
process of reanalysis-grammaticalization, being thus reinterpreted as a part of the
nominal head inflection.
Samvelian (2007, 2008) builds on two sets of evidence:
(i) The restrictions on the Ezafe construction highlighted by Samiian (1983) and
Ghomeshi (1997a) are either not well grounded or are not related to the Ezafe per se
but to its co-occurrence with other enclitics such as the indefinite determiner =i or
pronominal clitics.
Page 12 of 56

(ii) The Ezafe’s morphological behaviour, especially its complementary distribution
with the indefinite determiner =i and pronominal clitics, is typical of (phrasal)
affixes, rather than of a post-lexical clitic (Miller 1992; Zwicky and Pullum 1983).
Example (19a) shows that an AP can be introduced by the Ezafe within an NP. Note that
the AP is headed by the adjective negarān ‘worried’, which shows that the
ungrammaticality of (12a) does not result from the phrasal status of the modifier headed
by the adjective, but from the co-occurrence of the pronominal enclitic =ash and the
indefinite enclitic determiner =i. Removing the latter makes (12a) perfectly grammatical.
The same situation holds for PPs: that is, P1s, as well as P2s and P3s, can occur within
the Ezafe domain even when they head phrasal projections, (19b) and (19c) respectively.
(19)
(p. 237)
Samvelian (2007, 2008) then suggests accounting for the restrictions on the Ezafe in
morphological terms. The Ezafe is viewed as a phrasal affix attaching to the head and its
intermediate projections within the NP and indicating that the marked head or the
intermediate projection is awaiting a dependent. Samvelian (2007) argues that the
restrictions highlighted by Samiian (1983) and Ghomeshi (1997a) can be accounted for in
terms of slot competition between the members of the class of phrasal affixes, to which
belong the Ezafe affix itself but also the indefinite determiner =i and pronominal clitics.
The latter may combine with word-level inflectional affixes, that is, the plural suffix -hā
and the definite suffix -(h)e, but are in complementary distribution with the members of
their own class and thus mutually exclude each other. The major argument in favour of
Page 13 of 56

the affixal view of these enclitics is provided by restrictions on their co-occurrence: any
sequence containing two or more of these enclitics is excluded, even when their scope is
not the same constituent.
The incompatibility between the indefinite determiner =i and clitics is illustrated by

(12a). Examples in (20), where a head noun is followed by an adjective, illustrate the
same constraint on the co-occurrence between the Ezafe and =i. Note that the indefinite
determiner =i can either occur to the edge of the NP, that is on the adjective, as in (20b),
in which case the modifier is introduced by the Ezafe, or on the head noun, between the
head noun and the adjective, (20c), and in this case the Ezafe is excluded, (20a). These
facts have led some linguists to consider that the determiner =i in examples such as (20c)
cumulates the function of both the indefinite determiner and the Ezafe. Perry (2005), for
instance, uses the term ‘split Ezafe’ (p. 74) for the enclitic =i in these cases. Lazard
(1966: 257) expresses a similar opinion, noting that in addition to its role as a determiner,
=i acts in such contexts as a linker, being thus comparable to the Ezafe.
(20)
Examples in (21), namely the ungrammaticality of (21b), illustrate furthermore the fact
that any combination of the three enclitics under discussion is excluded even when they
have different scopes. In both (21a) and (21b) a reduced relative clause (RRC),
introduced by the Ezafe,6 is embedded within the NP headed by ghahremān ‘hero’. The
two NPs differ solely with respect to the constituent ordering within the reduced relative
clause. In (21a), the PP az mihan-ash ‘from his homeland’ precedes the participial head of
the modifier, (p. 238) while in (21b) it follows the head. Though both constituent
orderings within the RRC are grammatical,7 the addition of a possessor NP after the
reduced relative is possible only in (21a) but not in (21b).
Page 14 of 56

(21)
Samvelian (2007) claims that this contrast can be attributed to the fact that in (21b) the
Ezafe is attached to the personal enclitic =ash, but not in (21a). Contrary to (20a), the
Ezafe and the personal enclitic have two different scopes in (21b): the personal enclitic is
attached to the NP mihan ‘homeland’, while the scope of the Ezafe is the whole N′
ghahremān=e rānde shode az mihan=ash.
These facts are reminiscent of (haplology) phenomena discussed by Zwicky (1987) and
Miller (1992), which involve the English possessive (genitive) ’s and French weak
functional words. Along the same lines of argumentation, Samvelian (2007) concludes
that the Ezafe, the determiner =i, and personal enclitics are best regarded as phrasal
affixes and outlines a morphological treatment of these items in terms of edge inflection
(Klavans 1985; Lapointe 1990, 1992; Tseng 2003) dealt with by word-level morphology.
The Ezafe is thus considered as an inflectional affix adjoining to any nominal non-maximal
projection and registers the presence of a syntactic dependent, a modifier or a single NP
complement, within phrases headed by a nominal category: that is, nouns, adjectives, and
nominal prepositions.8 (p. 239)
Page 15 of 56

(22)
Page 16 of 56

9.2.5 Ezafe as the result of a roll-up movement
Kahnemuyipour (2014) develops a phrasal movement analysis of the Ezafe construction

using what is known in the literature as roll-up movement (Cinque 2005, 2010). Contra
Larson and Yamakido (2008), who take the basic word order of the Persian NP to be head-
initial, Kahnemuyipour (2014) assumes a head-final ordering for Persian NPs. While in
Larson and Yamakido’s account, the presence of the Ezafe is the result of a non-
movement, in Kahnemuyipour’s system, modifiers involved in the Ezafe construction are
uniformly merged in the specifiers of functional projections above the NP, regardless of
whether they are bare or phrasal. Under this view, movement and overt morphology go
hand in hand. When there is no movement, there is also no overt morphology. This implies
that the pre-nominal order within the NP, i.e. the one observed in English, is the basic
one. The movement that derives the postnominal order is accompanied with overt
morphology, hence the existence of the Ezafe, which is seen as a reflex of the roll-up
movement. (p. 240)
The backbone of Kahnemuyipour’s analysis is the near-perfect correlation between the

order of the head noun and other constituents within the NP and the presence of the
Ezafe, with the noun clearly demarcating the distribution of the latter: Ezafe cannot occur
on elements surfacing before the noun and is mandatory for every element following it.
Pre-nominal elements are considered as heads, i.e. X°s, and are not involved in the roll-up
derivation. Therefore, they do not need the Ezafe. Post-nominal elements, on the other
hand, are phrases, whose surface position is the result of the roll-up derivation, leading to
the appearance of the Ezafe marker. A crucial aspect of this analysis is that modifiers,
whether bare or phrasal, are (part of) XPs located in the specifiers of functional
projections above the noun, in accordance with Bare Phrase Structure (Chomsky 1995), a
bare adjective is treated as A/AP and can occupy a structural position similar to that of an
AP with a complement.
Kahnemuyipour (2014) argues againt Ghaniabadi (2010), who treats bare adjectives and
phrasal modifiers in radically different ways. For Ghaniabadi, bare adjectives are heads
which are head-adjoined to the noun, whereas AP and PP modifiers are phrasal elements
in the specifiers of functional projections above the NP. As it was mentioned in 9.2.3,
Ghaniabadi’s main argument for this bipartition comes from an elliptic construction he
refers to as the ‘empty noun construction’, where the head noun is elided, leaving behind
one or more modifiers. He claims that this type of ellipsis is only possible with bare
adjectives, cf. (18a), and not with AP or PP modifiers, cf. (18b) and (18c) respectively. In
other words, a head noun can be elided (along with other head-adjoined elements) only if
the remnant is a bare adjective and not an AP or a PP. Ghaniabadi claims that the ellipsis
of the noun along with another bare adjective is possible, because these adjectives are
recursively head-adjoined to the noun. This makes it possible to elide the noun with one
or more bare adjectives as long as what is left behind is another bare adjective and not an
AP. Kahnemuyipour first notes that pragmatic and lexical restrictions on these elliptical
constructions undermine any strong conclusion about the head vs. phrasal status of the
Page 17 of 56

modifiers based on the ungrammaticality of a few examples involving one or the other
type of remnant. He furthermore claims that there are grammatical examples of noun
ellipsis with a modifying PP and AP as remnants and provides a few examples, like the
one in (23).
(23)
Based on these facts, a uniform treatment of bare adjectives and phrasal modifiers as XPs
is adopted. The Persian DP is taken to be head-final, with the NP merged at the bottom of
the tree structure and the APs residing in the specifiers of projections above (p. 241) it.
The demonstrative (Dem) and the Numeral (Num) are heads higher up in the tree
structure in accordance with Cinque (2010). In addition, there are intermediate
projections enabling the roll-up derivation. The relevant structures and roll-up
movements are shown schematically in (24), where the projections hosting the APs are
marked as XP, YP, etc. and the intermediate projections are marked as AgrPs. Under this
view, the Ezafe can be seen as the surface realization of the suggested inversion process,
i.e. a linker in the sense of Den Dikken (2006). The height of the movement corresponds
to the realization of the Ezafe marker. The ‘overt’ movement stops below elements that
are high in the universal schema such as numerals and demonstratives. Consequently, the
Ezafe does not occur on the latter.
Page 18 of 56

(24)
Page 19 of 56

9.2.6 Concluding remarks on the Ezafe construction
As mentioned in the introductory remarks, the Ezafe construction is a common feature of

those Iranian languages which display a head-initial word order within their NP. While
this construction is assumed to have the same origin in all these languages, it has taken a
different path from one language to another, resulting in a contrasting picture in modern
Iranian languages. Thus, phonological, morphological, and syntactic properties of the
Ezafe construction considerably cross-linguistically vary. In some languages, the Ezafe
particle is inflected and displays agreement features, e.g. Kurmanji and Zazaki, while in
(p. 242) others, such as Persian, it is invariable. Likewise, while it behaves rather like a
post-lexical clitic in some cases, showing thus a certain degree of autonomy with respect
to its host, in some other cases, it is more or less amalgamated with other nominal
inflectional affixes. The same degree of variation is observed in the syntax of the Ezafe
construction. While some languages treat relative finite clauses on a par with APs and
PPs with respect to the Ezafe (e.g. Kurmanji), other languages exclude finite relatives
from the Ezafe construction (e.g. Persian). Likewise, the order of the Ezafe-marked
constituents may vary: while in some languages (e.g. Persian) the possessor NP closes the
Ezafe domain, in others modifiers occurring after the possessor NP can be introduced by
the Ezafe (e.g. Kurmanji and Sorani). Finally, the lexical head licensing the Ezafe
construction may also be subject to variation. While the NP is the favourite domain of the
realization of the Ezafe construction, adjectival and prepositional heads can also host the
Ezafe construction in some languages (e.g. Persian), but not in all of them (e.g. Sorani).
Studies on the Ezafe construction have up to now focused on the description and
modelling of the phenomenon in a single language, with a clear preponderance of
Persian. Investigating the Ezafe construction in less-studied Iranian languages using
cross-linguistic approaches can shed a new light on the construction itself and on the
nature of dependency relations within the NP. Another promising topic of investigation is
the contact-induced Ezafe construction in non-Iranian languages spoken in the area, such
as Aramaic languages. Finally, broader typological studies investigating the resemblances
and differences between the Ezafe and linkers (loosely speaking) within the NPs in the
languages of the world can also constitute another fruitful vein of research.
9.3 Differential object marking

Differential object marking (DOM)9 is a rather common feature of Iranian languages and,
according to Windfuhr (2009: 33), ‘a response to the loss of inflectional case marking’. It
has been one of the main topics of interest—if not the main topic—in various descriptive
and formal studies of Persian, and continues to generate interest despite the significant
number of publications dedicated to the topic.
Page 20 of 56

In Modern Persian, DOM is realized by the enclitic =rā, in the formal register, and its
colloquial variants =ro and =o (after a consonant). Rā is obligatory with all definite
objects. Historically, =rā is the phonological reduction of rāy in Middle Persian, which in
turn comes from the Old Persian postposition rādi(y), ‘for (the sake of)’, ‘in account of’,
‘concerning’. The suffix =rā developed as an indirect object marker in late Middle Persian
and Early New Persian and progressively developed into a direct object marker only in
the course of several centuries.10 According to Paul (2003: 182), unlike in later classical
and Modern Persian, it is not predominantly definiteness that determines when =rā
occurs and when not, but animacy. For more information on Ezafe, see Chapters 2, 3, 7, 8,
10, 12, 15, and 19 in this volume. (p. 243)
The enduring interest in =rā is due to the fact that a cluster of heterogeneous parameters
seem to be at work in rā-marking, since =rā can also occur with indefinite direct objects
and even other grammatical functions. Spotting the relevant or prevailing parameter(s)
that determine(s) the presence of =rā has thus been the major issue in studies on DOM in
Persian. Another issue that has been extensively investigated in generative studies is the
syntactic consequence of rā-marking and whether rā-marked objects occupy a different
syntactic position with respect to their non-marked counterparts.
9.3.1 Rā as a mark of specificity
Cross-linguistic studies on DOM (Aissen 2003; Bossong 1985; Comrie 1979, 1989;
Dalrymple and Nikolaeva 2011; Hopper and Thompson 1980; Lazard 1982, 1984b;
Malchukov 2008; Næss 2004, 2007; de Swart 2007; among others) have shown that
animacy and definiteness (or specificity) are generally involved in DOM: animate and/or
definite objects are more likely to be marked than inanimate and/or indefinite objects.
Among these two semantic properties acting upon DOM cross-linguistically, the degree of
determination (i.e. definiteness or specificity) prevails in Persian. Grammars and
linguistic studies generally qualify =rā as the mark of the definite direct object (Gharib et
al. 1994; Natel-Khanlari 1984; Lazard 1957; Mahootian 1997; Sadeghi 1970; among many
others). All definite direct objects must be rā-marked. This implies that personal
pronouns, proper nouns, and NPs introduced by a definite determiner (e.g.
demonstrative, interrogative) must be marked, as in (25a). This also implies that all
definite descriptions and NPs whose reference is unique, (25b), anaphoric NPs, and all
those NPs whose reference is given by the context must also be followed by =rā too,
(25c). The omission of =rā in all these cases yields strict ungrammaticality.
Page 21 of 56

(25)
However, although =rā is absolutely required with definite NPs, definiteness cannot
account for the whole range of the distribution of =rā. In other words, while definiteness
constitutes a sufficient condition for rā-marking of DOs, it is not a necessary condition,
since indefinite objects may as well be rā-marked, (26).
(26)
Note that in this latter case, the omission of =rā does not render the sentence
ungrammatical. (p. 244)
Some authors have therefore suggested that specificity, rather than definiteness, is
responsible for rā-marking (Browne 1970; Browning and Karimi 1994; Karimi 1990,
1996). In this view, the occurrence of =rā with indefinite NPs is not optional, but depends
on the reading of the NP: all specific objects must be rā-marked, be they definite or
indefinite. Note that there is no consensual definition of the notion of specificity.
Informally speaking, the referent of a specific indefinite expression is identifiable to the
speaker (but not to the addressee). A prototypical specific indefinite is generally assumed
to have wide scope, a referential reading, and an existential presupposition.11 Karimi
(1996) suggests that a specific NP must be rā-marked if it occurs in the syntactic
configuration, in (27):12
(27)
In contrast with this categoric view, other studies insist on the fact that a cluster of
features or properties, and not a single binary feature (be it definiteness or specificity), is
involved in DOM. Lazard (1982, 1994) claims that, apart from definiteness, the presence
Page 22 of 56

of =rā can be triggered by factors such as animacy (or humanness), the semantic
‘contentfullness’ of the verb, the semantic ‘distance’ between the verb and the object, the
relative weight of the syntactic constituents, and finally the information structure.
Lazard’s approach combines thus a cluster of non-homogeneous parameters, involving
not only the inherent semantic properties of the object itself but also its relationship with
the verb and particularly the way speakers organize their utterance in order to ‘polarize’
the object. Lazard coins the term ‘polarized object’ to designate rā-marked objects, as
opposed to ‘depolarized’, i.e. non-rā-marked, objects. Because of the complex interaction
between these factors, he concludes that it is impossible to formulate a categoric rule for
rā-marking in Persian. Note that Bossong (1991) also makes the same remark on DOM in
variety of languages,13 for example, Hindi, Kannada, and Ostyak, and claims that the
rules of DOM in these languages must allow for a certain degree of variability across
speakers and situations.
The fact that specificity is not sufficient by itself to account for the whole range of the
uses of =rā has been noted by several other authors (Dabir-Moghaddam 1992; Ghomeshi
1997b; Meunier and Samvelian 1997; among others). The most obvious counterexample
to such a generalization is provided by the use of =rā with generic objects:
(28)
(p. 245)
Phillott (1919) notes that the omission of =rā in (28c) would not change the interpretation
of the sentence. A very convincing example in this sense is provided by Hincha (1961),
quoted by Lazard (1982):
Page 23 of 56

(29)
As Lazard (1982) notes, (29) uncontroversially proves that referentiality is not the only
trigger of DOM in Persian and in some cases does not play any role at all.
9.3.2 Rā as a mark of high transitivity
Another set of data showing that other factors than referentiality intervene in rā-marking
is provided by the following examples:
(30)
(p. 246)
Page 24 of 56

In these examples, the rā-marked constituent is not the DO properly speaking, but a
locative or a ‘dative’ argument of the verb, which can also have a prepositional
realization. In example (30b), for instance, zamin=e posht=e bāgh ‘the ground behind the
garden’ is by preference introduced by a locative preposition such as dar ‘in’, as
illustrated in example (31a). In example (30e), hezār nafar ‘1,000 people’, which is the
goal or beneficiary argument of the verb dādan, is canonically introduced by the
preposition be ‘to’:
(31)
Lazard (1982) claims that in (30a), the whole surface or space designated by the Ground
argument, i.e. ru=ye yax ‘on the ice’, is occupied by the result of the action. In other
words, the rā-marked variant of the locative argument implies a holistic reading, while
the prepositional variant gives only a locative indication. Therefore, (30b) and (31a) do
not display the same truth conditions. Interestingly, the same remarks have been made
for the spray-load alternation in English (Levin 1993), to spray paint on the wall vs. spray
the wall with paint. It has been claimed that in this latter case, the locative argument
receives a holistic reading (Anderson 1971).
The alternation illustrated by (30e) and (31b) on the other hand is comparable to the so-
called ‘dative alternation’ in English (to give something to somebody vs. to give somebody
something). The interaction between several parameters, such as definiteness, animacy,
and discourse accessibility (or givenness), has been shown to favour the double object
Page 25 of 56

variant (Bresnan et al. 2007). It seems at first sight that some of these parameters also
play a role in the preference for the double object construction in Persian, although the
phenomenon needs to be thoroughly investigated.
Lazard (1982) coins the term ‘Polarized Quasi-Objects’ for the rā-marked constituents in
(30). The role of =rā is thus to turn some oblique arguments, i.e. those that share some
typical properties of rā-marked DOs, into objects. The use of =rā in this function is not
limited to (p. 247) locative (or Ground) and ‘dative’ (Goal) arguments but extends to a
wide variety of cases, illustrated by the following examples:
(32)
In (32a), the rā-marked constituent is a modifier denoting the distance. In (32b), =rā
occurs with an adjunct expressing the purpose. In (32c) and (32d), =rā is adjoined to
temporal modifiers.
Concerning the presence of =rā with temporal modifiers, Lazard (1982) claims that a
comparable semantic effect to the one observed with locative arguments is at play here
as well. Unlike their non-rā-marked counterparts, rā-marked temporal modifiers serve not
only the temporal anchoring to the activity denoted by the predicate, but also its temporal
Page 26 of 56

delimitation: that is, the activity occupies the entire time interval. This contrast is
illustrated by the difference of interpretation between examples (32d) and (33): (32d), but
not (33), implies that the speaker has spent his life suffering.
(33)
In accordance with Lazard (1982), Ghomeshi (1997b) assumes that these modifiers
behave as prototypical direct objects in that they ‘measure out’ or delimit the event
described by the verb (Ghomeshi and Massam 1994) and concludes that =rā is in fact a
marker of high (p. 248) transitivity, since the cluster of the properties triggering its
presence all correlate with high transitivity in the sense of Hopper and Thompson (1980).
9.3.3 Rā as a mark of topicality
The idea that DOM in Persian depends not only on the inherent referential features of the
object but also on the information structure has been defended in several studies (Dabir-
Moghaddam 1992; Dalrymple and Nikolaeva 2011; Ghomeshi 1997b; Karimi 1990; Lazard
1982; Meunier and Samvelian 1997; Peterson 1974; Shokouhi and Kipka 2003; Windfuhr
1979; among others). It has been observed that rā-marked objects tend to be topics, while
non-rā-marked objects display focus properties. The set of data in (34) unambiguously
highlights the link between =rā and topicality. In these examples, =rā occurs with floating
topics located at the left periphery of the sentence and cross-referenced by a clitic.
(34)
The floating topic can correspond to a number of different grammatical functions. In

(34a), the topicalized constituent is the DO doubled by a clitic in the sentence, in (34b)
the prepositional argument of the verb, and in (34c) the complement of the head noun of
the direct object (i.e. it does not bear a function with respect to the verb).
Page 27 of 56

Since the referential properties of an NP are generally correlated to its discourse status—
definite/specific NPs are more likely to be topics while non-specific NPs tend to be
focuses—some studies have claimed that the main function of =rā is to mark topicality.
Peterson (1974) suggests that specific DOs are rā-marked because topics are specific in
nature. Dabir-Moghaddam (1990, 1992, 2009), following Windfuhr (1979), claims that =rā
is the mark of secondary topics: all rā-marked objects are topics while all non-marked
objects are part of the comment. Dalrymple and Nikolaeva (2011) adopt a less ‘strong’
version of this analysis. While they agree with Dabir-Moghaddam (1992) that the main
function of =rā is secondary topic marking, they nevertheless disagree with the latter on
two points:
(i) Rā can mark the primary topic as well.

(ii) While the distribution of =rā on non-objects exclusively depends on topicality, the
picture is more complex for direct objects and topicality is not the only relevant
factor in determining rā-marking.
(p. 249)
Following Reinhart (1982), Gundel (1988), and Lambrecht (1994), Dalrymple and
Nikolaeva (2011) define topicality as a matter of ‘aboutness’: the topic is the entity that
the proposition is about. Consequently, topicality has to do with the construal of the
referent as pragmatically salient (or prominent: de Swart 2007) so that the assertion is
made about this referent. A potential diagnostic for topic-hood is the ‘what-about’ or ‘as-
for’ test (Gundel 1988; Lambrecht 1994; Reinhart 1982). Although topicality correlates
with the role played by the referent in the preceding discourse, the correlation is
imperfect. Topicality is mainly a question of saliency, and although definite or specific
NPs are generally given more saliency because of their referential properties, non-
specific NPs may also become salient if the speaker decides so in a given communicative
context.
The topic role is not necessarily unique and several studies have acknowledged the
existence of at least a secondary topic along with the primary topic (Givón 1984;
Nikolaeva 2001; Polinsky 1995). Nikolaeva (2001) defines secondary topic as ‘an entity
such that the utterance is construed to be ABOUT the relationship between it and the
primary topic’. Topics are ordered with respect to saliency: The primary topic is more
pragmatically salient than the secondary topic.
Dalrymple and Nikolaeva (2011) note that =rā can be a primary topic as well as the
secondary topic. In (35a), the rā-marked DO is a secondary topic, the subject being the
primary topic, while in (35b) it is the primary topic given that the subject is the focus and
therefore cannot be the primary topic.
Page 28 of 56

(35)
The second disagreement is more important. Dalrymple and Nikolaeva (2011) claim
topicality is not the only relevant factor in determining rā-marking on objects. It is a
factor for some objects, i.e. indefinite objects. On definite objects, however, rā-marking is
essentially motivated by definiteness, having to do with features of topic-worthiness: all
definite objects must be marked, independent of their information status and even if they
are in focus. In fact, Dalrymple and Nikolaeva (2011: 113) postulate two =rā’s or two
functions for =rā in examples such as (36): the first marks the topicality of the temporal
adjunct, while the second is licensed by definiteness.
(36)
Samvelian (2002) also defends the assumption that there are two =rā’s in Persian, noting
a crucial difference between rā-marked arguments on the one hand and rā-marked
floating topics and adjuncts on the other hand. While there can only be one rā-marked
argument (i.e. object) in a simplex sentence, the number of rā-marked floating topics and
adjuncts is not grammatically limited. The examples in (37) show that the nominal
element of a complex predicate can be rā-marked if referential. The examples in (38)
illustrate a ‘transitive’ (p. 250) use of the same predicate, so that the verb is preceded by
two direct nominal dependents. The relevant point here is that although the nominal
element of the predicate can still be modified and even determined, as in (38b), it cannot
be rā-marked, (38c). The situation is identical in ‘double object’ constructions in (39). The
complex predicate pādāsh dādan ‘to reward’ (lit. ‘reward give’) may take two direct
objects (Theme and Goal), but only one of them can be rā-marked. When both arguments
are definite, the dative (Goal) argument must have a prepositional realization.
Page 29 of 56

(37)
(38)
(39)
By contrast, rā-marked floating topics and adjuncts can be multiple: (40) contains two rā-
marked temporal modifiers and one rā-marked object.
Page 30 of 56

(40)
(p. 251)
To sum up, topicality cannot be considered to be the unique trigger of =rā, as claimed by
Dabir-Moghaddam (1990, 1992, 2009). In fact, it would be nothing less than surprising if
an item whose presence is mandatory in some cases was exclusively triggered by
information structure. Information structure is a matter of choice: the speaker decides
how to ‘structure’ the utterance in order to convey information. The presence of =rā with
definite objects is obligatory in Persian. So, whatever the choice of the speaker about the
information or discourse status of a definite object, =rā must be there as a grammatical
constraint. The price for the analysis of =rā as an exclusive topic-marker would be to
admit that Persian speakers have no choice as to the discourse status of the definite
direct object, which must be necessarily construed as a topic. If definite subjects can be
non-topical in Persian, why would definite objects be denied this status? As has been
noted by Karimi (1989, 1990), a definite DO can be a focus. In (41), the rā-marked object
can be the answer to a who-question and hence is the focus of the utterance.
Consequently, topicality is not a necessary condition for rā-marking.
(41)
There remains one last problem with the analysis of =rā as a topic-marker. Karimi (1989)
rightly claims that not all non-subject topics are rā-marked, as illustrated by (42), where
the bare object gusht ‘meat’ is topicalized without being rā-marked.
(42)
More investigation is needed on the topicalization of ‘bare’ objects in Persian, but one can
already affirm that not only is topicality not a necessary condition for rā-marking, it is not
either a sufficient condition.
Page 31 of 56

9.3.4 DOM, object positions, and word order
Several studies have noted that rā-marking has a syntactic correlate: rā-marked objects
tend to precede prepositional arguments and even subjects and thus do not occur in the
canonical position of DOs in Persian, i.e. adjacent to the verb. Based on this fact and some
others that will be discussed later, many studies have suggested a dual-position account
of the direct object depending on its markedness (Browning and Karimi 1994; Ganjavi
2007, 2011; Ghomeshi 1996, 1997b; Karimi 1990, 2003).14 It is assumed that rā-marked
objects do not occupy the same syntactic position as their non-rā-marked counterparts
and appear in a higher position than the former. According to Karimi (2003), for instance,
rā-marked DOs (p. 252) occupy the position of the specifier of VP, while non-rā-marked
DOs occupy a lower position, that is, the position sister to the verb (under the V′):15
(43)
The Two Object Position Hypothesis (TOPH) is built on the claim that rā-marked and non-
rā-marked objects display consistent asymmetries with respect to word order, licensing
parasitic gaps, and binding anaphors, and that they cannot be coordinated.
The unmarked word order asymmetry is the backbone argument of the TOPH. It is
generally assumed that in unmarked, canonical or neutral word order in ditransitive
constructions in Persian, rā-marked DOs precede the indirect object (IO) while non-rā-
marked DOs follow the IO (Browning and Karimi 1994; Ganjavi 2007; Givi Ahmadi and
Hassan 1995; Ghomeshi 1997b; Karimi 2003; Mahootian 1997; Rasekh-Mahand 2004;
Roberts et al. 2009).
(44)
Page 32 of 56

(45)
Samvelian (2001) questions this hypothesis and postulates a flat structure for the Persian
VP:16 rā-marked and non-rā-marked objects occupy the same syntactic positions. In a
series of recent corpus-based and experimental studies, Faghiri et al. (2014), and Faghiri
and Samvelian (2014) show that, with respect to word order, indefinite non-marked DOs
group with marked DOs rather than with bare objects. Faghiri and Samvelian (2015b) and
Faghiri (2016) further argue that the asymmetries between marked and non-marked
objects do not need to be accounted for in terms of syntactic positions and are best
accounted for by semantic and discourse considerations. (p. 253)
Based on a corpus-based study17 and experimental follow-up studies, Faghiri and

Samvelian (2014), Faghiri et al. (2014), and Faghiri (2016) investigate ordering
preferences between the DO and the IO in the preverbal domain in Persian. Observing
that non-rā-marked objects show more versatility with respect to word order, the authors
resort to a more fine-tuned classification of unmarked DOs, splitting them into bare DOs,
ketāb ‘book’, bare modified DOs, ketāb=e kohne ‘old book’, and indefinite (unmarked)
DOs yek ketāb(=e kohne) or ketāb(=e kohne)=i ‘an (old) book’. In addition to the
realization of the DO, these studies take into account other potentially influential factors
such as relative length, givenness, collocationality, and lexical bias, via mixed-effect
regression modelling, in line with key empirical studies on word order variations (Wasow
2002; among others).
The data reveal that while rā-marked DOs show a strong preference for appearing before
the IO, among various non-rā-marked DOs, only bare nouns show a strong preference for
adjacency to the verb. Interestingly, indefinite (non-rā-marked) DOs show a clear
preference for the inverse, grouping with rā-marked DOs. Moreover, extra syntactic
factors such as relative length also play a significant role in these ordering preferences.
Accordingly, Faghiri and Samvelian (2015b,a) argue that the ordering preferences
observed for different types of DOs are best represented as a continuum based on the
degree of conceptual and/or discourse accessibility. The authors conclude that any
structural account of word order preferences between DOs would lead to wrong
predictions. Moreover, even if the TOPH was to be maintained, word order preferences
speak in favour of an identical position for rā-marked DOs and indefinite non-rā-marked
DOs. To sum up, word order does not seem to constitute a conclusive criterion in favour
of a configurational account of rā-marking.
Page 33 of 56

The behaviour of DOs with respect to licensing parasitic gaps is another argument in
favour of the TOPH. According to Karimi (1999: 704), only rā-marked DOs can license
parasitic gaps, see (46).
(46)
Faghiri and Samvelian (2015a) note, however, that examples in (47) are grammatical
despite the fact that the non-rā-marked object licenses a parasitic gap. The oddness of
(46b) may be (p. 254) due to the fact that the verb is in the past tense and the sentence
denotes a specific accomplished event where it is expected for the DO to be known to the
speaker and hence a bare DO is not felicitous.
(47)
Unlike non-rā-marked DOs, rā-marked DOs have been claimed to be able to bind an
anaphor (Karimi 2003: 102):
Page 34 of 56

(48)
Here again, Faghiri and Samvelian (2015a) and Faghiri (2016) claim that in a proper
context, a non-specific DO can bind the IO, as shown by the attested examples in (49),
found on the web.
(49)
(p. 255)
The fact that non-rā-marked DOs and rā-marked DOs cannot be coordinated has also been
evoked in favour of the TOPH (Karimi 2003: 103):
(50)
Page 35 of 56

Samvelian (2001) claims that coordination cannot be used as a test in favour of the TOPH,
and Faghiri and Samvelian (2015a) give the following grammatical example in which a
non-rā-marked and a rā-marked DO are coordinated:
(51)
To sum up, although there is a consensus on the TOPH in the studies within the
generative framework, at least some of the empirical facts supporting this hypothesis
seem to be fragile. In particular, word order preferences do not allow for a clear-cut
distinction between rā-marked DOs and their non-rā-marked counterparts as far as
syntactic positions are concerned.
9.3.5 Concluding remarks on DOM
The facts addressed in this section show that despite the abundant literature on the
semantic and pragmatic parameters triggering =rā, there is still a lot to investigate in
order to draw a clear picture of the situation. Therefore, it is a safe bet that =rā will
remain a popular issue in forthcoming studies on Persian. However, it seems reasonable
to conclude, in line with Lazard (1982) and Ghomeshi (1997b), that rather than there
being a single binary feature that can characterize rā-marking, be it specificity, topicality,
any other feature, the presence of =rā is determined by the interaction between several
parameters that have been highlighted in various studies.
Complex as it may seem, this situation is neither specific to Persian nor to DOM. Bossong
(1991) concludes that in many languages the rules of DOM cannot be formulated
precisely, but must allow for a certain degree of variability across speakers and
situations. Variability is also observed for phenomena other than DOM, like word order,
optional realization of some appositions, dative alternation, etc. A growing body of studies
since Wasow (2002), Bresnan (2006), Bresnan et al. (2007), and Bresnan and Hay (2008),
among others, accounts for grammatical phenomena that involve variation by resorting to
a new approach which assumes that variation is part of grammar and can be statistically
modelled. These methods can be very useful for the study of rā-marking in Persian, which
involves various parameters whose complex interaction requires more reliable methods of
investigation than traditional grammaticality judgements. This point has been clearly
demonstrated by Faghiri and Samvelian (2014, 2015b,a), Faghiri et al. (2014), and Faghiri
(2016) for the linear position of (p. 256) the rā-marked and non-rā-marked objects:
generalizations based on grammaticality judgements turn out to be (partially) wrong
Page 36 of 56

when rigorous empirical methods are used. The same vein of research can be applied for
modelling semantic and discourse dimensions of rā-marking. For more information on
=rā , see Chapters 2, 3, 6, 7, and 8 in this volume.
9.4 Complex predicates

Persian has only around 250 simplex verbs, half of which are currently used by the
speech community.18 The morphological lexeme formation process outputting verbs from
nouns (khāb ‘sleep’ > khāb-idan ‘to sleep’, raghs ‘dance’ > raghs-idan ‘to dance’), though
available, is not productive. When they need to refer to a new event type, speakers resort
to complex predicates (CPrs), formed by a verb and a preverbal element, which can be a
noun, harf zadan ‘to talk’ (lit. ‘talk hit’), an adjective, bāz kardan ‘to open’ (lit. ‘open do’),
a particle, bar dāshtan ‘to take’ (lit. ‘PARTICLE have’), or a prepositional phrase, be kār
bordan ‘to use’ (lit. ‘to work take’). These combinations are generally referred to as
complex predicates, compound verbs, or light verb constructions.
According to Telegdi (1951), the gradual elimination of simplex verbs and their
substitution by ‘periphrastic expressions’ or ‘compound verbs’ is at least as old as Middle
Persian. Korn (2013) argues that the rise of CPrs in Persian is linked to the development
of the verb pair ‘do’ and ‘become’, which encode the features called Instigation [+INST]
and Affectedness [+AFF], respectively.
A first issue when dealing with Persian CPrs is the delimitation of the category itself. As
discussed extensively in Samvelian (2001, 2012), two facts drastically blur the boundary
line between lexical and light verbs, and hence, between CPrs and ordinary object–verb
combinations:
(i) The (expected) consequence of the limited number of simplex verbs in Persian is
that most of them have a vague semantic content, which becomes specified only in
the context of their combination with their arguments (Samsam Bakhtiari 2000;
Samvelian 2012). In other words, most Persian verbs are de facto light verbs, so that,
from a semantic point of view, deciding whether a noun–verb combination qualifies
for being a CPr involves some degree of arbitrariness. For instance, in combinations
like rang zadan ‘to paint’ (lit. ‘colour hit’), vāks zadan ‘to polish’ (lit. ‘polish hit’), or
kare zadan ‘to butter’ (lit. ‘butter hit’), the verb zadan may be considered either a
‘bleached’ (light) verb or a lexical verb meaning ‘to apply, to put’, and accordingly
the sequence can be considered as a CPr or an ordinary object–verb combination.
The most striking piece of evidence illustrating this situation is the variability of the
combinations listed in Persian dictionaries, which vary considerably from one
dictionary to another.
Page 37 of 56

(ii) From a strictly syntactic point of view, [bare object–lexical verb] combinations
and [adjective–copula] combinations are in many respects comparable to N-V and A-V
(p. 257) CPrs. For instance, a few of the criteria used to identify N-V CPrs, like single
word stress and limited syntactic autonomy for the noun, also apply to [bare object–
lexical verb] combinations, leading to a situation where sequences like māshin
rāndan ‘to drive a car’ are considered ‘compound verbs’ in some dictionaries and in
the literature on CPrs. Dabir-Moghaddam (1995), for instance, suggests that
sequences like ruznāme xāndan ‘to read a newspaper’ are also compound verbs
comparable to sequences like ersāl kardan ‘to send’ (lit. ‘sending do’).
Identifying the type of constructions that can be considered as CPrs has thus been one of
the issues discussed in some studies (Dabir-Moghaddam 1995; Samvelian 2012; Sedighi
2009; among others). However, this issue is probably far from being resolved, probably
because whether a sequence is a CPr or not—except for uncontroversially clear
combinations formed with a ‘real’ light verb such as kardan and a ‘predicative’ (or
eventive) noun, such as ersāl ‘sending’—is often a matter of usage and lexicalization
rather than the inherent properties of the sequence itself.
The main body of work on CPrs has focused on the dual nature of these sequences, which
exhibit both lexical and phrasal properties (Barjasteh 1983; Dabir-Moghaddam 1995,
1997; Family 2006, 2009; Folli et al. 2005; Goldberg 1996, 2003; Karimi 1997; Karimi-
Doostan 1997, 2005; Lazard 2013; Megerdoomian 2001, 2002, 2012; Müller 2010;
Pantcheva 2010; Samsam Bakhtiari 2000; Samvelian 2012; Samvelian and Faghiri 2013b,
2014; Shabani-Jadidi 2014; Tabaian 1979; Vahedi-Langrudi 1996; among others). On the
one hand, Persian CPrs display all properties of syntactic combinations, including some
degree of semantic compositionality, while, on the other hand, they also have word-like
properties, since CPr formation has all the hallmarks of a lexeme formation process, such
as lexicalization, idiomaticity, and the fact that the sequence can undergo morphological
operations.
9.4.1 Words or phrases?
Whether Persian CPrs are ‘words’ or syntactic construals has been one of the most
debated issues in the literature. The following arguments are generally put forward in
favour of a lexical view of these sequences:
• The whole sequence bears a single lexical stress, like a word (52).
• CPrs can serve as input to derivational rules (53).
• The components of a CPr must be adjacent and can only be separated by a restricted
set of elements, i.e. verbal inflectional prefixes, clitic pronouns and the future auxiliary
(54).19
Page 38 of 56

(52)
(53)
(54)
(55)
(p. 258)
Arguments in favour of a phrasal (i.e. syntactically construed) analysis of CPrs are the
following:
Page 39 of 56

(i) Not only inflectional material, but also syntactic material such as prepositional
phrases, can intervene between the verb and the non-verbal element in a CPr (56).
(ii) The non-verbal element, if a noun, can be modified, quantified, determined, and
rā-marked (57).
(56)
(57)
These apparently contradictory properties have given rise to debates on the

(p. 259)
appropriate ‘place’ for CPr formation. ‘Lexicalist’ approaches20 claim that Persian CPrs
are formed in the lexicon (Barjasteh 1983; Dabir-Moghaddam 1995, 1997; Karimi-Doostan
1997). Karimi-Doostan (1997: 193) suggests that Persian CPrs are lexically formed
complex lexical entries which consists of two zero-level elements separable in syntax, and
thus failing to display lexical integrity. Goldberg (1996) treats the Persian CPr as a
construction represented in the lexicon, whose categorial status is V° by default. This
guarantees that the verb and the preverbal element are unseparated and thus can
undergo derivational processes. However, the V° status is a default status and can be
overridden if there is a competing higher-ranked constraint.
‘Syntactic’ approaches, by contrast, consider CPr formation a syntactic process (Folli et

al. 2005; Ghomeshi and Massam 1994; Megerdoomian 2002, 2012; Mohammad and
Karimi 1992; Tabaian 1979; Vahedi-Langrudi 1996; among others). Mohammad and
Karimi (1992) suggest that despite their syntactic construal, Persian CPrs display lexical
properties for two reasons: (i) the impossibility for the light verb to assign thematic roles,
and (ii) the existence in Persian of two distinct positions for objects.
More recently, neo-constructionist studies on Persian CPrs (Folli et al. 2005;

Megerdoomian 2002, 2012; Pantcheva 2010) adopt the predicate decomposition approach
developed by Hale and Keyser (1992, 1993, 1997, 2002), Ritter and Rosen (1996) and
Borer (1994), which blurs the classic distinction between simplex and complex predicates
and analyses pairs like give a kick and kick in English as sharing the same syntactic
Page 40 of 56

representation: the ‘simplex’ verb kick is syntactically construed via the incorporation of
the noun kick into an abstract light verb. Megerdoomian (2002) and Folli et al. (2005)
claim that Persian CPrs constitute a conclusive argument in favour of neo-constructionist
theories of argument structure, since they are unincorporated counterparts of English
simplex verbs and thus reveal the universally complex underlying structure of predicates,
be they morphologically simplex or not.
On this view, whether CPrs are formed in the lexicon (i.e. morphologically) or in syntax is
not relevant, since morphology is handled in syntax.
Samvelian (2012) also considers the debate on the dual nature of Persian CPrs a false
issue, not because of the lack of a boundary between syntax and morphology, but because
of the numerous confusions surrounding the use of terms such as ‘formed in the lexicon,’
‘word’, and ‘morphologically formed’. Saying that a sequence containing more than one
word is ‘formed in the lexicon’ or is a ‘lexical unit’ can mean various things:
(i) The sequence is the output of a morphological operation and prototypically

behaves like an atom in syntax.
(ii) The sequence is lexicalized (Aronoff 1993) for various reasons (token frequency,
naming force, etc.) and must be stored. It is a listeme in the sense of Di Sciullo and
Williams (1987).
These are two independent dimensions and must be carefully distinguished. Gaeta and
Ricca (2009: 38) suggest a quadripartite typology, which allows for treating the (p. 260)
properties of being a lexical/stored unit or the output of a morphological operation as
independent grades of freedom. Each dimension is represented by a binary feature, [+
morphological] and [+ lexical], giving rise to four virtually possible combinations. This
typology shows that the set of morphological words and the set of lexicalized items need
not be coextensive.
Turning now to Persian CPrs, the latter are certainly not words in sense (i), since even
the most idiomatic Persian CPrs do not behave as atoms and are separable by lexical
material. Most Persian CPrs, on the other hand, are lexicalized sequences and display
lexeme-like (i.e. not word-like) properties (Bonami and Samvelian 2010; Samvelian and
Faghiri 2013a). They are [– morphological], [+ lexical] in the sense of Gaeta and Ricca
(2009). Samvelian (2012) furthermore argues that none of the arguments supporting the
‘wordhood’ of Persian CPrs are conclusive:
• Lexical accent. Bearing a single lexical stress is not specific to CPrs. Sequences
formed by a bare object and a lexical verb bear a single stress and yet have not
claimed to be ‘words’.21
Page 41 of 56

• Input to morphological operations. It has been argued that since Persian CPrs
can undergo morphological operations, such as nominalization, ‘they must be treated
as lexical or X° units’ (Megerdoomian 2002: 59). For Karimi-Doostan (1997), given that
the agentive noun °konande ‘doer’ is not attested, the agentive noun pazira’i konande
‘entertainer’ must be derived from pazira’i kon, which requires that the latter be a
word. According to Vahedi-Langrudi (1996) the suffix -i adjoins to the whole sequence
eslāh kardan ‘to reform’, and not to kardan in order to form eslāh kardani ‘likely to be
reformed’.
Samvelian (2006a: 162) argues, however, that this line of argumentation is flawed
since it leads to the conclusion that mār zadan ‘snake beat’ is a lexical unit on the basis
of the existence of the adjectival participle mār zade ‘snake beaten’. But mār zadan
never occurs as a sequence in discourse.
Bracketing paradoxes, i.e. cases where the semantic scope of an affix and its
morphological attachment do not coincide (Pesetsky 1985; Spencer 1988; Sproat 1984;
Williams 1981), are rather common in various languages. In the French agentive noun
metteur en scène ‘director, producer’, the derivational affix -eur is attached to the
verbal stem, while its scope is the whole sequence. Note that the agentive noun
°metteur ‘putter’ (from the verb mettre ‘to put’) is not attested, like °konande ‘doer’ in
Persian. Yet, it has not been suggested to derive metteur en scène from mettre en
scène. Such an analysis would imply that the affix -eur behaves like an infix and
interrupts a ‘word’. Therefore, the only option here is to derive metteur
morphologically first. A similar analysis has been outlined for producing sequences
such as pazira’i kon + -ande ‘entertainer’ by Müller (2010) within the HPSG
framework. A lexical rule applies to the stem of kardan ‘to do’ first and produces
konande ‘doer’. Since the lexical entry for kardan specifies that it must combine with a
preverbal element to form a CPr, konande inherits this information and combines in
turn with pazirā’i. (p. 261)
• Inseparability. Several studies affirm that the components of a CPr can only be
separated by a restricted set of items, which are either morphological material
(affixes) or grammaticalized lexical material, i.e. the future auxiliary, comparable to
inflectional affixes (Dabir-Moghaddam 1995; Goldberg 1996, 2003; Karimi-Doostan
1997). The insertion of ‘real’ syntactic items, these studies claim, is excluded and gives
rise to ungrammatical or odd examples, as in (58b).
(58)
Page 42 of 56

Other studies, however, question this claim and affirm that the members of a CPr can be
separated by syntactic or lexical items (Samiian 1983; Ghomeshi and Massam 1994;
Ghomeshi 1996; Samvelian 2001, 2012). The following examples are from Ghomeshi
(1996):
(59)
Samvelian (2012) also provides numerous attested examples where the prepositional
argument of the CPr interrupts the latter (Samvelian 2012: 58–60):
(60)
To conclude, none of the properties of Persian CPrs provide a conclusive argument in

favour of their analysis as ‘words’ or as being ‘morphologically’ formed. Persian CPrs
display all typical properties of syntactic combinations and parallel object–verb
combinations in Persian. However, although not words, Persian CPrs are clearly
multiword expressions and CPr formation has all the trappings of a lexeme formation
process. The lexical properties of CPrs result from their being lexemes or ‘phrasal
lexemes’ in the sense of Masini (2009). (p. 262)
9.4.2 The syntactic status of the nominal element
On the basis of a thorough comparison between the nominal element of the CPr and the
bare object of a lexical verb, Samvelian (2012) shows that the former are syntactically
comparable to bare objects in all respects. Like bare objects, the nominal element of a
CPr:
(i) is generally adjacent to the verb and tends to follow adverbials and prepositional
arguments;
(ii) rarely (if not never) appears in the postverbal position, (61);
Page 43 of 56

(iii) can be fronted (extracted) and receive a topical reading, (62);
(iv) can be promoted and become the subject of the passive construction, (63);
(v) can be coordinated with the nominal element of another CPr, as in (64).
(61)
(62)
(63)
(64)
Samvelian (2012) concludes that the nominal element of the CPr has exactly the same
syntactic status as a bare direct object.22 The differences between the latter and the
former are a matter of semantics and not of syntactic construal. While the noun in a CP is
more cohesive with the verb than a bare direct object (in terms of word order, differential
object marking, and pronominal affix placement), it is impossible to draw a categorical
syntactic distinction between the two types of combinations. (p. 263)
9.4.3 Compositionality, productivity, and idiomaticity
The compositionality of Persian CPrs has received a great deal of attention in recent
literature. Although Persian CPrs are idiomatic, they are also highly productive. Several
studies have suggested that compositionality is the key to this productivity and suggested
hypotheses on how the contribution of the verb and the preverbal element must be
combined to derive the meaning, or at least some of the semantic properties, of the CPr.
Page 44 of 56

Two main arguments have been invoked in favour of a compositional analysis of Persian
CPrs:
(i) The predictability of their argument and event structure.

(ii) The predictability of their lexical (referential) meaning.
In examples (65) and (66):
(i) The referential meaning of the CPr and the roles assigned to the arguments are
determined by the nominal element, since the semantic participants of the CPr sili
zadan ‘to slap’ (lit. ‘slap hit’) in (65b) are identical to those realized within the NP
headed by sili ‘slap’ in (65a).
(ii) The verb, on the other hand, determines the argument mapping, since the
substitution of zadan ‘to hit’ in (65b) for khordan ‘to collide’ in (65c) entails a change
in the mapping of the participants to grammatical functions.
(iii) The verb also seems to determine some of the aspectual properties of the CPr,
since the verb alternation in (66), dāshtan ‘to have’ vs. āvardan ‘to bring’, gives rise
to an aspectual contrast.
(65)
(66)
Various approaches have been developed to account for these facts. Most studies
(p. 264)
within the generative framework adopt a fully compositional view, in the sense that they
build on the assumption that the respective contributions of the components of a CPr are
consistent through all their combinations and can be defined a priori. Projectionist
approaches, for example Karimi-Doostan (1997), assumes that the information stored in
the lexical entries of the light verb and the non-verbal element combine to build a CPr,
Page 45 of 56

while constructionist approaches, for example, Megerdoomian (2001, 2002, 2012), Folli et
al. (2005), and Pantcheva (2010), consider the syntactic and the semantic properties of a
CPr to be derived from the syntactic construction in which the verb and the preverbal
element are inserted.
Alternative analyses have been developed in studies adopting a construction-based

approach in the sense of Fillmore et al. (1988), Goldberg (1995), and Kay and Fillmore
(1999). These studies account for the productivity of Persian CPrs either by adopting a
non-compositional view (Family 2006, 2009, 2014) or by developing a different view of
compositionality (Samvelian 2012; Samvelian and Faghiri 2013a,b, 2014).
Karimi-Doostan (1997) provides one of the first serious attempts to model the respective
contributions of the verb and the non-verbal element in CPr formation. Based on Butt’s
(1995) work on argument structure, Karimi-Doostan proposes an account in terms of
argument ‘fusion’ or ‘reformation’. Following Grimshaw and Mester (1988), he assumes
that the light verb (LV) does not assign theta-roles and therefore does not have an
argument structure. However, it displays aspectual properties and assigns an aspectual
role. Being thematically defective, the LV must combine with another element, namely the
preverbal element of the predicate, to develop into a syntactically and semantically
complete verb. This combination gives rises to two kinds of CPrs, either compositional or
non-compositional. The first kind results from the combination of the LV with a
predicative noun, that is, a noun displaying an argument structure, such as sili ‘slap’ in
(65). Non-compositional CPrs are formed when the LV combines with a ‘thematically
opaque’ noun, i.e. a noun that does not display an argument structure. Yax zadan ‘to
freeze’ (lit. ‘ice hit’), ghofl kardan ‘to lock’ (lit. ‘lock do’) and āb dādan ‘to water’ (lit.
‘water give’) are examples of non-compositional CPrs. CPr formation involves the fusion
of the information encoded in the respective lexical entries of the verb and the noun. For
more information about opaque versus transitive CPrs and their mental representations,
refer to Chapter 17.
LVs are divided into three categories with respect to their aspectual properties: Initiatory,
dādan ‘to give’; Transition, khordan ‘to collide’; and Stative, dāshtan ‘to have’. Some verbs
may belong to more than one category, kardan ‘to do’ for example, which is either
Initiatory or Transition, and thus have two lexical entries. The aspectual category of the
LV determines the aspectual type of the CPr. Initiatory verbs form CPrs with at least one
external argument, i.e. either unergative or transitive CPrs, and are compatible with
nouns having at least one external argument that refers to the initiator of the action
denoted by the CPr.
Transition verbs form CPrs with a single internal argument, i.e. unaccusative predicates,
and are compatible with nouns having at least one (internal) argument. The latter is
mapped into the subject function and receives the Patient role. A mapping rule ensures
the correct association between an LV and a preverbal element. For instance, a noun like
shekast ‘defeat’, which assigns Agent and Patient thematic roles, can combine with either
Page 46 of 56

an Initiatory verb or a Transition verb. In the first case, its external argument (i.e. the
Agent) is mapped into the subject function, (67a), while in the second case, it is the
internal argument that becomes the subject, (67b): (p. 265)
(67)
Since Karimi-Doostan (1997) postulates a categorical distinction between compositional

and non-compositional CPrs, the latter are not accounted for by his treatment, even
though he acknowledges that some of the regularities also hold for non-compositional
CPrs.
Megerdoomian (2001, 2012) and Folli et al. (2005) are representative examples of
constructionist approaches to Persian CPrs. Based on work by Hale and Keyser (1993,
2002) and Borer (1994), these studies claim that the syntactic and the semantic
properties of a CPr are derived from the syntactic construction in which the verb and the
preverbal element are inserted, and not from their respective lexical entries. A fully
compositional approach is thus maintained, but the burden shifts from the lexicon to the
syntax. According to Folli et al. (2005), the verb in the CPr realizes the v head in Hale and
Keyser’s approach, as illustrated in (68)–(70).
Persian CPrs are thus the non-incorporated counterpart of verbal constructions

suggested by Hale and Keyser.
Page 47 of 56

(68)
(69) (p. 266)
Page 48 of 56

(70)
In this approach, the thematic role of Agent/Cause is assigned by v to its external

argument (Kratzer 1996; Marantz 1997): kardan in (68) and (70) forms agentive
predicates, but shodan in (69) does not . In other words, the LV, being the lexical
realization of v, is responsible for the agentive properties of the CPr, while the non-verbal
element plays no role here. Megerdoomian (2001: 69) argues along the same lines: ‘the
choice of the light verb determines whether an external argument is projected’. This
claim is supported by the fact that changing the verb in a CPr entails a change in the
mapping of the arguments to grammatical functions, as illustrated in (67).
Following Bashiri (1981), the authors claim that the verb also determines some aspectual
properties of the CPr, namely its dynamic vs. stative and durative vs. punctual aspect.
This explains the aspectual contrast between (67a) and (67b). Like Agent selection,
aspectual/eventive properties of a given LV are assumed to be consistent through all its
combinations to form a CPr. For instance, dāshtan ‘to have’ is always stative.
The non-verbal element, on the other hand, is claimed to determine the Aktionsart
properties, i.e. telicity, and the referential meaning of the CPr. If the non-verbal element
of the CPr is a PP, a particle, an adjective, or an eventive noun, the CPr is telic; otherwise
—that is, if the non-verbal element is a non-eventive noun—the CPr is atelic.
Table 9.1, adapted from Folli et al. (2005), resumes and exemplifies the contribution of
each component in the makeup of the CPr.
Page 49 of 56

Other constructionist analyses have been developed by Megerdoomian (2001, 2002,
2012) and Pantcheva (2010). Notwithstanding their differences, these approaches all
build on the assumption that the respective contribution of the components participating
in CPr formation is consistent through all their combinations and can be defined a priori.
In a series of studies, Samvelian (2012) and Samvelian and Faghiri (2013a,b, 2014)
develop an alternative view of compositionality, which they qualify as a posteriori in the
sense of Nunberg et al. (1994) for idiomatically combining expressions, and outline a
construction-based analysis of Persian CPrs, building on the following observations:
(p. 267)
Table 9.1 Telicity in Persian Complex Predicates
Telic Complex Predicates
PP + LV: be donyā āmadan ‘to be born’ (lit. ‘world come’)
be ātash keshidan ‘to put on fire (lit. ‘fire pull’)
Particle + LV: kenār āmadan ‘to get along’ (lit. ‘side come’)
Adjective + LV: derāz keshidan ‘to lay down’ (lit. ‘long pull’)
Eventive noun + LV: shekast khordan ‘to be defeated’ (lit. ‘defeat collide’)
shekast dādan ‘to defeat’ (lit. ‘defeat give’)
Atelic Complex Predicates
Non-eventive noun + LV: dast khordan ‘to get touched’ (lit. ‘hand collide’)
kotak khordan ‘to get beaten’ (lit. ‘beating hit’)
dād zadan ‘to yell’ (lit. ‘scream hit’)
dast andākhtan ‘to mock’ (lit. ‘hand throw’)
(i) Although there are consistent regularities in the makeup of the syntactic and
semantic properties of CPrs, several examples show that the contribution of each
component cannot be determined a priori, but is determined in combination with the
other component of the CPr and the meaning of the construction as a whole.
Page 50 of 56

(ii) While the idiomatic properties of Persian CPrs have been generally
acknowledged, they have nevertheless been overlooked or minimized by studies that
adopt a fully compositional approach.
Samvelian (2012), on the basis of extensive data, shows that the same verb can give rise
to CPrs with different agentive and eventive properties. Likewise, the non-verbal
element’s contribution can vary through its combinations with different verbs.
For instance, the verb zadan ‘to hit’, generally considered agentive and eventive, can
nevertheless participate in the formation of ‘unaccusative’ (or passive-like) CPrs such as
yax zadan ‘to freeze’ (lit. ‘ice hit’) or zang zadan ‘to go rusty’ (lit. ‘rust hit’). The same
holds for gereftan ‘to take’ and kardan ‘to do’, which, apart from agentive CPrs, form
‘unaccusative’ CPrs also, ātash gereftan ‘to take fire’ (lit. ‘fire take’), ādat kardan ‘to get
used to’ (lit. ‘habit do’) and dard kardan ‘to ache’ (lit. ‘pain do’). The verbal contribution is
not consistent either with respect to the eventive properties of the CPr. Again, the same
verb can give rise to both stative and eventive (dynamic) CPrs. For instance, the verb
dāshtan ‘to have’ is not invariably stative and can produce eventive predicates, such as
ersāl dāshtan ‘to send’ (lit. ‘sending have’), taghdim dāshtan ‘to offer’ (lit. ‘offering have’),
and e’lām dāshtan ‘to announce’ (lit. ‘announcing have’).23
The contribution of the non-verbal element also turns out to be inconsistent. For instance,
adjectives and PPs can as well form atelic CPrs, lāzem dāshtan ‘to need’ (lit. ‘necessary
have’), penhān dāshtan ‘to keep hidden’ (lit. ‘hidden have’), and be maskhare gereftan ‘to
make fun (p. 268) of’ (lit. ‘to mockery take’). Inversely, non-eventive nouns can give rise
to telic CPs, pust andākhtan ‘to slough off’ (lit. ‘skin throw’).
The non-predictability of the meaning of the CPr is another significant impediment to

fully compositional approaches. In order for the latter to work, the meaning of the CPr
must be derivable on the basis of the meaning of its components. However, as mentioned
in several studies (Bonami and Samvelian 2010; Family 2006, 2009, 2014; Goldberg 1996;
Karimi-Doostan 1997; Samvelian 2012; Samvelian and Faghiri 2013a; among others),
numerous Persian CPrs are semantically opaque. Moreover, as shown by Samvelian
(2012) and Bonami and Samvelian (2010), even in cases where a CPr is semantically
transparent, it is hardly ever the case that its meaning is fully predictable from the
meaning of its component parts. In other words, the meaning of Persian CPrs, even the
transparent ones, is conventional in many cases and therefore has to be learned, in the
same way as one has to learn the meaning of the simplex verbs in English, for instance.
For a discussion on the processing of transparent and opaque CPrs in second language
speakers of Persian, refer to Shabani-Jadidi (2016).
Relying on these observations, Samvelian (2012) and Samvelian and Faghiri (2013a,b,
2014) claim that Persian CPrs, at least the lexicalized ones, must be stored, exactly as
lexemes are. They nevertheless argue that the need for an inventory is not incompatible
with a compositional approach, provided compositionality is defined a posteriori. With
respect to their compositionality, Persian CPrs are comparable to Idiomatically Combining
Expressions, that is, ‘idioms whose parts carry identifiable parts of their idiomatic
Page 51 of 56

meanings’ (Nunberg et al. 1994: 496). This means that the verb and the non-verbal
element of a CPr can be assigned a meaning in the context of their combination. Thus, the
CPr is compositional, in the sense that the meaning of the CPr can be distributed to its
components, and yet it is idiomatic, in the sense that the contribution of each member
cannot be determined out of the context of its combination with the other one. For
instance, zadan ‘to hit’ can receive various interpretations according to the noun with
which it combines: ‘to apply’ in rang zadan ‘to paint’; ‘to add, to incorporate’ in namak
zadan ‘to salt’; ‘to wear’ in māsk zadan ‘to wear a mask’; or ‘to emit’ in dād zadan ‘to
shout’. Given the meaning assigned to zadan and the meaning of the CPr as a whole, new
combinations can be produced and interpreted. For instance, tag zadan ‘to tag’ (lit. ‘tag
hit’), formed with the loanword tag, is created on the basis of barchasb zadan ‘to
label’ (lit. ‘label hit’), tambr zadan ‘to stamp’ (lit. ‘stamp hit’), etc.
This view of Persian CPrs is then developed into a Construction-based approach:
• Each CPr corresponds to a Construction.

• CPrs can be grouped into classes according to their semantic and syntactic
properties and each class can be represented by a partially fixed Construction.
• Constructions can be structured in networks, thus accounting for different semantic
and syntactic relations between CPrs, such as synonymy, hyperonymy/hyponymy, and
valency alternation.24
In this approach, the productivity of the Persian CPrs is accounted for via the analogical
extension of the existing classes. It can be compositionality-based or not. In the first case,
(p. 269) new combinations are created on the basis of the meaning assigned to the
Construction as a whole and to its components. However, productivity is not always

compositionality-based, and non-compositional Constructions (or classes) can also be
productive. The productivity of Persian CPrs is also related to other parameters such as
the coherence of the classes and their size.
9.4.4 Concluding remarks on complex predicates
Like the Ezafe construction and DOM, Persian CPrs have still a lot to reveal on the many
faces of predicate formation in languages of the world and key issues such as the
idiomaticity vs. compositionality or storage vs. online processing. For the discussion on
the online processing of CPrs, see Chapter 17 in this volume. Here again, there is a lot to
gain from resorting to empirical methods, which can provide a new insight to elucidate
some crucial theoretical issues discussed since the late 1990s concerning CPr formation
in Persian. The issue of the productivity of Persian CPrs, for instance, cannot be
adequately investigated without taking into account data from usage and without
resorting to quantitative methods comparable to those used in morphology (Baayen
1992). Likewise, the issue of whether Persian CPrs must be stored in the (mental) lexicon
cannot receive a valid answer without psycholinguistic investigation (Baayen 2007).
Page 52 of 56

Recent studies such as Shabani-Jadidi (2014) and Sadat Safavi et al. (2016) have opened
the way for other studies to come. For more information on complex predicates, see
Chapters 7, 8, 17, and 19 in this volume.
9.5 Conclusion
The main purpose of this article was to offer an overview of the issues raised by three
specific features of Persian syntax, namely the Ezafe construction, differential object
marking, and complex predicate formation, and the way various studies have tried to
account for these issues. Because of space limitations and the impressive number of
studies, it was impossible to get into the details and subtleties of all studies presented
through the article. Hopefully, nothing has been ‘lost in translation’ and the quoted
authors’ positions have been rendered faithfully.
Noting the enduring interest for these three phenomena in the literature since the early
1980s, one may (wrongly) assume that they have disclosed all their secrets and that
almost everything worth saying has already been said. Along with the presentation of the
huge amount of work already done, another aim of this article was to show that each of
three phenomena at stake still offers a challenging and promising area for empirical and
theoretical investigation, as illustrated by a number of recent studies adopting new
methodological approaches. (p. 270)
Notes:
(1) Glosses follow the Leipzig Glossing Rules (www.eva.mpg.de/lingua/resources/glossing-

rules).
(2) Note that the term ‘inverse’ or ‘reverse’ Ezafe is sometimes used to refer to the ending
—generally a vowel—occurring on pre-nominal adjectives in some Iranian languages with
Adjective–Noun order, Gilaki, Mazandarni, Balochi (Windfuhr 1979: 27–8).
(3) Haig (2011) argues contra Kent (1944) that the primary function of haya, hayā, taya
was in fact to introduce appositive phrases within the NP. He draws the attention to a
significant fact gone unnoticed in Kent (1944), namely the definiteness of the antecedent
noun (i.e. the head noun) in most haya constructions. This entails that the supposed
relative clause is not a restrictive relative clause (i.e. it does not contribute to the
identification of the referent), but an appositive relative. Consequently, what appears to
be a relative clause syntactically is functionally more of a loose appositive construction,
which can in fact be extended to all uses of haya, hayā, taya. Determining whether haya,
hayā, taya is an appositive ‘linker’ rather than a relative pronoun is beyond the scope of
this paper. Therefore, I will not take a stance on this issue here.
Page 53 of 56

(4) Estaji (2009) provides a set of arguments in favour of the analysis of ī as an
independent word in MP.
(5) Samiian (1994) adopts the syntactic feature system suggested by Chomsky (1970) and
subsequently developed by Jackendoff (1977), which classifies projecting lexical
categories according to primitive syntactic features such as [+/- V] or [+/- N]. Adjectives
and nouns are both [+N] while verbs and prepositions are [-N].
(6) Note that contrary to finite relative clauses, which exclude the Ezafe, reduced relative
clauses are always introduced by the Ezafe.
(7) More generally, within APs and RRCs, the prepositional complement can either
precede or follow the head, nazdik be khāne/be khāne nazdik ‘close to home’. The
difference between the two orderings is a matter of register. The second ordering, where
the complement precedes the head, is rather formal or literary.
(8) For the details of the formalization within Head-driven Phrase Structure Grammar
(HPSG) (Pollard and Sag 1994) and the way the feature [+Ez] (which indicates the
presence of the Ezafe) and the feature [+Dep] (which indicates that the head or the
intermediate projection are awaiting a dependent) are introduced and percolated through
the syntactic structure, see Samvelian (2007: 634–9).
(9) The term was coined by Bossong (1985), who provides a detailed account of the
phenomenon in a variety of languages.
(10) For a detailed discussion on =rāy in Middle Persian see Jügel (2015: 192–218, 340–2).
(11) For a detailed discussion of the notion and controversies on specificity, see Fodor and
Sag (1982); Hintikka (1986); Enç (1991); Farkas (1995, 2002).
(12) Karimi suggests a different view of rā-marking in a recent work (Karimi and Smith
2015). See also Karimi in this volume (Chapter 7).
(13) For a detailed discussion on this point, see also Dalrymple and Nikolaeva (2011).
(14) Studies of phrasal syntax within the generative paradigm generally assume two
distinct positions for objects, VP-internal and VP-external, and establish a correlation
between object position and object marking (Diesing 1992; van Geenhoven 1998; Ritter
and Rosen 2001; among many others): indefinite/non-specific objects are generally
assumed to be VP-internal, while marked objects are VP-external.
(15) In a more recent work, Karimi (2005) proposes a revised version of her Two Object
Position Hypothesis (TOPH), in which both objects are base-generated in the same
position, under the v′. The specific object shifts into the specifier of vP position in order to
receive its interpretation.
(16) Bonami and Samvelian (2015) also adopt a flat structure for Persian sentences.
Page 54 of 56

(17) The study is based on the Bijankhan corpus, a corpus collected from daily news and
common texts, in particular the newspaper Hamshahri, of about 2.6 million tokens,
manually tagged for part-of-speech information. The corpus was created in 2005 by the
DataBase Research Group at the University of Tehran and can be freely downloaded from
their website.
(18) Sadeghi (1993) gives the estimate of 252 verbs, 115 of which are commonly used.
Natel-Khanlari (1986) provides a list of 279 simplex verbs. The Bijankhan corpus contains
228 lemmas.
(19) This claim is questioned in several studies and will be discussed later.
(20) The term ‘lexicalist’ is ambiguous in the literature. Here, it must be understood as
‘morphologically formed’.
(21) Ghomeshi and Massam (1994) mention also this fact in favour of a syntactic analysis
of Persian CPrs.
(22) For a detailed description of bare nouns in Persian, see also Modarresi (2014).
(23) Note that the examples discussed in this section are by no means isolated. For
thorough examples illustrating the non-consistency of the verbal contribution to the
agentive and eventive properties of Persian CPrs, see Samvelian (2012: 114–30).
(24) See Samvelian (2012) for an application of this analysis to the CPrs formed with
zadan ‘to hit’. See also Müller (2010) for a partially comparable approach within the
HPSG framework.
Pollet Samvelian
Pollet Samvelian is Professor of Linguistics at Sorbonne Nouvelle University. She has

published several books and articles on the syntax and morphology of Western
Iranian languages, especially on complex predicates, bare objects, differential object
marking, word order, verbal periphrases, and clitics. Her recent publications include
La Grammaire des prédicats complexs. Les constructions nom-verbe (2012, Hermès-
Lavoisier) and Approaches to Complex Predicates (2015, co-ed. with L. Nash, Brill).
Page 55 of 56

Page 56 of 56

Morphology

Morphology
Behrooz Mahmoodi-Bakhtiari

This chapter is a description of Persian morphology, which intends to provide a general

sketch of the morphological features and processes found in Persian. Therefore, after a
general study of the Persian morphemes, nominal and verbal morphologies of Persian are
introduced, together with a description of the compounding process in both, as well as
other methods of word formation in Persian. In our general sketch of the Persian
Morphemes, lexical and functional morphemes are presented, and in the study of
functional morphemes, free and bound ones have been studied. In terms of Persian
nominal morphology, our study of the pronominal morphology preceeds the study of the
nouns, and in verbal morphology, the verb with its structure and functions has been
studied. In a separate part on compounding, this very common word-formation process of
Persian has been presented, and the minor word-formation types of Persian morphology
have also been considered.
Keywords: morphology, Persian, word formation, compounding, nominal and verbal structures
10.1 Introduction
IN this chapter, Persian morphology is described. Here, by ‘Persian’, I mean the written
form of the Persian of Iran, which is considered to be the most widespread variety of this
language, and is regarded as the standard variety of Persian. Colloquial Persian has some
specific morphological features which are not covered here, and our seldom references to
those features are just for the sake of some comparisons (see Chapters 2, 3, 4, 5, 6, 11,
and 15 for more on colloquial Persian).
Page 1 of 35

Morphology
(New) Persian is perhaps the best-known and the widest-studied Iranian language.
However, it does not represent the family of the Iranian languages in terms of
morphology, and has an atypical morphological system among the Iranian languages, in
the sense that it has almost completely lost the synthetic nominal and verbal inflection
(together with their inflectional classes), as well as the inflectional distinction of case,
number, and gender as well as of aspect, mood, tense, and voice; which were to be
inherited from its older ancestors.
Historically, Persian is a lineal descendant of Old Persian and Middle Persian. Old Persian
(like Sanskrit and Classical Greek as its contemporaries), was a typical inflected
language. This inflectional nature underwent a radical reduction by the late Middle
Persian period, to the extent that it had almost the analytical structure of the current
New Persian. As a matter of fact, among all the inflectional structures of the previous
stages of Persian, only the two categories of person and number in the form of three
persons in singular and plural are still surviving in pronouns and personal endings.
In this chapter, after a general study of the Persian morphemes, we will deal with nominal
and verbal morphology of Persian, together with describing the compounding process in
both of these, and the other methods of word formation in Persian.
10.2 Persian morphemes: a general sketch

Persian makes use of both its lexical and functional morphemes in its word formation
processes.
(p. 274) 10.2.1 Lexical morphemes
Lexical morphemes in Persian are either ‘free’ (like all the nominal roots, and many of the
simple adjectives and adverbs), which have a meaning of their own, such as divār ‘wall’,
farzand ‘child’, and mādar ‘mother’. However, bound lexical morphemes are also there in
Persian, which constitute the present stems of the verbs such as mi-rav-am (incomplete
aspect prefix/indicative mood marker-’go’ (root)-1sg. ‘I go’), which do not appear in the
Persian discourse, unless they are fully conjugated.
10.2.2 Functional morphemes
Persian has a diverse set of functional morphemes. These morphemes may be classified
as ‘free’ and ‘bound’ ones as follows.
10.2.2.1 Free functional morphemes
Page 2 of 35

Morphology
Among the free functional morphemes of Persian, two major groups can be identified:
adpositions and conjunctions.
1) Adpositions: prepositions
Persian principally uses its prepositions to express case relations. New Persian, in its
early stages, used to have circumpositions (be-daryā dar ‘in the sea’, be-khāne andar
‘inside the house’), which are no longer used.
New Persian has now seven primary prepositions:
(i) be ‘to’ (dative, and directional, and formerly locative and instrumental as well (see
Mashkur 1969: 162), which characteristically introduces the indirect object); as
ketāb rā be u dādam ‘I gave the book to him/her’. As a directional, it is used in both
material and figurative contexts: u be kānādā mohājerat kard ‘(S)he migrated to
Canada’, zahmat-hā-yam be hadar raft ‘my efforts were in vain’, pedar-ash be saratān
mobtalā shode ast ‘His/her father is afflicted with cancer’.
(ii) dar ‘in(to)’ as dar otāq ‘in the room’, dar chenin sharāyeti ‘in such
circumstances’;
(iii) az ‘from’, denoting the source of something, as khāne-ye mā az injā kheili dur ast
‘our house is so far from here’, or in beit az hāfez ast ‘This verse is by Hāfez’;
(iv) bā ‘with’ (in comitative, as bā dustam dars khāndam ‘I studied together with my
friend’, instrumental, as bā chāqu be u hamle kard ‘he attacked him with a knife’,
and concessive; as bā kamāl-e meil in rā mi-pazir-am ‘I accept this with all pleasure’);
(v) tā ‘to, until’ as bāyad tā shab montazer bemānim ‘We have to wait until night’, az
tehrān tā Tabriz cheqadr rāh ast? ‘How far is it from Tehran to Tabriz?’;
(vi) (be)joz ‘except’ as hame madrak gereftand, bejoz man ‘Everybody got its
certificate except me’;
(vii) barā(-ye) ‘for’ as in gol rā barāye to kharide-am ‘I have bought this flower for
you’.
There are also two other prepositions, chun ‘like’ and bar ‘(up)on, over’, which are not
commonly used, and appear chiefly in the literary and formal language.
(p. 275) 1a) The postposition rā
The only postpostion found in new Persian is rā, whose major function is marking specific
direct objects for accusative case (see Chapter 9 for more discussion on rā). Given the
fact that the proper nouns, nouns modified by demonstratives or personal pronominals,
nouns denoted by the enclitic -e or the context are all definite NPs, rā may follow them, in
case they act as direct objects: dust-i rā didam ‘I saw a friend’, dust-e mehraban-i rā
didam ‘I saw a kind friend’, dust-e khub-am rā didam ‘I saw my good friend’, dust-e khub-e
dorān-e dabirestān-am rā didam ‘I saw my good high school friend’.
Although rā is basically known for its direct object marking, the diachornic studies reveal
that it has not had the function from the beginning. It used to denote ‘reason’, and the
Page 3 of 35

Morphology
word cherā ‘why’ (lit. ‘what for’) in Modern Persian is a reminiscent of that usage. In
more recent literature, the primary function of rā has been suggested to be marking
specificity rather than accusative case.
It also serves as a marker for topicalization in spoken Persian (māshin ro dar-esh ro be-
band ‘the car, close its door’ from dar-e māshin ro be-band ‘close the door of the car’. It
also serves to denote the meaning of some prepositions such as ‘for’, in nāhār ro
esfāhānim ‘We will be in Esfahan for lunch’, or ‘to’ in gol-hā ro āb bede ‘(Give) water (to)
the flowers’ (see Dabir-Moghaddam 1992).
2) Conjunctions
Persian conjunctions may be classified morphologically into simple or complex ones.

Among the simple conjunctions, perhaps the most common is va ‘and’, which is used
between sentences, as well as phrases: mā va shomā kār mikonim va pul dar miyāvarim
‘We and you work, and earn money’. The adversative conjunction ammā (or its commonly
used Arabic word: vali), means ‘but, however’, as in hāl-am khosh nist, ammā be sar-e kār
mi-rav-am ‘I do not feel well, but I will go to work’. The two causal conjunctions zirā(-ke)
and chon(-ke) ‘because’ appear between two sentences or clauses as well. Another
conjunction with a similar function is yā ‘or’, which acts as a simple conjunction, when
connecting two interrogative sentences (otherwise it may be well seen as a complex
conjunction, when repeated): shomā dāneshju hastid yā ostād? ‘Are you a student, or a
professor’?
The conjunction agar ‘if’ is a marker of the embedded conditional clause, which is usually
placed before the two sentences, and acts almost identically with its English counterpart:
agar pul-e raftan dāshtam, hargez dar injā nemimāndam ‘If I had money to go away, I
would never stay here’. The two-morpheme conjunctions agar-che ‘even if, although’, har-
chand (ke) ‘however’, and chonān-che ‘in case’, are the other major ‘initial’ conjunctions
in Persian.
The NP-following conjunction ham ‘also, too, even’ connects two sentences as well,
although placed after the subject of the second sentence, rather that on the border of the
two sentences: shomā harekat konid, man ham be shomā molhaq mishavam ‘You set out,
and I will join you too’.
Among the simple conjuctions, tā and ke represent several functions, and denote different
meanings.
The polyvalent subordinator tā appears at the initial clause, and denotes time: tā dars-am
tamām na-shav-ad, az injā ne-mi-rav-am ‘I will not leave here, unless I finish (as long as I
have not finished) my education’, tā ma-rā did-pā be farār gozāsht ‘he ran away once/ as
soon as he saw me’. It may be placed between two clauses, and in that sense, it
introduces a subsequent clause, either to denote a time, as sabr kardim tā qatār-e ba’di
Page 4 of 35

Morphology
āmad ‘we waited (p. 276) until the next train came’; or to denote a purpose, as āmad-am
tā shakhsan ba shomā sohbat konam ‘I came to talk to you in person’.
The other multifunctional conjunction, ke ‘that’, is generally known to be a

complementizer, as in che khub shod ke shomā rā emruz didam ‘how nice that I met you
today’, goft ke nemitavānad biyāyad ‘he said that he couldn’t (lit. ‘can’t’) come’. However,
it also acts as a relative marker, as well as introducing the purpose clauses, as talāsh
kard-am ke movaffaq shav-am ‘I tried to be successful’; causal clauses: emruz az khāne
birun na-rav-id, ke havā be-sheddat ālude ast ‘Don’t go out of your houses today, for the
air is extremely polluted’; temporal clauses, to denote the interruption of an action: hanuz
harf-am tamām na-shod-e bud ke sili-ye mohkam-i be surat-am zad ‘My sentence was not
finished yet, when he slapped me hard on my face’. Ke also gets involved in combination
with other lexical items, such as pas az in-ke ‘after’, hamin-ke ‘as soon as’, az bas ke ‘so
much that’, vaqti ke ‘when’, with the following further combinations: az vaqti ke ‘since’,
and tā vaqti ke ‘until’, be mahz-e inke ‘as soon as’, and others.
On the other hand, the complex (or reciprocating) conjunctions ham … ham ‘both ... and’,
yā … yā … ‘either … or’, na … na … ‘neither … nor … ’, and che … che … ‘whether … or’
link NPs as well as sentences: ham behruz, ham bahrām (both Behrooz and Bahram), ham
kar mi-kard-am, ham dars mi-khānd-am ‘I both worked and educated’, yā u ra birun kon,
yā man rā talāq be-de ‘Either expel him, or divorce me’, na pul-at rā mi-khāh-am, na
hozur-at rā tahammol mi-kon-am ‘I neither need your money, nor stand your presence’,
che beravi, che bemāni, ozā’ taqyir na-khāh-ad kard ‘Whether you go or stay, the situation
will not change’. To the list of such symmetrical structures, we may add the negative
adverbial na tanha ‘not only’, together with the rhetorical adversative balke ‘but also’: na
tanhā zibāst, balke puldār ham hast ‘She is not only beautiful, but also rich’.
10.2.2.2 Bound functional morphemes

The major bound functional morphemes in Persian may be classified as affixes and clitics.
1) Affixes
Among the different types of affixes, Persian makes use of prefixes, suffixes, and to a very
lesser extent, interfixes. More than one suffix is possible in Persian words and each
suffixal form usually has just one meaning.
Infixation and Circumfixation are not among the morphological processes of Persian. In
many of the classical Persian grammars, and the interfixes have been wrongly introduced
as infixes (for example, see Vahidiān and Emrāni 2000; Kalbasi 2001).
Since Persian is quite rich in terms of its derivational affixes, in the parts to come, we will
concentrate mainly on this type of affixes, and will deal with the inflectional affixes within
our discussion of the nominal and verbal morphology.
1a) Prefixes
Page 5 of 35

Morphology
Persian prefixes act both as inflectional and derivational affixes. The derivational ones,
which, according to definition, change the parts of speech and generate new words,
mainly generate adjectives and related nouns. Some of the major prefixes of Persian are
as follows:
nā- ‘un’, which is added both to the nominal and verbal stems and generates
adjectives, such as nā-sepās ‘ungrateful’ (sepās ‘thanks’), and nā-tavān
‘disabled’ (tavān-est-an, ‘to be able’). This (p. 277) prefix, for sure, also acts as a
part-of-speech saviour as well, as in nā-zibā ‘unbeautiful, ugly’ (zibā ‘beautiful’)
and nā-be-khrad ‘unwise’.
ham- ‘same, co-’: ham-peimān ‘ally’ (peimān ‘treaty’), ham-kelās ‘classmate’.
bā- ‘with’: bā-adab ‘polite’ (adab ‘politeness’), bā-savād ‘literate’ (savād ‘literacy’).
Right after this, we may note bi- ‘without’: bi-hush ‘unconscious’ (hush ‘wisdom’),
bi-kas ‘alone’ (kas ‘person’).
be- ‘holding’: be-hanjār ‘standard’ (hanjār ‘discipline, order’), be-andām ‘well-

figured’ (andām ‘body’), be-sāmān ‘well-organized’ (sāmān ‘organization’), be-rāh
‘obedient’ (rāh ‘way’), be-hush ‘alert’ (hush ‘wisdom’). Although old, this prefix is
not much used, and is mainly seen in formal or literary discourse.
The two prefixes abar- ‘super, above’ and bish- ‘a lot/more’ are almost new with not much
productivity, and have been introduced by the Academy of Persian Language and
Literature, in order to be used in loan translations: abar-rasāna ‘superconductive’ (rasāna
‘conductive’), bish-fa’’āl ‘hyperactive’ (fa’’āl ‘active’).
1b) Suffixes
Although the productive suffixes are not so much in the New Persian, suffixation is a
principal method of nominal derivation in Persian.
(i) The Major Noun-making suffixes are: -i, which makes abstract nouns from the
adjectives or type nouns, such as khub-i ‘goodness’ (khub ‘good’), and dust-i
‘friendship’ (dust ‘friend’). It also makes place names such as baqqāl-i ‘grocery
shop’ (baqqāl ‘grocer’). It may refer to some actions such as vazne-bardār-i ‘weight
lifting’ and asid-pāshi ‘acid splashing (on somebody’s face)’. Some of the nouns made
by -i do not represent the stems without this suffix, such as parde-bardār-i
‘unveiling’ (curtain-remove-ing), and pārti-bāz-i (party-play-ing) ‘pulling strings’,
while *parde-bardār and *pārti-bāz as the probable agents of those actions do not
exist.
Another major noun-making suffix is -e, which turns numerals, adjectives, generic nouns,
and abstract nouns, and usually yields nouns with metaphorical or opaque semantic
relationship to the stem, such as gush-e ‘corner’ (gush ‘ear’), cheshm-e ‘spring’ (cheshm
‘eye’), pust-e ‘shell’ (pust ‘skin’), dast-e ‘handle’ (dast ‘hand’), panj-e ‘claw’ (panj ‘five’),
haft-e ‘week’ (haft ‘seven’). This suffix is also attached to verbal stems, while added to the
Page 6 of 35

Morphology
past stem, it produces past participle, such as shost-e ‘washed’ (shost-an ‘to wash’), bor-
id-e ‘cut’ (bor-id-an ‘to cut’), and sukht-e ‘burnt, consumed’ (sukht-an ‘to burn’). It may
also be added to the present stems to yield nouns, such as khand-e ‘laughter’ (khand-id-an
‘to laugh’), āmuz-e ‘instruction’ (āmukhtan ‘to teach’), sanj-e ‘criterion’ (sanj-id-an ‘to
measure’), and angiz-e ‘motivation’ (angikht-an ‘to encourage’).
The suffix -esh is always attached to the present roots of the verbs, and nominalizes them,
as in bakhsh-esh ‘forgiveness’ (bakhsh-id-an ‘to forgive’), bin-esh ‘(mental) view’, (from
the irregular verb did-an ‘to see’). On the other hand, the suffix -ār attaches to the past
verbal stems to make nouns: nevesht-ār ‘writing’ (nevesht-an ‘to write’), goft-ār
‘speech’ (goft-an ‘to say’).
Finally, the Arabic loan suffix -iy(y)at, makes abstract nouns either from concrete or
abstract nouns, such as jam’-iyyat ‘population’ (jam’ ‘group’), zedd-iyat ‘opposition’ (zed
‘opposed’).
(ii) Diminutive suffixes are -ak and -che. Although -ak mainly functions as a
diminutive suffix (as in pesar-ak ‘little boy’), it sometimes produces words not
necessarily of this (p. 278) type, such as sag-ak ‘catch’ (sag ‘dog’), tefl-ak ‘poor
fellow’ (tefl ‘baby’), and cheshm-ak ‘wink’ (cheshm ‘eye’). However, -che seems to
have only the diminutive function, with a much less degree of productivity that -ak, in
words such as ketāb-che ‘booklet’ (ketāb ‘book’) and bāq-che ‘house garden’ (bāq
‘garden’).
(iii) Agentive suffixes are -gar, -kār, -mand, -var and -ande. -gar and -kār attach to
nouns and verb stems, such as kār-gar ‘worker’ (kār ‘work) and jush-kār
‘melder’ (jush ‘meld’), while -mand and -var only attach to the nouns, and -ande is
only attached to the (present) verb stems, like honar-mand ‘artist’ (honar ‘art’),
kherad-mand ‘wise’ (kherad ‘wisdom’), dān-esh-var ‘knowledgeable’ (dān-esh
‘knowledge’), dav-ande ‘runner’ (dav-id-an ‘to run’), and bāz-ande ‘loser’ (bākht-an ‘to
lose’). The suffix -bān also refers to agents or occupations, such as mehr-bān
‘kind’ (mehr ‘kindness’) and pās-bān ‘police agent’ (pās ‘watch, care’), along with
bāq-bān ‘gardener’ (bāq ‘garden) and dar-bān ‘janitor’ (dar ‘door’). To this list we may
add two loan suffixes from Turkish: -chi and -bāshi (the former very productive),
which form occupation and profession nouns, such as shekār-chi ‘hunter’ and āsh-
paz-bāshi ‘chef’.
(iv) Suffixes denoting places are -estān, -gāh, -kade, and the less productive -zār, as
farhang-estān ‘academy’ (farhang ‘culture’), dānesh-gāh ‘university’ (dān-esh
‘knowledge’), pazhuh-esh-kade ‘research centre’ (pazhuh-esh ‘research’), and gol-zār
‘flower garden’ (gol ‘flower’). It should be noted that the suffix -estān also denotes
the names of territories and countries on the basis of the inhabitants: arman-estān
‘Armenia’ (arman ‘Armenian’), torkaman-estān ‘Turkmenistan’ (torkaman ‘Turkman’).
Finally, the suffix denoting the container of something: -dān, as in gol-dān ‘vase’ (gol
‘flower’), and qan(d)-dān ‘sugar bowl’ (qand ‘cubic sugar’). For a detailed study of the
suffix -estān, see Paraskiewicz (2008).
Page 7 of 35

Morphology
(v) Adjective forming suffixes are various in Persian. Perhaps one of the most
commonly used ones is -i, which markes attribution, such as ālmān-i ‘German’ (ālmān
‘Germany’), taryāk-i ‘opium smoker’ (taryāk ‘opium’), khārej-i ‘foreign,
foreigner’ (khārej ‘outside’), qahve-’i ‘brown’ (qahve ‘coffee’), and pārche-’i ‘made of
cloth’ (pārche ‘cloth’). For the final purpose (denoting the material of which
something is made), there is also the affix -in, as in āhan-in ‘made of iron’, and pulād-
in ‘made of steel’ which is not very common, and is mostly used in formal discourse.
The other one, -e, is added to numerical noun phrases, such as se-pāy-e ‘tripod’ (pā ‘leg,
foot’), chahār kār-e ‘for four functions’ (kār ‘work’), chand-manzur-e
‘multifunctional’ (manzur ‘purpose’), se-sāl-e ‘three-year-long, three years old’. It also
denotes holding, as in diplom-e ‘diploma holder’.
-āne is another attributive suffix, which yields adjectives such as bachche-g-āne ‘childish’.
When attached to the adjectives for animates, it produces adjectives with the same
meaning for the inanimates, e.g. deed, speech, and the like, such as āqel-āne ‘wise (deed,
work)’ and ahmaq-āne ‘silly’. This suffix also makes adverbs such as forutan-āne ‘humbly’,
mo’addab-āne ‘politely’. The two other adjective making affixes with limited productivity
are -nāk and -gin, as in nam-nāk ‘wet’ (nam ‘moisture’), and sharm-gin ‘sorry’ (sharm
‘sorrow, regret’). More examples and functions of all these suffixes may be read in
Sādeghi (1991b) and Kashāni (1992).
1c) Interfixes
Interfixing in Persian is rare and not much productive. Long misunderstood for infixes,
the major Persian interfixes -ā- and -vā- are bound morphemes attaching two indentical
(p. 279) lexical items, in order to semantically express the importance or volume of the
base word. This morphemes are beleived to be the allomorphs of tā ‘to’ or bā ‘with’, which
provide both transparent structures such as rang-ā-rang ‘colourful’ (rang ‘colour’), sar-ā-
sar ‘all over’ (sar ‘head’), jur-vā-jur ‘various’ (jur ‘kind’), and lab-ā-lab ‘totally full’ (lab
‘lip’), and sometimes less transparent ones such as kesh-ā-kesh and garm-ā-garm
‘midst’ (kesh ‘pull’, garm ‘warm’), dush-ā-dush ‘together’ (dush ‘shoulder’), tang-ā-tang
‘close, intimate’ (tang ‘tight’).
2) Clitics
The study of Persian clitics is rather new in the Iranian grammar tradition. Until recently,
many of the endings now known to be clitics were treated as affixes. But now, in the light
of the new linguistic findings, we know that not only Persian, but also most of the
contemporary Western Iranian languages make use of enclitic pronouns, to mark objects
for their verbs, or possessors for their nouns. This will be dealt with in detail in section
10.3, under our study of the nominal morphology of Persian.
Persian does not use proclitics. The major clitics of Persian comprise the nominal endings
denoting possession: ketāb=am (book=1sg. poss.), and objects: zad-am=ash ‘I hit
him’ (past of HIT-1sg.=3sg.).
Page 8 of 35

Morphology
The other widely used clitic is the indefinite marker -i, but perhaps the most important of
all, is the Ezafe marker -e. Ezafe construction is one of the specific syntactic properties of
Persian, in which a noun phrase consisting of the head (an element such as noun or
adjective), is connected to its modifier(s) by -e, such as ketāb-e simin ‘Simin’s book’ and
ketāb-e mofid-e simin ‘Simin’s useful book’ (for more information and examples, see Perry
and Sadeghi 1999, as well as Chapters 6, 7, 8, 9, and 19 in this volume).
The other clitics used in Persian mainly reveal themselves as the contracted forms of the
free functional morphemes in the colloquial Persian, thus not within our major concern.
Just to introduce them, they are -ā (as the contracted form of the plural suffix -hā), -am for
ham ‘also’, as well as -o which is an allomorph both for rā (the direct and specific object
marker) and the conjunction va ‘and’ (see Shaghāghi 1995).
10.3 Persian nominal morphology
10.3.1 Pronominal morphology
As said earlier, New Persian has almost completely lost the synthetic nominal and verbal
inflection of its ancestors, therefore, the inflectional distinction of case, number, and
gender (together with aspect, mood, tense, and voice) do not exist in it any longer.
However, the three persons and the two numbers (in singular and plural) are still
distinguished in pronouns and personal endings. The personal pronouns in Persian are
the independent and enclitic ones, as shown in Table 10.1.
As can be seen, no systematic distinction in terms of gender is there in the third-person

pronouns, and u (with the plural ān-hā) may denote both the masculine and the feminine.
Also, the objective and possessive enclitics -ash and -eshān refer to humans (and non-
humans) of both sexes. Independent pronouns are not pluralized, but the colloquial
Persian allows the addition of the plural suffix to the first- and second-person plurals for
emphatic purposes, such as mā-hā ‘we all’ and shomā-hā ‘you all’. On the other hand, in
the polite form (p. 280) of Persian, second and third singular persons are addressed with
the plural pronouns shomā and ishān respectively. The latter, of course, is now restricted
to the polite form of discourse, and is not used to refer to the third-person plural any
longer. Also it is possible to use the first-person plural for the singular, to denote modesty
on the part of the speaker, as in mā shagerd-e shomā hast-im, which may mean both ‘we
are all your students’, or humbly ‘I am your student’.
Page 9 of 35

Morphology
Table 10.1 Persian pronouns
1sg. 2sg. 3sg. 1pl. 2pl. 3pl.
Pronouns
man to u Mā shomā ānhā

Independent
Enclitic =am =at =ash =emān =etān =eshān
Page 10 of 35

Morphology
Syntactically, the behaviour of independent pronouns is identical to that of nouns: they

can be the object of transitive verbs, they may appear as the second constituent of an
Ezafe construction, as well as acting as the modifier of a prepositional phrase, as in man u
rā mi-shenās-am ‘I know him’, in ketāb-e shomā ast ‘this is your book’, and u be man
negāh kard ‘he looked at me’. They can also be placed as the NP heads, as in mā khāhar
va barādar-hā ‘we, (as) sisters and brothers’, shomā zabānshenās-hā ‘you linguists’. They
(except both the third persons) also accept adjectivs, and the first and second singulars
receive it with an Ezafe marker, as man-e bad-bakht ‘miserable me’, and to-ye bi-hayā ‘you
shameless’. For the first and second plurals, the adjective is also pluralized: mā bi-gonāh-
ān ‘we, the innocent (people)’, shomā vahshi-hā ‘you savages’.
The universal pronoun ‘one’ in Persian is ādam ‘human’, as in ādam haz mikonad ‘one is
overwhelmed’, ādam shākh dar-mi-āvar-ad ‘one grows horns (out of surprise)’. The
indefinite form of such a pronoun would be yeki ‘one’, as in yeki dar rā bāz kon-ad
‘somebody opens the door!’.
In order to denote possession with the independent pronouns, the word māl ‘belonging’
behaves as the head of the ezafe construction, with the pronoun as the modifier: in melk
māle- man ast ‘This property in mine’. māl-e shomā ān yeki ast ‘yours is that one’.
10.3.2 Pronominal enclitics
Pronominal enclitics (further discussed in Chapters 3 and 8) have four major functions in
Persian. They either act as a possessive marker, when attached to an NP: ketāb=etān
besyār jāleb bud (~ ketāb-e shomā besyār jāleb bud) ‘your book was so interesting’ (see
Table 10.2). They may also be attached to prepositions, and act as the object of
preposition: ānhā barāy=etān pul ferest-ād-and (~barāye shomā) ‘they sent money for
you’; although this mostly happens in the colloquial form): az=at khāhesh-i dār-am (~az
to) ‘I have a request from you’, be-h=emun mahal na-zāsht (~be mā) ‘he took no notice of
us’.
As the third function, they attach to the transitive verbs (and in the case of the compound
verbs, they are attached to the preverbal element of the compound) and assume the
function (p. 281) of the direct object, equivalent to an independent pronoun together with
the direct object marker rā: mi-shenās-am=ash (~ u rā mi-shenās-am) ‘I know him’, be-
zan-id=eshān (~ānhā rā be-zan-id) ‘hit them!’.
Table 10.2 Basic possessive paradigms
Independent Suffixed (non-topical)
1s pedar-e man pedar=am ‘my father’
Page 11 of 35

Morphology
2s pedar-e to pedar=at ‘your (sg.) father’
3s pedar-e u pedar=ash ‘his/ her father’
1p pedar-e mā pedar=emān ‘our father’
2p pedar-e shomā pedar=etān ‘your (pl.) father’
3p pedar-e ānhā pedar=eshān ‘their father’
Finally, there is a small group of compound verbs in which the object marker is enclitic
but as the overt subject does not induce agreement on these verbs, they act as the
subject of the verb, such as sard-am ast ‘I feel cold’ (it is cold for me), hers-am gereft ‘I
got furious’ (fury got me), and si sāl-am shod ‘I turned thirty’ (lit. ‘thirty year of me
became’). These constructions are relatively few in number, and mostly account for a
physical or mental experience (see also Chapters 3 and 8). For a detailed study of these
constructions, see Sedighi (2009 and 2011) where they are introduced as Psychological
Verbs.
10.3.3 Other types of pronoun
10.3.3.1 Reflexive pronouns

In terms of reflexive pronouns, classical Persian used to have three types, the last two
very highly literal. The pronouns khod, khish, and khishtan, all meaning ‘(one)self’, are
restricted to formal and written discourse, and apply to all persons. They get their
meaning from the context, and mainly from the personal endings of the verbs, as in
khāne-y-e khod rā forukht-am ‘I sold my house’ in comparison with khāne-y-e khod rā
forukht-and ‘They sold their house’.
In today’s Persian, only khod is being used, together with the personal enclitics in the
form of khod=am ‘myself’, khod=at ‘yourself’, khod=ash ‘him/herself’ and the like.
This pronoun has the classical function of the reflexives, i.e. denoting the identicality of
the subject and the direct object, as in u khod-ash rā kosht ‘he killed himself’, and mā
hargez khod-emān ra ne-mi-bakhsh-im ‘we never forgive ourselves’. It also acts as an
emphatic adjunct before a pronoun or a noun, in an Ezafe construction: khod-e shomā
‘you yourselves’, khod-e rezā ‘Reza himself, Reza personally’. These two cases may also be
said without an Ezafe construction in the form of shomā khod-etān and rezā khod-ash:
shomā khod-etān goft-id ke rezā khod-ash mikhāh-ad bā man sohbat konad ‘You yourself
told me that Reza wanted to personally talk to me’. After the prepositions, these pronouns
again have an emphatic role: cherā az man mi-pors-id? az khod-ash be-pors-id. ‘Why do
Page 12 of 35

Morphology
you ask me? Ask he (p. 282) himself’, in mored be khod-at marbut ast ‘this case (solely)
concerns you’, or in moshkel-e khod-at ast ‘this is your problem’.
10.3.3.2 Demonstrative pronouns

Demonstrative pronouns of Persian are two: in ‘this’, and ān ‘that’, which may both be
pluralized as in-hā and ān-hā, as in-hā dāneshju-hāye man hast-and ‘These are my
students’. They are also used as adjectives, such as in lebās ‘this dress’ and ān afrād
‘those people’. In colloquial Persian, they are also used with the word yeki ‘one’, to
denote a specific item, as in yeki ‘this one’ and un yeki ‘that one’. As adjectives, the
demonstratives may not be pluralized, even if they refer to a number of objects: in ketāb-
hā ‘these books’, ān zan-ān ‘those women’.
Demonstratives may appear in several emphatic phrases, such as hamin and hamān, like
hamān ketāb ‘(exactly) that book’, and hamin al’ān ‘right now’. They also attach to words
such as chon ‘like’, as chon-in and chon-ān, with their emphatic forms in-chonin and ān-
chon-ān. The words hamchonin ‘also’ and hamchonān ‘still, as before’ are the other
productions of such combinations: hamchonin az shomā tashakkor mi-kon-im ‘we also
thank you’, bārān mi-āmad, vali mardom hamchonān dar khiyābān bud-and ‘It was
raining, but people were still on the street’.
10.3.3.3 Interrogative pronouns

Interrogative pronouns are ke ‘who’ and che ‘what’ (ki and chi in colloquial form). They
are not pluralized (although they are, in the colloquial). Like the demonstrative pronouns,
interrogative pronoun che may also be used in phrases such as che kas-i ‘who, what
person’, che chiz-i ‘what, what object’, or che mard-i ‘what man’. The pronoun ke does not
have this characteristic.
10.3.3.4 Indefinite pronouns

Indefinite pronouns include hame ‘all’, ba’zi and barkhi ‘some’, and the extinct hich ‘none,
nothing’. The traditional Persian grammar refers to these as zamāyer-e mobham ‘the
vague pronouns’, as the exact referents of them are not clear (for example, see Natel-
Khanlari 1984: 201).
The word hame may either be used alone, or as the head of an Ezafe construction, as in
hame raftand ‘everybody went’, and hame-ye dāneshjuyān hāzer-and ‘all the students are
present’, hame-ye mā shomā rā dust dār-im ‘we all like you’. The combinations hame-kar
‘every work’, hame-jā ‘everywhere’, hame-jur ‘any kind’, hame-raqam ‘any type’ and the
like, are among the many compounds made with this pronoun. Although it basically refers
to something plural, it may undergo a double pluralization in the form of hame-gān ‘all,
everyone’. The word hame-gān-i ‘public, general’ is the famous derivation based on it.
The pronouns ba’zi and barkhi ‘some’ have a relatively similar function to hame. They may
be used alone, as barkhi mo’taqed-and ke … ‘Some believe that … ’; or may be used as the
head of an NP, without the Ezafe marker (or sometimes with the preposition az ‘of, from’),
Page 13 of 35

Morphology
as in ba’zi (az) afrād mi-guyand … ‘some (of) people say … ’. The pronoun ba’zi can be
pluralized in the form of ba’zi-hā, as ba’zi hā masā’el rā kheili sāde mi-engār-and ‘some
people take the issues very easily’.
The pronoun hich is no longer used on its own, but its compounds are widely
(p. 283)
used, such as hich-chiz ‘nothing’, hich-kas ‘no one’, hich-jā/hich-kojā ‘nowhere’.
The word folān (now used as folāni independently) ‘so and so’ also refers to some known
person for the hearer, not necessarily for everyone: bo-ro be-gu folān-i salām res-ānd ‘Go
and say so-and-so said hi’. This word may also make compounds such as folān-kas ‘such-
and such a person’, folān-jā ‘such-and-such a place’ and folān-chiz ‘such-and-such a
thing’ (see Mahmoodi-Bakhtiari 2006).
10.3.3.5 Reciprocal pronouns

Reciprocal pronouns are hamdigar and yekdigar ‘each other’, used as direct objects or the
objects of preposition: mā hamdigar rā mi-shenās-im ‘we know each other’, mā be
hamdigar komak mi-konim ‘we help each other’. In the prepositional phrases of colloquial
Persian, hamdigar can be reduced to ham, as in unā az ham motenaffer-an ‘They hate
each other’.
10.3.3.6 Relative pronouns

The complementizer ke ‘that’ is the actual relative marker in Persian, and Persian does
not assume a specific relative pronoun. Relative clauses are shaped with this pronoun,
which is located after the head of the clause, marked by the specific enclitic =i, as
pesar=i ke man bozorg=ash kardam ‘The boy whom I raised’.
10.3.4 The noun
Persian nouns are not marked in terms of case and gender. Different sexes may either be
distinguished lexically (quch ‘male sheep’, mish ‘ewe’), or by using a qualifier: khuk-e nar,
khuk-e māde (male/ female pig).
Arabic represents its influence in terms of pluralization in two ways: First, through its
two plural markers used in addition to the Persian ones: -in and -āt, like motarjem-in
‘translators’ and il-āt ‘tribes’; and second, through the presence of its ‘broken plurals’
such as asbāb ‘tools’ (sg. sabab ‘reason’), and amāken ‘places’ (sg. makān ‘place’). This
method of pluralization has even been applied to some native Persian words by analogy,
such as khavānin ‘local landlords’ (sg. khān), and asātid ‘professors, masters’ (sg. ostād).
Persian does not have specific definite articles, and expresses definiteness mainly by
syntactic means. Non-specific and definite nouns are the stem words carrying the stress
on their final syllable, such as ketầb (a book, books (as generic noun), or the book, the
book in question): ketầb behtarin dust ast ‘books are the best friends’, and ketầb rā barāy-
at khar-id-am ‘I bought the book (in question) for you’.
Page 14 of 35

Morphology
On the other hand, both the indefinite and specific nouns are marked with =i: ketầb=i
khar-id-am/ ketāb-hầ=i khar-id-am ‘I bought a book/ I bought some books’, in comparison
with ketầb=i rā ke goft-i kharidam/ ketāb-hầ=i rā ke goft-i khar-id-am ‘I bought the
(certain) book you mentioned/ I bought the (certain) books you mentioned’. (p. 284)
10.3.5 Numbers
Cardinal numbers
Persian numbers are historically natives of its own, except the numbers sefr ‘zero’, milyun
‘million’ and milyārd ‘billion’. Large numbers are counted from the highest to the lowest,
each connected together with the conjunction -o-. Cardinal numbers in Persian are
specific from one to twenty, but get very regular afterwards. Table 10.3 provides a list of
the Persian cardinal numbers:
Enumerated items come after the numbers, and are not pluralized: haft qalam ‘seven
pens’. Numeratives or classifiers may also be used when discrete items are counted, but
not necessarily. The most commonly used is tā’fold’, as in do ta bachche ‘two babies’. For
humans, however, in the formal and literary language, the words nafar or tan ‘body’ are
used: se tan shahid ‘three martyrs’. Distributive nouns are also expressed by juxtaposing
the cardinals together with tā: se tā se tā ‘three by three’, and the only exception is for
‘one’: yeki yeki ‘one after another, one by one’.
Numbers such as ten, one hundred, 1,000 and the like may be pluralized when they are
supposed to amplify the number: sad-hā nafar az dāneshjuyān ‘hundreds of students’,
milyun-hā tumān hazine ‘millions of tomans of expenses’ (‘toman’ = Iranian currency).
The adjectival suffix -gāne added to the numerals, produces words referring to the
numbers of the items included in the noun: dāstān-hāye se-gāne ‘the three-phase stories,
the trilogy’. They are also seen in many numerical compounds such as do-charkh-e
‘bicycle’, and hezār-pā ‘centipede’.
10.3.5.1 Ordinal and fractional numbers

Ordinal numbers are formed by suffixing -om or -omin to the cardinal numbers. The
exceptions are the Arabic words avval ‘first’ (used alone, or even more than the Persian
equivalent (p. 285) yek-om), ākhar ‘last’ and the numbers do-v(v)om ‘second’ and sev(v)om
‘third’ with a -v- inserted as a hiatus. Such numbers/adjectives follow the numerals in an
Ezafe construction, in case they are formed with -om, and precede them without Ezafe, in
case of having -omin as the suffix: ruz-e panjom-e hafte vs. panj-omin ruz-e hafte ‘the fifth
day of the week’. Ordinals containing -omin are mostly used in formal language. An
ordinal may be specified by suffixing -i to it, as avval-i ‘the first one’, panjom-i ‘the fifth
one’, ākhar-i ‘the last one’.
Page 15 of 35

Morphology
Table 10.3 Cardinal numbers
1 yek 11 yāzdah 21 bist-o-yek
2 do 12 davāzdah 22 bist-o-do
3 se 13 sizdah 30 Si
4 chahār 14 chahārdah 40 chehel
5 panj 15 pānzdah 50 panjāh
6 shesh 16 shānzdah 60 shast
7 haft 17 hefdah 70 haftād
8 hasht 18 hejdah 80 hashtād
9 noh 19 nuzdah 90 navad
10 dah 20 bist 100 sad
101 sad-o yek; 200 devist, 300 si-sad, 400 chahar-sad, 500 pānsad, 600 shesh-sad, etc.; 1,000 hezār 1,957 hezar-o nohsad-o panjāh-o
haft.
Page 16 of 35

Morphology
An ordinal pronoun may be formed by suffixing stressed -i to the ordinal in -om: avvali ‘the
first one’, dovvomi, ākhari, etc.
Fractional numbers are expressed by the cardinal numbers as the numerators, and the
ordinals as denominators; such as yek panj-om ‘one fifth’. The Arabic word nesf is used for
‘half’ in many circumstances, except for telling the time, in which the Persian word nim is
always used: sā’at se-o-nim shod ‘It became 3.30’. The Arabic word rob’ ‘one quarter’ is
usually used in telling the time, as in panj-o rob’ ‘a quarter past five’, or yek rob’ be shesh
‘a quarter to six’.
10.3.6 Adjectives
Adjectives are used as attributes and predicates, and are not pluralized when in noun
phrase: dokhtar-e zibā ‘lovely girl’, dokhtar-ān-e zibā ‘lovely girls’. However, when
substantivized, they may be pluralized, such as az ān gerd-hā mi-khāh-am ‘I want (some
of) those round ones’.
Comparison between adjectives is obtained by suffixing -tar and -tarin, to provide

comparative and superlative adjectives: zibā> zibā-tar > zibā-tarin ‘beautiful, more
beautiful, the most beautiful’. The only exceptions are beh-tar ‘better’ for khub ‘good’, and
bish-tar ‘more’ for ziyād ‘much’; although khub-tar and ziyād-tar are not ‘wrong’.
Diachronically, beh and bish meant ‘better’ and ‘more’ by themselves, and have received a
comparative suffix during the course of time.
Attributive adjectives normally follow their heads, in an Ezafe construction. However,

sometimes the word order undergoes changes, and it turns out to be the order of
adjective–noun, such as the case of the numbers as adjectives, like haft ketāb ‘seven
books’, ordinal numbers with the suffix -omin, such as panj-omin ketāb (vs. ketāb-e panj-
om) ‘the fifth book’, and the superlative adjectives, such as zibā-tarin dokhtar ‘the most
beautiful girl’. To this list we may add the phrases denoting the sympathy on the part of
the speaker, such as mazlum hosein-am ‘my suppressed Hussain’, teflak dokhtar-ash ‘poor
his daughter’, and colloquial sentences denoting the speaker’s emphasis on the adjective,
with an indefinite noun such as bad zan-i gereft-e ‘(such a) bad woman he has married’,
khub jā-yi estekhdām shod-i ‘(what a) nice place you have been employed in’. Quantifiers
such as kheili and besyār ‘very’ precede the adjectives: pesar-e khub, pesar-e kheili khub
‘nice boy, very nice boy’. However, in order to emphasize the adjective, it is possible to
displace the position of the quantifier, such as u pesar-e kheili khub-i-st/ u kheili pesar-e
khub-i-st ‘he is a very nice boy, (such a) very nice boy he is’.
Page 17 of 35

Morphology
10.3.7 Adverbs
Persian adverbs are either morphologically identical with nouns and adjectives, or
derived from them. Adverbs of time and place do not have a specific morphology of their
own, and (p. 286) are expressed with the original nouns: fardā mi-āyad ‘He will come
tomorrow’, raft bālā ‘he went upstairs’, aslahe=at ra pāyin biyandāz ‘drop (down) your
gun’, bo-ro ‘aqab ‘go backwards’. Some adverbs of time are formed with prefixing the
frozen prefixes di- ‘last (day)’, pari- ‘the day before last’, pār- ‘last’, pirār ‘one before last’
and pas ‘one after’ in the words diruz ‘yesterday’, dishab ‘last night’, pariruz ‘two days
ago’, parishab ‘two nights ago’, pārsāl ‘last year’, pirārsāl ‘two years ago’ and pasfardā
‘the day after tomorrow’.
Adverbs derived or originated from adjectives are mostly of manner. For example, the
adjectives khub ‘good’ and tanhā may also act as adverbs in u khub mi-dav-ad ‘he runs
well’ and man tanhā kār mi-kon-am ‘I work alone’. The suffix -āne, which is basically used
in making adjectives, also acts as adverb marker, as in ma-rā āsheq-āne dar āqush gereft
‘he hugged me very tenderly’ (lit. ‘lover-like’). The other suffixes -ān (after verbal stems)
and -aki (after adjectives) are also used in formal and colloquial Persian respectively, as in
ānjā rā khand-ān tark kard ‘he left there smiling’, and yavāsh-aki farār kard ‘he fled
quietly’. Adding the Arabic tanvin to some nouns and adjectives may also yield adverbs,
such as rasm-an ‘officially’, shakhs-an ‘personally’, jam’-an ‘all together’, and movaqqat-an
‘temprarily’. Prepositional phrases also act as adverbs, such as be sor’at az ānjā raft ‘he
left there quickly’ (lit. ‘with quickness’), or be khub-i kār rā be pāyān res-ānd ‘he finished
the work well’ (lit. ‘with goodness’).
10.4 Persian verbal morphology
10.4.1 The verb
Persian has a very regular verbal morphology. Verbs have two stems each, which are
conjugated in terms of three persons and two numbers. In terms of their structures,
Persian verbs are either simple, prepositional, or compound. The stems are either present
or past, and there is no stem for the future, as it basically has a periphrastic structure.
The past stem is an infinitive without the final -an, therefore most of the past stems end
up either in -t or -d, as in bord-an ‘to take’, neshast-an ‘to sit’, and āmad-an ‘to come’.
Deriving the present stem from the infinitive, however, is not as regular as that of the
past tense. However, some consistencies may be seen: all the past stems ending with -id
may provide the present stem in case that -id is deleted, such as bor-id-an ‘to cut’, fahm-
id-an ‘to understand’ and bakhsh-id-an ‘to forgive’; in which borid-, fahmid- and bakhshid-
Page 18 of 35

Morphology
are the past stems, and bor-, fahm- and bakhsh- are the present stems. As the most
productive and innovative of its kind, the addition of -id to many nouns has yielded so
many denominated (or ja’li ‘forged’) verbs such as jang-idan ‘to fight’ (jang ‘war’), raqs-
idan ‘to dance’ (raqs ‘dance’), and bus-idan ‘to kiss’ (bus ‘kiss’).
Apart from -id, the other ‘past stem morphemes’ attached to the present stem are -ād and
-d, as in oft-ād-an ‘to fall’ (pr. stem oft-, ps. stem oftād-) and nah-ād-an ‘to put’ (pr. stem
nah-, ps. stem nahād-). Examples for -d are setān-d-an-/setān- ‘to get’, kan-d-an/ kan- ‘to
dig’, khor-d-an/ khor- ‘to eat’, parvar-d-an/ parvar- ‘to raise’, āvar-d-an/ āvar- ‘to bring’ and
rān-d-an/ rān- ‘to drive’.
The number of the ‘irregular’ verbs in which the past stems are not derived from the
present are not so many. Efforts have been made to regularize them according to their
final consonant clusters, but the rules have almost the same number of exceptional cases
as the (p. 287) regular ones. Therefore, it may be more appropriate to consider them as
the irregular items to be memorized.
The only Persian verb with totally two distinct lexical items for its past and present roots
is didan ‘to see’, with bin- and did- as its two roots. The other verbs show some similarities
between their roots. Some of the most regularly used ones are as follows.
zad-an/ zan- ‘to hit’, āfarid-an/ āfarin- ‘to create’, shenid-an/ shenav- ‘to hear’, dād-an/ dah-
‘to give’, kard-an/ kon- ‘to do’, bord-an/ bar- ‘to take’, mord-an/ mir- ‘to die’, dukht-an/ duz
‘to sew’, rikhtan/ riz- ‘to pour’, forukhtan/ forush- ‘to sell’, shenākht-an/ shenās- ‘to know’,
khāst-an/ khāh- ‘to want’, bastan/ band- ‘to close’, shekastan/ shekan- ‘to break’,
neshastan/ neshin- ‘to sit’, dāsht-an/ dār- ‘to have’, kāsht-an/ kār- ‘to plant’, kosht-an/ kosh
‘to kill’, gashtan/ gard- ‘to search’, bāftan/ bāf ‘to knit’, shemord-an/ shomār- ‘to count’,
gereftan/ gir ‘to get’, raftan/ rav- ‘to go’, goftan/ gu- ‘to say’, āmadan/ ā(y) ‘to come’, and
shodan/ shav ‘to become’.
10.4.2 Conjugation elements of the verbs
Verbs are formed on the basis of the stems, and different affixations result in the
formation of different tenses. The infinitive without the ending -an provides the past stem,
while it may also be known as a ‘truncated infinitive’, which is a non-finite verb, and may
be used in the impersonal constructions, as well as the periphrastic future construction
khāh-am raft ‘I will go’ (raft-an ‘to go’).
10.4.2.1 Verbal Stems

The simple past tense or the preterite, is shaped with the addition of the personal endings
to the past stems: did-am ‘I saw’ (did-an ‘to see’). Participles (either past or passive), are
formed as a result of adding -e to the past stem: nevesht-e ‘written, having been
written’ (nevesht-an ‘to write’). These participles also serve as verbal adjectives.
Page 19 of 35

Morphology
The present stem forms the present verbs, in their indicative, imperative and subjunctive
forms: mi-bar-i ‘You take’; be-rav-i ‘(that) you take’; be-bar ‘take!’ The agentive nouns are
also made up of the present participles -ān and -ande, suffixed to the present stem: khān-
ande ‘reader, singer’ (khān-d-an ‘to read, to sing’), khand-ān ‘laughing’ (khand-id-an ‘to
laugh’).
10.4.2.2 Prefixes
Verb stems receive three types of prefixes: the indicative mi-, the subjunctive be-, and the
negative na-/ ne-. These prefixes shape the first syllable of the verb, as well as receiving
the primary stress. This is as opposed to the place of stress in the nouns, which is
basically on the ultimate syllable.
1) mi-
The prefix mi- is attached to the past stem to produce the imperfect past: mardom
esterāhat mi-kard-and ke nāgahān zelzele shod ‘people were taking a rest when suddenly
an earthquake took place’; habitual deeds in the past: man be dāneshgah-e tehrān mi-raft-
am ‘I used (p. 288) to go to the University of Tehran’, and the conditionals which are
counterfactual: agar mi-tavanest-am, hatman az injā mi-raft-am ‘If I could, I would surely
leave here’.
With the present stem, mi- denotes general present: āb dar sad daraje mi-jush-ad ‘water
boils at 100 degrees’. This may encompass habitual words such as har ruz sar-e kar mi-
rav-am ‘I go to work every day’. It also denotes an action in the future: fardā bārān mi-
bār-ad ‘It will rain tomorrow’.
2) be-
The prefix be- (with its allomorphs bi- and bo-) is now the marker of imperative and
subjunctive verbs. However, in classical Persian, it used to attach to the past stems too, in
order to make preterite, as in be-raft-am ‘I went’. This usage is no longer practised.
Imperative verbs require be- before the present stems, as in be-neshin ‘sit!’, and be-khān
‘read!/ sing!’ For the subjunctive mood, it is placed before the stem of the conjugated
verb, i.e. it replaces mi- in the present verb: be-rav-am ‘(that) I go’, be-band-and ‘(that)
they close’. In the compound verbs, be- is added to the verbal part of the verb, or the light
verb: jiq be-kesh-im ‘(that) we scream’ (jiq keshidan ‘to scream’, lit. ‘scream-pull’). Also, in
the prefixing verbs, it gets attached to the stem, as in dar-bi-āvar ‘take out!’ The prefixing
verbs having bar- as their prefixes, however, do not get be- either for the imperative
forms, or the subjunctive forms: bar-dār ‘pick up!’ (bar-dāsht-an ‘to pick up’, and not *bar-
be-dār). It represents itself as bi- before the vowels a and ā, as in bi-āvar ‘bring!’ (āvar-d-
an ‘to bring’), bi-andāz ‘drop!’ (andākht-an ‘to drop’). It also shows itself as bo- before the
back vowels o and u, such as bo-kon ‘do!’, bo-khor ‘eat!’,and bo-ro ‘go!’, although it is not
always so, and exceptions such as be-bor ‘cut!’ are also seen.
3) na-
Page 20 of 35

Morphology
The negative prefix na-/ne- is the sole method of making a verb negative. Na- is prefixed
to the past stem, and ne- to the present verb, i.e. it supplements mi- in the present, and
precedes the past stem, as in ne-mi-khor-am ‘I do not eat’ vs. na-khord-am ‘I did not eat’.
With respect to the compound tenses such as the pluperfect and the periphrastic future,
it is again placed at the beginning of the verb, such as na-raft-e bud-am ‘I had not gone’,
and na-khāh-am raft ‘I will not go’. But with respect to the compound verbs (where there
is a light verb together with a nominal part), it is attached to the verbal part, as in komak
ne-mi-kon-am/ komak na-kard-am ‘I do not/ did not help’.
As to the subjunctives and the imperatives, na- is attached to the present stem, as in na-
bar ‘don’t take away’ (bord-an/ bar ‘to take away’), and agar na-bar-i ‘if you do not take …
’. In colloquial Prsian, this prefix may also be attached to the second person present
subjunctive, in order to denote emphasis on the order: na-zan ‘don’t hit!’, na-zan-i ‘don’t
you hit!’
10.4.2.3 Endings
The personal endings attached to the stems of the verbs are six, and are almost identical
with the past and present stems (except with the third-person singular). These endings
suffice to indicate the persons, therefore the independent pronouns may be dropped, or
be used with an emphatic function: man mi-rav-am, to be-mān. ‘I go, you (may) stay’. The
verbal endings and their different types are summarized in Table 10.4. (p. 289)
Page 21 of 35

Morphology
Table 10.4 Verbal endings in Persian
Sg. Sg. Sg. Pl. Pl. Pl.
Endings 1 2 3 1 2 3
Present -am -i -ad -im -id -and

stem
Imperative Ø -id
Past Stem -am -i Ø -im -id -and
Perfect =am =i =ast =im =id =and

Stem/ Copula
Existential hast-am hast-i hast- Ø hast-im hast-id hast-and

Verb
Page 22 of 35

Morphology
10.4.3 Absolute tenses: past and present
The simple past tense basically refers to an action accomplished in the past, as diruz
ketāb-i khar-id-am ‘I bought a book yesterday’. However, it may also refer to something
almost done, or in the progress of completion: bachche-hā ostād āmad ‘Hey kids, the
professor is coming (āmad ‘came’), komak konid, khāne-am sukht ‘help me, my house is
burning (sukht ‘burnt’). However, past may also be denoted with the present verbs (see
Mahmoodi-Bakhtiari 2002). In colloquial Persian, some uses of present structure for the
past events are recognized, such as the ‘historical present’ in tā ān jā barāyetān goftam
ke dozd vāred-e manzel mi-sh-e o yek nafar ro mi-kosh-e … ‘I told you, up to the point
when the thief enters the hous and kills a person … ’, and in some complaints about an
action in the past: bachche-ye man=o mi-zan-i? neshun=et mi-d-am! ‘You hit my child? I
will show you (the consequences)!’
10.4.4 Relative tenses (compound tenses)
There are two tenses which may be regarded as compound tenses: The perfect, and the
double compound past (pluperfect) tense. These tenses indicate the epistemic events, i.e.
the viewpoint of the speaker is at work in the truth value of the action denoted. The
indicative (present) perfect tense is formed by the past participle followed by the enclitics
denoting the verb ‘to be’, such as gofte=am, gofte=i, gofte=ast, gofte=im, gofte=id,
gofte=and ‘I (have) said, you (have) said … ’. In standard Persian, the stress falls on the
final syllable of the past participle, but when negated (na-gofte=am), the negative prefix
receives the stress. This verb denotes the actions which have taken part in the past, with
implications in the present: polis khiyābān ra baste=ast ‘the police have blocked the
street’; or an action whose effects or traces are remaining: dārolfonun rā amirkabir sākht-
e ast ‘Amir Kabir has built Dār ol-Fonun’; presumption of an action happened in the past:
ehtemālan mājarā ra be u gofte=and ‘Perhaps they have told him about the event’, or
even presumption of an action that the speaker is sure will happen in the future: vaqti to
beresi, man az gorosnegi morde=am ‘When you arrive, I will have already starved to
death’.
Semantically and syntactically parallel to the present subjunctive, is the perfect

subjunctive, which denotes an unrealized or desired state of an action. It is formed by the
past participle coming together with the conjugation of the subjunctive form of budan
(p. 290) (bāsh-): enshallāh resid-e bāsh-and ‘God willing they have arrived’, bāyad ta hāla
tamām shode bāshad ‘It should have been finished by now’.
On the other hand, past perfect is shaped with the past participle plus the past tense of
budan: rafte bud-am ‘I had gone’, koshte bud-and ‘They had killed’. Denoting a completed
action before to an action noted, it deals with real and unassumed facts: u rā qabl az
Page 23 of 35

Morphology
ezdevāj=emān na-did-e budam ‘I had not met her before our marriage’. The stress
pattern of the past perfect is the same as the present perfect.
There has also been a ‘durative perfect’ tense in Persian, which is not very commonly
used in current Persian. Indicating an action considered in its duration in the past, with
implications in the present, this tense is shaped by mi- attached to the present perfect
tenses: ānhā injā zendegi mi-karde=and ‘they used to live here’. The other compound
tense is the currently extinct ‘double compound past tense’, referring to the actions
completed in the past, and being reported in the present time. It is shaped with the past
participle plus the perfect tense of budan: raft-e bude=am ‘I had already gone’.
The last compound tense, is the so-called future tense, which basically has a periphrastic
structure. It should be noted here that this structure is used for emphatic purposes, or
the formal speech, and in other cases, the speakers of Persian tend to make use of the
present tense (with the future adverbials) to refer to the future. The structure of this
tense is the presnet of the verb khāstan ‘to want’ (without the prefix mi-), together with
the truncated infinitives of the main verb, such as khāh-am raft ‘I will go’ (raftan ‘to go’),
khāh-i did ‘you will see’ (didan ‘to see’). For the compound verbs, khāstan appears before
the light verb: kār khāh-im kard ‘we will work’ (kār kardan ‘to work’); and in the prefixing
verbs, it is attached to the main verb, after the prefix: bar khāh-am gasht ‘I will
return’ (bar-gashtan ‘to return’). Negative prefix na- is attached to khāstan in all the
cases.
Table 10.5 provides the basic paradigms for the verb bordan ‘to take’.
And Table 10.6 provides the different aspectual, modal, and temporal forms of the verb
bordan ‘to take’.
10.4.5 Changing the valence of the verbs: passive voice and

causatives
The passive form of a verb is formed from the past participle of the verb, together with
the conjugation of the verb shodan ‘to become’, depending on the tense. For example, the
(p. 291)present passive is formed with the present tense of shodan: ān melk kharid-e mi-
shav-ad ‘That property is (being) bought’; Past passive: u dar jang kosht-e shod-Ø ‘he was
killed in the war’; Future passive: ānjā forukht-e khāh-ad shod ‘There will be sold’,
Present subjunctive passive: agar kosht-e shav-ad heif ast ‘It is a pity if he gets killed’,
and the like.
Page 24 of 35

Morphology
Table 10.5 Basic paradigms for the verb bordan ‘to take’
Present Preterite Imperfect Present perfect Past perfect
1s mi-bar-am bord-am mi-bord-am bord-e=am bord-e bud-am
2s mi-bar-i bord-i mi-bord-i bord-e=i bord-e bud-i
3s mi-bar-ad bord-Ø mi-bord-Ø bord-e=ast bord-e bud-Ø
1p mi-bar-im bord-im mi-bord-im bord-e=im bord-e bud-im
2p mi-bar-id bord-id mi-bord-id bord-e=id bord-e bud-id
3p mi-bar-and bord-and mi-bord-and bord-e=and bord-e bud-and
Page 25 of 35

Morphology
Table 10.6 Aspects, moods, and tenses: bordan ‘to take’
Indicative Non-indicative
Imperfective:
be-bar Imperative (2sg.)
Present mi-bar-ad be-bar-ad Subjunctive
Imperfect mi-bord mi-bord1 Conditional
Evidential mi-bord-e ast
Aorist:
Preterite bord-Ø
Evidential bord-e ast
Present perfect bord-e ast bord-e bāsh-ad Subjunctive
Past perfect bord-e bud-Ø bord-e bud Conditional
Evidential bord-e bud-e ast
Affirmative future: khāh-ad bord
On the other hand, causatives are formed by suffixing -ān to the present stem of the verb,
as in fahm-id-an ‘to understand’, fahm-ān-(i)d-an ‘to convey, to teach’; ras-id-an ‘to arrive’,
ras-ān-(i)d-an ‘to bring along, to give a lift’. For more discussion on passive voice and
causative, see Chapter 7.
10.4.6 Specific verbs and their morphological properties
There are some specific verbs in Persian which do not necessarily follow the normal
patterns of verbal morphology in Persian, and deserve more attention. In the parts to
come, we will study them in more detail.
Page 26 of 35

Morphology
(p. 292) 10.4.6.1 ‘To be’, ‘to have’, and the auxiliary verbs
1) budan ‘to be’
The verb budan ‘to be’ has several forms, many of which not similar to its infinitive.
Normally, the verb ‘to be’ is indicated by the copula enclitics =am, =i, =ast, =im, =id,
and =and, attached to the noun, and make the independent pronoun optional, or kept for
the sake of emphatic reasons: (man) mariz=am ‘(I) am sick’, (to) hasan=i ‘(you) are
Hassan’, (u) dānā=ast ‘he is wise’, (mā) āzād=im ‘(we) are free’, (shomā) mahkum=id
‘(you) are condemned’, (ānhā) qātel=and ‘(they) are murderers’. The existential and the
subjunctive forms of budan are conjugated on the basis of the stems hast- and bāsh-, for
which no separate infinitive is assumed: man hamishe āsheq-e to hast-am ‘I always love
you’, lit. ‘I am always your lover’, chand nafar dar mehmān-i hast-and? ‘How many people
are there in the party?’, agar ānjā bāsh-ad, hatman hamān-jā mi-mān-ad ‘If he is there, he
will definitely stay right there’. The negative is also rather specific, in the form of ni-st-
together with the personal endings, in the form of ni-st-am, ni-st-i, ni-st-Ø, ni-st-im, ni-st-
id, ni-st-and. The cluster -st- is surely the shortened form of hast or =ast, which also
shows itself when attached to a word ending up with a long vowel: in rezā=st ‘this is
Reza’, ketāb ān bālā=st ‘the book is up there’.
The imperative is formed via bāsh-, as in sāket-bāsh ‘be quiet’ (sg.), and hamin-jā bāsh-id
‘stay right here’, lit. ‘be right here’ (pl.). Although bāsh- is also the basic stem for the
subjunctives and conditionals, the simple past of budan may be used to denote the
counterfactual conditional states as well, such as kāsh man shohar=at bud-am ‘I wish I
were your husband’, agar āqel bud-i, in-jā ne-mi-mānd-i ‘If you were wise, you would not
stay here’.
The verb budan turns to participle like the other verbs, in the form of bud-e, and the
perfectives are conjugated on that basis, in the form of bud-e=am, bud-e=i, bude-e=ast,
etc. However, there is no pluperfect for this verb, neither indicative nor subjunctive.
2) dāshtan ‘to have, hold, keep’
The verb dāshtan/ dār- is a rather complicated and exculsive verb in Persian. It is one of
those verbs with distinct past and present stems (dāsht- and dār- respectively). It does not
get the present prefix mi- in its present conjugation (dār-am, dār-i, dār-ad, etc.), and as a
transitive verb, it does not undergo the process of passivization. Its imperative form is not
(or rather, is no longer) lexically made with the addition of be- to the present stem as dār
or be-dār; instead it has the periphrastic form using the imperative form of the verb
budan, together with its past participle: dāsht-e bāsh ‘have!’ The periphrastic structure is
also used for the subjunctive mood, receiving the personal endings, as in dāsht-e bāsh-id
‘(that) you have’: barāye in sākht-emān, bāyad kheili pul dāsht-e bāsh-id ‘You need so
much money for this building’, agar pul-e kāfi dāsht-e bāsh-am, hatman be to komak mi-
kon-am ‘If I have enough money, I will surely help you’.
Page 27 of 35

Morphology
The negative marker for both the past and present stems is na-, which attaches to the
stems: na-dār-am ‘I do not have’, na-dāsht-i ‘you did not have’, na-dāsht-e bāsh ‘Do not
have!’
However, when the verb dāsht-an forms a part of compound or prefixing verbs without its
base meaning, the present prefix mi- is attached to the stem, as in ketāb rā bar-mi-dār-am
‘I pick up the book’ (bar-dāshtan ‘to pick up’, bar ‘up, over’), ān-hā rā hamin-jā negah-mi-
dār-im ‘We keep them right here’ (negah-dāshtan ‘to stop’, negah ‘look’). The imperative
will also be made out of the present stem, as in bar-dār ‘take!’, and negah-dār ‘stop!’. The
negative marker represents itself as na- for the past, present, and the imperative of the
compounds, in (p. 293) dust=ash na-dār-am/ dust=ash na-dāsht-am ‘I do not/ did not love
him’, ān rā bar na-dār ‘do not pick it up!’, and u rā dar in sharāyet negah na-dār-id ‘do not
keep him in such conditions’. The other allomorph of na- (ne-) is seen before the prefix mi-
in bar-ne-mi-dār-am/ bar-ne-mi-dāsht-am ‘I do not/ would not pick (it) up’.
The verb dāsht-an is also used in a progressive construction in standard colloquial

Persian, which is confined to positive statements, and is not witnessed in literature and
formal language. The construction consists of the verb dāsht-an preposed to the indicative
imperfective forms, no matter if it is a simple, prepositional, or compound verb; such as
dār-ad mi-rav-ad ‘(S)he is going now/ is about to go’, dār-am bar-mi-gard-am ‘I am
returning, about to return’, and dārand shenā mi-konand ‘They are swimming’. The past
progressive would logically be made by the past tense of dāshtan, and the above
sentences would turn to dāsht-Ø mi-raft-Ø ‘(S)he was going/ was about to go’, dāsht-am
bar-mi-gasht-am ‘I was returning, was about to return’, and dāsht-and shenā mi-kard-and
‘They were swimming’.
3) Modal auxiliaries bāyad ‘must’ and shāyad ‘may’
The two auxiliaries bāyad and shāyad usally receive verbs in their subjunctive mood:
bāyad be-rav-id ‘you must go’, shāyad fardā bārān be-bār-ad ‘It may rain tomorrow’.
Although bāyad and shāyad look similar in some respects, they are basically different in
terms of their origins. The auxiliary bāyad comes from the Middle Persian word abāyēd
with almost the same meaning, while shāyad derives from the verb shāyestan ‘to be
fitting, to be qualified’, which used to be conjugated as well. Bāyad can get a negative
prefix, but shāyad does not get it, although it did in the classical Persian: na-bāyad be ānjā
be-rav-i ‘you mustn’t go there’. Bāyad also precedes past progressives, to denote an
unfulfilled obligation: bāyad govāhināme mi-gereft-am ‘I ought to have got a driver’s
licence’.
The other function of bāyad may be seen in the impersonal constructions. Together with
the verbs tavānestan ‘to be able to’ and shodan ‘to become’, bāyad receives the verbs in
their truncated forms, and act for all the persons: bāyad raft ‘one should go’, mi-tavān
goft ‘one can say’, mi-shav-ad gorikht ‘one may escape’. For a detailed study of the
Page 28 of 35

Morphology
functions of bāyad and shāyad, see Windfuhr (1979: 99–113), and for a study of their
internal morphology, see Mahmoodi-Bakhtiari (2008).
10.4.7 Verbal derivations
Present stems receive several suffixes and produce different deverbal nouns (or
adjectives). The major derivational suffixes attached to the present stems are the
nominalizers -i in bāz-i ‘play’ (bākht-an ‘to lose’), -esh in rav-esh ‘method’ (raftan ‘to go’),
kush-esh ‘effort’ (kush-id-an ‘to try’), and bin-esh ‘view’ (did-an ‘to see’); -(e)mān in zāy-
emān ‘delivering the baby’ (zā-d-an ‘to give birth’), sāz-mān ‘organization’ (sākht-an ‘to
make’), and the nominalizer -āk which is found in few instances such as khor-āk
‘food’ (khor-d-an ‘to eat’), push-āk ‘clothes’ (push-id-an ‘to dress’), and suz-āk
‘gonorrhea’ (sukht-an ‘to burn’); The agentive marker -ande in dav-ande ‘runner’ (dav-id-
an ‘to run’) and khān-ande ‘singer, reader’ (khan-d-an ‘to sing, to read’); and the adjective
markers -ā in tavān-ā ‘mighty’ (tavān-est-an ‘to be able’), dān-ā ‘knowing’ (dān-est-an ‘to
know’); and -ān in khand-ān ‘laughing’ (khand-id-an ‘to laugh’). There are also several
derivational suffixes which attach to the past stems. The major ones include -e, which
produces verbal adjectives and passive participle, like nevesht-e (p. 294) ‘written
(text)’ (nevesht-an ‘to write’), and sukht-e ‘burnt’ (sukht-an ‘to burn’); and nominal marker
-ār in khar-id-ār ‘buyer’ (khar-id-an ‘to buy’) or nevesht-ār ‘(written) text’ (nevesht-an ‘to
write’), which act as the agentive marker in the former example, and a concrete noun in
the latter.
The infinitive also receives the adjective making suffix -i, denoting the potential of the
content of the verb. For example khord-an-i ‘edible’ (khord-an ‘to eat’), did-an-i ‘sight to
see’ (did-an ‘to see’). Sometimes, it denotes the character’s will to do something, as in
mehmān-hā-ye mā raftan-i nistand, mānd-an-i hastand ‘Our guests do not seem to be
willing to go, they would stay’ (lit. ‘they are not going-like, they are staying-like’).
10.5 Compounding
Persian makes an extensive use of compounding in its word formation. In the parts to
come, we will deal with compounding in two phases: nominal compounding, and verbal
compounding.
10.5.1 Compounding in nominal morphology
Compounding in the nominal Persian morphology takes place when (usually) two lexical
morphemes are placed adjacent to each other, and form an entirely new word. These
compounds are different from the phrases, in the sense that they act just like a single
word: they do not let any inflectional or derivational affix between their components, and
Page 29 of 35

Morphology
the plural marker -hā has to attach to the whole word, not just to a part of it. Persian
compounds have two major classifications: non-syntactic and syntactic. The non-syntactic
compounds are those between the components of which there is no syntactic relationship,
such as arre-māhi ‘swordfish’ (lit. ‘sword-fish’), kot-shalvār ‘formal suit’ (lit. ‘jacket-
pants’), gāv-mish ‘buffalo’ (lit. ‘cow-ewe’), sine-pahlu ‘pneumonia’ (lit. ‘chest-side’), and
mādar-bozorg ‘grandmother’ (‘mother-big’). These compounds also encompass the
reduplications such as rāh-rāh ‘stripped’, tond-tond ‘quickly’, hezār-hezār ‘in thousands’,
kam-kam ‘gradually’, and tekke-tekke ‘in pieces’.
Besides these, irreversible binominals, or as Perry (2007) states, ‘copulative compounds’,

are those compounds in which the components are almost of equal weight, and are
attached via -o-, such as gard-o-khāk ‘dust’ (lit. ‘dust-and-soil’), kār-o-kāsebi ‘business’ (lit.
‘work-and-business’), kafn-o-dafn ‘burying ceremony’ (lit. ‘shroud-and-bury’). These words
may sometimes contain meaningless items, such as bar-o-bachche-ha ‘guys’ (lit. ‘?-and-
kids’), āt-o-āshqāl ‘junk’ (lit. ‘?-and-rubbish’). The meanings of such compounds, however,
are not always as clear as the meaning of the examples. Some opaque compounds of this
kind are: gorg-o-mish ‘dawn’ (lit. ‘wolf-and-ewe’), cheshm-o-cherāq ‘dear’ (lit. ‘eye-and-
lamp’), shākh-o-shāne ‘threat’ (lit. ‘horn-and-shoulder’).
Another type of non-syntactic compounds are those refered to as ‘Semi-syntactical bound

phrases’ by Shaki (1964: 18), as the word-groups intermediate between the syntactical
and non-syntactical word-groups, such as dast-be-jib ‘wealthy’ (lit. ‘hand-in-pocket’),
(p. 295) halqe-be-gush ‘obedient’ (lit. ‘ring-in-ear’), dast-be-asā ‘cautious’ (lit. ‘hand-to-
stick’), and khāk-bar-sar ‘miserable’ (lit. ‘dust-on-head’).
On the other hand, there are compounds in which a syntactic relationship is traced
between the components, such as lebās-khāb ‘nightgown’, shāgerd-avval ‘top student’ and
pesar-amu ‘uncle’s son’ (which are all lexicalized Ezafe constructions). Sometimes the
compound is ‘possessive’, that is to say, the result of the compound is an attribute of, or
possessed by, a third party, such as gardan-koloft ‘powerful’ (lit. ‘thick-neck’), āstin-kutāh
‘short-sleeved’ (lit. ‘sleeve-short’), and pāshne-boland ‘high heels’ (lit. ‘heel-tall’). When a
part of the compound is a verbal stem, there might be several syntactic relationships
between the noun and the verbal stem adjacent to it. These relations may be of the
following types: nominative, as in naft-khiz ‘oil centre’ (lit. ‘oil-rising’); accusative, as in
bomb-afkan ‘bomber’ (lit. ‘bomb-drop’); dative, as in āyande-negar ‘provident’ (lit. ‘future-
look’); adverbial, as gerān-forush ‘expensive-seller’. Some famous verbal stems used in
compounds are: -ālud or -ālu ‘polluted, stained (with)’ from āludan/ālā- ‘to pollute’: khun-
ālud ‘bloody’, gusht-ālu ‘plump’; -āmiz ‘mingling (with)’ from āmikhtan ‘to mix’): mehr-
āmiz ‘kind’; -angiz ‘arousing’, from angikhtan ‘to stimulate’): del-angiz ‘pleasant’; -āvar
‘bringing’, from āvardan ‘to bring’: khāb-āvar ‘boring’ (lit. ‘sleep-bringer’), -bakhsh ‘giver’
from bakhshidan ‘to bestow’: ārām-bakhsh ‘tranquilizer’; -pazir ‘accepting’, from
paziroftan ‘to accept’: bāvar-pazir ‘beleivable’; -kosh ‘killing’, from koshtan: hashare-kosh
‘insecticide’; -khār ‘eater’, from khordan: giyāh-khār ‘vegetarian’.
Page 30 of 35

Morphology
We may also consider the adjectival compounds, where we have lexicalized adjectival
phrases normally appear in Ezafe constructions, such as medād rang-i ‘coloured pencil’,
gowje sabz ‘greengage’ (lit. ‘tomato-green’), gis borid-e ‘shameless (girl)’ (lit. ‘wigs-cut’),
cheshm sefid ‘ungreatful’ (lit. ‘eye-white’), and siyāh bakht ‘miserable’ (lit. ‘black-
fortune’), in which the first two are nouns and the rest are adjectives, again, as a result of
compounding.
10.5.2 Compounding in verbal morphology
As said before, Persian makes a great use of compounding, both in terms of nominal
morphology and the verbal. Compound verbal constructions may be classified into two
groups: those resulted as the process of compounding (where we have a noun together
with a light verb), and those made as a result of incorporation (in which the nominal part
is syntactically related to the verb, usually as a direct object). Incorporative verbs are
many, common and growing, such as qazā khrodan ‘to eat food’ (lit. ‘food eating’), and
lebās pushidan ‘to get dressed’ (lit. ‘clothes dressing’) (see Dabir-Moghaddam 1997).
Compound verbs are further discussed in Chapters 2, 3, 7, 8, 9, 15, 17, and 19.
Compounding in Persian is historically new. Middle Persian did not have many complex
verbs, and the only light verb used in verbal construction was kardan ‘to do’. In the
course of time, the number of compound verbs grew significantly, to the extent that now
the number of simple verbs in Persian numbers almost a hundred, many of which are not
very actively used, and many of them are used as their nominalized form with a light
verb. For example, it is very uncommon to hear the verbs geristan ‘to cry’, āvikhtan ‘to
hang’ or andudan ‘to coat’ any longer in casual speech, rather the compounds gerye
kardan (lit. ‘cry-doing’), āvizān kardan (lit. ‘hanged-doing’) and andud kardan (lit. ‘coat-
doing’) are used.
Although the most commonly used verb in the verbal part of the compound verbs
(p. 296)
is kardan, there are a number of other ‘light verbs’ as well, such as dādan ‘to give’ in pās
dādan ‘to pass (in sports)’; zadan ‘to hit’ in dād zadan ‘to cry’, raftan ‘to go’ in dar raftan
‘to escape, to be dislocated’, āvardan ‘to bring’, in kam āvardan ‘to get tired, to give up’,
dāshtan ‘to have’, in negah dāshtan ‘to stop, to keep (from moving)’, and khordan ‘to eat’
in yekke khordan ‘to be surprised’.
This method of verb formation is extremely productive, and is not restricted to the
Persian nouns in the compound. Loan words may also play a role in such verbs, such as
clik kardan ‘to click’, telefon kardan ‘to call’, āsfālt kardan ‘to asphalt’, shāns āvardan ‘to
be lucky’, and rofuze shodan ‘to fail (an exam)’. Many of the Arabic loan words which are
basically infinitives, receive the light verb kardan to be regarded as a Persian infinitive,
such as qat’ kardan ‘to cut’ (lit. ‘cut doing’), moqāvemat kardan ‘to resist’ (lit. ‘resistance
doing’), e’temād kardan ‘to trust’ (lit. ‘trust making’), emtahān kardan ‘to test’ (lit. ‘test
doing’), and eshtiyāq dāshtan ‘to be interested’ (lit. ‘interest having’).
Page 31 of 35

Morphology
10.6 Minor word-formation types in Persian

As we have noted in this chapter inflection, derivation, and compounding are the major
word-formation methods in Persian. However, some minor word-formation processes,
mostly recent and for specific purposes, have been adopted in this language. Here is a
brief review of them.
10.6.1 Back-formation
Although back-formation (together with conversion) may be seen as closely related to

derivational affixation, due to their relatively less productivity, we prefer to introduce
them both here.
Back-formation refers to a type of word formation in which a single word is considered to

be a derived one, thus a new root may be extracted out of it for further word-formation
purposes. Persian examples are bāz-bin ‘reviewer’ from bāz-bini ‘review’, baste-band
‘packer’ from baste-bandi ‘packaging’, morq-dār ‘aviarist’ from morq-dāri ‘aviculture’.
10.6.2 Conversion
Conversion (also known as zero-derivation or functional shift) is a common word

formation process in Persian. As to the verb-oriented conversions, it is not possible, of
course, to expect the vast production of verbs in the forms of other parts of speech as it is
in a language such as English, since Persian verbs are conjugated, and undergo some
changes all the same. However, the verbal stems may be transformed into nouns, as in
the past stems bākht ‘loss’ (bākht-an/ bāz), dar-yāft ‘receipt’ (dar-yāft-an/ dar-yāb ‘to
understand, to realize, to receive), and sākht ‘structure’ (sākht-an/ sāz ‘to make, to build’).
Newly used words such as borun-raft (p. 297) ‘exit’ (borun-raft-an ‘to go out) and dir-kard
‘delay’ (dir-kard-an ‘to be late’) are based on compound verbs. Present stems may also act
as nouns, especially in bi-nominal forms, as in jonb-o-jush ‘activity’ (jonb-id-an ‘to move’,
jush-id-an ‘to boil’), pors-o-ju ‘query’ (pors-id-an ‘to ask’, jost-an ‘to search, to look for’).
Adjectives easily convert to adverbs, and there are plenty of such conversions, as khub
‘good, well’, sari’ ‘fast, fast’, zibā ‘beautiful, beautifully’, qalat ‘wrong, wrongly’, ārām
‘slow, slowly’.
Some nouns turn to adjectives in specific contexts, such as chaman ‘grass’, which is used
as an adjective in zamin-e chaman ‘grass field’; or an infinitive such as khord-an ‘to eat’ as
an adjective in āb-e khord-an ‘drinking water’. The same thing holds true for jādu ‘spell’ in
cherāq-e jādu ‘magic lamp’, and khun ‘blood’ in del-e khun ‘bloody heart, grieving heart’.
Sometimes, the names of the materials are used as adjectives. For example shalvār-e
Page 32 of 35

Morphology
katān ‘cotton pants’, medāl-e talā ‘golden medal’, jeld-e pelāstik ‘plastic cover’. Naturally,
all these items should receive an adjective marker -i.
10.6.3 Clipping
New Persian has accepted the process of clipping, or shortening some words while
retaining the original meaning, to some extent. Like some other languages (such as
English), Persian clipping words do not create lexemes with new meanings, rather they
produce lexemes with a new stylistic value. Examples are motor (< motorsiklet
‘motorcycle’), super (< supermārket ‘grocery’), pās (< pāsport ‘passport’), sānt
(<sāntimetr ‘centimetre’),and gig (<gigābāyt ‘gigabyte’) among the loan words,2 and dasti
(< tormoz-dasti ‘hand brake’), bar o bach (< bar o bachche-hā ‘guys, pals’), botr3 (< botri
‘bottle’), lul4 (<lule ‘pipe’), and qāt (< qāti ‘mixed, chaotic’) among the original Persian
words; which are not so many, mostly belonging to colloquial, informal language, and
usually the material removed belongs to the end of the word. Since the Persian nouns
take their stresses on their final syllables, naturally the place of the stress shifts after
clipping. This process is a favourite and growing one among the younger speakers, who
look for new methods of creating words for their argots. However, it seems unlikely that
this method will become a creative and lasting model for standard language in the long
run.
The other field where clipping is noticed even more, may be seen in the hypocoristics or
pet names. The personal name khashāyār is clipped to khashā, while the name esfandiyār
may either be clipped as esfand, or even further to esi (what also happens to names such
as esmā’il), in which the first is the production of clipping, while the second may be
regarded as an embellished clipping. Embellished clippings are far more practised than
simple clippings in terms of Persian names, and may clip both the beginning and the end
of the word: kat-i (< katāyun) and dokh-i (<malek-dokht) (see Mahmoodi-Bakhtiari and
Shāhhoseini 2014).
(p. 298) 10.6.4 Acronyms
Among the letter-based word-formation patterns, Persian has a large tendency towards
producing acronyms. The major reason lies in the fact that since short vowels are not
written down in the Persian orthography, it is possible to read a string of letters in the
form of a word, usually by inserting the front vowels /a/ and /e/. That accounts for the
very limited use of initialism as a type of alphabetism in Perian, although some famous
words have been produced this way such as alef-bā (alphabets) and sin-jim (interrogation,
based on the initial letters of so’āl-va-javāb ‘question and answer’).
The use of acronyms is almost recent in Persian morphology, and some established words
of this kind are now current in the language, such as homā (transliterated HMA) for
Havāpeimāyi-ye Melli-ye Irān ‘Iranian national aviation’, and saman (transliterated SMN)
Page 33 of 35

Morphology
for sāzmān-e mardom-nahād ‘NGO’. However, acronyms are mostly used in the military
terminology of Persian, such as pahpād ‘UAV’ (transliterated PHPAD), as a result of the
original letters of the string parande-ye hedāyat pazir az dur ‘flying object directed from
far’, or nedājā (transliterated NDAJA) to refer to niru-ye daryā-yi-ye artesh-e jomhuri-ye
eslām-i-ye irān ‘The navy of the Islamic Republic of Iran’. Sometimes this acronym
process uses the one letter conjunctions of the string as well, such as dāfus (transliterated
DAFVS) for dāneshkade-ye farmandeh-i va setād ‘College of commanding and logistics’;
and sometimes retains a syllable together with the initial words, such as marāposh for
markaz-e āmuzesh-e poshtibāni ‘Centre for provision instruction’.
10.6.5 Blending
Persian blends, as lexemes made out of a phonological parts of two other words, are new,
but growing. Examples are razmāyesh ‘manoeuvre’ (razm-āzmāyesh ‘fight-test’), ābfā
‘Iranian water organization’ (< āb-fāzel-āb ‘water-sewage’), and tavānir ‘Iranian power
organization’ (< tolid va enteqāl-e niru ‘production and transfer of power’). Again, the
terms created by blending either represent themselves as colloquial words, or as
technical ones not widely used in the core of the language.
10.6.6 Analogy
Like so many other languages, Persian also makes use of analogy to make some new
words on the basis of analogy. For example, the Turkish word doqlu is used to refer to
‘twins’ in Persian (with the pronounciation doqolu), and the terms se-qolu ‘triplets’ (se
‘three’) or chahār-qolu ‘quadruplets’ (chahār ‘four’), are formed on the basis of analogy.
This also holds true for the word kaffāsh ‘shoesmith’, which is simply analogous with the
Arabic occupation words najjār ‘carpenter’ or khayyāt ‘tailor’, while the word kafsh is
basically Persian, and its Arabic counterpart, na’l, is used solely for ‘horse shoe’.

Although minor types of word formation may not be linguistically very important, the fact
that they are showing the potential of productivity and creativity in the modern world
means that they have an increasing importance in the lexicon of modern Persian in terms
of the new, innovative forms they create. This chapter aims to provide the reader with a
general sketch of Persian morphology. However, lots of systematic and corpus-based
studies are yet required to provide us with a complete panorama of word-formation
processes in this language.
Notes:
Page 34 of 35

Morphology
(1) It is noteworthy that the structure mi-past stem-ending, is used both to refer to a
continous action in the past (such as mi-bord-am ‘I used to take’), and to an action in the
conditional clause ([agar] mi-bord-am ‘[if] I would take’). That is why this construction is
placed both as indicative and non-indicative in the table.
(2) For an up-to-date study of the English loan words in Persian, see Paraskiewicz (2015).
(3) The word botr refers to ‘bottle’ in the colloquial Persian, when referring to a
measuring scale, such as do botr viski ‘two bottles of whisky’.
(4) Lul also replaces lule when it refers to scales, such as do lul taryāk ‘two sticks of
opium’, or tofang-e do-lul ‘double-barrelled shotgun’.
Behrooz Mahmoodi-Bakhtiari
Behrooz Mahmoodi-Bakhtiari received his MA (1999) and PhD (2004) in Linguistics

from Allameh Tabatabaee University, Tehran, and is now an Associate Professor of
Linguistics and Persian at the Faculty of Fine Arts, University of Tehran. His major
publications include Tense in Persian (2002), Fārsi Biyāmuzim [Let’s Learn Persian]
(2004), and Persian for Dummies (forthcoming). He is also the author of numerous
articles on Persian linguistics, grammar, and its several dialects.
Page 35 of 35

Lexicography

Lexicography
Seyed Mostafa Assi
Print Publication Date: Aug 2018 Subject: Linguistics, Lexicography, Languages by Region
The history of lexicography in Iran dates back to more than 2,000 years ago, to the time
of the compilation of bilingual and monolingual lexicons for the Middle Persian language.
After a review of the long and rich tradition of Persian lexicography, the chapter gives an
account of the state of the art in the modern era by describing recent advances and
developments in this field. During the last three or four decades, in line with the
advancements in western countries, Iranian lexicography evolved from its traditional
state into a modern professional and academic activity trying to improve the form and
content of dictionaries by implementing the following factors: the latest achievements in
theoretical and applied linguistics related to lexicography; and the computer techniques
and information technology and corpus-based approach to lexicography.
Keywords: Iranian lexicography, Persian language, dictionaries, corpus linguistics, computational lexicography
11.1 Introduction
LEXICOGRAPHY in its traditional sense and synonymous with ‘dictionary making’ has
usually been considered a craft, an art, a profession or even a pastime. It was only in the
late 1960s and early 1970s that linguists started to pay more attention to it as a scientific
activity and specifically as an applied linguistics branch. As Quemada rightly noticed at
that time:
Lexicography, which until recently, had been limited to the art of making
dictionaries, is on the verge of becoming not only a technique in its own right,
Page 1 of 22

Lexicography
surer of itself, but for certain authors, if not a science, then at least an applied
science.
(1972: 427)
Hartmann and James define lexicography as ‘the professional activity and academic field
concerned with dictionaries and other reference works’ (1998). Actually, at the present
time, no major lexicographical project can be imagined without taking into account
various linguistic theories directly related to most of the practical aspects and phases of
it, as no such task may be accomplished without using modern tools and furthermore,
techniques of corpus linguistics and information technology (see also Chapters 3 and 19
for more discussion on corpus studies).
In this chapter, lexicography is considered in a broad sense as an interdisciplinary subject

concerned with both the practice of making all kinds of reference works such as
dictionaries, encyclopedias, thesauri, concordances and other wordlists, and the theory
and research related to these activities. Perry, in his ‘outline of the characteristics,
geocultural affinities, and historical development of Persian lexicography’ (2011), reviews
the works under categories like alphabetical, bilingual, topical and defining dictionaries
and glossaries. Sādeghi (1995) has classified the dictionaries into five classes such as:
Persian, Arabic–Persian, bi/multilingual, specialized, and slang dictionaries. The aim of
this chapter is to give a concise chronological overview of Persian lexicographic tradition
and then to discuss the recent advances and developments in the lexicographic
publications.
(p. 301) 11.2 Traditional perspectives
11.2.1 Farhangs: dictionaries for Middle Persian
Lexicographical tradition in Iran is very old; some evidence indicates that it dates back
2,000 years. At that time, the official language of the Achaemenian Empire, Old Persian,
was in a state of change into the Middle Persian language known as Pahlavi (see also
Chapters 2 and 3). It was the same for the other language of that time, Avestan, which
served as the religious language of the country (see more about Avestan in Chapter 3). It
was becoming difficult for people to understand their religious texts as well as their holy
book, the Avestā. So, there were attempts to translate, explain, and comment upon the
Avestā. To this end, a series of glossaries or lexicons, called ‘Farhangs’ or
‘Frahangs’ (literally: Schools, and later, dictionaries) were compiled, some of which have
been discovered in various places, even outside present Iranian territory, such as India
and Turfan in China. The aim of these Farhangs was to explain the difficult words and the
Semitic heterograms found in Zoroastrian texts, as they were ‘Probably intended both as
guides for the students in the schools for scribes and as handbooks for writers and
Page 2 of 22

Lexicography
readers of documents in general’ (Klima 1968: 48). Two of such wordbooks have survived,
though the exact date of their compilation is not known. For more information on
heterograms, refer to Chapter 2.
One of these lexicons is Farhang i Oim or An Old Zand Pahlavi Glossary, edited by Destur
Hoshengji Jāmāspji and revised by Martin Haug in 1867 (reprinted 1973), also edited in
German by Hans Reichelt in 1900 (q.v.). As Haug and Hosheng have noted: ‘it was
originally prepared from several works of the same nature for the use of students of the
Zand (Avestan) language to be learnt by heart’ (Haug and Hosheng 1973: 1). This is the
introductory sentence of the glossary:
In the name of God and to his praise! May this explanation for understanding the
words and phrases of the Avestā, that is, the meaning in which, and how (they
should be taken), be good (for the reader)!
(ibid.: 45)
Although the main editor, Hosheng, has claimed that the date of its compilation is about
700 BC (ibid.: ii), Martin Haug, who revised this edition, does not agree with him
completely, and believes that the older part of the book, which is arranged topically, must
have been composed between the seventh and fourth centuries BC, while the other part,
arranged in alphabetical order, may be of a later date (ibid.: xlviii).
Another Farhang which has been preserved almost perfectly is Farhang i Pahlavik, edited
by H. F. J. Junker in 1972. It is compiled according to word families, and the entries are
classified by different subjects. In general, these Farhangs are bilingual Avestan–Pahlavi
glossaries which are of great value in historical linguistics. They also provide evidence of
the lexicographical skills and linguistic knowledge of the compilers regarding the
grammatical classification of the lexical items, the alphabetical arrangement of the
entries, and the utilization of a precise and virtually phonetic alphabet (cf. Behruz 1963).
(p. 302) 11.2.2 Lexicography for the Arabic language
From the beginning of the Islamic period, Iranian philologists, again inspired by religious
motives, concerned themselves with the description and analysis of the Arabic language
(see also Chapter 2). Tauer has noted:
It is indeed remarkable that although it was the Persian Sibavayhi (died between
782–809 AD) who created the Arabic system of grammar and that later on
important works on this subject flowed from the pens of learned men of Persian
origin, the Iranians left the field of the grammatical treatment of their own
language almost untouched. It cannot of course be denied that the extremely
simple grammatical structure of Persian contributed to this, and that an
impressive system could not be developed on the basis of Aristotelian dialectics,
as was possible in Arabic, and that it was these facts that forced Persian
Page 3 of 22

Lexicography
philologists to turn their attention to other branches of their subject, namely to

lexicography, the theory of style, poetics and epistolography, which they studied
far more zealously than did the Arabs.
(1968: 429)
Apart from al-Khalil (715–786), the father of Arabic lexicography, who is also considered
an Iranian by many scholars (cf. Nafisi 1959; Shahidi 1959; Soltāni 1959), a large number
of Iranians contributed to this field during the Islamic period. Ismā’il al-Jawhari (died
1002 AD), is the compiler of the most famous Arabic dictionary ‘as-Sihāh fil-lugha’ and,
Mohammad Firuz-ābādi (died 1414) composed his Qāmus (full title: al-Qāmus al-Muhit wa
al-Qābus al-Wasit), which brought the Arabic lexicography to its climax with 60,000
entries and remained the most frequently consulted Arabic dictionary for several
centuries. Many commentaries, critical studies, abridged versions and translations of it
were made, and it is still regarded as a reliable source (Soltāni 1959).
In the eleventh century a tendency to compile bilingual Arabic–Persian dictionaries

appeared mostly in Khorāsān which continued for several centuries. Monzavi (1959: 265–
372) in a comprehensive and detailed study gives an historical account of many of them
which are roughly categorized as topical, alphabetical, and versified dictionaries.
11.2.3 Lexicography for the Persian language: classical period
After exhaustive work on every aspect of the Arabic language, as a result of the new
political and linguistic situation, i.e. the rise of nationalistic movements, the
establishment of local governments and the popular use of the Modern Persian language,
Iranians turned their attention to their own language.
By the tenth century AD, the eastern version of Modern Persian, called Dari, started to
spread over the central and western parts of the country, where the Middle Persian
language, Pahlavi proper, was still in use. (For more information on Dari, refer to
Chapters 2, 3, 13, and 19.) People adapting this new language, with its rich and rapidly
growing literature, needed books and dictionaries to help them learn and use Modern
Persian (cf. Nafisi 1959: 178; Windführ 1975: 158).
Thus, a trend in compiling dictionaries for the Persian language started in Iran and
continued for about five centuries and this was later followed in India and Ottoman
Empire (present Turkey) for several centuries. Sādeghi (1995) has arranged most of these
dictionaries in (p. 303) separate lists while some of the most important ones are treated in
detail by various scholars in the Dehkhodā’s Introduction to Loghatnāme (1959).
There are only reports on two old monolingual dictionaries for (early) Modern Persian,
one compiled by Ghatrān Tabrizi (an eleventh-century poet) and the other by Abu-Hafs
Page 4 of 22

Lexicography
Soghdi (not to be mistaken with his namesake poet of the ninth century); both must have
been compiled in the eleventh century (cf. Nafisi 1959: 179; Sādeghi 1995).
The oldest extant Persian to Persian dictionary is Loghat-e Fors (Dictionary of the
Persians, or Persian words) by the poet Asadi of Tus, who must have compiled it between
1066 and 1073 AD (cf. Nafisi 1959: 186). This dictionary contains Persian (mostly literary)
words and their definitions or synonyms, supported by examples from earlier poetry. ‘It is
most probable that by Persian words, Asadi meant the words of Dari, the Persian
language used at that time in Transoxiānā and Khorāsān’ (Eghbāl 1940: 9/t). Considering
the great differences among the available manuscripts, there is no consensus about the
exact number of the entries in the original version and those added by different scribes in
the course of time; however, Morādi makes an estimate of around 3,000 words altogether
(2013: 5). The headwords are arranged according to their last letter, as if the dictionary
was intended for the use of the poets in search of rhyming words. This point is attested by
Asadi’s own words in the introduction to the dictionary: ‘Then, my son … Ardeshir ebn-e
Deylamsapār al-Najmi, the poet … asked for a wordbook from me … in such a way that
there would be an example cited from a Persian poet for every word’ (Asadi 1940: 2).
Loghat-e Fors has been the most influential source for later works during the next five
centuries.
Sehāh al-Fors, compiled by Mohammad Nakhjavāni in 1328 in Tabriz, is the oldest

existing Persian dictionary next to Loghat-e Fors. Though very similar, it has some
improvements in comparison to Asadi’s work in the number of the entries (2,300) and
their arrangements, i.e. first sorted by the last letter of the headwords in twenty-five
chapters ‘bāb’ and then by their first letter in 430 sections called ‘fasl’ (Tā’ati 1958: 187).
Mention should be made of two other notable dictionaries in this trend both influenced by
Asadi’s work: The first is Majmu’at al-Fors by Abu’l-’Alā’ ‘Abd-al-Mo’men Jāruti, known as
Safi Kahhāl, probably compiled in the late thirteenth or early fourteenth century and
containing 1,542 entries (Sādeghi 1995), and the second is Me’yār-e Jamāli, a book
compiled by the poet Shams Fakhri Esfahāni in 1343–4. Kiā (1959) has reviewed this book
which contains four chapters dealing with four literary techniques: ‘aruz (prosody),
ghavāfi (rhymery), badāye’ al-sanāye’ (rhetorics) and loghat (philology). The last chapter
is actually a wordbook with 1,580 entries arranged according to their last letter but
unlike Asadi’s work, Fakhri has used his own poems (except for a few cases) as examples
to support the meanings and definitions of the words. Kiā has found many mistakes and
corrupt words in this book which have found their way into many subsequent dictionaries
as ghost words.
During the following three or four centuries, the compilation of Persian dictionaries in
Iran declined and only a few works are worth mentioning: Farhang-e Vafā’i (1526), Tohfat
al-Ahbāb (1529), and Majma’ al-Fors (1599). Hekmat (1959: 198) points out that at the
time Mohammad Ghāsem Kāshāni (Soruri) was compiling his Majma’ al-Fors at the order
of Shāh Abbās I in Iran, Mir Jāmāl al-Din Hosein Enju Shirāzi was compiling his Farhang-e
Page 5 of 22

Lexicography
Jahāngiri in the court of Akbar Shāh of India, and finally finished it during the reign of his
son, Jahāngir.
Contemporaneously, there were rising trends in Persian lexicography in India and Turkey
as a result of ‘a literary Persianization process’, which brought about a series of large
(p. 304) mono- and bilingual dictionaries. It is noteworthy that almost all of these works
were based on previous dictionaries, and they themselves were used as the principal
sources for most of the later European bilingual dictionaries which can be regarded as
another trend. Windführ quoting Lagarde points out: ‘a dictionary should not be compiled
from other dictionaries, which perpetuates ghost words and innumerable mistakes’. He
draws up a list of twenty-three Western bi-/multilingual dictionaries for Persian, but
refers to Steingass, as the most famous and most widely used European English–Persian
dictionary, which like many others, suffers from this weak point (1975: 158–60).
Blochmann (1868) carried out an exhaustive review of the dictionaries compiled during
this period especially in India; and in a brief survey, Nafisi (1959) listed some 188
important ones. Morādi (2013) estimates that from about 250 monolingual and bilingual
Persian dictionaries, only about forty of them were compiled by the Iranians and the rest
were the result of the endeavours of the lexicographers in India and Anatoly (Ottoman
Empire).
Farhang-e Ghavvās is considered to be one of the first Persian dictionaries compiled in

India by Fakhr al-Din Mobārakshāh Ghavvās (Kamāngar) around the end of the thirteenth
or the beginning of the fourteenth century. It is also credited as the first dictionary using
the word ‘farhang’ as an equivalent for dictionary (Morādi 2013), and ‘the second oldest
monolingual Persian defining dictionary’ (Perry 2011). Its 1,050 entries are arranged by
topics in five sections and their meanings are supported by verses mostly taken from
Asadi’s work.
Bahr al-fazā’el by Mohammad ebn-e Ghavām Balkhi Kera’i (comp. 1433) is another
remarkable work in which the headwords are arranged in two ways: in the first part by
the initial letters and in the second part by topics (Sādeghi 1995).
The next important dictionary compiled in India is Sharaf-nāme-ye Monyari/Ebrāhimi

(comp. 1473) by Ebrāhim Ghavām Fārughi, which is, like previous works, organized first
by the initial letter of the headwords and then by their last letter. There are about 11,000
entries in this dictionary with illustrative examples from Asadi’s work and other poets up
to the time of Hāfez. A century later, an abridged version of Sharaf-Nāme was made by
Mirzā Ebrāhim ebn-e Shāh-Hosein-e Esfahāni who named it Farhang-e Mirzā Ebrāhim
(ibid.).
In the sixteenth century, at least three dictionaries are worth mentioning, all of which
followed this trend: Mo’ayyed al-fozalā’ (1519) by Mohammad Lād Dehlavi, Madār al-
afāzel (1592) by Allāhdād Feyzi Serhendi with about 12,000 entries and the famous
Farhang-e Jahāngiri by Mir Jamāl al-Din Hosein ebn-e Fakhr al-Din Hasan-e Enju-ye
Page 6 of 22

Lexicography
Shirāzi. The compilation of this third work began during the reign of Akbar Shāh (1596)
and finished by the time of the reign of his son Jahāngir (1608). A new version of this
dictionary was prepared and offered to the king in 1622 (Hekmat 1959: 196).
Enju collected only words of Persian origin in his dictionary and used more than forty-four
dictionaries and wordbooks as sources for his examples. Hekmat believes that Farhang-e
Jahāngiri is ‘one of the most comprehensive and precise Persian dictionaries with the
largest inventory of lexicological works of that time’ (ibid.). With an informative
introduction, a rich and comprehensive lexicon, clear definitions supported by many
sources and several appendices for compound words, idiomatic and metaphorical
expressions, Farhang-e Jahāngiri was highly influential on many subsequent dictionaries
such as Farhang-e Rashidi and Borhān-e Ghāte’. (For further information on idiomatic
expressions, see Chapters 9 and 17.)
Almost simultaneously in Iran, Mohammad Ghāsem ebn-e Hāj Mohammad-e Kāshāni

(pen-name: Soruri) was compiling his dictionary called Majma’ al-Fors in the court of King
(p. 305) Abbās I. Some twenty-three years after completing his work in 1622, he made a
trip to India and saw the Farhang-e Jahāngiri. Then he revised and expanded his work and
this may be the reason why each of these Farhangs have mentioned each other as one of
their sources (ibid.: 198).
Farhang-e Rashidi, compiled by ‘Abd al-Rashid Tatavi in 1653, is an abridgement of the

two above-mentioned dictionaries, with some 9,000 entries (cf. Sādeghi 1995).
One of the most famous and controversial Persian dictionaries compiled in India is
Borhān-e Ghāte’ by Mohammad Hosein Khalaf-e Tabrizi (pen-name: Borhān) in 1652. The
innovations and outstanding features of this dictionary made it a source and model for
future dictionaries and a subject for the initiation of a new wave of lexicographical
criticism. It was regarded as one of the largest repositories of Persian words (more than
20,000) with precise, clear, and simple definitions with reference to four major
dictionaries of that time, including Farhang-e Jahāngiri with all its sources. It provides
ample synonyms and antonyms for even better clarification of the meaning of the words,
showing their pronunciation either by mentioning the missing vowels in the Persian script
(the job done nowadays by the diacritics), or by presenting familiar words with similar
pronunciation. Its new and almost perfect alphabetical sorting method and other features
made this work so prominent that it remained at the centre of attention for many decades
(cf. Hekmat 1959: 199–217 and Mo’in 1963). However, Borhān’s interest in gathering as
many words of Persian origin as possible, misled him to include many fake words from an
unreliable source, i.e. Dasātir. This weak point and some other mistakes brought about
many works and reviews both criticizing and defending Borhān which ‘culminated two
centuries later in a series of defences and counter-attacks triggered by the broadside
Ghāte’-e Borhān written by the poet Ghāleb (1797–1869)’ (Perry 2011). A century later,
Mo’in (1963) in his extensive critical edition of this dictionary presented all of its merits
and demerits recognizing almost all of the fake Dasātir words in it.
Page 7 of 22

Lexicography
During the eighteenth and nineteenth centuries several dictionaries were produced,
among them Bahār-e ‘ajam by Tik Chand Bahār (1739) and Haft gholzom (seven volumes)
by Abu’l-Mozaffar Ghāzi al-Din Heydar, which was arranged and edited by Mowlavi
Ghabūl Mohammad (1813–14) are the more important ones (Sādeghi 1995).
In the nineteenth century, with the rise of the Urdu language and the decline of Persian
lexicography in India, while fewer noticeable works were produced, the appearance of
Farhang-e Anandrāj can be considered an important event. It was compiled by
Mohammad Pādeshāh ebn-e Gholām Mohy al-Din (pen-name: Shād) in 1888–9. The
dictionary was ‘named in honour of the mahārāja Ānand Gajāpāti Rāj, the ruler of
Vijāyānāgār in South India’ (Baevskii 1999). The extensive number of sources, the strict
alphabetical order of the headwords, precise and clear definitions, and ample
grammatical notes make this farhang one of the most comprehensive Persian dictionaries
of that time and ‘thus epitomizes the achievements of the centuries-old Persian tradition
in lexicon compilation and marks a transition to the use of European methods of
lexicography’ (Baevskii 1999). Mohammad Pādeshāh had published a dictionary of
synonyms and idioms before this work in 1874.
From the fifteenth century, a trend of Persian lexicography started in the Ottoman Empire
which concentrated mostly on compiling bilingual Persian–Turkish dictionaries and
followed the lexicographical tradition in Iran and India. Dabirsiāghi, (1989) reviews some
of the published and unpublished dictionaries produced during the three to four centuries
in this region. Some of the most important of them are as follows.
Oghnūm-e ‘ajam (a manuscript dated 1492), with about 5,000 entries, whose
(p. 306)
author is not known, is referred to by Ne’mat-Allāh Rowshanizāde in his dictionary,

Loghat-e Ne’mat-Allāh. Another dictionary, Bahr al-gharā’eb, was compiled by Lotf-Allāh
ebn-eYusof Halimi, who later expanded it into Sharh-e Bahr al-gharā’eb or Ghā’eme, with
about 5,500 entries. In 1467, he compiled a third version of it on the basis of many earlier
Persian works containing around 5,540 headwords (Sādeghi 1995). Lesān al-’ajam or
Nawāl al-fozalā was compiled by Hasan Sho’uri Halabi before 1693 with about 18,000
entries. It is based on many Persian dictionaries and some Persian–Turkish dictionaries
(Dabirsiāghi 1989: 284–90).
Almost during the same period, European lexicographers began to produce dictionaries
(mostly bilingual) for the Persian language. Gool (1643), compiled the first Persian
dictionary which was printed in Europe, and later as part of Castell’s Lexicon
Heptāglotton (1669) including the following languages: Hebrew, Ārāmāic, Syrian,
Samaritan, Amharic, and Arabic. Then come the works of Meninski (1680) (Turkish–
Arabic–Persian) and Angelus (1684) (Persian–Latin–Italian–French), followed by several
others up to the twentieth century. The most famous and most frequently consulted
Persian–English dictionary in this series is the Comprehensive Persian–English Dictionary
by Francis Joseph Steingass (1892) published in London. Windführ (1975) has listed some
twenty-three important western dictionaries which were mostly compiled earlier than the
Page 8 of 22

Lexicography
twentieth century, when gradually the Iranians began to compile bilingual dictionaries of
Persian with many European languages. Afshār (1959), in a comprehensive survey, has
drawn up a list of sixty of such, twenty-eight of which are English–Persian.
By the twentieth century, the number of bilingual dictionaries with Persian increased.
There are several studies of such works providing detailed information about them and
showing long lists of European and Asian languages as their source or target language.
Sāme’i (1995) mentions some of them as: English (more than 125 titles), French (sixty-
three titles), German (thirty-five titles), Russian (thirty-four titles), Latin and Italian (eight
titles each), Spanish (three titles), Greek, Esperanto, and Swedish (two titles each). Asian
languages other than Arabic include Turkish (more than forty titles), Urdu (twelve titles),
Armenian (eight titles), Pashto (five titles), Hindi (four titles), Chinese (three titles),
Japanese (two titles), and Syriac, Hebrew, Gujarāti, and Bengāli (one title each).
11.3 Contemporary works: continuity of the old

tradition
From the early decades of the twentieth century, Iranian scholars became acquainted
with western science and technology, including the new lexicographical methodology,
began to produce works with similar features. One of them is Farhang-e Nezām, by
Mohammad ‘Ali Dā’i al-Eslām, printed in Heydarābād (1926–39). In his introduction, the
compiler describes his methodology and states that the pronunciation of the words is
given according to what is pronounced in Tehran. He also asserts that three types of
words are included in the dictionary: spoken words, words of prose and words of poetry
(Nafisi 1959: 225). Another dictionary of this period is Farhang-e Nafisi or Farnudsār
byAli-Akbar Nafisi Nāzem al-Atebbā (1938–55), which was published some twenty years
after the death of the author by his son. (p. 307) It contains 158,431 entries including
Persian and many loan words from other languages (ibid.: 224).
Generally speaking, (in this author’s opinion), Iranian language studies in the twentieth
century have been under the influence of two fundamental social and political tendencies:
a spirit of nationalism; and a wave of modernization. The outcome of these tendencies in
the field of lexicology and lexicography may be classified in terms of three basic trends.
The first trend aimed at lexical innovation for the Persian language (cf. Yārshāter 1970;
Windführ 1975); the second one consisted of several attempts at compiling
comprehensive monolingual Persian dictionaries on a national scale; and the third, which
developed in response to increasing demands for new terms and equivalents, led to the
production of a large number of general and specialized bilingual dictionaries, glossaries,
lexicons, and technical terminologies.
11.3.1 Lexical innovations by the Academies
Page 9 of 22

Lexicography
The first trend in lexical innovations was the result of the endeavors of the Iranian
Language Academy with three interrupted periods of activity. In the first period (1314–
1319 AH/1935–1940 AD) under the name of Farhangestān-e Irān (Iranian Academy), they
selected, combined and even coined Persian equivalents for around 2,000 foreign terms
(Farhangestān-e Zabān-e Irān 1975). This period, although very short, was very fruitful
and effective and many of their proposed words were integrated in the Modern Persian
lexicon.
From 1971 Farhangestān-e Zabān-e Irān (The Iranian Academy of Language) began its
activity as the Second Academy with more ambitious goals and a larger organization with
nine research departments to study all aspects of Iranian languages and dialects, which
could also support the process of term selection as the primary mission of the academy.
During seven years of its activity, apart from many lexicological research projects on
Modern Persian and other Iranian languages and dialects, some 57,000 Persian
equivalents for about 30,000 foreign words were proposed by twenty-six groups of
experts in terminology for different scientific disciplines. Only 1,100 of the words were
approved by the high council and of those, only a small fraction of them were published
before this period of activity came to an end in 1978 (Āssi 2005: 281–92).
The third period started in 1990, with the third academy named Farhangestān-e Zabān va
Adab-e Fārsi (Academy of Persian Language and Literature), again with the adoption of
new words and their equivalents as its major task. Over the last twenty-five years of
activity, about ninety word-selection groups were formed and proposed a remarkable
number of words in different fields of science and technology. Up to now, more than
46,000 of them have been approved by the Word Selection Council, are in the website of
the academy and printed in eleven volumes (Academy of Persian Language and Literature
2004–14).
11.3.2 General comprehensive monolingual dictionaries
An outstanding example of the second trend is Loghatnāme (Wordbook), the greatest

lexicographical project for the Persian language undertaken by Ali Akbar Dehkhodā, a
prominent figure in literature, journalism, and politics. With the aim of collecting the
complete (p. 308) and extensive vocabulary of the Persian language, he alone set out to
read and extract words from many classical and literary sources in 1916. After forty years
of research, towards the end of his life he had compiled more than 2 million quotation
slips and notes. In 1945 the Parliament passed an act for the establishment of an
organization for the continuation and publication of the dictionary. In 1946 the first
fascicle was published in folio format. After Dehkhodā’s death in 1956, the work went on
with the cooperation of more than 130 lexicographers until the last fascicle was printed
in 1980 (cf. Loghatnāme: Introduction and Supplement to the Introduction).
Page 10 of 22

Lexicography
The complete Loghatnāme is divided into 222 fascicles, totalling 26,475 pages. It is based
on more than 2 million dictionary slips and the definition and meaning of each headword
is supported by several quotations illustrating its history. In some respects, the work is
encyclopedic in nature, especially with regard to the inclusion of biographical and
geographical entries.
‘It is not only a review of the entire lexicographical tradition… but is also based on
profound research in primary sources to which Dehkhodā devoted the greater part
of his life’
(Windführ 1975:161)
This monumental work, however, suffers from several weak points (cf. Yārshāter, 1970)
which are mostly the result of the lack of a systematic and methodical approach.
Another important Persian dictionary in this group is Farhang-e Mo’in (Moin’s

Dictionary); though smaller than the Loghatnāme, it is far more systematic and useful. It
is based on more than 300,000 dictionary slips (prepared by Mohammad Mo’in himself
and about 80 other collaborators) which were extracted from 500 sources. It was first
published in 1963 and reprinted several times, in six volumes and 7900 pages. Four of the
volumes are devoted to the body of the dictionary, and the rest to proper names and
foreign expressions.
There were some other minor works in this group which mostly followed this trend on a
smaller scale. Farhang-e ‘amid by Hasan ‘amid, is a medium-sized Persian monolingual
dictionary first published in 1963 and Farhang-e zabān-e fārsi-ye emruz by Gholāmhosein
Sadri Afshār published in 1990 (later edited and published as Farhang-e mo’āser-e fārsi)
aims to contain the contemporary lexicon of Persian.
Mention must be made, however, of a recent important work by Hasan Anvari, Farhang-e
Sokhan (Sokhan Dictionary), a comprehensive monolingual Persian dictionary first
published in 2002 in eight volumes. It contains about 75,000 headwords (entries) and
some 45,000 sub-entries which are supported by about 160,000 examples taken from 450
Persian sources (Anvari 2002). In some respects, this work follows the trend of traditional
dictionaries, while with regard to the large number of well-organized collaborators,
consultants, and editors, it shows a progress in team-work lexicography. It is claimed that
the dictionary contains ‘all’ the words of contemporary as well as the older varieties of
modern Persian based on more than 450 literary texts (ibid.).
11.3.3 General and specialized bilingual dictionaries
The third group of lexicographical activities in recent years aims at producing general
bilingual dictionaries and specialized terminological glossaries, lexicons, and dictionaries
in (p. 309) different fields of knowledge, especially those relating to modern science and
technology. At the starting point of this trend stands Soleyman Haim who is often
Page 11 of 22

Lexicography
regarded as the forerunner of modern bilingual lexicography for the Persian language.
His Comprehensive English–Persian Dictionary (two volumes) was first published in 1929
and his Persian–English Dictionary also in two volumes was finished in 1933. In a revised
and improved edition of the first one in 1951, in addition to many corrections, he used a
new and simplified transcription system to provide pronunciation of (some of) the Persian
words which can be useful for the English users. With relatively exact definitions,
providing most of the equivalents coined by the Iranian Academy and many other
innovations in form and content, Haim’s dictionaries remained unsurpassed for several
decades. It is noteworthy that he also compiled two French–Persian and Hebrew–Persian
dictionaries in 1937 and 1961 respectively.
In the early 1970s, Abbās and Manuchehr Āryānpur published their English–Persian (five
volumes) and Persian–English (one volume) dictionaries, and later in 1998 Manuchehr
Aryanpur compiled a six-volume English–Persian dictionary based on and mostly
translated from the Webster’s New World Dictionary.
One of the latest works in this category is Farhag-dāneshnāme-ye Kārā (Kārā

Encyclopedic Dictionary) by Khorramshāhi, published in 2015. It is mainly based on the
Oxford Advanced Learner’s Encyclopedic Dictionary and Britannica Concise
Encyclopedia. It contains about 150,000 entry words in four volumes, while the fifth
volume is a separate wordlist of specialized terms with most of the equivalents approved
by the Persian Academy.
Regarding specialized dictionaries, there are more than 200 titles of them in the central
library of Tehran University alone and many more in the Iranian National Library, the
majority of which are bilingual dictionaries and glossaries published from the 1960s to
1980s. Their special subjects cover almost all fields of knowledge such as accounting,
agriculture, anatomy, arts and crafts, astronomy, banking, biology, botany, building
construction, chemistry, computer science, economics, electronics, engineering,
geography, hydrology and irrigation, language and linguistics, law, mathematics,
medicine, metallurgy, military terms, navigation, nuclear science, petroleum industry,
pharmacology, philosophy, physics, politics, psychology, religion, sociology, technology,
theology, etc. A mere glance at these dictionaries reveals that they are very diverse. Some
of them are original compilations based on several sources with full definitions; some are
mere translations, while others (called vāzhegān or Lexicon) are simple wordlists or
frequency wordlists of foreign terms with their Persian equivalents. Many of them have a
limited scope and specialize in one particular subject, but a good number of them claim to
be comprehensive in covering a wide range of subjects from mathematics and physics to
astrology, geology, sociology, etc. (cf. The Scientific Dictionary by Shahriāri et al. 1970). A
few of these works are even encyclopedic in nature. The compilers are from different
backgrounds and have different interests and include professional (and amateur)
lexicographers, translators, university professors, school teachers, doctors, engineers,
army officers, poets, etc., who, except in a few cases, have done the compilation single-
handedly. No organized team of compilers with expert editors, linguistics and
lexicography advisors can be found among them. A redundancy of subjects and titles can
Page 12 of 22

Lexicography
also be observed among these works; for example, there are eight medical glossaries and
five terminologies for law, geography, and economics, and even more for political or
technical terms. The sources of the works are not always known to the user, and they are
seldom orderly and well organized. Finally, they can hardly serve any special purpose;
they (p. 310) may be considered specialized word lists, but they are by no means special-
purpose dictionaries. Windführ has put it thus:
While specialized dictionaries were rare appearance earlier … today there is an

ever-increasing list of specialized dictionaries. They are either specialized with
regard to a particular author, or they are topical. It reflects the virtual explosion of
activities and their diversification since the fifties in Iran.
(1975: 161, 2)
The reason behind this diversification is the lack of any systematic research on the needs
and priorities in this field of lexicography, and the absence of any methodology, general
planning and coordination of the activities.
Apart from the above-mentioned works, a variety of special-purpose dictionaries and

reference books have been produced during the past four or five decades, as follows.
In 1993 Kashāni published his reverse dictionary, Farhang-e Zānsu, though reminding the
old tradition of reverse arrangement of the entries back to Asadi’s Loghat-e Fors, it
followed a different purpose, i.e. serving linguistic research in its modern sense.
Several thesauri and dictionaries of synonyms were published in this period, such as
Farhang-e bayān-e andishe-hā (A Dictionary for the Expression of Ideas) by Mohsen Sabā
in 1987, with headwords and their synonyms, derivatives, and related expressions
arranged in clusters; Farhang-e Jāme’-e Vāzhegān-e Motarādef va Motazād-e Zabān-e
Fārsi, by Farajollāh Khodāparasti (A Comprehensive Dictionary of Persian Synonyms and
Antonyms) in 1997, with about 15,000 entries and their synonyms and antonyms
extracted from a corpus of Persian texts; and Farhang-e Teyfi (A Persian Thesaurus) by
Jamshid Farāruy in 1998, which is a translation or adaptation of Roget’s Thesaurus, and
has a similar structure.
Informal variants of the Persian language, except for a few minor works dating back to
the Ghājār dynasty period, were not considered worthy of recording. It was only in recent
decades that several dictionaries were produced mostly under the title āmiyāne (slang),
dealing with different varieties of Persian such as spoken (formal and informal),
colloquial, idiomatic, slang, and argot. (For more information about the colloquial form,
refer to Chapters 2, 3, 4, 5, 6, 10, and 15.)
Farhang-e ‘āmiāne (Slang Dictionary) by Yusef Rahmati (1951) contains about 3,000
popular words, expressions and proverbs collected by field work and arranged
alphabetically. Amirgholi Amini compiled his Farhang-e ‘avām (Dictionary of the Language
Page 13 of 22

Lexicography
of Common People) (1960) by collecting words and expressions mostly used by Tehrani
and Isfahani people.
Mohammad Ali Jamālzāde, one of the forerunners of the modern Persian novel and short
story writing, published his influential Farhang-e loghāt-e ‘āmiyāne (Dictionary of Slang
Words) with the collaboration of Mohammad Ja’far Mahjub in 1962. It contained 10,000
headwords with definitions and examples.
Farhang-e estelāhāt-e jāvānān (Dictionary of the Jargon of the Young) (2002) by Mahshid
Moshiri, and Farhang-e loghāt-e zabān-e makhfi (Argot Dictionary) (2003) by Mehdi
Samā’i, are collections of contemporary argot mostly invented and used by Tehrani
youngsters.
The first volume of Ketāb-e Kuche (The Book of Alley), a monumental project by Ahmad
Shāmlu, the famous Iranian poet, appeared in 1979 and until his death in 2000 only
fourteen volumes (less than a quarter of it) were published. His wife, Āydā Sarkisiān and
his son (p. 311) Siyāvash are cooperating to continue the publication of the remaining
sections. It is a collection of folk idioms, expressions, and proverbs arranged
alphabetically and it is more-or-less encyclopedic in nature.
While Abolhasan Najafi’s Farhang-e fārsi-ye ‘āmiyāne (Dictionary of Persian Slang)

published in 2008 was welcomed by many literary figures, it also received some critical
remarks on the ground that although the author claims that all the slang words and
expressions of the Persian language used in the last century, are collected in the
dictionary (Najafi 2008: 9), many of its entries actually belong to the literary and formal
variety of Persian and only a small portion of them are real slang words (cf. A’lam 2001).
Compiling concordances to Persian texts does not have a long history and goes back to
Fritz Wolff’s glossary of Ferdowsi’s Shāhnāmeh (Glossar zu Ferdosis Schāhnāme) (1934).
It was only in 1971 that the Iranian Academy of Language decided to compile a series of
concordances to the important classical Persian texts such as works of Sohravardi,
Avicenna, Biruni etc. In 1972 a project for the automatic compilation of concordances by
computer was launched (Cultural Studies and Research Institute 1987: 145–9). This issue
will be discussed later; also cf. Farhangestān-e Zabān-e Irān (Iranian Academy of
Language) (1974) and Windführ (1975).
11.3.4 Encyclopedias: traditional and modern
Encyclopedias as major reference works dealing with general knowledge or special fields
of science have a long history in the Iranian lexicography. The earliest work that is
usually referred to as a ‘Mazdean encyclopedia’, is Denkard ‘(lit. ‘Acts of the religion’),
written in Pahlavi, as a summary of tenth century knowledge of the Mazdean
religion’ (Gignoux 1994). For more information on Pahlavi, refer to Chapters 2 and 3.
During the next ten centuries, many works of an encyclopedic nature were produced.
Vesel and A’lam (1998) in a brief review of more than thirty encyclopedias under the
Page 14 of 22

Lexicography
category ‘Pre-modern’ introduce only some of them. ‘The first Persian author who
distinguished himself in the encyclopedic field was Abu Nasr Fārābi (d. 950) in his Ehsā’
al-’olum, composed in Arabic (ed. Amin, Cairo, 1931–48)’. Among later works are:
Dāneshnāme-ye Alā’i (Alā’i Encyclopedia) compiled by Avicenna in the eleventh century;
al-Tafhim, an astrology encyclopedia by Biruni c. 1028; Zakhire-ye khārazmshāhi, a great
medical encyclopedia, by Jorjāni in 1110; and ‘Ajāyeb al-makhlughāt va gharā’eb al-
mowjudāt (Marvels of Creatures and Strange Things Existing) by Zakariyā al-Ghazvini in
1280, just to mention a few examples of such works which should be regarded as
traditional.
Although Dhkhodā’s Loghatnāme is considered by some writers to be the first Persian

encyclopedia in its modern sense (cf. Vesel and A’lam 1998), it is in fact a comprehensive
dictionary of the Persian language with some entries with encyclopedic information.
Therefore, the new era of compiling modern encyclopedias really begins with the
publication of Dāyerat al-ma’āref-e Fārsi (The Persian Encyclopedia) in three volumes
(1966–95). It is a systematic and carefully designed work by a large team of experts
under the editorship of Gholāmhosein Mosāheb. It was first intended to be a translation
of The Columbia-Viking Desk Encyclopedia, but later they decided to reorganize it by
adding many original articles about Iranian culture and literature. Dānesh-nāme-ye Irān o
eslām (Encyclopedia of Iran and (p. 312) Islam) was a Persian translation of the second
edition of The Encyclopedia of Islam, which was not completed, and only eight fascicles of
it were published between 1975 and 1978.
Apart from several minor encyclopedic works dealing with special subjects or intended
for special groups of users such as children, young people and women,
the only encyclopedic work in Persian exclusively devoted to all aspects of Persia,
both in historical and contemporary perspectives, is the large two-volume (2142
pages) Irānshahr, published by the UNESCO National Commission in Iran (Tehran,
1963–64).
(Vesel and A’lam 1998)
Mention should also be made of The Encyclopedia of Persian Language and Literature,
which was recently finished under the supervision of Esmail Sa’adat. It is published by
the Academy of Persian Language and Literature in six volumes (2005–2016) and
contains more than 2,000 entries on Persian literature, grammar, mythology, and proper
names.
At the present time, there are several important encyclopedias in the process of
compilation: Dānesh-nāme-ye jahān-e eslām [Encyclopedia of the Islamic World];, so far
(2016) twenty volumes of it have appeared; Dāyerat al-ma’āref-e bozorg-e eslāmi [The
Larger Islamic Encyclopedia] with a Twelver Shi’ite approach which, so far (2016) with
twenty-two volumes published, has reached to the entry word ‘khandagh’. The
Encyclopedia Iranica is dedicated to the study of Iranian civilization in the Middle East,
the Caucasus, Central Asia and the Indian Subcontinent. Its general editor is Ehsan
Page 15 of 22

Lexicography
Yarshater and is published in English by the Columbia University, New York. It is

produced both in printed form (vol. 15 was published in August 2011) and online digital
format.
11.4 Recent developments

All of the works mentioned above (except the last section dealing with modern
encyclopedias), can be classified as ‘traditional’ in contrast with the most recent
approaches to lexicography in Iran as ‘modern’. A new movement to improve the form
and content of dictionaries can be traced from three decades ago when three major
factors were ‘gradually’ introduced into this field. These were in line with and followed
the advancements that happened in western lexicography some decades earlier:
a) the implementation of new achievements in theoretical and applied linguistics

related to lexicography;
b) the use of computer techniques and information technology; and
c) corpus-based approach to lexicography.
11.4.1 Persian lexicography and linguistics
In a survey of lexicography in the western countries, al-Kasimi shows that around the
middle of the twentieth century, a new tendency towards the application of linguistic
theories in this field had started (1970: 5–9). Even at that time, however, there was a
need for ‘the study dealing with the linguistic treatment of methodological issues in
bilingual lexicography’ (p. 313) (ibid.). A similar movement in Iran started in the 1980s
when some Iranian linguists turned their attention and career to lexicography. A new
course in lexicography was introduced in the MA programme in linguistics, many
research projects on different aspects of this field were conducted, papers and articles
were presented regularly and frequently to linguistics conferences and journals, and
seminars and workshops on lexicography were held to discuss all of the theoretical issues
in the light of new theories and findings in linguistics. The evidence of this tendency is
seen in the compilation of new monolingual, bilingual, and specialized dictionaries by
known Iranian linguists (e.g. Bāteni 1993; Deyhim 2000; Haghshenās 2001; Āryānpour
and Āssi 2003; Sādeghi 2006).
The compilation of the English–Persian dictionary Farhang-e Mo’āser took Bāteni seven
years and was finished in 1993. One of its main features is that the meaning of every
English headword is defined not by definitions but by providing as many Persian
equivalents as possible. This feature makes it very helpful to the translators of English
Page 16 of 22

Lexicography
texts. Another feature is the use of a simplified but controversial transcription system
which is unique to this dictionary.
At around the same time, another linguist, A. M. Haghshenās embarked on a project for
the compilation of a bilingual English–Persian dictionary based mainly on three English
learners’ dictionaries (i.e. Oxford Advanced Learner’s Dictionary; Longman Dictionary of
Contemporary English; and Collins Cobuild Advanced Learner’s Dictionary). Most of the
definitions of over 55,000 entry words are translations from the sources which are
supported by Persian equivalents. After about fifteen years, with the collaboration of a
group of assistants, Haghshenās finished and published it with the title Farhang-e Hezāre
(Millennium Dictionary) in 2000. In comparison with Farhang-e Mo’āser, this dictionary
seems more suitable for learners of English.
Deyhim, in her Farhang-e Āvā’i-ye Fārsi (Persian Pronunciation Dictionary) (2000),

recorded and carefully transcribed the pronunciation of about 31,000 common standard
Persian words by fifteen Tehrani informants. In cases of words with several
pronunciations, the variants are sorted by their frequencies.
In 2003, Āryānpur and Āssi published a comprehensive Persian–English dictionary in four

volumes which was, for the first time, based on a parallel Persian–English corpus. The
definitions were supported by many equivalents and examples extracted from the corpus.
Farhang-e Emlāi-ye Khatt-e Fārsi (A Dictionary of Persian Orthography and Spelling)

(2006), compiled by Sādeghi and Zandi Moghaddam, is a list of about 33,000 words with
their standard spellings according to the rules proposed by the Academy of Persian
Language and Literature.
11.4.2 Persian lexicography and computing
In recent years, there have been great advances in Iranian lexicography with regard to
the application of computer methods, changing it from an old tradition into a modern
scholarly practice. Early attempts to use computer techniques in compiling dictionaries
were made in the Iranian Academy of Language (the second Farhangestān) in 1973 to
1977 by developing and accomplishing a project to produce computerized concordances
to eight Persian texts (Farhangestān-e Zabān-e Irān = Iranian Academy of Language
1974; Cultural Studies and Research Institute 1987: 145–9). At the same time, Susan
Hockey produced her concordance (p. 314) to the poems of Hafez (Hockey 1973) and four
years later, Bo Utas published his concordance to Tarigh-ot-Tahghigh of Sanā’i Ghaznavi
(Utas 1977).
It took several years until Iranian lexicographers and publishers began to use computers
first as simple word processors and later as advanced and sophisticated devices for data
collection, data processing, and presentation of the final output in a variety of formats.
Vazhegān-e Gozide-ye Zabānshenāsi (A Selective Lexicon of Linguistics), published in
1996, ‘is the first dictionary in Persian that has utilized the achievements of
Page 17 of 22

Lexicography
computational linguistics’ (Tāheriān 1996: 124). This specialized bilingual dictionary was
compiled on the basis of a parallel corpus of more than 130 linguistic texts in two
formats: traditional paper (book) form; and electronic (software) form. The software
offered more facilities for searching a rich database with information about the terms and
all their equivalents, their frequency, the author and the date of their first appearance
(see Chapter 17 for more discussion on frequency). Nowadays, producing electronic and
online dictionaries is becoming more and more common. Even the electronic version of
Loghatnāme has been released on CD and now is also available online.
11.4.3 Persian lexicography and corpus linguistics
From the earliest Persian dictionaries like Loghat-e Fors and its followers, including
quotations and examples from actual usage and literary sources came into practice.
Furthermore, during the last century, many scholars of Iranian languages have
undertaken the investigation and description of Iranian dialects mostly on the basis of
collecting data by field work. Although these activities and other similar studies up to the
present time may be considered ‘corpus-based’ in its old sense, in most cases, there were
limitations and drawbacks. The insufficiency of data in spite of the need for more
extensive information, and in case of gathering huge data, the impossibility of handling
them manually are some of them. Moreover, the single-purpose, single-user, and usually
single-compiler nature of such corpora made this activity very time consuming,
expensive, and even frustrating.
Data extracted either from written literary texts or collected manually through field work,
though traditionally called ‘corpus’ and the work done with such data therefore called
‘corpus-based’, the modern notions of corpus, corpus-based, and corpus linguistics belong
to the last three or four decades. The use of computers for compiling, organizing, and
processing huge corpora led to a new definition of corpus. Nelson Francis wittingly
named those traditional types as ‘language corpora BC’1 (Francis, 1992: 17); thus, we
may refer to the new generation as corpora AD.2
The use of corpus in its modern sense can be traced back to the progressive projects
initiated by the Iranian Academy of Language (the second Farhangestān), the only
Language Planning Organization in Iran during the years 1970–8. It was realized that for
most of literary, linguistic, and terminological studies there was an urgent need to re-
examine the huge body of the past and present Persian literature by providing
concordances to most important texts (Rubin 1979: 48–50).
Early attempts to prepare modern electronic corpora began from 1973 in the
(p. 315)
Academy with a project for compiling computerized concordances to eight selected

classical literary texts. The output was intended to be used in a corpus-based term-
selection process. In 1978 a project for compiling computerized bilingual dictionaries
began on the basis of the first drafts of Hartmann’s theory of Contrastive Textology, later
published as Hartmann 1980. In this PhD project, seven English linguistic texts with their
Page 18 of 22

Lexicography
Persian translations were used in a database as parallel texts which could be considered
as the first parallel corpus for the Persian language (Āssi 1986). That was the starting
point for a series of activities to pave the way for establishing a large, multipurpose,
multifunction database containing a series of corpora for different periods and varieties
of Persian (Āssi 1994). The activities consisted of introducing the flourishing
interdisciplinary field of corpus linguistics and its impact on most of the theoretical and
applied branches of linguistics especially lexicography (Āssi 2003). It was only in 1993
that the project for developing a large database for all periods of the Persian language
(PLDB3) was started and after ten years its contemporary section was prepared and
presented on the internet (Āssi 1997). Later, a new trend of corpus development in Iran
led to the appearance of a series of corpora of different sizes and types. A dedicated site
for collecting and distributing information about most of the Persian corpora lists thirty-
eight items (Database Reference for the Persian Language 2015).
11.5 Current projects

A search in the reference section of the National Library of Iran resulted in a list of 941
titles for monolingual and bilingual dictionaries with Persian as their source language.
Another search for bilingual dictionaries with Persian as their target language revealed a
list of 1,146 works with different Western, Asian, and African languages as their source
language (July 2015). Apart from the quantity, a survey of these dictionaries reveals the
fact that only a few of them have utilized one or two of the three factors of modern
lexicography mentioned above, and those few have all been published during the last two
decades. It is also during these years that some publishers have concentrated on
dictionary making by undertaking research projects leading to the compilation of new
specialized and special-purpose reference works. Some of them have established large
dedicated organizations with planning, editing, and research departments, in which many
lexicographers, linguists, writers, editors, and computer experts cooperate to produce
dictionaries of high quality both in form and content.
An example of such an enterprise is a project for compiling An Advanced Learner’s

Dictionary of Persian, with a team of more than twenty-two linguists, editors and
lexicographers in addition to the technical and support staff which took some ten years to
finish (Āssi forthcoming). Some of the main features which are mentioned in its
introduction and observed in the body of the dictionary are:
• By defining the users of the dictionary and thus recognizing their needs, and by
taking the lexicographical priorities of the society into account, a special-purpose
dictionary (p. 316) with the aim of serving the needs of the advanced learners of
contemporary Persian was designed.
Page 19 of 22

Lexicography
• By utilizing a 60-million-word corpus of the Persian language, the dictionary can be

considered a corpus-based work in its modern sense.
• As a fully computerized project, computer techniques were used in every stage of the
compilation, i.e. data collection, data processing, and final presentation.
• The process of entry selection was based on several linguistic, pragmatic, and
statistical factors to provide an optimized list of headwords.
• Relative frequency of every headword was indicated by a simple graphic symbol.
• Formal and colloquial forms of pronunciation of headwords were presented where
applicable.
• Meaning discrimination and part of speech (POS) tagging were done with regard to
the word usage indicated by the context and corpus.
• Definitions of the entry words were given in the framework of simple sentence
patterns designed for this purpose.
• A limited list of defining words was drawn up by implementing several linguistic and
statistical criteria.
• Real examples were taken from the corpus to support the meaning and show the real
usage.
• Many clarifying illustrations were inserted wherever necessary.
• Different kinds of visual aids such as graphic symbols, colours, and fonts were used
wherever required.
• Additional information on grammar and usage by referring to explanatory and
descriptive sections was given either in the introduction or in the body of the
dictionary.
• Synonyms, antonyms, collocations, and idiomatic expressions for many headwords
were included.
• It can be presented online and linked to the corpus through hyperlinks.
Nowadays, lexicographers and publishers are becoming more familiar with the new
developments in this field and new dictionaries with advanced forms and contents are
produced and dictionary users are becoming more educated about selecting better
dictionaries and their use.
11.6 Summary
A chronological survey of Persian lexicography shows that the earliest dictionaries
survived from seventh and fourth centuries BC were actually bilingual wordbooks. After
working on Arabic language for several centuries, it was around the eleventh century that
Iranians turned to the Persian language by compiling a series of monolingual dictionaries;
Page 20 of 22

Lexicography
a trend which continued to the twentieth century, while from the seventh century
compilation of bilingual and multilingual dictionaries started and came along with it to
the present time.
Typologically, a full range of monolingual dictionaries for the Persian language, from
concise word lists to general comprehensive dictionaries, from specialized lexicons and
(p. 317) concordances to huge encyclopedic reference works were compiled. A similar
scene can be observed in the field of bilingual lexicography. All types of general and
specialized dictionaries with a variety of languages and in different fields of knowledge
were produced.
With regard to recent works and numerous ongoing projects, and the general tendency
towards the application of new methods and techniques, a rapid improvement in the
quality and quantity of Persian lexicography is quite foreseeable.
Notes:
(1) Before the use of computers.
(2) After the digital age.
(3) Persian Linguistic Database.
Seyed Mostafa Assi
Seyed Mostafa Assi has a PhD in Linguistics (lexicography and computational

linguistics) from the University of Exeter, 1989, and is Professor and Dean of the
Faculty of Linguistics at the Institute for Humanities and Cultural Studies, Tehran,
Iran. His areas of research are English and Persian linguistics and lexicography. He
is the author and co-author of more than 70 papers and 18 books, such as A Selective
Lexicon of Linguistics (1996), A Comprehensive Management Dictionary (1998),
Persian Equivalents for Computer Terms (2002), and A Comprehensive Persian–
English Dictionary (4 volumes) (2003). He is also the founder and director of the
Persian Linguistic Database, available at http://pldb.ihcs.ac.ir.
Page 21 of 22

Lexicography
Page 22 of 22

Academy of Persian Language and Literature


In this chapter, the Academy of Persian Language and Literature is introduced in the
context of an eighty-year-old history of the establishment of the Academy in Iran. The
chapter intends to describe the atmosphere which motivated the need for the emergence
of this institution in Iran. It seems to be fair to claim that word selection, and more
technically terminology, has been the central concern of the three Iranian academies of
the Persian language. It also seems to be just to evaluate the contributions and activities
of the first and the third academies in Iran more fruitful both quantitatively and
qualitatively than the endeavours of the second Iranian academy. The experiences which
Iran has gained in the last eight decades could be relied on to move forward from a stage
of language reform activities towards a more comprehensive phase of developing a
language policy for the country in future.
Keywords: Iranian Academy, Iranian Academy of Language, Academy of Persian Language and Literature,
language reform, word selection and terminology
12.1 Introduction
THIS chapter, in addition to the present introductory section, contains three major
sections and a final summary section. In section 12.2, an overview of the developments
which led to the establishment of the first Iranian academy for word selection eighty
years ago and the activities and contributions of that academy will be presented. In
section 12.3, the endeavors of the second Iranian academy will be highlighted. In section
12.4, the concerns and achievements of the third Iranian academy will be introduced.
Page 1 of 11

Sections 12.2 and 12.3 establish the historical background within which the third
academy has evolved and enhanced. Section 12.5 is the summary section.
12.2 Iranian Academy

It is now eighty years since the first Iranian Academy was officially founded. This
Academy called in Persian færhængestan-e iran, literally ‘Academy of Iran’, was based on
the model of the French Academy. The Persian word færhængestan1 was adopted for the
French word Académie.2 In the preface to the first issue of name-ye færhængestan (lit.
‘letter of (p. 319) Farhangestan’) which was the journal of the Academy and was
published from 1943 (1322 h.š.) until 1947 (1326 h.š.), it is clearly stated that many years
before the Academy of Iran was founded and especially when Iranian people were
familiarized with the western civilization the idea of forming a society of scientists and
scholars was felt. In 1934 (1313 h.š.), about five months before the Academy of Iran was
founded, Ministry of Education decided to organize societies composed of scientists and
scholars from different disciplines. As a first step, the ministry started the preparations
for the formation of ‘Medical Academy’. In the preliminary sessions of akademi-ye tebbi
(lit. ‘academy of medical’), this would-be Academy, the members of which were a number
of famous physicians and some other scholars, were convened at the Faculty of Medicine,
the name færhængestan as the equivalent for ækædemi ‘Academy’ was chosen (Nāmeye
Farhangestān 1943 [1322 h.š.]: 6). A constitution was also written for færhængestan-e
tebbi ‘Medical Academy’ in which ten tasks were foreseen. Writing and translating
medical books, compiling a medical dictionary, and coining medical terms were among
those tasks (p. 7). But before this Academy became a reality, ‘in the winter of 1313 h.š
[1934] suddenly extremist ideas about language reform and Persian orthography were
proposed’, which worried a number of influential scholars, literary men, and moderate
statesmen (p. 8). These extremist ideas which had penetrated newspapers, government
offices, and official administrative letters were purist in nature. They planned to expel
even normalized familiar Arabic words and substitute for them obsolete words and words
coined based on ancient Iranian languages which were much harder for the majority of
Iranians to learn (pp. 8–9). In this chaotic and unscientific word coining atmosphere, with
the initiative of the Ministry of Education the formation of færhængestan-e iran was
proposed.
Sedigh (1943 [1322 h.š.]) is an article in the fourth issue of name-ye færhængestan. He
wrote this article as a supplement to the preface of the first issue of the mentioned
journal in order to provide additional information on the formation of the Academy of Iran
(p. 1). He names and briefly introduces the activities of the two institutions which ‘before
the formation of færhængestan had fundamental roles in creating such an institute’ (p. 1).
He intends to make it clear that ‘need rather than mere desire and wish motivated the
formation of it [namely the Academy]’ (p. 1). Sedigh mentions the formation of a
committee consisting of the representatives of the Ministry of Education and Ministry of
Page 2 of 11

War in 1924 (1303 h.š.) (p. 2). This committee, in which Sedigh was a member, had
weekly meetings and in about four months selected about 300 Persian words mostly
related to military and aviation for French terms (e.g. hævapeyma ‘avion’, hævanæværd
‘aéronaute’, forudgah ‘aérodrome’, xælæban ‘pilote’, vabæste-ye nezami ‘attaché
militaire’, bomb ‘bombe’, yureš ‘charge’). The reader may note that a number of the
Persian words are formed based on Persian stems and suffixes (e.g. hævapeyma, forudgah),
one is a coinage xælæban, one is a French borrowing bomb, one has Arabic stem nezam,
and one has a syntactic pattern vabæste-ye nezami in which -ye (a variant of -e) is the
Ezafe particle which is a head marker linker in Modern Persian. This suggests that the
committee at that stage had a moderate and realistic approach to word selection in
Persian. The second institution that Sedigh mentions is darolmoællemin-e ‘ali ‘Teachers’
Training School’ (p. 3). In 1932 (1311 h.š.). when Sedigh was the head of the Teachers’
Training School, he organized a number of societies including ‘A Society for Coining
Scientific Words (p. 320) and Terms’ (p. 3). This particular society was a student society
which had weekly meetings under the supervision of one of the professors of the school
and was active until mid-1941 (1319 h.š.). The society had different branches including
natural sciences, mathematics, physics and chemistry, literature, and philosophy (p. 4).
Sedigh reports that the Society for Coining Scientific Words and Terms within eight years
of its activity selected 3,000 words out of which 400 words are used by the professors and
teachers in the books. As an illustration, he has quoted a number of these words (e.g.
gærmasænǰ ‘calorimètre’, hæmrixt ‘isomorphe’, peyvæste ‘continu’, napeyvæste
‘discontinu’, tæpeš ‘pulsation’) (p. 5).
færhængestan-e iran, generally known as færhængestan-e ævvæl ‘the First Academy’

started its activities in 1935 (1314 h.š.) with prime minister, also a prominent scholar,
Mohammad-Ali Forughi as its key initiator and president. This Academy was active until
1942 (1321 h.š.). Its constitution contained twelve articles as summarized and translated
by me from the first issue of name-ye færhængestan (1943 [1322 h.š.]: 1, 2):
(1) To compile a Persian dictionary;

(2) To select terms for all aspects of life with preference given to Persian words and
terms;
(3) To purify Persian through discarding its inappropriate foreign words;
(4) To write a grammar and develop procedures and rules for coining Persian words
as well as accepting or rejecting foreign words;
(5) To collect technical terms used by craftsmen and artisans;
(6) To collect words and terms from classical books;
(7) To collect provincial words, terms, poems, proverbs, tales, anecdotes, and songs;
(8) To search for and identify classical books and to encourage their publications;
(9) To guide the general public to the essence of the literature and the quality of the
prose and poetry and to suggest guidelines and yardsticks;
(10) To encourage poets and writers to create literary masterpieces;
(11) To encourage scholars to compose and translate useful books in eloquent and
familiar Persian;
Page 3 of 11

(12) To study reforms in Persian orthography.
The First Academy’s approved terms were published in a series of books entitled
važehaye now3 ‘new words’. The total number of the approved terms published in the
final issue of važehaye now (1940 [1319 h.š.]) which is considered as the last official
report of the Academy amounts to about 1,700 terms (Rusta’i 2001 [1380 h.š.]: 171).
Rusta’i quotes Badre’i (1976 [1355]) who has reported that the total number of the
approved terms by the First Academy until mid-1941 (1320 h.š.) amounts to about 2,000
words (Rusta’i 2006 [1385 h.š.]: 235). Rusta’i says that a large number of the missing
words were approved before mid-1941 (1320 h.š.) but the sociopolitical circumstances of
that period, including the occupation of Iran by the allies’ forces did not allow the
publication of the new volume of važehaye now. However, these approved terms were
dispatched to the relevant institutions through official communiqués. These communiqués
as well as other documents were thoroughly studied and reported on (p. 321) in Rusta’i
(2006 [1385]: 236–3). As a consequence of the mentioned major political developments in
Iran in 1941 (1320 h.š.) and the ensuing instabilities, the word-selection activities of the
First Academy practically terminated.
The First Academy’s language reform activity and objectives were in practice limited to
word selection and a series of important presentations by the permanent council
members of the Academy which were published in the Academy’s journal name-ye
færhængestan (lit. ‘the Letter of the Academy’). It is noteworthy that the first issue of the
journal was published in 1943 (1322 h.š.) and its last issue appeared in 1947 (1326 h.š.).
In its word-selection endeavour, the needs of various fields and disciplines were taken
into consideration. These fields and disciplines are listed below, based on the degree of
the success of the approved terms, namely their acceptance and frequent use, as
generally evaluated in Rusta’i (2006 [1385 h.š.]: 63–7):
(1) military and police terms

(2) administrative, banking, political, and municipal terms
(3) zoology and botany (and generally natural sciences) terms
(4) geographical names
(5) medical terms
(6) geology terms
(7) physics and engineering terms
(8) judicial terms
(9) meteorological terms
(10) mathematical terms
(11) psychological terms
(12) sport terms (which were limited to horse-racing and football)
færhængestan-e ævvæl adopted a moderate position on word selection and consciously

avoided purism. On the issue of the success of færhængestan-e ævvæl, Paul (2010: 102)
has expressed the following remarks:
Page 4 of 11

The Persian language has always been receptive to lexical influences from various
languages—Arabic in the early and Mongolian and Turkic in the late medieval
periods and French and English in modern times … seen in this light, the
Farhangestān would be only one out of many actors, in the thousand-odd years of
the history of the Persian language, that effectively enriched the vocabulary of
Persian. This time it was not done with words from outside, but with a selection of
words which seem to be indigenous …
12.3 Iranian Academy of Language

Although official attempts were made to found a new Academy in 1961 (1340 h.š.), they
were not materialized (Rusta’i 2006 [1385 h.š.]). The establishment of færhængestan-e
zæban-e iran, The Iranian Academy of Language (lit. ‘academy-of language-of Iran’),
generally known as færhængestan-e dovvom, The Second Academy, was officially
approved in 1971 (1349 h.š.) but its activities started a year later. After the victory of the
Islamic revolution in 1979 (1357 h.š.), the second Academy was reorganized and was
named ‘Institute for Cultural Studies’. Now, this institute has become a major research
and educational centre in (p. 322) humanities in Iran containing many departments with
only graduate students and is called pæžuhešgah-e4 ʼolum-e ensani5 væ motaleʼat-e
færhæng-i6 (lit. ‘research centre-of sciences-of humanities and studies-of cultural’)
‘Institute for Humanities and Cultural Studies’.
During the seven years of its activities with Dr Sadegh Kia, a scholar particularly
interested in dialectology and ancient and middle Iranian languages who advocated
purism, as its president, the Iranian Academy of Language was functioning with nine
research departments (see Chapter 13 for more information about dialectology). The first
and its most important research department which had gradually managed to form
twenty-five word-selection committees in various disciplines, as listed below, was the
terminology department:
(1) education and psychology;

(2) economics and commerce;
(3) nuclear energy;
(4) informatics and computer sciences;
(5) medicine;
(6) geography;
(7) law and administration;
(8) mathematics;
(9) chemistry;
(10) political science and international relations;
(11) natural sciences;
Page 5 of 11

(12) physics;
(13) music;
(14) cartography;
(15) engineering and industry;
(16) toponymy;
(17) agriculture and husbandry;
(18) philosophy and social sciences;
(19) accounting and finance;
(20) linguistics (language and literature);
(21) culture and Waqf;7
(22) library science;
(23) scientific and educational texts;
(24) metallurgy;
(25) art and archeology (Asi 2005 [1384 h.š.]: 282–3)
Asi (2005 [1384 h.š.]: 288) has reported that until late 1978 (mid-1357 h.š.) about
(p. 323)
30,000 words and terms were studied and more than 57,000 Persian equivalents had
been suggested for them, but the Academy’s high council had only approved 1,100 of
them and out of that number just 151 words and terms were published in the Academy’s
booklets. He has considered the slow process of the approval of the terms by the high
council of the Academy and undue hesitations and indecisions in the publication of the
approved terms as two of the major weaknesses of the Second Academy’s activities (p.
290). A number of the well-accepted and currently commonly used terms which the
Second Academy has approved are listed in Asi (2005 [1384 h.š.]: 285–7). A selected
number of them are as follows: čekide ‘abstract’, pæzireš ‘admission’, negarxane ‘art
gallery’, pædidavær ‘author’, fehrestnevisi ‘cataloguing’, nemudar ‘chart’, rayane
‘computer’, hæmayeš ‘congress’, virastar ‘editor’, ǰæšnvare ‘festival’, nemaye ‘index’,
dærundad ‘input’, resaneha-ye hæmegani ‘mass media’ (lit. ‘transmitters-of general
public’), behine ‘optimum’, šomargan ‘triage’ (French), gæšt ‘tour’, and tærabæri
‘transport’.
As it was mentioned earlier, the Academy of Language had nine research departments,
one of which was the terminology department. The other departments were as follows:
department of Persian words; department of Persian grammar and orthography;
department of ancient and middle Iranian languages; department of dialectology;
department of the relation between Iranian languages and other languages of the world;
department of onomastics; department of spoken varieties of Persian language; and
department of terms for crafts and professions (Asi 2005 [1384 h.š.]: 282). The
researchers who were affiliated with the department of ancient and middle Iranian
languages and the department of dialectology later on produced important works which
were published by the Institute for Cultural Studies/ Institute for Humanities and Cultural
Studies.
Page 6 of 11

12.4 Academy of Persian Language and

Literature
færhængstan-e zæban væ ædæb-e farsi (lit. ‘academy-of language and literature-of Farsi’)
‘Academy of Persian Language and Literature’, also known as færhængestan-e sevvom
(lit. ‘academy-of third’) ‘The Third Academy’, was ratified in 1989 (1368 h.š.) and after its
constitution was approved by The High Council of the Cultural Revolution, it officially
commenced its activities a year later (in 1990 [mid-1369 h.š.]) and Dr Hasan Habibi, a
scholar who had received his postgraduate education in the field of sociology and law in
France and was at that time the first vice-president of the Islamic Republic of Iran, was
elected as the president of the Academy of Persian Language and Literature. The second
president of the Academy of Persian Language and Literature has been Dr Gholam-Ali
Hadad Adel, a professor of philosophy, poet, and statesman. The two presidents of
færhængestan-e sevvom have also served as the heads of the department of word
selection and terminology of this Academy and have advocated and practised a moderate
and balanced perspective to the tasks of the department. It is fair to say that the
permanent members of the High Council of the Academy have taken and supported a
realistic approach to the issue of word selection and terminology and have consciously
avoided extremist and puristic tendencies in this (p. 324) enterprise. This department was
formed in 1991 (1370 h.š.). There are now exactly forty-three specialized and scientific
committees under the supervision of the word-selection and terminology department
which are seriously engaged in this aspect of language reform activity. The members of
each committee are the university professors or other scholars who are particularly
interested in promoting the Persian language as the language of science. There were also
six other similar committees which have already completed their word-selection and
terminology activities. These numbers (altogether forty-nine) indicate the wide range of
disciplines and fields which have participated in this endeavour. The total number of the
terms so far approved by the word-selection councils of the Academy whose members are
a number of permanent or affiliate members of the Academy as well as a number of
experienced specialists in terminology amounts to more than 45,000 terms. These terms
are published in twelve volumes entitled A Dictionary of the Approved Terms by the
Academy and are also available in the Academy’s website. When the total number of the
approved terms of a particular field has passed 1,000, a special volume named The One-
thousand Words has been published by the Academy. In a few fields, more than one
volume of this kind has been published. Up to now fifteen volumes of The One-thousand
Words are printed. These volumes cover the terms of medicine (two volumes);
transportation (two volumes); biology; chemistry; military sciences; physics; geosciences
(two volumes); agriculture and natural resources; art; humanities and social sciences;
and engineering sciences (two volumes). So far the word-selection and terminology
department has organized two conferences on the issues of word selection and
terminology. The first conference was held in 1999 (1378 h.š.) and the second one in 2003
(1382 h.š.). The proceedings of the two conferences were published by the Academy in
Page 7 of 11

2001 (1380 h.š.) and 2005 (1384 h.š.) respectively. The establishment of a research
institute in the field of terminology and word selection is the most recent activity of
færhængestan-e sevvom. This institute accepted M.A. students in terminology in
September 2015.
In addition to the word-selection department, færhængestan-e sevvom established the

following research departments in 1991 (1370 h.š.): department of lexicology, department
of Persian grammar and orthography, department of (Ancient and Middle) Iranian
languages, and department of dialectology. Since then, the following research
departments have been added: department of encyclopedia of Persian language and
literature (1992 [1371 h.š.]), department of Persian language and literature in the
subcontinent (1993 [1372 h.š.]), department of comparative literature (2000 [1379 h.š.]),
department of contemporary literature (2004 [1383 h.š.]), department of classical Persian
texts’ editing (2010 [1389 h.š.]), department of the Islamic revolution’s literary works
(2011 [1390 h.š.]), department of teaching Persian language and literature (2011 [1390
h.š.]), and department of language and computer (2011 [1390 h.š.]).
The Academy of Persian Language and Literature has published more than 150 titles of
books and encyclopedias. A large number of them are the products of the above-
mentioned research departments or are the studies completed by the permanent
members of the Academy. The Academy also publishes the following journals: name-ye
færhængestan ‘The Letter of the Academy’, which is its main journal; dæstur ‘Grammar’,
which deals with morphology and syntax of Modern (and classical) Persian;
færhængnevisi ‘Lexicography’; ædæbiyat-e tætbiqi ‘Comparative Literature’ (lit.
‘literature-of comparative); šebh-e qarre ‘Subcontinent’ (lit. ‘sub-of continent); ædæbiyat-
e enqelab-e eslami ‘Islamic Revolution’s (p. 325) Literature’ (lit. ‘literature-of Revolution-
of Islamic’); and zæbanha væ guyešha-ye irani ‘Iranian Languages and Dialects’ (lit.
‘languages and dialects-of Iranian’). færhængestan-e sevvom held two international
conferences on the occasion of ‘the Millennium of Shāhnāma of Ferdowsi’ in 2011 (1390
h.š.) and on the occasion of ‘the Announcement of Ghazni as the Cultural Capital of the
Islamic World’ in 2012 (1391 h.š.). It seems reasonable to suggest that the Academy of
Persian Language and Literature has now gained the necessary requirements and
capacity to move from the stage of language reform activities to the more general and
comprehensive phase of language policy. For more information on language policy, refer
to Chapters 13, 14, and 15.
Page 8 of 11

12.5 Summary
In this chapter, I have described the activities and achievements of the Academy of
Persian Language and Literature by reviewing the activities and contributions of its two
predecessors. I also provided an overview of the state of the art before the first Iranian
Academy was founded in 1935 (1314 h.š.). It was mentioned that there were official and
unofficial word-selection and word-coining activities sometimes coupled with extremist
modernization and puristic tendencies which concerned a number of moderate influential
scholars and statesmen. This atmosphere was managed through the establishment of the
First Academy in Iran. The First Academy approved and published 1,700 terms for
various fields and disciplines. As a consequence of the sociopolitical instabilities which
resulted from the occupation of Iran by the allies’ forces in 1941 (1320 h.š.), the word-
selection and word-coining activities of the First Academy terminated. But the members
of the academy began the publication of the academy’s journal nameye færhængestan in
1943 (1322 h.š.). The journal was published for four years. In 1953 (1332 h.š.), when the
last president of the academy passed away, the academy was officially and practically
terminated. The establishment of the Second Academy, the Iranian Academy of Language,
was ratified in 1971 (1349 h.š.) but its activities began the following year. The Second
Academy was active until the victory of the Islamic Revolution in Iran in 1979 (1357 h.š.).
Then it was reorganized and was named Institute for Cultural Studies and was later
renamed Institute for Humanities and Cultural Studies. Although the Second Academy is
reported to have proposed more than 57,000 Persian equivalents for about 30,000 foreign
words and terms, the academy’s high council had only approved 1,100 of them and out of
that number just 151 words and terms were officially published by the Second Academy.
The Third Academy, the Academy of Persian Language and Literature, commenced its
activities in 1990 (1369 h.š.), one year after its constitution had been approved. Since
then the academy has established twelve research departments, it has published seven
journals and more than 150 books and encyclopedias, and it has recently established the
terminology and word-selection research institute, which had graduate students in this
field from September 2015 (1394 h.š.). The word-selection research department of the
Third Academy has approved and officially published 45,000 terms in various fields and
disciplines. It seems fair to say that the Third Academy has benefited from both the
positive and the negative experiences of the two previous academies and therefore has
adopted a moderate, realistic, and balanced approach to word-selection and
terminological issues. Of (p. 326) course, the fact that the Third Academy, like its
predecessors, has limited itself to the issue of word-selection and has not incorporated
the task of the research on the syntax and discourse of Persian as a language of science is
a shortcoming of this academy. Undoubtedly, there are a number of other major language
and literary issues that the academy can and should address. Thus, it seems reasonable to
suggest that it is time to move beyond language reform activities of the past eight
decades and engage in the more general and comprehensive task of language policy.
Page 9 of 11

Notes:
(1) On the etymology of the word færhængestan, which is also used as such in Middle
Persian (Pahlavi), one can refer to Purdavud (1943 [1322 h.š.]).
(2) Paul (2010) has expressed a different view on the model which the Iranian Academy
was based on. He mentions Reza Shah’s visit to the Republic of Turkey in 1934 and that
he was deeply impressed by ambitious political, social, and cultural reforms which
Mustafa Kemal Atatürk, the Turkish leader, had initiated some years earlier. Paul (2010:
81) writers ‘[a]fter returning to Iran, Rezā Shah ordered a Persian language academy to
be founded to modernize and purify the Persian language, following the Turkic model.’
Perry (1985) in his meticulous review of ‘Language Reform in Turkey and Iran’ refers to
some of the modernization attempts in modern times beginning in the nineteenth century
in Ottoman Empire and in Qajar dynasty. In regard to Iran, he mentions that ‘[i]t was only
from the time of the Constitutional Revolution of 1906 up to World War 1 … that literary
societies arose in Tehran and the provinces with the purpose of promoting modern ideas
and coining Persian words to express them’ (Perry 1985: 296–7).
(3) The morphemic segmentation and glosses of this title is as follows:
(4) pæžuh-eš-gah-e
research-nominalizer-place-ezafe
(5) ensan-i
human-relative adjective
(6) færhæng-i
culture-relative adjective
(7) A religious endowment in Islamic law.
Mohammad Dabir-Moghaddam received his PhD in linguistics from the University of

Illinois at Urbana-Champaign in 1982. He is Professor of Linguistics in Allameh
Tabataba’i University (Tehran). He is a permanent member of the Academy of Persian
Language and Literature. He is the author of Theoretical Linguistics: Emergence and
Development of Generative Grammar, Studies in Persian Linguistics, Typology of
Iranian Languages (2 volumes), and a number of articles.
Page 10 of 11

Page 11 of 11

Sociolinguistics

Sociolinguistics
Yahya Modarresi
Print Publication Date: Aug 2018 Subject: Linguistics, Sociolinguistics, Languages by Region
The present chapter is an overview of sociolinguistic studies in the Persian-speaking

territory with an emphasis on Iran and is divided into two general parts. In the first part,
different aspects of Persian sociolinguistics are discussed at the macro level and issues
such as dialect studies, and language contact in the Persian-speaking area are briefly
reviewed. In the second part, some issues such as social variations in Persian, change in
progress and standard varieties of Persian are briefly analysed at the micro level. The
main focus of this chapter therefore, is on language diversity in a multilingual area in
general, and social and regional variations in the Persian-speaking area in particular.
Keywords: sociolinguistics, variation, Persian, multilingualism, language change, standard Persian
13.1 Introduction
THE idea that language is a social possession was first suggested by the American
linguist W. Whitney (1867) in the second half of the nineteenth century and then, passed
to Saussure to Meillet to Martinet to Weinreich to Labov (Shuy 2003: 4) in the twentieth
century. During this relatively long period of time, the concept has gained more and more
attention and has become an important part of the discipline of linguistics in Europe and
the United States. According to Paulstone and Tucker (2003: 1), the term
‘sociolinguistics’ was coined and used in the late 1930s and 1940s by scholars such as T.
Hodson and E. Nida. In the 1950s and particularly the 1960s, linguists such as J.
Fishman, C. Ferguson, D. Hymes, and W. Labov tried to establish sociolinguistics as a new
field of study by developing and expanding its theoretical and methodological aspects.
The beginning of sociolinguistics, as an academic field of study, however, is commonly
marked by the pioneering works of W. Labov (1963) in Martha’s Vineyard and New York
Page 1 of 29

Sociolinguistics
City (1966) which played an important role in the process of establishing the field, by
showing systematic correlations between linguistic and social variables. Since then, the
field has expanded its territories and covered different areas of language study in social
contexts. Today, according to Paulstone and Tucker (2003: 1) sociolinguistics has turned
out to be a very lively and popular field of study and many of its subfields can claim to be
separate fields in their own category.
13.2 The Persian-speaking territory

The vast territory that covers three presently independent Persian-speaking countries of
Iran, Afghanistan, and Tajikistan, used to lie within the borders of a single ancient
empire; the territory geographically covered an area of approximately 2.5–2.7 million
km2, within which many ethnic groups, with different languages and dialects, have lived
together for centuries. The present population of Iran, Afghanistan, and Tajikistan is
estimated at 79.77 million, 32.76 million, and 8.53 million respectively (Worldometers and
CountryMeters, (p. 330) Wikipedia 2015). Thus, the total population of the three main
Persian-speaking countries is approximately 120 million. Although no official statistics is
available for the number of speakers of Persian and other languages spoken in this area,
the total number of Persian native speakers in the three countries and adjacent areas, as
estimated by Windfuhr and Perry (2009: 418) is around 60 million.
In times past, the ancient empire was much wider than the present territory and was
bounded by the Sind River, Pamir and Soleiman mountains to the east, Mesopotamia and
Asia Minor to the west, the Aral sea and Caucasian mountains to the north, and the
Persian Gulf and Oman sea to the south (Mosahab 1966; Ghirshman 1976; Modarresi
2006). Payne (1987: 515) also notes that from archaeological and textual evidence, it can
be deduced that Iranian languages at the time of the Achaemenid Empire had a wider
geographical distribution than at present.
Scholars have divided the history of the Iranian languages into three periods: the old
Iranian period (from the earliest time to the collapse of the Achaemenid empire in the
third century BC), the middle Iranian Period (from Arsacid dynasty in the third/fourth
centuries BC to the collapse of the Sassanid empire in the seventh century AD), and the
modern Iranian period (from the Arab conquest in seventh century up to the present
time). In early modern period, in addition to Arabic, which was the language of
conquerors and therefore, the language of power and administration, other indigenous
languages and dialects such as Pahlavi and Dari were also spoken in this vast area (see
also Chapters 2, 3, 11, and 19). As Payne (1987: 518) notes, by the time of the Arab
conquest, different varieties of Modern and Middle Persian were in use in a wide
territory, which today mainly covers the present-day Iran, Afghanistan, Tajikistan.
Page 2 of 29

Sociolinguistics
The most significant sociolinguistic feature of the present Persianate territory, like some
other areas around the globe, is ethnic and linguistic diversity. The remarkable diversity
within the wide Persian-speaking lands is due to some important historical and social
events such as wars and migrations that have occurred throughout the region’s long
history. The intense ethnic diversity, has led to multilingualism and multidialectalism in
the whole territory. Thus, the coexistence of several ethnic, regional, and social groups
that have lived together for centuries has brought about various linguistic consequences.
Page 3 of 29

Sociolinguistics
13.3 Major sociolinguistic issues

Current sociolinguistic issues in Persian-speaking countries are numerous and can be
classified under different rubrics; some of the most important topics in Persian
sociolinguistics include language contact and borrowing; language planning; politeness
and power; multilingualism and endangered languages; regional and social variations.
During the past few decades, a considerable number of studies have been conducted by
Iranian and non-Iranian scholars on each subfield of Persian sociolinguistics, some of
which are briefly introduced in this section and some others in subsequent sections.
During the long history of the territory, Persian has been in close contact with other
languages, particularly with Arabic and Turkish, and as a result, heavy borrowings have
taken place among these languages. Within the last one or two centuries, due to
economic, cultural, and political exchanges between the Persian-speaking countries and
European countries, (p. 331) linguistic borrowing from languages such as French,
Russian, German, and English has increased significantly. In the aftermath of the Second
World War, borrowing from English and Russian increased radically in Persian and Tajiki
respectively (see also Chapters 2, 3, 7, 11, and 19). Pioneering works such as Jazayeri
(1958) and Bateni (1970) are good examples of studies in the area of language borrowing;
while Jazayeri dealt with English loanwords in Persian, Bateni mostly analysed French
and English words. Later on, Modarresi (1989) discussed lexical borrowings in Persian
and also provided examples for structural borrowings. Although the total number and
frequency of use of Russian loanwords in Persian is relatively low, the number of Russian
loanwords in Tajiki seems to be much higher. Bashiri (1994: 110) studied Russian
loanwords in Persian and Tajiki, claiming that there are about 100 Russian loanwords in
Modern Standard Persian as a legacy of the Anglo-Russian rivalry for sociopolitical and
economic dominance in Iran since the nineteenth century.
Language planning is another important issue which has been implemented in the past
several decades in Persianate countries, particularly in Iran; the main focus of language
planners in Iran has traditionally been on neologism and word coinage (see Chapter 17
for more information on neologism). During the last several decades, numerous new
words have been coined by Persian language academies, many of which were accepted
and used by Persian speakers in Iran. New Words (1975) for instance, is a glossary of
approximately 2,000 Persian words, coined by the Iranian Academy several decades ago,
most of which are now in use in Persian. Within the past two decades, more than 15,000
new Persian equivalents have been coined and published in a number of volumes; the
most recent volume, entitled A Collection of Terms, is volume 12, and includes 4,000 new
words. Studies on the relative acceptance of the new words coined by the academy have
been also made by Ahmadipour (2006), the results of which indicate that the degree of
acceptance of the newly coined forms in the Persian-speaking community depends on
parameters such as age, education, and awareness.
Page 4 of 29

Sociolinguistics
Studies on other areas such as the mutual relation between language, power, politeness,
and violence have also been done in the Persianate territory. One of the most important
works in the area of language and culture, is Beeman (1986) in which he tried to show
that in the Persian- speaking community, meaning is a creative negotiated social process
rather than a property of words and other cultural phenomena. Keshavarz (2001) in his
study of forms of address in Persian, concluded that the choice of forms of address were
related to social context, social distance, and also to personal characteristics of the
interlocutors. Koutlaki (2002) has tried to describe the concept of tā’ārof in Persian and
its relationship to the concept of face. Modarresi (2009) discussed markers of power and
politeness in Persian, and argued for a process of gradual change in different
generations. Atraki (2013), in her study on street remarks in Tehran, showed linguistic
abuse and violence against women.
13.4 Multilingualism
The Persian-speaking territory is a multilingual land, within which a considerable number
of ethnolinguistic groups speak their own native languages. On the basis of their genetic
origin, the languages and dialects of the territory can be divided into two main groups of
Indo-Iranian branch (of the Indo-European family) and non-Indo-Iranian languages and
dialects. The Iranian group has been classified in different subgroups by different
(p. 332)
scholars. According to Payne’s classification (1987: 514), for instance, there are four main
subgroups of the Iranian languages: the South-West subgroup (includes Persian, Dari,
Tajiki, Luri and Bakhtiari); the North-West subgroup (includes Kurdish, Talishi, Baluchi,
Mazandarani, Zaza and Gurani); the South-East subgroup (includes Pashtu, Yazgulami,
Roshani, Bartangi and Sarikoli) and the North-East subgroup (includes Ossetic and
Yaghnobi). This language group has been traditionally divided into western Iranian, which
includes Persian, Kurdish, Gilaki, etc., mainly spoken in Iran, and the eastern Iranian,
including Yaghnobi, Wakhi, Pashtu, etc. that are in use in Afghanistan and Tajikistan
(Payne 1987: 514).
The non-Iranian languages and dialects spoken in these three countries can be classified
into different language families; Arabic, Mandaic, and Assyrian from the Semitic family;
Azerbaijani, Turkish, Turkmen, and Uzbek from Altaic family; Armenian and Romano
languages from Indo-European family; Barahui from the Dravidian family; and Georgian
from the Caucasian family (see also Windfuhr 2009c: 9–18; Reza’i-Bagbidi 2001;
Modarresi 2006).
According to Asher (1994), Ethnologue (2001), and Modarresi (2006) the languages and
dialects of the main Persian-speaking countries, can be roughly classified into nine main
genetic groups; five of these groups (i.e. Iranian, Indo-Aryan, Armenian, Germanic, and
Balto-Slavic), belong to the Indo-European family. The other four groups mainly belong to
Semitic, Altaic, Caucasian, and Dravidian families. As far as the number of speakers of
Page 5 of 29

Sociolinguistics
each language is concerned, it is hard to find definite statistics, however unofficial

estimates can be found in sources like Ethnologue. It is only possible to claim that the
majority of the speakers in this territory belong to the Iranian group.
In this multilingual territory, there are some Iranian and non-Iranian languages which can
be considered as endangered. Languages such as Mandaic, Tati, Parachi, Wakhi, Munji,
Ormuri, Yaghnobi, and Parya, are classified as definitely, severely, or critically
endangered (Wikipedia). Grenoble and Whaley (1998: 34) note that the majority of
endangered languages come from oral cultures. This seems to be the case in the Persian-
speaking area where, not only minority oral languages, but also languages with
considerable numbers of native speakers can be classified as endangered. Several studies
have been done on multilingualism and endangerment in non-Persian minority
communities, in most of which there seem to be a positive attitude towards the use of
Persian in their daily interactions in various social domains. Works such as Zolfaghari
(1997) in Bakhtiyari, Mashayekh (2002) in Gilaki, Safa’i (2004) in Turkish, Bani-shoraka
(2005), in an Azerbayjani community in Tehran, Sheikhi (2006) in Turkmeni, and
Bashirnezhad (2007) in Mazandarani, are just few examples from the studies done during
the last two decades. The main point in these studies is the fact that the younger
generations tend to choose Persian over minority languages and thus endanger their
native languages. This process seems to be much slower in Azerbayjani, Kurdish, and
Turkmen communities. For more discussion on multilingualism and bilingualism, see
Chapters 14 and 15.
Page 6 of 29

Sociolinguistics
13.5 Dialect studies

As noted by dialectologists and sociolinguists such as R. McDavid (1948) and W. Labov
(1964), the early stages of sociolinguistics can be traced in dialect studies. Shuy (2003:
10) (p. 333) believes that there were many early features of modern sociolinguistics in
traditional dialectology (see Chapters 3, 8, and 12 for more discussion on dialectology). It
can be claimed that while dialectologists generally study linguistic variations in
horizontal (or geographical) dimension, particularly in rural areas, sociolinguistic studies
focus on vertical (or social) dimension of linguistic variations, particularly in urban
speech communities. From the methodological point of view, dialectologists were more
interested in gathering data from few (and even, only one) old and uneducated
informants; sociolinguists, on the other hand, are more interested to collect data from
different social groups of a community, and therefore, tend to use statistical and
sociological sampling techniques. This is what Labov called ‘Quantitative
measurement’ (1972: 123).
Within the present Persian-speaking community, as was mentioned earlier, many regional
as well as social varieties, are spoken, which despite their considerable differences, are
mutually intelligible. As Windfuhr (1987: 523) indicates, Persian with its various dialects,
covers a vast geographical territory from the west (Iran), to the east (Afghanistan) and to
the North East (Tajikistan). The three major representatives of these varieties of Persian
are now called Persian, Dari, and Tajiki respectively. These standard varieties of Persian
offer good examples for showing how standard varieties expand and grow in multilingual
territories.
During the past several decades, a considerable number of studies have been done on the
various languages spoken in the Persian-speaking area in general and on varieties of
Modern Persian in particular. The long tradition of dialect studies in Iran, Afghanistan,
and Tajikistan can be classified into two separate categories: professional studies done by
Iranian or non-Iranian dialectologists and linguists, and non-professional works done by
native speakers of different dialects. Examples of the most significant professional studies
are reviewed here: Lorimer (1915) on the syntactic description of Pashtu; Phillott (1919)
on the grammatical description of Modern Persian and a comparative study of Persian in
Iran and Afghanistan; Lorimer (1922) on the phonological system of Bakhtiari and other
dialects almost a hundred years ago; Moghaddam (1938) on the structure of Vafs,
Ashtian, and Tafresh dialects; Yarshater (1945) on the classification of the Iranian
languages and dialects; Kia (1956) on the description of Ashtian dialect; Lazard (1957) on
the grammatical description of Modern Persian; Peisikov (2001) on description of Tehran
dialect; Abolghasemi (1969) on the phonology and morphology of Ossetic spoken in
Caucasia; Fekrat (1976) on Herati dialect vocabulary; Oranskiy (1978) on the
classification of the Iranian languages; and Windfuhr (1979) on description of Persian.
Page 7 of 29

Sociolinguistics
More recent works done on the structural description of the Iranian languages and
dialects, include Kalbasi (1995) a structural comparison between Persian and Tajiki;
Hasan-doust (2010) a comparative dictionary of more than 700 Persian entries and their
equivalents in various dialects; and Kia (2011) a comparative dictionary for sixty-seven
dialects of Persian. Other recent works include Stilo (1981), Shokri (1995), Jahani (2003),
Taghipour (2008), Windfuhr and Perry (2009), and Anonby and Yousefian (2011), to
mention just a few. An important common feature of these studies, is that they all pay
attention to sociolinguistic aspects of dialect studies. For example, Windfuhr and Perry
(2009), tried to compare Tajiki and Persian not only from lexical and structural point of
view, but also from certain sociolinguistic aspects such as different registers, styles,
address forms and tā’ārof. Some of the studies done by Jahani and her colleagues in
Baluchistan also have a social side. Anonby and (p. 334) Yousefian’s study (2011) is a
good example in which by adopting a sociolinguistic approach, the authors tried explain
some social aspects such as language use, language attitude, and language vitality in
Kumzari and Laraki language communities in the multilingual Island of Larak, located in
Persian Gulf.
13.6 Social variations in Persian

One of the most significant and central issues in sociolinguistics is variation, an issue
which Labov and other sociolinguists have emphasized for decades and claimed that
variations are linguistically and socially structured and therefore, predictable. Linguistic
variation was also an important issue for dialectologists such as McDavid and Fischer
who claimed that in order to explain linguistic variations in a single area, one need to
take in to account variables other than geographical factors. McDavid (1948), for
instance, showed that there is a connection between a linguistic variable (post vocalic /r/)
and some social variables (such as age and education) in South Carolina. Fischer (1958)
also argued that the choice between /n/ and /ŋ/ in New England is related to some social
parameters such as gender and formality of the speech situation. These observations
showed that some dialectologists, who mainly aimed at the study of areal variations,
gradually started to turn their attention to variations observable in the same area and
tried to explain them through social factors.
Persian as a macro-language is the official language of Iran, Afghanistan, and Tajikistan,

and therefore has a standard variety in each of these countries, plus numerous local
varieties spoken in different geographical areas. According to Beeman (2010) the range of
variation in Persian, Dari, and Tajiki communities is quite extensive, embodying
regionalisms and borrowings from other language families.
The study of social dimension of variation in the Persian-speaking community in general

and in Iran in particular, began in the 1970s, during which sociolinguistics as a new field
of study was introduced to Persian linguists. During 1975 and 1976, a number of articles
Page 8 of 29

Sociolinguistics
on social aspects of language were published in Persian, in one of which, Bateni (1976)
coined /jameʔe shenasi-ye zaban/ and /zaban shenasi-ye ejtemaʔi/ as the Persian
equivalents for ‘sociology of language’ and ‘sociolinguistics’ respectively.
Through the same decade, two new fieldwork based research were done by Modarresi
(1978) and Jahangiri (1980), both dealing with social variations in Tehran as an urban
area. In the following three decades, many books, articles, MA and PhD dissertations
such as Beeman (1986), Modarresi (1989), Jahangiri (1999), Taghipour (2008), Adli
(2011), Habibi (2014), and Mozaffari (2015), were conducted and/or published in the field
of Persian sociolinguistics in general and linguistic variations in particular. Also,
significant works such as Weinreich (1953), Trudgill (1974a) and Beeman (1986) were
translated into Persian. Examples of some important findings in Persian sociolinguistics
during last four decades or so are briefly introduced and discussed in this section. Similar
to other languages like English, most of the sociolinguistic studies in Persian have been
done on phonological variables and studies such as Adli (2011) seems to be rare examples
of studies done on syntactic variations in Persian. (p. 335)
13.6.1 Stylistic variation
An important non-linguistic variable that explains linguistic variations is the social

context in which appropriate linguistic behaviours occur; variations of this kind, sensitive
to social context of speech, are generally called style. According to Trudgill (2003: 129)
style is a variety of a language which is associated with social context and which differs
from other styles in terms of formality. Styles can thus be plotted on a continuum from
very formal to highly informal or casual levels. According to Labov (1966) and Meyerhoff
(2006), the different distribution of forms in different styles was motivated by the amount
of attention the speaker was paying to the act of speaking. Thus, the more formal the
social context, the more attention is given to speech, and hence, the more formal the
linguistic style would be.
In his New York study, Labov (1966) suggested five separate styles (casual, careful,
reading, word lists, and minimal pairs or A, B, C, D, D’ respectively), on the basis of the
criterion of the formality of social contexts in which the act of speaking occurs. The
Labovian framework has been adopted by a number of scholars in different speech
communities, including the Persian-speaking area, since the 1970s.
The first significant study on Persian styles was carried out about six decades ago by
Hodge (1957), who recognized four separate styles (i.e. Colloquial, Deliberate, Formal,
and Quotative), which was also adopted by Henderson (1975). In a sociolinguistic study of
Tehran Persian, Modarresi (1978) compared the styles suggested by Hodge and
Henderson, with Labovian stylistic continuum and equated them in a general way:
according to Modarresi, Hodge’s Colloquial, Deliberate, and Formal styles seems to be
Page 9 of 29

Sociolinguistics
close to Labov’s Casual, Careful, and Reading styles respectively, and Quotative style
seems somewhat close to two styles (word list and Minimal pairs) in Labov’s model.
In the analysis of Persian variables in Tehran, Modarresi (1978: 116), shows that the
pattern of stylistic variation of phonological variables in Persian is similar to those found
in previous studies such as Wolfram (1969: 75) and Trudgill (1974: 96) in English-
speaking communities. For instance, the final alveo-dental stop (/t/ and /d/) deletion in
Tehran Persian behaves like the way this process have been illustrated in English in the
cities like Detroit and Norwich. Table 13.1 shows the percentage of the final stop deletion
in different styles in Persian.
Page 10 of 29

Sociolinguistics
Table 13.1 Percentage of final stop deletion in Persian by style
Style Casual Careful Reading Word list Minimal pair
Percentage 59.80 44.80 22.70 7.80 1.90
Page 11 of 29

Sociolinguistics
According to the above figures, stop deletion in Persian correlates with the amount of
attention paid to speech, as called for by the formality of social context. Thus, the
percentage of final stop deletion in Persian shows a decrease along the stylistic
continuum by moving from the casual speech to the most formal level (minimal pair).
(p. 336) 13.6.2 Diglossia
The term diglossia was first used in linguistics literature by Ferguson (1959: 325) for
describing certain linguistic situations in which two varieties of a language, High and
Low, exist side by side in a speech community, each having a definite role to play (H in
highly formal and L in casual or less formal contexts). In his seminal paper, Ferguson
makes it clear that diglossia is a language situation which can be found in Arabic or
Greek speech communities (where a superposed variety exists together with other
varieties) and differs with situations such as Persian and Italian speech communities (in
which standard and local varieties can be observed). Fishman (1971) extended and
broadened the original meaning of the term, to include other similar situations such as
Persian. He suggests that the term can also be applied to all situations where two
different varieties of a language (dialects, accents, etc.) or even two completely different
languages, play different social roles in a speech community. See also Chapters 2 and 19
for more information on diglossia.
On the basis of this interpretation of diglossia, Henderson (1975) suggests in his paper on
Kabul Persian (Dari), that different styles (Colloquial, Deliberate, Formal) in Kabul
Persian, have different social roles and therefore, can be used in different social contexts.
Although Ferguson excluded Persian from the examples of diglossia, by classifying it
under the standard-local model languages, Henderson claims that Kabul Persian with
three separate stylistic levels, can perhaps be considered as an example of diglossia or
even’ triglossia’. Based on the fieldwork he carried out in Tajikistan, Beeman (2010: 138),
also suggests that standard Persian as spoken in Iran has become a special register of
Tajiki marked for formal occasions such as political speech making, wedding orations,
news broadcasts, and elevated scientific discourse. In this way, the opposition between all
varieties of colloquial Tajiki and standard Persian in Tajikistan resembles the diglossic
opposition between dhimotiki and katherevusa in modern Greek (another classic example
of diglossic situations). Here again, Beeman tends to believe that Persian speech
community is diglossic.
The answer to the question of whether or not Persian speech community is diglossic,
depends on different definitions, criteria and interpretations linguists have had for the
term since Ferguson’s paper. On the basis of the extended meaning of diglossia, Persian
speech community, like some other language communities around the globe, can be
considered as diglossic, because its different standard and non-standard varieties or
styles have different social functions, but in the framework of Ferguson’s definition,
Persian community is not a good example for the term diglossia.
Page 12 of 29

Sociolinguistics
In her study of Armenian and Azerbaijani ethnic minority groups in Tehran, Nercissians
(2001) considered Iran as an example of diglossic situation, where separate languages
like Armenian and Turkish are used with different social roles and in different social
contexts. She demonstrates that Armenian and Turkish speakers use their mother
tongues in more informal and face-to-face communications, while Persian is mostly used
in more formal and official contexts, a situation which can be called diglossic in an
extended sense.
13.6.3 Speaker’s variables
Whereas the degree of formality of social contexts accounts for some linguistic variations,
other social variables such as gender, age, education and social class can explain other
kinds (p. 337) of variations in language. Thus, speakers who talk to each other within the
same social context and with the same style, may also have variations in their speech,
depending on their gender, age, social class and the like. According to Meyerhoff (2006:
185) while it is obviously true that interactions take place between individuals, the
linguistic behaviour of each person nevertheless patterns with the group he/she socially
belongs to. Examples of the correlations between linguistic and social variables in Tehran
as a Persian-speaking community are discussed in the following sections.
13.6.3.1 Gender
Gender is one of the most interesting social factors investigated by sociolinguists in
different speech communities. In fact, a major concern in sociolinguistics is to explore the
relationship between the linguistic behaviour of speakers and their gender. Sociolinguists
such as Labov (1966) and Trudgill (1974) have been studying this issue during the past
few decades, and have been trying to find answer to various questions such as: Do men
and women who are members of the same speech community and share the same
language, speak differently? Are these linguistic differences the results of the different
roles that the two genders play in a single society? Do the social roles of men and women
in society have any impacts on language change?
The results of studies done in Persian-speaking communities in the last few decades,
confirm some of the findings of previous studies in different language communities. On
the basis of the data collected in Tehran, Modarresi (1978: 130–1), for instance, observed
some differences in the final alveo-dental stop deletion in the linguistic behaviour of male
and female Persian-speakers. The data in Table 13.2 shows the percentage of the final /t/
and /d/ deletion in different styles of the speech of male and female Persian speakers in
two age groups in Tehran.
Page 13 of 29

Sociolinguistics
Table 13.2 Percentage of final stop deletion in Tehran Persian by gender, age, and style
Age Gender Casual Careful Reading Word list Minimal pair
10–19 M 72.00 55.00 28.30 14.00 9.20
F 52.20 55.20 25.10 8.10 00.00
20–29 M 67.30 51.40 21.90 4.50 00.00
F 64.30 49.00 23.30 00.00 00.00
Page 14 of 29

Sociolinguistics
The data in Table 13.2, indicates that the percentage of final /t/ and /d/ deletion in the
speech of male is more than the percentage for female speakers in Tehran; although the
differences here are not significant but they illustrate a systematic pattern. The linguistic
behaviour of Persian-speaking men and women follows more or less the same pattern
observed in the linguistic behaviours of men and women in some English-speaking
communities such as Detroit (Wolfram 1969). On the basis of the data collected on Tehran
Persian, women are more conscious and sensitive to the social prestige of their linguistic
behaviour. The low prestige of alveo-dental stop deletion in Persian explains why female
Persian-speakers tend (p. 338) to use more /t/ and/d/ in their speech (i.e. less percentages
of final stop deletion) than male speakers. A more recent study on final /t/ deletion in
Tehran Persian, by Habibi (2014: 83) shows that adult female Persian speakers, in their
formal speech (reading style), delete /t/ slightly (2.5 per cent), more than male speakers,
which is similar to the figures in Table 13.2. In other words, women in the age group 20–
29 delete more /t/ than men in both studies.
Sociolinguists have tried to explain women’s linguistic behaviour in their gender analysis.
Trudgill (1974: 94) for instance, notes that women are generally more status-conscious
than men and therefore, more aware of the social significance of linguistic variables. He
presents two main reasons for this feminine attribute among the English speakers in
Norwich. The first reason refers to the fact that the social position of women was less
secure than men in the community he studied, and it is therefore more necessary for
women to secure and signal their social status linguistically and in other ways. His
second reason for this behaviour is masculinity phenomenon, which is associated with
roughness and toughness, supposedly characteristics of working-class life which are, to a
certain extent, considered to be desirable masculine attributes. On the other hand,
refinement and sophistication seems to be desirable feminine characteristics.
13.6.3.2 Social class

Studies have also shown that social class has significant influences on the verbal
behaviour of speakers in different speech communities. In western tradition, class is
measured and defined on the basis of several indices such as occupation, income, housing
and education (Swann et al. 2004: 282). Almost all studies done in different speech
communities all over the world have adapted the same definition.
Jahangiri (1980: 66–70) gives an example of the correlation between a linguistic variable
(vowel assimilation) and social class in Tehran Persian; this variable, involves alternation
of mid-vowel /e/ (in imperative morpheme/-be/) with vowels /i/ and /o/, respectively by the
application of a vowel harmony rule, particularly in more casual or informal styles. Thus,
the Persian words such as/begir/ (take!) and /bero/ (go!), can be alternatively pronounced
as / bigir / and / boro/ depending on the occurrence of vowels /i/ and /o/ in the next
syllable with which /e/ is harmonized. Figures 13.1 and 13.2 from Jahangiri show the
percentage of the application of vowel harmony (assimilation) rule in four social classes
in the speech of adult (24 years old and over) male and female Persian speakers
respectively (see Chapters 5 and 15 for more on vowel harmony).
Page 15 of 29

Sociolinguistics
The data in Figures 13.1 and 13.2 indicates a systematic increase in vowel harmony
(assimilation) rule (i.e. the percentage of /i/ and /o/ occurrence) in the speech of male and
female Persian speakers by moving from the highest social class (G1) to the lowest social
class (G4); therefore, the lower the social class of speakers, the higher the realization of
variants /i / and / o/ in their speech. On the other hand, by comparing Figures 13.1 and
13.2, it is possible to conclude that the percentages of vowel harmony are relatively lower
in the speech of female than male speakers. Thus, /i/ and /o/ variants of this variable, in
Tehran Persian, appear to be related to masculinity and informality and, also related to
the speech of lower classes and are not considered as prestige variants. In fact, Figures
13.1 and 13.2 provide another example of the correlation between gender and linguistic
variables in Persian. (p. 339) (p. 340)

Figure 13.1 Percentage of vowel assimilation by
class and style, female adults (Jahangiri 1980)
Page 16 of 29

Sociolinguistics
13.6.3.3 Education
Education is another
variable that has
significant impacts on
linguistic variations.
According to Al-wer (2013)
education has been widely
used as a major sampling
criterion in Arabic studies,
almost analogously to the
way socioeconomic class
has been used in studies in
North America and the
United Kingdom. The data
in Table 13.3 from
Figure 13.2 Percentage of vowel assimilation by
class and style, male adults (Jahangiri 1980) Modarresi (1978: 128) for
final alveo-dental stop
deletion, shows the impacts of education on linguistic variables in Persian.
Page 17 of 29

Sociolinguistics
Table 13.3 Percentage of final stop deletion by education and style in Tehran Persian
Education Casual Careful Reading Word list Minimal pair
College 51.20 28.30 15.80 2.40 00.00
High school 54.80 37.80 25.10 5.30 1.20
Junior high 58.00 44.80 13.90 11.30 1.30
Primary school 75.20 68.30 36.10 12.10 5.00
Page 18 of 29

Sociolinguistics
On the basis of the above figures, the more educated the speakers, the less final stop
deletion can be observed in their different styles of speech. The same trend was observed
by Jahangiri (1980) and Wardhaugh (2002) in the speech of male and female speakers of
Persian. The correlation between vowel harmony and the degree of education and gender
is shown in Table 13.4.
Page 19 of 29

Sociolinguistics
Table 13.4 Percentage of vowel harmony in casual speech of Persian speakers by education and gender
Gender University Secondary Primary None
Male 13.00 32.00 52.00 78.00
Female 6.00 24.00 40.00 65.00
Page 20 of 29

Sociolinguistics
Here again, the percentage of vowel harmony in the speech of Persian speakers increases
as their educational level decreases. Thus, correlation between education and linguistic
variables in Persian can be formulated as follows: the more educated a Persian speaker is,
the less the percentage of vowel harmony in his/her speech can be observed. Also, we can
see that in each separate educational group, the percentage of vowel harmony is higher
in the speech of male speakers than female speakers. On the basis of the data presented
and analysed here, it is possible to say that the final stop deletion and vowel harmony
rules in Persian are both non-prestigious features in Persian and that is why they are
more realized in the speech of less-educated male speakers in Tehran Persian. On the
other hand, the prestigious variants can be observed with higher percentage in the
speech of female, educated and also upper-class speakers. (p. 341)
13.6.3.4 Age
As another important non-linguistic variable, age seems to play a significant role in
language variation and also language change. In fact, it is on the basis of data collected
from different age groups and generations that we can have a better understanding of
both language variations and language change. The correlation between linguistic
variation and age has been studied in two different ways: the measurement of language
change in apparent time (comparing speakers of different ages at a single time), as
opposed to real time (tracking speakers over a long period of time). Labov’s seminal
study in Martha’s Vineyard (1963) on centralization of two diphthongs (/ay/ and /aw/), can
be considered as one of the most important experimental works that demonstrated the
correlation of age and linguistic variables. On the basis of data collected by apparent time
approach, Labov concludes that the younger speaker age groups in Martha’s Vineyard
tend to centralize the initial element of the two diphthongs more than the older age
groups, indicating that a change is in progress. In the study of Tehran Persian, almost the
same patterns of correlation between age and linguistic variables found in previous
studies can be observed. The pre-nasal vowel raising (with two variants /u/ and /a/ before
a nasal) is another variable in Persian that was studied by Modarresi (1978: 99). Here, in
words like /khāne/ ‘house’ or /nān/ ‘bread’, the variant /u/ is more frequently realized in
spoken informal contexts (/khune/ and /nun/ respectively), while the other variant, /a/ is
more related to formal situations or written styles. Data in Table 13.5 shows the
percentage of /u/ realization in the speech of Persian speakers with same educational
level but in different age groups and linguistic styles.
Page 21 of 29

Sociolinguistics
Table 13.5 Percentage of /u/ realization in the speech of Persian speakers in Tehran by age and style
Age Casual Careful Reading Word list Minimal pair
10–19 82.90 60.50 6.70 00.00 00.00
20–29 77.80 60.00 00.00 00.00 00.00
Over 50 72.20 58.90 00.00 00.00 00.00
Page 22 of 29

Sociolinguistics
The data in Table 13.5 shows a systematic pattern in the percentage of /u/ variant in the
speech of speakers in three age groups; the older speakers systematically use less /u/ in
different styles but by moving to younger age group, the percentages increases. This
means that the /u/ occurrence in casual and careful speech of the younger speakers is
higher than those of the older ones; in more formal styles, however, the differences in /u/
occurrence in the speech of the three age groups are trivial. Here, we can observe a high
percentage of /u/ occurrence in all three age groups in casual and careful styles and a
sharp decrease in more formal styles. The considerable occurrence of /u/ in more
informal styles, may lead us to conclude that although this variant is an important feature
in Tehran Persian, the speakers consciously avoid using it in more formal styles. (p. 342)
13.6.4 Changes in progress
The major strategy of historical linguistics, according to Labov (1972: 161–3) has been to
study the linguistic changes completed in the past. There are, however, other kinds of
linguistic changes, the incomplete changes, or changes in progress. To answer the
question of what is the mechanism by which linguistic change proceeds, and how it
works, Labov indicates that the simplest way to establish the existence of a linguistic
change in progress is a set of observations of two successive generations of speakers who
have comparable social characteristics. He argues that a sound change usually originates
with a restricted subgroup in a speech community, then the change begins as a
generalization of the linguistic form to all members of that subgroup, and finally the
sound change with its associated value of group membership, spreads to other groups in
successive stages.
Like some other communities, considerable variations can be observed in the Persian-
speaking community in general and Tehran speech community in particular. Linguistic
variations, which can be explained by regional and/or social variables, are usually
examples of changes in progress. Phonological processes such as the final alveo-dental
stop deletion, pre-nasal vowel raising, vowel harmony (assimilation) in Tehran Persian
discussed in this chapter, are just few instances of the sound changes in progress in the
Persian-speaking community as a whole. According to Jahangiri (1980) and Modarresi
(1978) each one of the sound changes discussed earlier, seems to be in a different stage
of progress in various regions and social subgroups of the Persian-speaking territory. The
pre-nasal vowel-raising rule, for instance, has been going on for a long period of time in
Persian, with different degrees of progress in different parts of the Persian-speaking area,
depending on social norms and values associated with it. Figures in Table 13.6 from
Modarresi (1978: 166) for instance, show the percentage of /u/ realization in the speech
of informants in Tehran and Ghazvin (a city approximately a hundred Kilometers distance
from Tehran). The figures point to the fact that this process is in different stages of
progress even in a limited geographical distance.
Page 23 of 29

Sociolinguistics
Table 13.6 Percentage of /u/ realization of Persian speakers in Tehran and Ghazvin by style
City Casual Careful Reading Word list Minimal pair
Tehran 73.50 56.20 3.10 0.40 1.20
Ghazvin 45.90 30.50 3.20 00.00 0.60
Page 24 of 29

Sociolinguistics
The figures in Table 13.6 show that the difference in the percentage of /u/ variant in more
informal speech (casual and careful styles) of the Persian speakers in Tehran and
Ghazvin, is meaningful, while the percentages are almost the same in more formal styles
(reading, word list and minimal pairs). The sharp decrease in the percentage of /u/
realization between careful and reading styles indicates the fact that the vowel-raising
rule in Tehran Persian more frequently applies in informal styles. A more important point
here, is of course, the significant difference in the frequency of /u/ realization in informal
styles in Tehran and (p. 343) Ghazvin, which indicates the fact that the pre-nasal vowel-
raising rule is in different stages of historical development in the two nearby cities.
Depending on the social value of a linguistic change, the community reacts; as Labov
(1972: 179–80) notes, if the group in which the change originates is not the highest status
in the speech community, the members of the highest-status group eventually stigmatize
it through their control of various institutions of communication networks. On the other
hand, if change originates in the highest-status group, it may become the prestige model
for all members of the speech community. The examples such as final stop deletion and
vowel harmony from Tehran Persian, seem to be associated with lower-status or less-
educated groups in Tehran speech community. In both studies in Tehran (i.e. Modarresi
1978; Jahangiri 1980), the same pattern is followed by the most-educated Persian
speakers as they tend to avoid using non-prestige variants in their speech and
particularly in more formal styles.
The figures in Table 13.4 (Jahangiri 1980) also indicate that the percentage of vowel
assimilation in casual speech of male speakers in Tehran is significantly higher than
among female speakers. Here again, we can see that women’s sensitivity to social
behaviour is higher than men’s and this attitude in turn, causes a lower percentage in the
use of non-prestige forms in their speech. This behaviour of women in Tehran
corresponds to Trudgill’s claim (1974b: 94–5) that women are more status-conscious than
men in general, and are, therefore, more aware of the social significance of linguistic
variable in British society. According to Trudgill, from the point of view of linguistic
theory, as far as linguistic change from below is concerned, we can expect men to be in
the vanguard. Changes from above, on the other hand, are more likely to be led by
women.
13.6.5 Standard Persian
Despite the great variation observed in various language communities, there is usually a
variety in each language which is considered as prestigious for speakers of that
community as a whole, and is called standard variety or standard language. According to
Swann et al. (2004: 295), standard variety is usually a relatively uniform variety of a
language which does not show regional variation, and which is used in a wide range of
functions such as official language, medium of instruction, literary language, scientific
language, etc., which are codified in grammars and dictionaries. Trudgill (1974a: 17,
Page 25 of 29

Sociolinguistics
2003: 128) also emphasizes that standard variety, is a variety which is usually used in
print, and which is normally spoken by educated people, taught in schools and to non-
native speakers.
Wardhaugh (2002: 35) notes that there may be more than one standard variety in a
language like English, with almost the same grammar and vocabulary everywhere in the
world, and variation among local standards is really quite minor, being differences of
flavour rather than of substance. Persian, with three standard varieties, is similar to
languages such as English and French.
Modern Persian developed from Middle Persian or Pahlavi, used in the Sassanid period,
but it should be noted that there is a disagreement or uncertainty on the issue of direct
historical relation between Middle Persian and New Persian (see Sadeghi 1978; Payne
1987). The Arab conquest, as a turning point in the history of the country had great
impacts on Iranian languages since Arabic became the official language of administration
for two to three centuries. However, as Sarli (2008: 270) notes, after the Saffarian and
Samanid dynasties came (p. 344) to power, Dari Persian gradually resumed some of its
functions and became dominant in certain administrative domains. According to Windfuhr
and Perry (2009: 416), Persian has been the dominant language of Iranian lands and
adjacent regions for over a millennium. From the tenth century onwards, it was the
language of literature and culture, as well as a lingua franca in large parts of south, west,
and central Asia until the mid-nineteenth century. Table 13.7, from Windfuhr and Perry
(2009: 420), shows the sociolinguistic set-up from late Sassanid.
Table 13.7 Social status of Persian during the last several centuries
Period High Low
Late Sassanid Middle Persian Dari
Early Islamic Arabic, Dari Dari
Mongols, 13th century Persian Persian
Safavid, 16th century Persian Persian, Turkic
Qajar, 19th century Persian Persian
As early New Persian (Dari) became the vehicle for New Persian literature from around
the eleventh century, it received greater importance, prestige, and higher social status
among Iranian poets and writers, even though some works of Persian literature were also
written in Arabic. The high status of Persian was a necessary condition for its promotion
to the standard level. Windfuhr (1987: 525) believes that by the thirteenth century, when
classical Persian began, the regionally marked features had largely disappeared in both
Page 26 of 29

Sociolinguistics
poetry and prose. During the past few centuries, standard Persian, among all local, social,
oral, and stylistic varieties of Persian has gradually developed, first in written literature
and then in spoken forms.
At present, Modern Persian has three standard varieties (Persian, Tajiki, and Dari), which
are in use in three independent countries of Iran, Tajikistan, and Afghanistan. Beeman
(2010) also indicates that the Iranian language sphere consists of three currently
recognized ‘core’ varieties (Persian, Dari, and Tajiki standards) which are mutually
intelligible, and a number of peripheral varieties that diverge significantly enough to be
thought of as separate languages.
These three main standard varieties are mutually intelligible, despite the extensive
differences they have, due to different social, cultural, and political contexts in which
they have developed within the last hundred years or so. During this period, two
revolutions with certain linguistic and cultural consequences have occurred in Iran;
moreover, in a multilingual social context, the lexical and structural impacts of languages
such as Arabic, Turkish, English, and French on Persian have been considerable. Despite
all social and political changes, the status of Persian, as the official and national language
of Iran, has remained stable and its standard variety has been promoted and has gained
more and more prestige.
Within almost the same period, two historical events occurred in Tajikistan; according to
Windfuhr and Perry (2009c: 420) Tajiki became the national language of the Soviet
(p. 345) Socialist Republic of Tajikistan in 1920, even though it was under the heavy
influence of Russian loanwords, and had been evolving independently from the standard
Persian of Iran. After the independence of Tajikistan in 1991, Tajiki was promoted to the
official and national language of an independent nation-state and the Tajiki standard
began to gain more prestige. Afghanistan had been involved in serious political problems,
successive regime changes and civil war in the past few decades. The status of Dari and
its standard variety was significantly different from Persian in Iran, since Dari together
with Pashtu were both official languages in Afghanistan. During the last decade, as the
political and social situation of the country has somewhat stabilized, standard Dari
appears to be gaining a higher social status and prestige.
Studies conducted by scholars such as Henderson (1975), Windfuhr (1987), MacKenzie

(1987), Kalbasi (1995), Windfuhr and Perry (2009), and Beeman (2010), among others,
during the past decades, in which the structural and lexical features of each standard
variety have been separately or comparatively analysed, suggest a regional continuum on
which the differences of Persian, Dari, and Tajiki can be observed.
From an historical point of view, a comparison between Persian, Dari, and Tajiki reveals
that the three standard varieties of Persian appear to be in different stages of change in
progress along a linguistic continuum. On the basis of certain lexical and structural
features, Persian is generally in a more advanced stage and Tajiki seems to preserve more
archaic elements. This suggests that in comparison to Dari and Tajiki standard varieties,
standard Persian is seemingly more modernized. Older Arabic forms such as /askar/
Page 27 of 29

Sociolinguistics
(soldier), /etfāʔiya/ (fire brigade), /maktab/ (school) and also European loanwords like /
aǰendā/ (agenda) and /dusiya /(dossier), for instance, are still in use in Dari or Tajiki but
no longer used or understood by younger Iranian generations. Such forms have been
respectively replaced by /sarbāz/, /ātašnešāni/, /dabestān/, /dastur-e kār/ and /parvande/
in standard Persian during last several years.
The word final /a/, as a phonological variable, is also another example which can show
that a change in progress is in different stages in the three standard varieties of Persian.
A phonological change from /a/ to /e/ in word-final position, has been going on for
centuries in Persian, and while the change ended decades ago in standard Persian, it is
still in progress in Dari and Tajiki standards. Words such as /khāna/ (house), /khasta/
(tired), /dasta/ (group), and the like, are still in use with high frequency in standard Dari
and Tajiki, while they are not used in standard Persian any longer. Such forms are
pronounced in standard Persian in Iran as /khāne/, /khaste/, /daste/, respectively; in fact,
the final /a/ is evaluated, as rural or local pronunciation with low prestige by educated
Iranians.
On the basis of these observations, it is possible to claim that standard Persian has a
relatively higher prestige than Dari and Tajiki. This claim corresponds to the concept of
directionality suggested by Beeman (2010). According to his observations, Persian is seen
by all speech communities as a prestige standard, and Tajiki and Dari as colloquial forms.
Dari, as spoken in Afghanistan, is seen as a stigmatized variety for many of its speakers,
when they find themselves in a primarily Persian-speaking setting. Afghan residents in
Iran will often resort to using a foreign language such as English rather than speak Dari.
To reinforce this notion of hierarchy, it is worth noting that speakers of Persian varieties
rarely learn Tajiki or Dari, whereas educated Tajiki and Dari speakers all acquire some
command of Persian forms.

The Persian-speaking area, which presently includes the three separate countries of Iran,
Tajikistan, and Afghanistan, in addition to certain territories in central Asia, can be
considered as a great laboratory for sociolinguists. Ethnic and linguistic diversity of the
territory has made it an interesting place for sociolinguistic studies. Regional variation
studies began by traditional dialectologists around a hundred years ago. Sociolinguistics
as a new field of linguistic studies, however, was mainly introduced in the 1970’s to the
linguists and field linguists in Persian-speaking community. During the last four decades,
a considerable number of studies on almost all aspects of sociolinguistics, and
particularly on social variations, contacts, borrowing, bilingualism, code-switching, and
language planning have been done by Iranian and non-Iranian researchers, some of the
most important of which were referred to and introduced in this chapter.
Page 28 of 29

Sociolinguistics
Yahya Modarresi
Yahya Modarresi received his PhD from the University of Kansas, USA and is
currently a Professor of Linguistics at the Institute for Humanities and Cultural
Studies (I.H.C.S.). He has taught sociolinguistics and anthropological linguistics in
the Department of Linguistics, IHCS, and the Department of Anthropology, University
of Tehran. His books include An Introduction to Sociolinguistics (1989) and Language
and Migration (2015). He has written many articles in academic journals. At present,
he is the Editor in Chief of the journal of the Linguistics Society of Iran. He has also
been a member of the editorial board of different academic journals, including the
International Journal of the Sociology of Language for many years.
Page 29 of 29

Language contact and multilingualism in Iran

Shahrzad Mahootian

Subject: Linguistics, Language Contact, Languages by Region
Throughout its history, Iran has been a richly multilingual nation, with documented
evidence reaching back nearly three millennia. Today, estimates of the number of
languages spoken in modern Iran vary, with numbers ranging from fifty-four to seventy-
six living languages. This chapter presents a general description of societal bilingualism,
how bilingual communities come about, the relationship between language and identity in
multilingual contexts, and how best to describe the kind(s) of bilingualism found in Iran,
including the use of English. The chapter then turns to bilingualism in Iran from a
historical perspective, with the goal of understanding why there are so many languages
in present-day Iran. Finally, it addresses the status of English in pre- and post-
revolutionary Iran and issues of language maintenance.
Keywords: bilingualism in Iran, multilingualism in Iran, languages in Iran, language contact in Iran, linguistic
diversity in Iran, language and identity in Iran, language documentation
14.1 Introduction
I have long appreciated the motto of the L’Observatoire Linguistique, ‘Dans la galaxie des
langues, la voix de chaque personne est une étoile [In the galaxy of languages, each
person’s voice is a star]’. Established in Quebec in 1983, L’Observatoire has set its goal as
‘an exploration into the totality of human languages, observing them as the interactive
working parts of a planetary system of communication … [which is] constantly
evolving’ (L’Observatoire Linguistique n.d.). And thus, I begin this chapter on the
evolution of multilingualism in Iran and the current status of Iran’s many languages.
Page 1 of 18

Throughout its history, Iran has been a richly multilingual nation, with documented
evidence reaching back nearly three millennia. Today, estimates of the number of
languages spoken in modern Iran vary, with numbers ranging from fifty-four to seventy-
six living languages. This range is no doubt partly due to the fuzzy boundary between
‘dialect’ and ‘language’. While mutual intelligibility is a key element in distinguishing
between a dialect and a language, an equally important factor in the decision making of
the language-dialect distinction is the community identity and identification that (often)
hinges on the language attached to the community. For the purposes of this chapter, I
follow Ethnologue’s count of seventy-eight ‘individual’ languages for Iran: ‘76 are living
and 2 are extinct. Of the living languages, 5 are institutional, 8 are developing, 27 are
vigorous, 31 are in trouble, and 5 are dying’ (Ethnologue n.d.). The designation of
‘institutional’ means that the language is used beyond the home and community and, to
greater or lesser degree, has support from institutions. The five languages designated as
institutional: Persian (also known as Farsi); Azari (also referred to as Azeri, Azarbayejani
Turkish or Azari Turish); Kurdish; Gilaki-Mazandarani;1 and Armenian, dominate the
linguistic landscape, with a total of nearly 90 per cent of the (p. 348) population speaking
at least one of these languages natively. The greatest support is for the official statutory
language, Persian, which is used and fully supported by constitutional mandate in
government, education, media and all other public institutions. Article 15 of the
constitution specifies that:
The official language and script of Iran, the lingua franca of its people, is Persian.
Official documents, correspondence, and texts, as well as textbooks, must be in
this language and script. However, the use of regional and ethnic languages in the
press and mass media, as well as for teaching of their literature in schools, is
allowed in addition to Persian.
(1979, Iranian Constitution, Article 15, in Riazi 2005:107).
In 2014, 45 million (about 53 per cent) of the 81 million population in Iran were Persian
speakers (The World Factbook). Armenian is set apart from Azari, Kurdish, and Gilaki-
Mazandarani by the fact that it is also used to a limited extent as a language of education
in the Armenian national K-12 schools, restricted to religion classes (Armenians are
Christian) and Armenian language courses. Persian is used for all other courses in the
curriculum (Nercissians 2001). The World Factbook provides the following percentages of
first-language speakers for the top five languages, and also includes percentages for Luri
(also spelled Lori), Balochi (or Baluchi), Arabic, and ‘other’: Persian 53 per cent; Azeri
Turkic and Turkic dialects 18 per cent; Kurdish 10 per cent; Gilaki and Mazandarani 7
per cent; Luri 6 per cent; Balochi 2 per cent; Arabic 2 per cent; other 2 per cent
(www.cia.gov/library/publications/the-world-factbook/geos/ir.html). The 2 per cent figure
for ‘other’ includes Armenian, Assyrian, Georgian, and Persian sign language, among
others. A detailed map of the language families and languages spoken in Iran can be
found at Everytongue (n.d.).
Page 2 of 18

‘Developing’ languages, according to the Ethnologue scale of endangerment, are just

below institutional languages. They enjoy a certain amount of stability and recognition
through some degree of standardization, use in literature and some media. ‘Vigorous’
languages have three crucial features: the language is used for in-person communication,
by all generations, and is highly sustainable, though its use is not standardized.
In the following sections, I present a general description of societal bilingualism, how

bilingual communities come about, how best to describe the kind(s) of bilingualism found
in Iran, including the use of English. I then turn to bilingualism in Iran from a historical
perspective, with the goal of understanding why there are so many languages in present-
day Iran and their language-family sources. Finally, I address the role and status of
English in pre- and post-revolutionary Iran, and issues of language maintenance.
14.2 Language contact and bi-/multilingualism

What is bilingualism/multilingualism? Is it when individuals speak two or more languages
fluently, flawlessly as if they are two monolinguals in one brain? Is it the ability to
communicate at some level in two or more languages, with one language dominant and
the other(s) (p. 349) functionally useful to say, order food, find a restroom, get directions,
etc. (Edwards 2002)? And, where does literacy figure into the definition? Must a bilingual
be able to read and write in all their languages? Does the ability to speak one language
and read and write in another, make you a bilingual? The answer to these and other
similar questions aimed at defining bilingualism is a resounding ‘Yes!’ and ‘No!’ As it
turns out, defining bilingualism is complicated. Many variables must be considered,
including the ‘purpose’ of the bilingualism. For example, to be able to read and write but
not speak a language may be perfectly fine for some purposes of bilingualism (for
instance, to translate documents), but not for other purposes (such as serving as a
simultaneous translator). Literacy skills in both languages may be a necessary
requirement for defining bilingualism in some cases but not in others (for example, young
pre-school children growing up in bilingual households). Hence, age of acquisition,
method of acquisition and domains of use are other important variables to consider.
Furthermore, can we apply the same definitions we use to describe bilingual individuals
to describe bilingual speech communities? Common sense leads us to a ‘no’ on this point
for at least two reasons. First, let’s keep in mind that a community is a grouping of
individuals who share cultural and linguistic norms, not individuals with identical cultural
and linguistic norms. Second, as noted, there are many ways to cut the bilingual pie—
there are many types of bilinguals, individuals whose two (or more) languages are in a
somewhat unique relationship with each other regarding domains of use and language
dominance, literacy, and age and method of acquisition of each language.
Page 3 of 18

For the purpose of this chapter, bilingualism will be defined as the alternation between
two or more languages on a regular basis (Weinreich 1953; Mahootian 2006, 2012). This
definition is particularly useful because, in its generality, it allows for application of the
description both to language use by individuals and within communities.
14.2.1 Types of bilingualism
Bilingualism may come about as a result of exposure to two languages simultaneously or

sequentially. Simultaneous bilingualism typically begins at birth within households where
more than one language is used on a regular basis. Sequential bilingualism can occur in
childhood or in adulthood, and, as the term implies, when individuals are exposed first to
one language (i.e. in the home), and later to another language, typically outside of the
home, usually through school. Sequential bilingualism can be further distinguished as
early or late. Early sequential bilingualism occurs pre-puberty, late bilingualism refers to
the acquisition of a second (or third, fourth, etc.) language post-puberty, at any point in
adulthood. In the latter case, the second language may be acquired formally through
classroom instruction as a foreign language or more holistically, for example as a result of
immigration and full immersion into a new culture and language (De Houwer 1990;
Mahootian 2005; Romaine 1995). Sequential bilingualism can be the result of de facto
bilingualism (the existence of more than one language in a nation) or de jure bilingualism
(the existence of more than one language in a nation by official decree, as part of a
nation’s constitution). This distinction is important in the big scheme of things because it
can mean life or death for a language.
(p. 350) 14.2.2 The bi-/multilingual community: creation, shift, loss
Bilingualism and multilingualism are natural by-products of language contact. Whenever

there has been cause for two cultures to meet on a national scale, whether through war,
colonization, commerce, migration, or immigration, often one of many outcomes of the
contact has been the creation of bi-/multilingual speech communities—at least initially. As
time passes, either the two languages in contact survive and coexist, with each language
associated with a set of domains in which the particular language is the unmarked choice,
while the other language would be considered a marked choice (Myers-Scotton 1993),
hence creating a diglossic speech community, where the two languages are an
established part of the community and stable, each with its own sphere of use (Ferguson
1959; Fishman 1967). For example, in cases where there is an official national language,
the official language is used in government, education and other official contexts, while
the regional/local language is used in the home, among friends, in shops, etc. Use of the
official language with family members, neighbours, or local shops, would be considered a
marked choice.
Page 4 of 18

Since not all bilingual communities are created in the same way, nor do they have the
same ‘lives’ and life span, they can’t be uniformly described. Each community has its own
identity based on a number of variables, including number of speakers, status and
function of each language within and outside of the community, and the stability of each
language— variables that are somewhat fluid and form and reform the community’s
linguistic and ethnic identity. Furthermore, depending on the relative isolation from or
integration into the larger public domain, a bilingual community may be a temporary
community which survives a couple of generations before the home language and the
public language become one. Language shift and eventual loss have various causes. The
Expanded Graded Intergenerational Disruption Scale (EGIDS), developed by Lewis and
Simons (2010) and based on Fishman’s Graded Intergenerational Disruption Scale
(1991), defines shift as a language situation where ‘the child-bearing generation can use
the language among themselves, but it is not being transmitted to children’ (go to
Ethnologue.com to see the complete Expanded Graded Intergenerational Disruption
Scale). The reasons a language does not get transmitted to the next generation constitute
a key element in understanding language death. Simply, the speakers no longer find the
local home language useful. From their perspective, it no longer serves an important
function.
Among ethnolinguistic communities, a variety of opinions on the future prospects

of their languages can be observed. Some speakers of endangered languages
come to consider their own language backward and impractical. Such negative
views are often directly related to the socio-economic pressure of a dominant
speech community.
(UNESCO Ad Hoc Expert Group on Endangered Languages 2003: 1)
Shift can be slow, or it can happen over a just a few generations, such as in cases of
immigration, especially when the immigrant minority is less than welcome in the host
country. Rapid shift is often seen within immigrant communities. Gradual language shift
is often observed where several languages have coexisted within one nation-state border,
generally because the border has shifted through wars and invasions. Over time, as
power has shifted from one region to another, the speech communities and their
languages have become minorities, and the communities themselves are non-urban, not
central to commerce (p. 351) or government. These local languages have no currency in
the big national picture. Since nothing vital outside of the community happens in these
languages, they are not considered a resource to be protected and cultivated (Blommaert
2005; Mahootian 2012). Consequently, parents are reluctant to pass the language on to
their children, children are reluctant to use it, and instead opt for the majority language.
A continuation of this cycle eventually results in language death. It is, therefore,
important to catch this sequence before it reaches its end point, and while there is an
opportunity to reverse it.
Page 5 of 18

The EGID Scale can be applied to bilingual communities born out of immigration or to
those that have evolved over a long, usually bloody, period of contact through invasions
and conquests (such as the case of Iran). In the context of immigration, the home
language is replaced by the host country’s language over two or at best three generations
due to marriages outside of the home-language community, monolingual school systems
in the host language, and the pressure to assimilate and become part of the larger
community (Bourhis and Marshall 1999; Romaine 1995; Mahootian 2005, 2012).
Similar pressures can be found among minority speech communities in historically

multilingual nations such as Iran, especially where only one language is recognized as
official and statutory, producing the same results—language shift takes place where the
minority language is replaced or displaced by the majority language. Such scenarios of
language endangerment and loss are far too common among the world’s languages
(Fishman 1991; Hale 1998; UNESCO 2003, 2010), and languages in Iran are no
exception, as we shall see. Language endangerment is mostly unidirectional, leading to
extinction of the endangered language. What constitutes endangerment? Nine factors and
conditions are identified by the UNESCO Ad Hoc Expert Group on Endangered
Languages to help assess level of vitality and/or endangerment of a language. Most
significantly, the factors take into account the number of communicative domains of use
of a language, the number of speakers, the availability of educational materials, language
policies, speakers’ attitudes towards the language: as the domains become fewer and
fewer, cultural transmission of the language ceases from one generation to the next, and
eventually the number of speakers dwindles until no one speaks the language (UNESCO
2003; Ethnologue n.d.; Endangered Languages Project (n.d.); Grenoble and Whaley
2006).
14.3 Iran as a multilingual nation
Page 6 of 18

Although we can refer to Iran as a multilingual nation, because the various languages
themselves are in a bilingual relationship with the official national language, Persian, we
can more accurately refer to Iran as a multilingual nation, made up of many different
bilingual communities, where in each community, the official language, Persian, is one of
the languages understood and used by speakers for various functions in several domains.
It is also important to mention that not all speakers in Iran are bilingual. In fact, in areas
where Persian is the native/home language, speakers are typically monolingual. About 53
per cent of the population falls under this category.
Put another way, bilingualism in Iran means that in some communities Persian is the
dominant language of the bilingual community, while in other communities, the (p. 352)
regional language is the primary language—though in all cases, Persian, as the official
national language is the language used in schools. In essence these bilingual communities
may be described as diglossic. It is often the case that the regional home language and
Persian are acquired sequentially, with Persian acquired later than the home language. In
fact, in some cases, children are not exposed to Persian until they begin kindergarten
(which is not compulsory) or the first grade. In a study of 7,703 fourth-graders across the
twenty-seven provinces in Iran, Hameedy (2004) found that nearly 11 per cent of the
respondents had not learned Persian ‘during childhood’. Since Persian is the official
language of education, it means that these children would have been exposed to and
acquired Persian once they entered kindergarten or first grade. More than one-third
(nearly 35 per cent) ‘indicated that they never, or only at times, spoke Persian at
home’ (p. 5), which suggests that these children must have picked up Persian from other
sources in the community, including television programmes, before beginning school.
Once the child begins school, Persian dominates, with all books, teaching materials, and
lessons provided in the official language.
As is the case in many diglossic speech communities, the home language is used in
informal, intimate, and ‘unofficial’ contexts, while the official language, Persian, is used
for education, government, and official media including most entertainment, and other
national contexts. Narcissians’s study of Armenian bilinguals and Azari–Turkish bilinguals
in Tehran confirms this split in domains of use of Persian and home language. Aliakbari
and Khosravian’s study (2014) of a total of 220 Lur–Persian, Kurdish–Persian and Azari–
Persian bilinguals showed similar results:
The three ethnic groups valued the knowledge of Persian, the common language
used in day-to-day communication as well as the official language used in the
educational and other formal establishments. At the same time, they expressed
strong desire for the retention and the use of their mother tongue. It can be said
that mother tongue was mostly used in intergroup and informal face-to-face
communication, while Persian was frequently employed in formal situations.
(p. 199)
Page 7 of 18

With regard to the language of media, as to be expected, though constitutionally

permitted to produce and disseminate media in their own languages, not all minority
groups have the same resources and support. First, all Iranian broadcasting is controlled
by the state and television and is the most-popular medium, reaching more than 80 per
cent of Iranians (www.bbc.com/news/world-middle-east-14542234, 2013). Of the eight
state-run news agencies, two are in the English language. The state-run Islamic Republic
of Iran Broadcasting (IRIB) runs national and provincial services. Additionally, 90 per cent
of websites are in Persian, with another 10 per cent in other languages; Arabic, Azari, and
Armenian are attested (Rogers et al. 2013, in Hartley et al. 2013). In the provinces, we
can also find news broadcasts in Arabic and Kurdish, and entertainment in Balochi, Azeri,
Kurdish, and Mazandarani. The programmes in the latter are mostly comedies and:
in some cases American comedies dubbed into Mazandarani, as well as programs

that revolve around Mazandarani culture. A noteworthy example of the latter is
Kakaroon, a relatively popular game show (in Mazandarani) in which the
contestants have to answer questions about Mazandarani traditions and old
proverbs [and] each episode of the show is set in a different village in
Mazandaran.
(Personal communication, Mohsen Mahdavi Mazdeh, June 2015)
(p. 353)Overall, these facts underscore the diglossic nature of multilingualism in Iran,
with Persian as the statutory official language, and provide support for the designation of
‘institutional’ for Azeri, Kurdish, and Mazandarani.
It should also be noted that historically ‘bilingualism’ has not been valued in Iran (Afshar
1989). Seen as a possible threat to national unity, before, and less so after, the 1979
revolution, very little has been done to promote the variety of languages spoken in Iran.
However, some studies seem to indicate that in the last couple of decades bilingualism
has been recognized as a resource (Narcissians 2001; Aliakbaria and Khosravian 2014)
rather than indicative of membership in a disenfranchised rural community, or as a
national threat. In the section on language maintenance, I discuss the consequences of
the split between home and school language.
14.3.1 Language diversity, status, and endangerment in Iran
Although the end result is the same, i.e. the language is not passed along to the children,
language shift and loss in historically multilingual nations such as Iran are typically
associated with economic factors; shift and loss of a local language results from economic
depression, lack of jobs, and isolation of a community, which results in younger
generations moving away from their region and families to larger more economically
robust urban centres which do not support their home languages. In a study on
bilingualism in Iran, Narcissians (2001) conducted a study on bilingualism in Tehran
where she found a rapid growth in the Armenian- and Azarai–Turkish-speaking
Page 8 of 18

populations. She attributes the growth to ‘widespread immigration from provincial cities
and rural areas’ (p. 59).
The National Population and Housing Census shows about a 40 per cent increase in
‘urbanization’2 and a corresponding 40 per cent decrease in ‘ruralization’ (IRAN/
Iran-2011-Census-Results.pdf). For the years 2000–2011, ‘urbanization’ increased about 6
per cent, while ‘ruralization’ declined about 6 per cent (in Selected Findings of National
Population and Housing Census 2011: 23). The urbanization and correlative de-
ruralization of Iran included the population movement from the countryside to cities,
which, as indicated, is one of the factors contributing to language loss. In fact, as Iran
continues urbanizing, more rural and isolated languages are dying. The World Factbook
(n.d.) reports an urban population of 72.9 per cent of the total population of over 80
million in 2014. Additionally, it reports an urbanization rate 2.07 per cent annual rate of
change between 2010 and 2015.
Moreover, according to the Endangered Languages Project (n.d.), forty-six of seventy-six

active languages in Iran are in the range of endangerment, from ‘vulnerable’ to ‘critically
endangered’.3 Of these forty-six languages, twenty-four4 are on Ethnologue’s and
UNESCO’s (p. 354) 2010 endangered, critically endangered, or extinct list, mainly as a
result of language shift (Ethnologue n.d.; UNESCO 2010). Not surprisingly, all the
languages are rural languages with speaker populations as low as 160, such as the case
of Koroshi, which is spoken by only fifty or sixty families in the Fars province and is
considered critically endangered (Endangered Languages Project n.d.). A profile of the
development and endangerment status of the languages in Iran can be found on
Ethnologue (n.d.).
As mentioned earlier, there is also a social-psychological component underlying shift that

needs to be understood and addressed. Many consider language to be the most important
dimension of individual identity (Clément 1980), one of the most important symbols of
ethnicity (Giles, Bourhis, and Taylor 1977) and a stronger link to identity than residence,
religion or ancestry (Pool 1979). Bucholtz and Hall (2005) define identity as a ‘relational
and sociocultural phenomenon that emerges and circulates in local discourse contexts of
interaction rather than as a stable structure located primarily in the individual psyche or
in fixed social categories’ (p. 585). The style, register, and language(s) we choose to
express ourselves all contribute to who (we think) we are, how we want others to see us
and how others actually perceive us. In short, language constructs, indexes, and reveals
identity (Mahootian 2012). In the case of the minority languages in Iran, many are
denigrated and marginalized by the speaker(s) themselves and by speakers of other
speech communities, notably, the majority language speech communities (i.e., Persian,
Azari, Kurdish speakers) as not ‘modern’, as ‘dahati’, a derogatory term which literally
means ‘of the village’ and ‘village dweller’, but is used to mean an ignorant, unrefined,
unskilled individual. This psychological ‘attack’ on the local language not only speeds up
the demise of the language and the culture norms, values and history that are transmitted
Page 9 of 18

through the language, but also destroys a part of the identity of the individual. As Padillo
and Borsato note:
The language that a person speaks often takes on extralinguistic characteristics

that go far beyond the need to communicate. For members of many ethnic groups
… the language itself becomes a symbol of the group’s vitality.
(2010: 12)
Consequently, language shift and loss are always at the cost of individuals and cultures,
no matter what the reasons. In section 14.5, I address some steps that can be taken to
revitalize and preserve endangered languages in Iran.
14.4 History of language contact in Iran

As mentioned, Iran is a multilingual nation, made up of many bilingual communities. The
languages spoken in Iran today descend from three major language families: Indo-Iranian
(a branch of the Indo-European language family), Semitic, and Turkic. Speakers of Indo-
Iranian languages constitute the largest language family group. Nearly 78 per cent of the
population speaks an Indo-Iranian language, including Persian, Kurmanji (spoken by
Kurds of Western Iran), Luri (spoken by the Bakhtiaris and Lurs), Balochi (spoken by
tribes in southeastern Iran), Caspian languages (such as Gilaki, Mazandarani, Talish). The
other 22 per cent speak various dialects of Turkish (18 per cent, such as Azari), and
Semitic languages (2 per cent, such as Gulf Arabic, and Assyrian), Armenian, and Persian
sign language (World Factbook n.d.). A complete list of all the languages can be found on
various sites including Ethnologue.
(p. 355) 14.4.1 The early years: Elamite, Akkadian, Old Persian
Most of the history of language contact in Iran can be summed up as expansion and
invasion-induced contact. Our starting point for a history of the source of the variety of
languages in Iran is 550 BC, when the Persians, a subgroup of Iranians under Cyrus the
Great, established the Persian Empire (the Achaemenids) (see also Chapter 2). Although
the original Iranians are said to have been in the region since 1000 BC, they were
subsumed by the Assyrians. Three official languages Elamite, Akkadian, and Old Persian,
are attested during the reign of Cyrus and his son Darius, during which time the city of
Susa (now Shush) in present-day south-western Iran was the seat of government
(Dandamayev 2002). As the empire pushed into Mesopotamia, Aramaic was added to the
list of languages, and became the official court language, used to communicate with all
the territories in the region (Shaked 1987). Further conquests by the Persians brought
Page 10 of 18

Greek into the mix, though it was used mainly for administrative purposes; there is little
evidence of a lasting linguistic influence, such as borrowings from Greek (Tucker 2001).
By the early 400s BC, the Persian Empire extended from India in the west to Libya in the
east. At its greatest extent, the Persian Empire included Libya and Egypt, northern
Greece and the Balkans, modern Turkey and Armenia north to the Caucasus, Palestine,
Jordan, Syria, Iraq, Persia, and east to the Indus River, thus including Afghanistan and
Pakistan, as well as territory north to the Aral Sea (including modern Turkmenistan and
Uzbekistan). These borders and the people and languages they enclosed set the
foundation for the multilingual, multiethnic Iran of today.
14.4.2 Greek, Arabic, Turkish
In 330 BC, after Cyrus and Darius, Alexander of Macedon and his armies swept through
and conquered the empire. Though a powerful force of destruction, Alexander’s conquest
and his military descendants failed to leave a lasting Greek impression, culturally or
linguistically, on Iran: ‘Greek influence was ultimately passing and superficial despite the
colonies of Greek ex-soldiers’ (Axworthy 2008: 30). Following nearly five centuries more
of ‘outsider’ rule, 226 AD marks the beginning of the Sassanid Dynasty and a revival of
Persian-ness from the minting of coins in Persian (Middle Persian instead of Greek) to
reestablishing Mazdaism, and a renewed tolerance for cultural and religious diversity.
The empire remained a patchwork of languages, cultures and religions, though Persian
was the official court language. During the next four centuries, important scientific and
medical texts of the time were translated into Persian, thus securing the role of Persian as
a language of oral communication and the language of literacy.
Additional contact through more invasions brought in two languages that have had
significant influence on the languages in Iran today. The Arab invasions and conquests of
Iran’s provinces of 641 to 707 AD brought the Arabic language and culture as well as the
religion of Islam to Iran (Axworthy 2008). By 696 AD, Arabic had become the official
language of the court and throughout Iran, with Arabic–Persian bilingualism becoming a
common trait of members of the court (Ostler 2006: 98), although Persian continued ‘to
be widely used as the spoken language’ (Curtis and Hoogland 2008: 14), as Arabic gained
as a language of scholarship. The Abbasid caliphate encouraged writings and translations
into Arabic, everything (p. 356) from literature to astronomy. The use of Persian as a
spoken community language marks Iran as one of few nations that did not capitulate to
the use of Arabic as a national language post-invasion (Olmsted 1948). Over the next two
centuries, more regional dynasties would arise (e.g. Samanids, Ghaznavids) headed by
ethnically and linguistically Persian rulers, bringing with them a returned focus on
Persian as a court language through a ‘cultural policy [of] avoiding Arabic
words’ (Axworthy 2008: 86).5
Page 11 of 18

Next in the line of linguistically influential groups are the Seljuks, a Turkic tribe who
defeated the Ghaznavids in the mid-eleventh century, thus bringing a stronger Turkish
language influence into the region,6 one which would later be reinforced by the Safavids.
In 1501, the Turkic Azari-speaking Safavid dynasty began a 200-year renaissance, with a
centralized government, and more linguistic, ethnic, and religious tolerance. The Safavids
themselves were thought to be Kurds who had moved to Azarbaijan, and eventually
became Azari speakers, though their origin was Iranian. They adopted Persian as the
court language. In 1795 the Qajar family established a dynasty that would rule Iran until
1925.
How is this history reflected in the languages spoken in modern-day Iran? Though the
empire shrank over 2,500 years, the remaining borders and the history of the contact
readily explain the existence of the five major languages used in Iran, and why Persian
has the status of official language. It also explains the existence of the many minor
languages spoken in Iran—including the 60,000 Georgian speakers outside Esfahan!
Though not the focus of this article, these historical facts also explain the continued
presence of Persian dialects outside the borders of Iran.
14.4.3 English in Iran
In addition to the languages that have become a part of the multilingual fabric of a nation
over centuries, it’s also worthwhile to address individual bilingualism achieved in
adulthood, through formal instruction. In particular, it is the case in many countries,
including Iran, where English is learned as a foreign language, resulting in the formation
of enclaves of career/academic bilinguals, or what has been referred to in the literature
as ‘achieved bilingualism’ (Hoffman 1991). These bilinguals don’t identify as a cohesive
community within the larger linguistic landscape, per se, but are still active users of more
than one language, and can be grouped together based on (a) their choice of English as a
second language, and (b) use of English for career advancement and travel. The term
‘school/cultural bilingualism’ (Skutnabb-Kangas 1984) is useful in understanding the
motivation for choosing English as the target foreign language, as it is used to describe a
situation where people become bilingual, usually in adulthood, through formal
instruction. The choice of English as a foreign language (rather than a language used in
the region or closer in, such as Armenian, Turkish, even Spanish or French), reflects the
awareness among Iranians in Iran of the status of English as an academic language and
also as a resource in non-academic global contexts such as travel, entertainment, and
commerce, an awareness (p. 357) shared by much of the world. The English language,
and along with it American culture, has been making its way into a variety of contexts
and venues outside the United States, with greater and lesser impact. For example, it is
widely used in advertisements in much of the world, including in Iran, to convey
metalinguistic message: English is associated with US pop culture, and as such it is seen
as reflecting and representing a cool, modern, exciting element (see Androutsopoulos
Page 12 of 18

2007; Ben-Rafael 2008; Cheshire and Moser 1994; Haarman 1986; Revaz 2014; among
others).
In fact, English has been, to some degree, a global language for centuries, beginning with
British colonialism and continuing with American capitalism. English was first introduced
into Iran in the nineteenth century under the Qajar dynasty (Nasir al-din Shah’s rule),
along with the science and technology of the West. After the First World War, and during
the decades that followed until the Iranian revolution in 1979, English gradually became
the dominant foreign language, effectively replacing its rival, French.7 The reforms
brought in by the last Shah and the resulting economic and military/political ties between
Iran and the US strengthened English as a necessary and desirable foreign language to
be acquired (Axworthy 2008; Riazi 1995). The importance of English grew during this
time and spread to other domains, including education and careers, to a point that
degrees earned from Iranian universities began to take second place to those obtained in
the United States, and parents were clamouring to send their children off to American
universities.
English became an important requirement in the Iranian military because a good

command of English was needed for the army personnel to go to the US for
further specializations. In addition, teaching English became a social need and
private language schools mushroomed in the capital and many large cities.
Knowledge of English became an essential requirement for many job opportunities
for the younger Generation … [English] was kept as a vehicle to educational
advancement in Iran. Thousands of Iranian students were sent to US universities
to get higher educational degrees.
(Farhady and Sajadi Hezaveh 2010: 10)
Although immediately post-revolution English language teaching was mostly suspended,

it eventually made it back into the high school and college curricula. Borjian (2013)
provides an in-depth overview of the rise, fall and rise again of English beginning with its
first introduction into Iran in the nineteenth century, through the Pahlavi era and, finally,
post-revolution. She states, ‘Post-revolutionary Iran was envisioned with a homegrown
indigenized model of English education—an indigenized English free from the influence of
the English-speaking nations. The indigenization movement began some 30 years ago at
the onset of the 1979 Iranian Revolution’ (p. 20).
The purpose of indigenization was to separate the language from the Western, in
particular American cultural values that would necessarily accompany English as a
foreign language material and curriculum developed outside of Iran. There was and still
is to some degree, an additional fear, shared by many other nations, that English would
replace the host/national language as a language of science and education in higher
education. Mahboudi and Javdani (2012) write,
Page 13 of 18

In Iran, there is a case for making pro-active strategies in ELT [English-language

teaching] to protect the national culture. The textbooks do not include anything
about the culture of (p. 358) English speaking countries. For instance, almost all
the names or situations that are presented in the textbooks are Iranian.
(p. 89)
The official English language teaching policy has gone through a number of stages since
1979, from a we-don’t-need-it stance to its present-day status as part of the core
curriculum, carrying the same number of credit units as other core subjects in public
schools: four hours a week at junior high level, and six credits at the high school level
(according to the 2006 report of the Secretariat of the Higher Council of Education, cited
in Farhady and Sajadi Hezaveh 2010).
Riazi states:
the process of globalization has nevertheless exerted its own pressures to promote
the learning of English as a hidden curriculum. This demand has been responded
to in parallel formal and informal schools and language centres and, to a lesser
extent, by the state-run education system. The economic power of the
globalization process manifested in all aspects of people’s lives has resulted in a
social force which dictates English language use despite formal policies
introduced in school curricula.
(Riazi 2005: 113)
All in all, we are left with an image of official policy on the teaching of English pulled in
two directions: a recognition of the global status of English in higher education and in
international relations, and its flip-side, the potential threat of western values displacing
Iranian and Islamic values. Currently, these two forces are in a delicate balance, each
appearing to be unaware of the other’s existence.8
14.5 Language policy and language

maintenance
Page 14 of 18

With twenty-three languages on the critical or endangered list, one question among
Iranian languages linguists in and out of Iran is what steps can and should be taken to
halt shift, revitalize these languages and the cultures and values they encode, and make
them sustainable? A given in this context is a commitment on the part of government, the
minority language community, as well as the individual members of the community to
promote, maintain, and/or revitalize the language. These groups will ultimately shoulder
the responsibility of transmitting the language to subsequent generations and of
establishing social and educational infrastructures. In other words, the causes of
language shift need to be neutralized and reversed. Though we can make generalizations
about the causes of shift, it’s important to remember that each case is unique, a response
to a combination of political, economic and ideological pressures, including psychosocial
forces that can erode the worthiness of a language in the eyes of the speakers.
That said, we can still consider a set of guidelines for maintenance and revitalization, to
be adapted and tailored as determined by the specific language situation. In 2003,
UNESCO’s (p. 359) Ad Hoc Group on Endangered Languages produced a document
identifying and detailing nine factors that, taken together, help determine the level of
vitality/endangerment for a language. In order of importance these factors are:
(1) Intergenerational language transmission;

(2) Absolute number of speakers;
(3) Proportion of speakers within the total population;
(4) Shifts in domains of language use;
(5) Response to new domains and media;
(6) Materials for language education and literacy;
(7) Governmental and institutional language attitudes and policies, including official
status and use;
(8) Community members’ attitudes towards their own language; and
(9) Type and quality of documentation.
Rightfully, intergenerational transmission is crucial and paramount in the survival and

revitalization of a language, but, taken alone, it’s not always sufficient to maintain a
minority language. Nor is a large number of speakers, on its own. Krauss (1992) notes
that the languages that can be considered ‘safe’ are those where ‘we may identify two
obvious positive factors: official state support and very large numbers of speakers’ (p. 7).
As for official state support, the idea here is that the language is used on a regular basis
for at least some official/governmental business and services. The best-case scenario, of
course, is that the state supports and funds bilingual education. Consequently, children
have incentive and purpose to acquire and use the language beyond keeping a connection
with family. Official school use of local languages also signals a respect for the local
language (and culture) by the authority behind the majority language, which in turn
brings with it the recognition of the majority language users. These latter outcomes are
among the factors proposed by the Ad Hoc group as influencing language vitality and
maintenance.
Page 15 of 18

The survival potential of a threatened language is significantly increased where there is a

positive attitude from the community of users and from outside of the language
community. It bears repeating that none of these factors are sufficient, independently, to
determine the future of a minority language. Rather, a combination of the factors will
render a more accurate determination.
14.5.1 Status of minority languages in Iran
In the case of local languages in Iran, the Academy of Persian Language and Literature,
one of four Academies under the supervision of the State established in 1991, has listed
as one of its duties the following:
‫ﺑﻬﺮﻫﺒﺮﺩﺍﺭﻯ ﺻﺤﻴﺢ ﺍﺯ ﺯﺑﺎﻧﻬﺎﻯ ﻣﺤﻠﻰ )ﺩﺭ ﺩﺍﺧﻞ ﻭ ﺧﺎﺭﺝ ﺍﺯ ﺍﻳﺮﺍﻥ( ﺑﻬﻤﻨﻈﻮﺭ ﺗﻘﻮﻳﺖ ﻭ ﺗﺠﻬﻴﺰ ﺍﻳﻦ ﺯﺑﺎﻥ ﻭ ﻏﻨﻰ ﺳﺎﺧﺘﻦ ﻭ‬
‫ﮔﺴﺘﺮﺩﻥ ﺩﺍﻣﻨﮥ ﻛﺎﺭﻛﺮﺩ ﺁﻥ؛‬
‘The appropriate utilization of local languages (both inside and outside of Iran) in
order to strengthen and equip the language and enrich and expand the scope of
its function.’
(p. 360)What this means is that, at least in theory, there is some level of
acknowledgement from the state that (a) local languages exist, and (b) we need to help
them grow. However, this statement of duty taken together with the section on local
languages in Article 15 of the Iranian Constitution regarding language (‘the use of
regional and ethnic languages in the press and mass media, as well as for teaching of
their literature in schools, is allowed in addition to Persian’), makes it difficult to
determine what responsibilities the state has taken on in revitalizing and maintaining
minority languages. As noted earlier, attitude of the speakers themselves also contributes
to maintenance of revitalization efforts. If speakers associate the local language with
poverty, unemployment, low social status and other socially negative relationships, it’s
hardly likely they will work at passing the language on to their children.
Moradi and Raeesi (2015) conducted a language choice study in Dehloran in Ilam
province in Iran, where Luri, Kurdish, and Arabic are indigenous languages. The focus of
their study was to discover (a) why parents prefer their children to speak Persian rather
than the indigenous language, and (b) whether language shift was more prevalent among
particular groups within the community. Results from 206 participants’ responses to a
forty-six-item questionnaire consisting of questions that dealt with reasons for language
shift, and ‘possible reasons why parents prefer to speak Farsi [Persian] rather than the
indigenous language to their children’ during their early childhood (p. 31). 90.73 per cent
of the participants indicated they wanted their children to be able to speak Farsi [Persian]
comfortably and ‘understand the material better when entering schools’. Responses to
other questions highlighted negative attitudes towards the local language (indicating
Persian is ‘superior’ to the local language), and the association of Persian with greater
academic, economic and social success.
Page 16 of 18

Moradi and Raeesi also report on four other studies which had similar findings, all
showing a decline in the use of the local language in favour of Persian. Hooshmand (2007)
looked at the place of Luri in Mamasani; Zolfaghari (1997) examined the use of Bakhtiari
in Masjid Soleiman; Esmaili et al. (2008) conducted a study on the decline in the use of
Mazandarani; and Nasiri-Alamooti (2009) compared the use of Persian with the use of
Gilaki in Tonekaban and found that women used Persian more often than men.
14.6 Concluding remarks

The many indigenous languages in Iran face many of the same struggles faced by
minority languages in other multilingual nations. Revitalizing and maintaining these
languages is the responsibility of the nations, the communities, and the individuals in the
communities and the nation as a whole. Additionally, the linguistic academic community
at large has an obligation to do its part in helping to document the languages, to assist in
developing and providing educational curricula and materials at all levels of K-12, among
other types of assistance that can help to keep the languages vital.
Notes:
(1) None of the source books or websites are clear about the difference between Gilaki
and Mazandarani. Some assert that they are the same or closely related dialects. Native
speakers, using mutual intelligibility as a measure, separate the two as distinct but
related languages/dialects, with shared historical roots (personal communication, Mohsen
Mahdavi Mazdeh, June 2015).
(2) Urbanization is a population shift from rural to urban areas, resulting in increase in
the proportion of people living in urban areas over time. It predominantly results in the
physical growth of urban areas (World Urbanization Prospects 2014).
(3) For a full description of all five levels, visit http://www.endangeredlanguages.com/

assets/information_catalogue_endangered_languages.pdf.
(4) The list of endangered languages from the UNESCO endangered languages site
includes Ashtiani, Bashkardi, Dari, Dzhidi, Gazi, Hawrami, Hulaula (extinct in Iran),
Khalaj, Khunsari, Koroshi, Lari, Lishan Didan (extinct in Iran), Mandaic, Natanzi, Nayini,
Semnani, Senaya, Sivandi, Soi, Suret, Talysh, Tati, Vafsi (see UNESCO 2010).
(5) This policy resulted in Ferdowsi’s Shahname, written in Persian, with an aim to avoid
Arabic. It is regarded by some as a unifying force for Iranians, culturally and
linguistically.
(6) The Abbasid caliphs had originally brought in Turkish slaves into the region as
labourers (Axworthy 2008).
Page 17 of 18

(7) The history of French in Iran and its social prestige until the late 1950s dates back to
1839 with the establishment of the first French language school, the Lazarist’s
missionary Jeanne d’Arc School of Tehran (http://www.iranicaonline.org/articles/france-
xv).
(8) English language schools can be found in every corner of Iran (a partial list of schools
and cities can be found at http://www.eslbase.com/schools/iran). According to a San
Francisco-based Iran news website, Payvand News, in 2012 there were more than 5,000
foreign language schools in Iran, and weekly English lessons geared toward students
preparing for university entrance exams can be found on a few channels
(www.payvand.com/news).
Shahrzad Mahootian
Shahrzad Mahootian is Program Coordinator and Professor of Linguistics in the

Linguistics Department at Northeastern Illinois University in Chicago. In addition to
her attention to aspects of Persian and Iranian linguistics, her research interests and
publications include topics in language contact, bilingual language acquisition,
structural, cognitive and social aspects of code-switching and language choice,
language and identity, endangered languages, and language documentation and
maintenance.
Page 18 of 18

Persian as a Heritage Language

Anousha Sedighi

This chapter discusses the phenomenon of ‘Heritage Language’ as a whole, with a focus
on Persian as a heritage language. Studying heritage languages is an emerging field,
which has flourished within the past several decades and, as migration and globalization
grows, heritage languages are becoming more important. The first part of this chapter
provides an original contribution by unifying the existing research on Persian as a
heritage language. This is a crucial task, because various researchers have already
explored this topic from different perspectives. However, many scholars are not aware of
the existing research, which causes them to start the work from the ground up. The
second part of this chapter examines various characteristics of heritage Persian speakers
in terms of their linguistic and metalinguistic abilities, compares their profiles with that
of a native speaker and a second land-language learner, and sheds light on the current
challenges within this field.
Keywords: heritage language, Persian, heritage speaker, language acquisition, bilingualism, language policy,
language maintenance, Iranian diaspora, bilingual education, migration
Page 1 of 31

15.1 Introduction
WHILE the previous chapter studies multilingualism inside Iran, the current chapter
focuses on Persian as a heritage language outside Iran and mainly in the west. This
chapter discusses the phenomenon of ‘Heritage Language’ as a whole with a specific
focus on Persian as a heritage language. Studying heritage languages is a relatively new
and emerging field, which has flourished within the past several decades and, as
migration and globalization grows, heritage languages are becoming more and more
important. The first part of this study provides an original contribution by unifying the
existing research on Persian as a heritage language. This is a crucial and timely task
because, while Persian as a heritage language is a relatively new field, various
researchers have already explored this topic from the linguistic, sociological,
anthropological, and language policy perspectives. However, most scholars are unaware
of the existing research, which causes them to start the work from the ground up. This
work provides a cohesive overview of the existing literature that serves as a solid
resource for future scholars. The second part of this study builds on the previous work
and examines various characteristics of heritage Persian speakers in terms of their
linguistic and metalinguistic abilities, compares their profiles with that of a native
speaker and a second-language learner, and sheds light on the current challenges within
this field.
The structure of this article is as follows. Section 15.2 discusses heritage languages in
general, provides definitions, and discusses its history. Section 15.3 provides a
comprehensive overview of the existing literature on Persian as a heritage language from
the linguistic, sociological/anthropological, language policy/bilingual education, and
pedagogical point of views. Section 15.4 builds on the existing work of Sedighi (2010) and
explores the linguistic and metalinguistic characteristics of heritage Persian speakers
from the points of view of language style, phonetics, phonology, lexicon, syntax,
sociocultural norms, language attitude, identity, motivation, and community and parental
attitudes and compares their performance with that of a native speaker and a second-
language learner. Section 15.5 discusses the current pedagogical challenges and provides
recommendations for future research and policy making for Persian as a heritage
language. Section 15.6 summarizes the chapter.
15.2 Heritage language: definition and

(p. 362)
history
Historically, the term ‘heritage language’ has been used in Canada since the 1970s
(Baker 2000; Cummins 2005), while in Europe and Australia the term ‘community
language’ has been preferred (Horvath and Vaughan 1991). Drapper and Hicks (2000: 19)
Page 2 of 31

provide the following description for a heritage speaker: ‘someone who has had exposure
to a non-English language outside the formal education system. It most often refers to
someone with a home background in the language, but may refer to anyone who has had
in depth exposure to another language.’
Fishman (2001) provides an anthropological perspective and divides heritage language

learners based on their sociohistorical relationships within the United States into three
groups.
1. Indigenous languages, which are spoken by aboriginal Native Americans who

existed before the arrival of Europeans. Many of these languages are now
endangered.
2. Colonial languages such as Spanish, French, German, or Italian, which were
spoken by the earlier settlers.
3. Immigrant languages such as Arabic, Persian, Korean, etc. that migrated later on.
Fishman’s (2001) definition has been known as the ‘broad’ definition, since it implies that
the heritage language might be a language in which the person has no language ability at
all, but has a cultural connection to that language. In contrast to studies that provide a
definition based on the affiliation with the language, Valdés (2000, 2001) provides a
proficiency-based definition, commonly known as the ‘narrow’ definition. Valdes (2000:
375) characterizes a heritage speaker as someone:
who is raised in a home where a non-English language is spoken, who speaks or

merely understands the heritage language, and who is to some degree bilingual in
English and the heritage language … For the most part, the experiences of these
heritage speakers have been similar. They speak or hear the language spoken at
home, but they receive all of their education in the official or majority language of
the countries in which they live. What this means is that, in general, such students
receive no instruction in the heritage language.
Among others who have provided a broad definition one can mention Van Deusen-School
(2003) and Foley and Thompson (2003: 99), who define Heritage Language as: ‘the
language, which is frequently the means of establishing and reaffirming consolidation
with ones origins, though linguistic proficiency is not a pre-requisite.’ Adopting a more
neutral approach, Polinsky and Kagan (2007: 368) state, ‘Heritage speakers are people
raised in a home where one language is spoken who subsequently switch to another
dominant language … the version of the home language that they have not completely
acquired is called heritage language’.
Boon and Polinsky (2015: 2) argue that a heritage language is ‘an ethnic or immigrant
minority language which is the weaker of a bilingual’s two languages’. This part of their
definition assumes the speaker is bilingual, which can be considered as a strong
statement (see (p. 363) Chapters 13 and 14 for more on bilingualism). The second part of
Boon and Polinsky’s definition provides a broader picture by stating, ‘heritage language
speakers feel a cultural or family connection to their home language, but their most
Page 3 of 31

effective and frequently used language is the one that is dominant in their community’.
Boon and Polinsky (2014) argue that ‘a true heritage speaker, from a linguistic
perspective, is one whose personal experience with the language has led to a real amount
of proficiency’ (see Chapter 16 for more discussion on proficiency).
An in-depth study of heritage languages in general and a review of the entire literature
for different languages is beyond the scope of this article, which mainly focuses on
Persian as a heritage language. However, it is worth mentioning that in terms of
terminology, researchers have used different terms to refer to heritage speakers: Dorian
(1981) uses the term ‘semi-speakers’; Baker and Jones (1998) use the terms ‘unbalanced,
dominant, or pseudo-bilinguals’; Montrul (2002) and Polinsky (2006) call them ‘incomplete
acquirers’; and Kim et al. (2006) consider them ‘early-bilinguals’. Since the background
and setting in which a person is exposed to a heritage language differs from one person
to another, the language abilities of heritage language speakers cover a wide spectrum
and will vary from one person to another. Polinsky and Kagan (2007) argue that heritage
speakers fall along a ‘continuum’ according to the speakers’ distance from the language
abilities of the native speakers. The continuum-based model enables researchers and
educators to classify heritage speakers more accurately and readily.
In response to the question of ‘What is the significance of researching heritage

language?’ Boon and Polinsky (2014: 5) summarize the central goals of studying heritage
languages into four major groups:
1. Describing precisely what it means to be a heritage speaker and identifying the

range of variation among different heritage languages and their speakers.
2. Using patterns in the structure of heritage languages to inform our understanding
of uniquely human ability to create and use languages in general.
3. Testing the possibility of predicting the degree of heritage language maintenance
or loss for a particular individual or community.
4. Determining the particular pedagogical challenges presented by heritage speakers
in the classroom.
To this list, one can add the following goals:
5. Providing recommendations for educators and curriculum developers of heritage

language instruction.
6. Raising awareness within the respective ethnic communities to foster their
children’s linguistic abilities.
7. Raising awareness of the importance of heritage languages to policy makers and
in order to create K-12 environments to reinforce heritage languages.
With this background on heritage languages in general, we now focus on Persian as a

heritage language. The next section aims to provide an extensive overview of the existing
literature on Persian as a heritage language.
Page 4 of 31

15.3 Overview of literature on Persian as a

(p. 364)
heritage language
Researchers have studied Persian as a heritage language from a variety of perspectives
including the sociological, anthropological, linguistic, and pedagogical points of view.
However, there has not yet been a study to unify and gather different approaches and
present them as a cohesive topic of research. As such, there is clearly a need for a
comprehensive overview of existing literature on heritage Persian that can help future
researchers know what has already been done and provide a roadmap to further explore
the topic. This section aims to provide a much-needed comprehensive overview of the
literature.
15.3.1 Linguistic studies
The most comprehensive linguistic study to date has been done by Cagri, Jackson, and
Megerdoomian (2007). They examine a wide range of language features in Persian
phonology, morphology, syntax, lexis, collocation, and comprehension, totalling fifty-six
distinct test sets through a computer-based battery project. They control the items in
terms of their frequency, vocabulary level, and register, and measure the responses in
terms of their accuracy and reaction time. They conclude that heritage learners perform
faster than second-language learners across the board and perform better in areas such
as argument structure (causatives) and formation of complex conversational sentences
(sequence of tenses). On the other hand, they conclude that second-language learners
perform better on elements not frequent in conversational language such as Arabic
broken plurals. They also argue that English-like structures such as preposition
subcategorization hinder heritage learners more than second-language learners.
In a study that is aimed at raising awareness about Persian as a heritage language,

Sedighi (2010) explores the linguistic abilities of heritage Persian speakers and examines
them from a variety of perspectives including their syntax, phonetics, phonology, lexicon,
identity, motivation, and parental attitudes, while shedding light to the existing
limitations of Persian heritage-language instruction. She emphasizes the importance for
the Iranian community and academia to pay attention to the maintenance of their
children’s Persian language skills.
Moore and Sadegholvad (2013) conduct a survey on student errors in exams and other
assignments and study morphological, syntactic, and orthographic differences between
heritage Persian students and native speakers. Their data is collected from twenty-six
undergraduate students enrolled in the lower-level Heritage Persian course. Among their
findings, they report that heritage Persian speakers exhibit colloquial usage, transfer
effects, and simplification.
Page 5 of 31

Apart from studies that have explored the linguistic characteristics of heritage speakers
and compared them to native speakers and second-language learners, a couple of studies
have focused on applying the existing theories to heritage Persian learners. Focusing on
language skills, Abasi and Akbari-Saneh (2012) study the expression of interpersonal
meaning in writings of heritage and non-heritage advanced learners of Persian. They
examine the inscription of interpersonal meanings in the genre-specific writings of
heritage and (p. 365) non-heritage learners at the advanced level through the appraisal
theory (Martin and White 2007) and focus on the appraisal choices identifying
divergences and convergences in the way attitudinal choices have been realized in their
writings. They also discuss the implications of an appraisal theoretical perspective for
advanced foreign language instruction and heritage literacy.
While most of the existing studies on heritage Persian have focused on college-level
heritage learners, Sadeghi et al. (2014) examine text processing in what they call
‘English–Persian bilingual’ children and investigate whether a Simple View of Reading
framework could be used as a basis for determining predictors of literacy levels among
children learning to read and write in both Persian and English. Their results indicate the
existance of cross-language effects, specifically in listening comprehension and
vocabulary. This finding goes hand in hand with the general observation that heritage
speakers are stronger in oral skills, specifically in listening comprehension. That is, even
if they cannot converse in the heritage language, they mainly understand it. Moreover,
Sadeghi (2013) investigates potential cognitive-linguistic predictors of reading
comprehension levels among Persian monolingual and Persian–English bilingual primary
school children attending schools in New Zealand or Australia. He uses the findings to
derive a model of Persian reading comprehension similar to the Simple View of Reading
and argues that they are of use for development of cross-language models of reading and
global theories of reading comprehension.
15.3.2 Sociological and anthropological studies
Sociological and anthropological studies on the Iranian-American diaspora flourished

after the mass migration of Iranians due to the Islamic Revolution; among others, one can
mention Askari et al. 1977, Ansari 1988, and Bozorgmehr 1992. Some of these studies
have discussed the role of language maintenance within the Iranian diaspora (see also
Chapter 14 for more on language maintenance). Ansari’s (1988) book conducts a case
study of Iranian-Americans in the United States from a sociological perspective. In terms
of the language, he argues that the phenomenon of language loyalty found in some other
immigrant groups (such as Hungarian) has not prevailed among Iranian immigrant
children, who show some resistance to their parents’ attempts to teach them Persian.
Ansari points out some reasons including the marginal number of Iranian children in
residential areas and the lack of sufficient cultural and religious centres. He states that
there are other social and psychological reasons that need to be further investigated and
explored.
Page 6 of 31

Mahdi (1997, 1998) studies ethnic identity among second-generation Iranians in the
United States. Mahdi conducts a national survey of second-generation Iranian youths in
the United States and examines the concepts of ethnic, cultural, and national identity.
Mahdi’s results show that the majority of participants could speak and understand
Persian. However, 60 per cent of participants could not read or write. Moreover, Mahdi
also reports the very interesting observation that based on his results, the notion of
‘Iranian identity’ among second-generation Iranians is more ‘symbolic’ than ‘behavioural’
and that values, norms, and symbols are not as easily accessible to them as they are to
their parents. Mahdi also points out the importance of a unified community which has a
positive effect toward the maintenance of the heritage language. For instance, he argues
that in an area with a higher concentration of Iranians (such as Los Angeles) a higher
percentage of heritage speakers (p. 366) maintained their language skills, as opposed to
areas with a lower percentage of Iranians. where the usage of the heritage language is
not a norm in the community.
Through a sociolinguistics lens, Modarresi (2001) conducts a study on heritage Persian

speakers which gives a general view of the status of Persian in the Iranian community in
the United States. He notes the interesting observation that second-generation Iranians
seem to be in the frustrating position of being American by birth on the one hand, and
bearing the stigma of being a foreigner on the other hand, especially due the anti-Iranian
sentiments during the hostage crisis of 1979–81 and thereafter. Modarresi (2001: 93)
states the following:
The first-generation Iranians have been trying to preserve their linguistic and
cultural heritage and introduce it to their children through various means such as
national Iranian ceremonies, radio and TV programs, newspapers, magazines,
books, etc. However, since the process of Americanization applies powerful
pressure, the second generation needs stronger cultural motivations and supports
to resist language shift.
Following Fishman (1966), Modarresi points out a very interesting factor: the role of
women in the language maintenance of their children. He states that the children of the
families whose mothers appreciate the Iranian cultural heritage have a relatively better
command of Persian. Modarresi (2001: 111) states: ‘Mothers who love the Iranian
cultural heritage and the Persian language try harder to teach Persian to their children,
and, thus, they actually play a very important part in the process of maintenance of
Persian in the next generation.’ Additionally, Modarresi points out the role of
grandparents and relatives visiting from Iran and spending considerable time with
children as well as frequent trips of children with their parents to Iran as crucial factors
toward the maintenance of the Persian language in second-generation Iranians.
Ramezanzadeh (2010) studies the effects of sociopsychological factors on heritage

Persian language loss and maintenance. She conducts a deductive thematic analysis of
coding on twenty-two college students. Similar to Modarressi’s findings, Ramezanzadeh’s
research lead to some interesting issues on identity. She observes that the identification
Page 7 of 31

with Iran among her subjects was complex: Iran is politically, religiously, and ethnically
‘othered’. Thus students strategically align themselves with different aspects of their
identity at different times and spaces depending on the audience and the effect they hope
to achieve.
Motivation plays a crucial role in maintaining heritage language skills. Miremadi (2014)
studies the motivational factors for heritage and second-language learners of Persian
through a forty-item questionnaire. The participants were one hundred heritage learners
and fifty second-language learners from two American universities. Responses to the
survey were subjected to descriptive analyses in order to capture salient features of
students’ responses. His results exhibit differences in motivation of students from the two
universities. Miremadi indicates the three main reasons for learning Persian as culture;
geopolitical issues and possible future relations between United States and Iran; and
interest in knowing their parents’ mother tongue.
Similar to the linguistic studies, most of the sociological and anthropological studies on
heritage Persian have focused on college-level learners. However, few studies have
focused on children and the issue of Persian as a heritage language. Atoofi (2011, 2013a,
2013b) provides a linguistic-anthropological study to evaluate the linguistic markers of
affect for the students and teachers in heritage Persian classes for children in Los
Angeles. Atoofi observes that Persian teachers’ strategies include linguistic repetition as
a poetic device to influence (p. 367) students as they coached students to do the same. He
also argues that teachers used affective communication for class management and
discipline and that affect was frequently used to create hierarchical differences. On the
other hand, Atoofi observes that students used affective display to align their class
interaction based on the output received from the interlocutor. Furthermore, Mokhatebi
(2014a) conducts a case study of identity and motivation among primary school Persian
community language learners attending four Persian community schools in Sydney,
Australia. Her data was collected by interviewing thirty-five students, ten parents, and
seven teachers. Mokhatebi provides interesting indications on heritage language
learners’ diverse ethnic identity perceptions and goals for studying their heritage
language.
15.3.3 Language policy and bilingual education
Looking at the general picture, the numbers do not sound very promising. In her research
focused on the language policy and bilingual education, Najafi (2009) studies the
maintenance of Persian language among Iranians in the United States and examines the
data from secondary sources such as the US Census (2000) and the Iranian Survey
(2007). Najafi’s research shows that among second-generation Iranians 70.3 percent
understand Persian, 55 percent speak Persian, 27 percent can read in Persian, and 21.6
percent can write in Persian. Thus, she argues that the rate of Persian loss as an oral
language is 45 percent and that the rate of language loss as a literate language is almost
75 percent. As we saw in 15.3.2, Mahdi’s (1997, 1998) research showed that the 60
Page 8 of 31

percent of second-generation Iranians could not read or write. This shows that
unfortunately within the ten years between Mahdi’s work and Najafi’s, the rate of
language loss has increased by 15 percent. This is a worrisome fact that make the
importance of focus on Persian as a heritage language all more vital. Najafi concludes
that as an oral language, Persian might get transferred to the third generation, but, as for
most immigrant languages, the literate language will die after the second generation.
Family plays a crucial role in the heritage language maintenance. Rohani et al. (2006)
explore the role of the family in language maintenance focusing on immigrant families
living around New York City from five language groups: Cantonese, Persian, Japanese,
Spanish, and Urdu. The key questions they explored were: What factors contribute to the
maintenance of language within each language group? What specific efforts are made by
families across groups to maintain language? How does each group vary in determination
and attitude towards language maintenance? For each language group, they conduct
interviews with six participants, three born in the United States and three born elsewhere
who came to the United States before sixth grade. The Persian component of the study
focuses on the Iranian Baha’is, who are considered a religious minority in Iran. The
results indicate that the Baha’i families have no sound language maintenance programme
for their children. Issues such as lack of access to the home country due to persecution
and natural assimilation to the American culture are the major factors contributing to this
language loss. Rohani et al.’s result for Persian is somewhat contrary to a later study
done by Shirazi and Borjian (2012), who explore the Persian bilingual and community
education among Iranian-Americans in New York City by conducting a study of nine
Iranian Americans living in New York City and examining the Persian community
language schools in that area. Contrary to Roshani et al.’s (2006) research on Iranian
Bahai’s in the same area, the findings of their study show that the (p. 368) Iranian-
American community is actively involved in making an effort for their children to learn
their heritage language as a means of providing them with ‘additional avenues of
expression, additional connections to their families, additional exposure to cultural
diversity, and additional tools to further their intellectual development and academic
proficiency’.
Another study on the impact of family language policy on maintenance of Persian in

second generation Iranian immigrants is by Kaveh (2003). She conducts a survey of six
Iranian immigrant families in England and the United States to explore the influence of
home language experiences and parental attitudes. Her results show that parents whose
children are fluent Persian speakers consider both languages equally important, while
parents whose children cannot speak Persian or do not use it to speak to them consider
English more important than Persian. Parents whose children can speak Persian
attributed their success to making efforts to introduce their cultural heritage through
language as well as their behaviour. Similarly, Babaee (2013) investigates the factors
contributing to the language maintenance of an Iranian individual during his four-year
stay in England. Data were collected through a semi-structured interview, descriptive and
Page 9 of 31

reflective field notes during and after the interview, and the participant’s journal writing.
Results show that the primary reasons for the participant’s maintenance of his heritage
language were parents and familial attachments, frequent visits to the home country, and
the heritage community.
Looking outside the English-speaking countries, Namei’s (2008) research explores the
language choices among Iranians in Sweden, both inside and outside the home. The data
were collected from 188 participants through structured interviews and questionnaires.
Namei’s results show that Persian is the main instrument of communication in the home
domain between parents and children. However, to some extent the Swedish language is
also used at home, especially by the second generation. An interesting observation in
Namei’s research is that mothers tend to use Swedish more than fathers. Namei argues
that this may be because they are much more involved in their children’s education,
which demands greater second-language skills. Based on Namei’s observation, one may
speculate that perhaps women are more open to accepting a new culture and language
than men are. Another interesting observation by Namei is that the topic of education
gradually constrains the usage of heritage language, as it is more straightforward and
less time-consuming for both mothers and children to talk in Swedish about matters that
concern school.
The existence and implementation of suitable language policy plays a vital role on
maintaining the heritage languages (see also Chapters 12, 13, and 14 for more on
language policy). Recently, in several parts of the United States, the ‘Seal of Biliteracy’
has been passed, through which public bilingual schools are established and supported.
These initiatives play a crucial role in heritage language maintenance and has significant
short- and long-term effects for a nation. In an interesting study, Hoffman (1988) studies a
secondary school in Los Angeles with a large population of Iranian students. Her results
show that the school was not successful in acknowledging the Iranian students’ national
origin, including their language and national holidays. Iranians collectively were viewed
negatively by the teachers, not because of the lack of academic ambition or poor
performance, but due to the students’ lack of respect for the rules and regulations of the
school. Iranians resisted the implicit educational mission of the school and maintained
their affiliation with Iran. They devised various modes of resistance to overcome what
they perceived as the school’s preoccupation with rules and regulations. The teachers, in
turn, perceived these actions as bypassing the rules.
Traditionally, community education has proven to be an important pillar in

(p. 369)
heritage language maintenance. Shirazi (2014) studies how a Persian community-

organized language school serves as a site of diasporic cultural production and examines
how the school serves as a site to teach the Persian language, delimit cultural meanings,
and facilitate a sense of belonging and community membership among a diverse
population of parents and children. Shirazi reports that community education efforts
remain vital to the understanding and exploration of notions of identity and culture that
may be socially contested within diasporic communities. Similarly, Mokhatebi (2014b)
studies the Persian community’s language learning through a case study of four Persian
Page 10 of 31

community language schools in Sydney, Australia. Mokhatebi reports on the challenges

and issues indicated by school principals, teachers, heritage learners, and their parents.
The result shows that the ad-hoc curriculum, lack of parental involvement, high rate of
learner attrition, learner language proficiency diversity, and paucity of professional
teachers are some of the issues.
While most of the supporters of bilingual education argue that promoting English over
the heritage language is harmful towards their cultural identity, Shavarini (2004) argues
that the promotion of English among second-generation Iranians does not necessarily
mean they have lost their ethnic culture and heritage because their heritage and culture
is nurtured in a variety of contexts that go beyond only the language. In terms of
maintenance of the Persian as a heritage language, Shavarini (2004: 141) states:
One salient component of successful assimilation into American society for

second-generation Iranians has been the acquisition and perfection of English.
Rather than being forbidden or discouraged to speak English at home, they were
encouraged. Mastering English meant that second generation Iranians would
improve their academic performance and further guarantee their success in
higher education.
15.3.4 Pedagogical studies
Page 11 of 31

Compared to anthropological, sociological, and language policy studies on heritage

Persian, pedagogical studies are somewhat underdeveloped. Among the early works on
the pedagogical aspect of heritage Persian is Paribakht and Rezaei (1992), who study
heritage Persian language programmes in Canada and provide a curriculum guide
towards Persian language instruction to heritage language learners. It is worth
mentioning that Canada is one of the pioneer counties to have worked on heritage
languages. At the University of California, Berkeley, Pirnazar (1998) studies the
characteristics of heritage Persian learners and addresses the challenges of teaching
mixed classes. She observes the diversity of students’ religious backgrounds, noticing
that while the religious minorities constitute less than 5 percent of Iran’s population, they
consistently had around 30 percent students belonging to the Baha’i, Jewish, Christian,
Sunni, or Zoroastrian communities in their classes. To capture the lack of appropriate
material, they created their own material focusing on folk and mythological tales and
cultural highlights such as Nowruz, the Persian New Year, and its rituals. They also
provided exposure to diversity in foods, jokes, and movies in the classroom.
Megerdoomian (2010) studies the pedagogical aspects of heritage Persian. Contra to

Lynch (2003), Megerdoomian (2010) argues that entirely communicative classroom
without the explicit instruction of grammar is not adequate for heritage learners,
particularly when (p. 370) it comes to formal language and literacy skills. Megerdoomian
proposes a novel pedagogical paradigm for heritage language teaching through linguistic
instruction. She argues in favour of a cutting-edge linguistic analysis rather than
grammar rules, so that the students tap into their intuition and discover linguistic
generalizations. Her proposal shares similarities with the ‘explicit focus on form’
approach. However, in her proposal, the explicit focus is on the linguistic forms but the
instruction is still in a content-based format (see Chapter 16 for more on content-based
instruction). Megerdoomian (2012b) studies various factors that play a role in language
loss in the Iranian diaspora, including the lack of systematic language instruction and
teaching material for all levels, ideological issues affecting attitudes towards Persian,
isolation from the home country, lack of firmly established Persian language communities
except for a few major cities, and parental behaviour and attitudes. Megerdoomian
expands on her 2010 work, proposing an instructional approach that emphasizes the
strengths of heritage speakers and caters to their specific needs.
One of the biggest challenges of heritage language instruction comes into play when both
heritage and non-heritage learners are taught in the same class. Hamedani (2015)
explores how to teach the Persian language having both heritage and non-heritage
language learners, while catering to the needs of the both groups of learners. Following
Tomlinson (1999), she argues in favour of ‘differentiated instruction’ and proposes
distinguishing classroom components as process, content, products, and the learning
environment.
At the pre-college level, Mokhatebi and Moloney (2010) conduct a study to evaluate and
develop current curricula for Persian heritage learners in Persian community schools at
the elementary level, K-5, in Sydney, Australia. They examine diverse factors associated
Page 12 of 31

with improving the quality of language teaching and learning of heritage learners of
Persian. Among other implications are the development of a curriculum that satisfies all
the various stakeholders. Fani (2012, 2013) explores the academic profile of heritage
Persian speakers at intermediate and advanced levels and examines the efforts of the
Iranian School of San Diego in creating autonomous curricula that attempt to address the
specific needs of its heritage learners.
In recent years, the US government has funded projects such as STARTALK that organize
teacher training workshops for college-level, K-12 level, and community language school
teachers. These teacher training workshops have been instrumental in providing proper
training and exposure to cutting-edge pedagogical methods to teachers, some of whom,
especially at the community language school, have had little proper training in language
pedagogy (see Chapter 16 for more discussion on pedagogy). Ahmadeian and Aziz (2015)
conduct a project to study the heritage teachers’ pedagogical practices and aim to help
them better manage the learning experiences of their students through a series of
targeted professional development workshops. Their study shows that teachers benefited
more from the professional development workshops that were conducted in their target
language as opposed to the workshops that were conducted in English.
15.3.4.1 Textbooks and instructional materials

The shortage of pedagogically sound and updated textbooks and instructional materials
for second-language learners of Persian is discussed in the next chapter. The same
shortage at even a larger scale exists when it comes to textbooks specifically designed for
heritage (p. 371) Persian learners. Among the existing work, one can mention the
elementary level workbook by Atoofi and Emami (2011) and created by the National
Heritage Language Resource Center at UCLA. Similar course packets and readers have
been created by professors at other universities such as UCLA (by Hagigi) and UCSD (by
Sadegholvad). At the pre-college level, the Iranian school of San Diego has been by far
the most active in terms of creating a strong curriculum and autonomous instructional
material for heritage Persian children. Among their work is Fani’s (2009) textbook,
designed for the advanced level heritage learners, and Ahmadeian’s textbooks for
elementary and intermediate levels. The significant need for creating age-appropriate
and level-appropriate materials for heritage Persian will be further discussed in section
15.5.
15.4 Current study: exploring the

characteristics of heritage Persian speakers
This section builds on previous work by Sedighi (2010) and studies the characteristics of
heritage Persian speakers identifying the recurrent features of their linguistic and
metalinguistic profiles from a variety of perspectives, including language style, phonetics,
phonology, lexicon, syntax, sociocultural norms, language attitude, identity, motivation,
Page 13 of 31

and community and parental attitudes, and compares their performance. It also compares
their performance with that of a second-language learner. Thus, the findings of this
section provide implications that can help educators, and curriculum and textbook
designers identify the areas that require more focus and attention. Methods of data
collection are through filling out a questionnaire, recorded spontaneous speech prior to
entering the Persian language classes, and class performance in a mixed class with
second-language learners. The subjects of this study are heritage Persian speakers at the
university level residing in the United States. They are mainly second generation with
some exceptions being third generation. The birthplace of the majority of these heritage
students is outside Iran, with a minority group who left Iran prior to primary school. They
have been exposed to the Persian language mainly at home, with their extended family,
and in the Persian community as well as the media sources such as Iranian TV
programmes and satellites, Iranian local radios, newspapers, magazines, and, more
importantly, the internet.
Recent studies differentiate between the terms ‘heritage speaker’ and ‘heritage learner’
and consider a heritage learner to be someone who is actively taking classes to learn the
heritage language. However, the data for this study is gathered from both sources: the
initial interview and the class performance of the heritage speaker. By using the term
‘speaker’ we are also indicating that the data is gathered from participants who have at
least a minimal level of speaking; those who could merely understand but could not speak
the language at all were excluded from this study.
The findings are divided into two major groups and presented in terms of ‘Challenging
areas’ where the performance of heritage speakers differs from that of the native
speakers (p. 372) and ‘Internalized areas’ which are the areas where the heritage
speakers perform closer to native speakers.
15.4.1 Challenging areas for heritage speakers
This section focuses on areas that tend to cause problems for heritage speakers and
differentiate their language performance from that of a native speaker/the baseline. The
differences come from Language style, Polite/Honorific form, Phonetics, Phonology,
Orthography, Lexicon and code switching, Relative clauses, Continuous past tense,
Present perfect tense, Future tense, Passive construction, Prepositions, and Broken plural
formation.
15.4.1.1 Language style

Before starting the discussion, the notion of ‘baseline’ language needs to be discussed.
Several studies such as Kagan and Polinsky (2007) propose establishing the baseline
language, which is the precise variety of the language that heritage speakers were
exposed to during childhood as spoken by native speakers in natural settings. The
baseline of the current study is the Tehrani dialect of Persian. Determining the baseline
language for Persian is not as easy as in languages where there is no major distinction
Page 14 of 31

between the spoken and the written language. However, for a language like Persian,
where there are clear differences between the written and spoken form of the language,
establishing the baseline language is not a clear-cut concept.
Persian has two registers. The first form is the ‘Written’ form, also known as Standard
which is the way language is written. It is also used in broadcast media. The second form
is the ‘Spoken’ form, also known as Colloquial or Conversational, which is the way native
speakers converse, both in informal and formal settings, i.e. with familiar people and in
more polite settings. The spoken form can also be used in writing letters, texts, and even
in modern poetry (see Chapters 2, 3, 4, 5, 6, 10, 11, 13, and 19 for more on the spoken/
colloquial form). The written form is learned by native speakers at school. Since heritage
speakers’ exposure to the heritage language is through home and family settings, in the
majority of cases they either entirely lack the formal training that one gets at school or
they have attended school minimally (mainly at the level of elementary education), which
still does not provide the proper language training that a high-school graduate or native
speaker has.
Several chapters in this volume discussed the colloquial speech from different
perspectives such as phonetics, phonology, and morphology. Table 15.1 attempts to
present the differences between the written and spoken/colloquial forms of the Tehrani
dialect by providing examples for each category. As discussed before, the differences
between the written and spoken forms are mainly at the phonology and phonetics level.
However, there are also lexical differences between the written and spoken form where
certain words are not used in written form and vice versa. There are also few syntactic
differences between the written and spoken forms. For instance, in spoken form, verbs of
motion (such as to go and to come) move from the sentence-final position to an earlier
position (usually before the goal/destination) and the preposition is also dropped (the
third category in Table 15.1). (p. 373)
Table 15.1 Differences between written and spoken forms of Persian (Tehrani dialcet)
Written Spoken
Phonological: be-rav-im! be-r-im!

Contraction sbjv-go-1pl sbjv-go-1pl
Let us go! Let’s go!
Phonetics: vaght vakht

Devoicing Time Time
Lexical: baleh āreh

Choice of word Yes Yeah
Page 15 of 31

Syntactic: mā be dāneshgāh mi-rav- mā mi-rim

Word order change and im dāneshgāh
preposition omission we to university cont- we cont-go-1pl
go-1pl university
We go to university We go to university
Since heritage speakers usually do not have sufficient scholastic education in the heritage
language, they fail to perform in the written form of the language even if they have the
basic literacy skills and know how to read and write.
15.4.1.2 Polite/honorific form

Both spoken and written forms of Persian contain the formal1/polite/honorific versus the
informal/familiar distinction. The formal/polite/honorific form is used when addressing
people who are older (such as grandparents), higher in rank (such as teacher or boss), or
people you are not very close to (such as clerk). The difference between the formal and
informal forms is shown in (1) and (2):
(1)
(2)
Interestingly, second-language learners seem to follow this honorific rule faster than
heritage speakers, or at least try to self-correct (Sedighi 2010). This could potentially be
due to the fact that second-language learners are more mindful of the cultural points of
the new language they are learning. However, heritage learners seem to have overlooked
this honorific (p. 374) rule since childhood and have not generally been corrected by
parents. Thus fossilization has occurred where the wrong form has been internalized and
is more difficult to be corrected at this stage.
Page 16 of 31

15.4.1.3 Phonetics
In an interesting study, Godson (2004) examines the production of five Armenian vowels
in heritage speakers and reports that, of the five Armenian vowels, only the two that are
rounded in English but not in Armenian (o and u) were similar in heritage and native
speakers’ production. The other three diverged from native speakers for heritage
speakers in the direction of the English vowels.
Heritage speakers of Persian pronounce certain sounds differently than native speakers
and this is mainly due to the interference of their dominant language, English. The two
typical problematic sounds are: /r/ and /gh/. They tend to pronounce /r/ as a postalveolar
approximant causing them to exhibit a major English accent. Another source of accent is
the sound /gh/. Most of the heritage speakers, except those who are highly fluent,
struggle with the /gh/ sound and pronounce it as /g/ which is the closest familiar sound in
their dominant language, i.e. English. See examples (3) and (4).
(3)
(4)
Since the sound /gh/ does not exist in English, the easiest strategy is to use the closest
available English sound, the voiced velar stop /g/. Again, knowing that these are new
sounds, second-language learners make a conscious effort to capture them (which is not
always successful). On the other hand, heritage learners have been using the wrong
sounds in a fossilized form, and mainly without being corrected at home. So it is more
difficult for them to let go of the old habit and to use the correct sound.
Unlike Cagri et al. (2007), this research shows that /kh/, which also does not exist in the
dominant language (English), is much more easily captured by heritage learners than by
second-language learners. However, compared to /gh/, /kh/ is much easier to produce for
both by heritage learners as well as the second-language learners.
While the sounds in (3) and (4) may cause an accent for the heritage learners, they
almost always exhibit less accent than the second-language learners and that may be
because they have a stronger command of stress and intonation system of Persian.
15.4.1.4 Phonology
Megerdoomian (2009, 2010) discusses the phonological alterations that occur in the
utterances of heritage speakers of Persian. Spoken Persian displays a number of
alternations that are not represented in the writing system (see Chapter 5 for a detailed
discussion of phonological processes of colloquial Persian). One of the most common
example is ‘bilabial (p. 375) assimilation’, whereby the dental nasal phoneme /n/ is
pronounced as the bilabial nasal /m/ when it is followed by a bilabial phoneme. Hence,
certain Persian words that originally contained a /n/ will be pronounced with a /m/ sound,
Page 17 of 31

such as shambeh instead of shanbeh (Saturday), zambur instead of zanbur (bee), and
dombāl instead of donbāl (after someone/something). Heritage speakers tend to follow
these colloquial alteration rules in speaking and writing. It should be noted that some of
the colloquial alternation rules, such as the bilabial assimilation, is more often used by
uneducated native speakers, who, like heritage speakers, have not had much scholastic
language training (this is similar to the American pronunciation of ‘aks’ instead of ‘ask’).
15.4.1.5 Orthography
Moore and Sadegholvad (2013) study the orthographical patterns of heritage speakers.
Among other factors, they argue that the phonetic and phonological inefficiencies of the
heritage speakers affect their orthography and provide examples such as:
(5)
Example (5) illustrates the orthographic substitution of /g/ for /gh/ by heritage learners.
While, this kinds of error does occur in second-language learners’ orthography but it is
not as widespread as it is for heritage learner, mainly due to the fossilization of familiar
words for heritage speakers.
15.4.1.6 Lexicon and code switching

Polinsky (2008) argues that heritage speakers have ‘serious problems with lexical access
and retrieval’ and she believes that this inability slows down their speech production
significantly. Polinsky argues that as this problem arises, heritage speakers, especially
adults, use different coping strategies such as ‘code switching’. Modarresi (2001) and
Sedighi (2010) report the same situation for heritage speakers of Persian. In fact the
issue of code switching occurs regularly for native speakers of Persian who have lived
abroad for a certain period of time. Therefore, the occurrence of code switching for
heritage speakers who have a different dominant language is not unexpected. The issue
of code switching in Persian takes a different form, especially because of the existence of
a large body of compound verbs (see Chapters 2, 3, 7, 8, 9, 10, 17, and 19 for detailed
study of compound verbs). Some versatile verbs such as kardan (to do) and shodan (to
become) can take a large number of non-verbal elements to create new verbs. This is
shown in (6) and (7).
(6)
In (6) the verb kardan (to do) takes the English term ‘miss’ as its non-verbal element and
constitutes the equivalent of ‘to miss’ in English. (p. 376)
Page 18 of 31

(7)
In (7) the English word ‘OK’ replaces the adjective khub (good) and is used frequently
among heritage speakers as well as native speakers living abroad.
In terms of the domain of their lexicon, while heritage speakers have acquired extensive
vocabulary, the range is limited to social and interactional domains at home and in the
neighbourhood. Therefore, they usually have difficulty talking about abstract topics,
literary terms, technical topics, and political terms.
15.4.1.7 Continuous past tense

Moore and Sadegholvad (2013) report that heritage speakers often use the simple past
where the continuous past would be appropriate:
(8)
In Example (8), the simple past has been used instead of the continuous past, causing an
ill fom structure. They argue that these errors appear to represent transfer effects from
English. Similarly, second-language learners make the mistake of using simple past tense
instead of continuous past.
15.4.1.8 Present perfect tense

Sedighi (2010) reports that because of the phonological similarities of some tenses and
the fact that heritage speakers have not academically learned the grammar, they make no
distinctions between certain tenses such as present perfect and simple past:
(9)
(10)
Page 19 of 31

The verb in (9) consists of two separate vowels ‘e’ and ‘a’, which for ease of pronunciation
and in fast speech neutralizes into one, and only the second vowel will be heard. The
difference between the simple past tense and the spoken form of the present perfect
tense is that in the former stress falls on the final syllable of the past stem (ráft-am),
whereas in the latter, stress (p. 377) falls on the personal ending (raft-ám). Native
speakers, because of their familiarity with the formal form of the language in which both
of these vowels are distinctly pronounced, can easily distinguish between (9) and (10)
while heritage speakers make no distinction in comprehending or producing these two
tenses (simple past and present perfect). Second-language learners on the other hand,
make a conscious effort to use the present perfect as they have not been previously
exposed to the spoken form (which sounds similar to the simple past).
15.4.1.9 Future tense

Future tense in Persian is only used in the written form and is rarely used in the spoken
form (except certain sarcastic phrases such as khāhim did, ‘We shall see’). In spoken
form, the simple present tense is used to convey the future tense and adverbs of time can
be added to specify the exact time in the future. Since heritage speakers mainly lack
scholastic and literacy skills, they do not have a command of the future tense and always
resort to using the simple tense plus an adverb. Second-language learners on the other
hand are very alert about using the future tense. This can be due to the fact that in
English ‘will’ is always present in the future tense.
15.4.1.10 Relative clauses

Polinsky (2008) reports that heritage speakers have much shorter utterances than native
speakers and show a significantly lower number of embedded clauses. This is also true
for heritage Persian speakers. One point to consider is that the Persian language
produces much longer utterances than English (dominant language) which can also be a
leading factor in producing shorter utterances by heritage Persian speakers. Sedighi
(2010) reports that Persian heritage students tend to stay away from clauses with
presumptive pronouns and relative clauses, as in (11).
(11)
Instead they use a simplification strategy and simply say the following by pointing to the
man on TV.
(12)
Page 20 of 31

15.4.1.11 Passive construction

Fani (2012, 2013) reports that heritage learners do not use the passive voice in their
utterances, which is again due to the rare usage of the passive voice in the spoken form
of the language. For more information on passive construction, refer to Chapters 3 and 7.
(p. 378)
15.4.1.12 Prepositions
Moore and Sadegholvad (2013) report problems with the preposition usage of heritage
speakers (see Chapter 10 for more on prepositions). They argue that these errors do not
always correspond to English usage and may instead indicate a limited command of an
unfamiliar system of prepositions:
(13)
In (13) the preposition barāye (for) has been used instead of the preposition be (to). In
section 15.4.1.1, it was indicated that some prepositions are dropped in spoken form.
Since heritage speakers mainly use the spoken form, it is not surprising that they are not
in command of the preposition usage, especially in more complex structures used in the
written form. Problems with the usage of prepositions is equally observed with second-
language learners.
15.4.1.13 Broken plural formation

In contrast to the general rules of plural marking (hā for both animates/inanimates and ān
for animates), there are some broken plurals that come from Arabic and are mainly used
in the written form of the language which is learned at school (see Chapters 3 and 10 for
more on plurals). Cagri et al. (2007) report that heritage speakers have usually not
acquired the Arabic broken plurals. The broken plurals follow a format different from that
of the usual plural formations rules (e.g. madreseh → madāres, school/schools). Since
heritage learners generally lack scholastic training, broken plural formation is another
area where heritage speakers exhibit non-native-like speech. Second-language learners
with a background in Arabic perform better in production of Arabic broken plurals.
15.4.2 Internalized areas for heritage speakers
In contrast with the previous section, this section focuses on areas where heritage
Persian speakers’ abilities and performance resemble that of a native speaker, while
second-language learners usually tend to struggle in such areas. According to the current
study, these areas are: Rate of speech, Word order, Overt subject, Verb agreement,
Page 21 of 31

Psychological verbs, Generic nouns, Specific direct object marker, Ezafe construction,
Order of nouns and adjective, Imperatives and vowel harmony, and Language chunk and
formulaic speech.
15.4.2.1 Rate of speech

Polinsky and Kagan (2007) report that heritage speakers have a rate of speech up to 30
per cent slower than native speakers. On the other hand, Kagan and Friedman (2004)
show that (p. 379) some heritage speakers come very close to the baseline rate. This is
true for heritage Persian speakers. Calgri et al. (2007) also report that heritage Persian
speakers perform faster than second-language learners across the board.
15.4.2.2 Word order

The default word order in Persian is subject—direct object—indirect object—verb.
Scrambling and topicalization are very common, so the Persian word order is not as strict
as in some other languages such as English (see Chapters 3, 7, 8, and 10 for more on
word order). Research on other languages (Sanchez 1983; Silva-Corvalán 1994; Halmari
1997) shows that heritage speakers may make changes to the expected word order of a
sentence. Montrul (2009) examines the knowledge of clitic pronouns and word order in
second-language learners and Spanish heritage speakers. Her results show that, overall,
heritage speakers seem to possess more native-like knowledge of Spanish word order
than their second-language-learner counterparts.
However, heritage Persian speakers seem to have a good command of Persian word order.
Second-language learners, on the other hand, struggle with the Persian word order. They
tend to transfer the English rule and place the verb after the subject instead of the
sentence-final position. Even advanced level second-language learners sometimes
struggle with the word order.
15.4.2.3 Overt subject

Persian is a pro-drop language, so a subject does not need to be overtly present in the
sentence as the subject is reflected through the verb inflection. This concept is highly
internalized for most of the heritage learners even with low levels of speaking proficiency.
They tend to have more native-like utterances with no overt subject. This is in contrast
with second-language learners who tend to include an overt subject in their utterances
mainly due to the English transfer, where sentences need to have an overt subject. While
including an overt subject is by no means ungrammatical, it does set them apart from
native and heritage speakers.
15.4.2.4 Verb agreement

In Persian, verbs agree in person and number with the subject (see Chapter 3 for further
discussion on verb agreement). Heritage Persian speakers possess a very strong
command of verb agreement and most of the time produce the appropriate verb ending,
which appears to come to them naturally. While second-language learners are usually
Page 22 of 31

aware of the verb agreement rule, they may forget the appropriate verbal ending or self-
correct.
15.4.2.5 Psychological verbs

Persian contains a certain class of verbs called psychological verbs (Sedighi 2005, 2009,
2011), in which the verb does not appear to agree with the subject. Rather it agrees with
the ‘psychological state’ that is addressed in that sentence. Similar constructions exist in
Italian (p. 380) Belletti and Rizzi (1988). This is similar to the contrast between ‘I am
hungry’ and ‘Hunger is occurring to me’. Another example is shown in (14).
(14)
Here, the verb does not agree with the subject man (I) and appears in third-person
singular form. Putting the technicalities aside, Sedighi argues that the goal subject xosh-
am (the psychological state) is the element inducing agreement on the verb. By nature,
the psychological state is in third-person-singular form inducing third-person-singular/
default agreement on the verb. Therefore, the assumption that there exists no verbal
agreement in such constructions is only apparent. Furthermore, Sedighi argues that the
optional experiencer in sentence-initial position (I) is a topic/high applicative argument
(see Chapters 3 and 8 for more discussion on psychological constructions).
Psychological constructions are highly internalized for heritage Persian speakers who
tend to display a large array of them in their utterances. This is mainly due to the fact
that these constructions are frequently used at home (expressing feelings such as hunger
and thirst), so heritage speakers have a high level of familiarity with them. Second-
language learners, on the other hand, struggle greatly with the agreement rule of
psychological verbs, as this concept is too unfamiliar for them.
15.4.2.6 Generic nouns

Whether countable or uncountable, generic nouns in Persian are always in singular form.
This is in contrast to English in which countable generic nouns are always in plural form:
(15)
Sedighi (2012) argues that generic nouns are highly internalized for heritage speakers
and they seldom transfer the English rule. Second-language learners, on the other hand,
are constantly challenged by this rule even at higher proficiency levels. Thus they
produce ill-formed sentences such as in (16).
Page 23 of 31

(16)
15.4.2.7 Specific Direct Object Marker

In Persian, specific direct objects require rā (see also Chapters 3, 7, 9, and 10):
(17)
(p. 381)
The usage of rā is highly internalized for heritage speakers and they exhibit native-like
performance with respect to its usage. Second-language learners are constantly
challenged by this rule and fail to use rā where needed (e.g. *Mina did-am instead of Mina
rā/ro did-am).
15.4.2.8 Ezafe construction

The Ezafe construction is a vowel /-e/ that connects any two or more related words in
Persian (see Chapters 3, 6, 7, 8, 9, 10, and 19 for more discussion on the Ezafe
construction):
(18)
Heritage speakers have a strong command of the Ezafe construction even at lower
proficiency levels. Since the concept of Ezafe does not exist in English, it creates a major
challenge for second-language learners, who tend to exhibit difficulties with the
production of the Ezafe construction even at higher levels of proficiency.
15.4.2.9 Order of noun and adjective
Page 24 of 31

While Ezafe creates a challenge for second-language learners, its first indicated usage,
which is the order of a noun followed by an adjective, creates the highest level of
challenge for second-language learners. In Persian, adjectives follow the noun they
modify and are connected with Ezafe:
(19)
Heritage speakers follow this rule and seldom transfer the English order, which is the
opposite of Persian. Second-language learners, on the other hand, are challenged with
this order and hunted by the English rule of adjective + noun:
(20)
15.4.2.10 Imperatives and vowel harmony

Sedighi (2010) argues that heritage Persian speakers have a strong command of the
imperative from. This is mainly due to the fact that the imperatives are probably the first
and most frequent verb form that a child encounters from their parents. They are also
very (p. 382) comfortable with the vowel harmony involved in imperatives (see Chapters 5
and 13 for more on vowel harmony). Commands such as examples (21)–(23) are easily
produced by any heritage Persian speakers, even those with a very basic level of
proficiency.
(21)
(22)
(23)
Second-language learners acquire the rule for imperative formation but they constantly
struggle with the vowel harmony that is involved in the imperative form.
15.4.2.11 Language chunks and formulaic speech

Boon and Polinsky (2015) report that heritage speakers of most languages are quite
native-like in the usage of high frequency ‘fossilized forms’. It should be noted that the
term ‘fossilized’ here is used differently from when used to indicate acquisition of a
Page 25 of 31

wrong form/pattern. Perhaps it is more accurate to use the term ‘language chunks’
instead of ‘fossilized forms’. Examples of language chunks (or formulaic speech) are high
frequency prepositions of location (tu khnueh, at home), and high frequency expressions
such as (dastet dard nakoneh, may you not be tired/thanks). Of course, due to the
newness of the concepts, second-language learners are not as comfortable producing
language chunks. That is why recent methods of curriculum design have started focusing
on teaching language chunks at a variety of levels such as lexis, grammar, and pragmatics
(see Chapter 14 for more discussion on curriculum design).
15.4.3 Other aspects
After studying the linguistic characteristics of heritage Persian speakers and comparing
them with the native speakers and second-language learners, we can now focus on other
aspects that are equally important and worthy of attention.
15.4.3.1 Socio-cultural norms

Sedighi (2010) argues that the way heritage learners carry themselves, their discourse,
gestures, facial and linguistic expressions, and even the way they sit in class is more
affected by their dominant language and culture than their heritage culture. This issue
might create a gap between heritage speakers and other members of the Iranian
community (first generations) as well as those coming from Iran. It will also be a
differentiating factor when they visit Iran and are in contact with authentic interactions
in Iran. Sedighi (2010) states that sometimes, while heritage speakers are behaving
completely politely in their own mind, their mannerism might come across as rude or
ignorant in the eyes of native speakers who (p. 383) are unfamiliar with the dominant
culture of heritage speakers. They may not completely follow the concepts of tārof (the
social etiquette) as they are mainly immersed in the dominant language’s cultural norms.
Such cultural mismatches can create a negative barrier to heritage speakers’ assimilation
with their heritage community and culture.
15.4.3.2 Language attitude

As studies on heritage learners of other languages have indicated, children do tend to
prefer to converse in the dominant language as soon as they start pre-school,
kindergarten, or primary school, where they are in daily contact with peers speaking the
dominant language (Modarresi 2001). Peer pressure plays a crucial role in their usage of
dominant language, as they often feel that their heritage language is a barrier to their
‘fitting in’ with their peers and classmates. At an older age, the urge to learn and speak
their heritage language re-flourishes and that is when they feel discontent with the
Persian culture and their language abilities. They start to express a significant degree of
discontent with respect to their contact with the Persian language and mainly hold their
parents responsible for this issue.
15.4.3.3 Identity
Page 26 of 31

Several studies have addressed heritage speakers and the issue of identity. These studies
have mainly focused on the cultural and sociopsychological struggles of heritage
speakers. For instance, Hornberger and Shuhan (2008: 5) state the following: ‘defining
heritage language learners requires far more than simply assessing their linguistic
abilities and determining the relationship between their dominant language and home
language.’ Heath (1983) and Trueba and Zou (1994), among others have argued that the
heritage culture and ideologies of heritage speakers have to constantly compete with
their dominant culture and ideologies. Erickson and Schultz (1982) state that heritage
individuals must constantly choose, construct, and perform their social identities based
on the different group of people they associate with.
As presented in the overview of literature section, the sociopolitical factors that heritage
Persian speakers deal with have been previously studied by several scholars, including
Mahdi (1998), Modarresi (2001), Shivarini (2004), and Ramezanzadeh (2010). Due to
political issues between Iran and western countries, the issue of identity for heritage
Persian speakers differs from that for other languages. How people view themselves with
respect to others and the world around them is a very deep and multifaceted concept.
Also, the way the host country views people of a certain culture or nation will add to the
complexity of identity issues of heritage learners of a language. Various factors, including
the social class of the majority of immigrants from a certain country, as well as the
political, economic, and social factors play a role in the way an immigrant child is raised.
Sedighi (2010) argues that due to the current political situation between Iran and United
States, heritage speakers exhibit two distinct patterns. They are either extremely mindful
of showing that they have Iranian/Persian origin or they try not to show their ethnicity.
For instance, they tell people that their exotic look is due to Italian origin. Ramezanzadeh
(2010) accurately describes the situation of heritage Persian speakers as being
considered ‘othered’ politically, religiously, and ethnically. She accurately concludes that
students strategically align themselves with (p. 384) different aspects of their identity at
different times and spaces, depending on the audience and the effect they hope to
achieve. As previously mentioned, Hoffman’s (1988) research showed that Iranians were
viewed negatively by teachers, not because of a lack of academic ambition or poor
performance but due to the students’ lack of respect for the rules and regulations of the
school. They devised various modes of resistance to overcome what they perceived as the
school’s preoccupation with rules and regulations.
15.4.3.4 Motivation
Heritage learners of Persian express different motivations for learning Persian. The main
reason they report is literacy. It was noted earlier that, as a general rule, heritage
speakers tend to exhibit stronger oral skills (listening and speaking) than written skills
(reading and writing). Some also state that they wish to concentrate on better oral skills
hoping to communicate with family and extended family. Grandparents always stand out
among people they wish to connect to in Persian. Some also state that they want to learn
the language to learn more about their ethnic background, roots, and Persian culture. A
few reported other reasons such as credit for school, future plans to go to Iran, career
Page 27 of 31

reasons, and fun. As we saw in the overview of literature, Miremadi’s (2014) research
indicated the three main reasons for learning Persian as culture; geopolitical issues and
possible future relations between United States and Iran; and interest in knowing their
parents’ mother tongue.
15.4.3.5 Community and parental attitudes

Most Persian heritage learners complain about the small amount of language training
they received from their parents. Most students regret the fact that their parents did not
continuously speak to them in Persian and if they did, did not request an answer back in
Persian. Sedighi (2010) argues that with the hectic lifestyle of the twenty-first century,
and the unique cultural, social, political, economic, linguistic, and educational challenges
that immigrants face, it is unfair to put too much blame on the parents and the amount of
time they spent speaking in their heritage language to their children. Yet, this crucial task
must not be forgotten and needs to be taken seriously. On the other hand, experience
shows that the usage of the dominant language between parents and children is at times
inevitable. As Namei (2008) observes, there are times that it is more practical and
straightforward for parents to discuss certain topics in the dominant language.
Sedighi (2010) also reports that heritage speakers whose parents are both Iranian tend to
have stronger Persian language skills than those with only one Iranian parent. Some can
speak Persian fluently in various social situations and use the language regularly in their
daily lives. This is mainly due to the fact that immigrant couples often tend to converse in
their native tongue with each other and that creates more opportunities for children to
overhear and perhaps utilize the Persian language. Another interesting observation by
Sedighi (2010) is that typically, among students who have only one Persian parent, those
with a Persian mother have better Persian language skills than those with a Persian
father. This could be due to the fact that mothers usually tend to spend more time with
their children, especially during the young ages of plasticity when language acquisition is
taking place in a native-like manner. This concept goes hand in hand with Modarresi’s
(2001) observation (p. 385) on the role of Iranian women in the language maintenance of
their children that was already discussed in the overview of literature.
One of the issues that may have a negative effect on children’s Persian language learning
is the issue of divorce. Sedighi (2010) mentions reports from several community Persian
weekend schools that the language education of some children has stopped after the
divorce of the parents partly because of lack of time and mainly because only one parent
was Iranian. It is, however, true that in a few cases the non-Persian parent puts more time
and effort in providing the opportunity for the child to learn the language she or he has
not had the chance to learn in an effort to connect the child to his/her roots from the
other culture.
The attitude of the extended family, friends, relatives, and the Persian community towards
the Persian language skills of heritage speakers is crucial. Sedighi (2010) argues that it is
very common for a heritage speaker to exhibit a certain level of accent (lahjeh). Although
Page 28 of 31

this accent may sound cute to native speakers, acknowledging how heritage speakers
stand out from native speakers may act as a negative factor in their language-learning
process, especially through the sensitive period of teenage-hood.
15.5 Challenges and further research

Boon and Polinsky (2015), among others, report that one of the biggest challenges
encountered with heritage language learners is the initial assessment of their abilities, as
their strength in the oral form of the language may cause them to be placed in a higher
proficiency level class than is appropriate to their actual language level.
The second challenge is universities’ budget limitations, which usually mean that heritage
speakers are in the same class as second-language learners. This budget limitation
causes many challenges, as the backgrounds and needs of the two groups of students
differ in many ways. Sedighi (2010) reports that heritage learners usually overestimate
their linguistic knowledge, i.e. they think they know more than they do. In a mixed class
with both heritage and non-heritage students, heritage learners start with a higher
proficiency than second-language learners but end up with almost the same or even lower
grades. This is mainly due to the fact that the curriculum is designed for foreign-language
learners.
In those cases where there are specific classes for heritage speakers, the same challenge
remains, as students come to class with different backgrounds with respect to their
Persian language proficiency. This is due to the fact that some use Persian regularly at
home; some only speak it minimally with grandparents and relatives; some are only
exposed to Iranian television; some explore Persian websites on the internet; and so on.
As mentioned in the overview of literature, following Tomlinson (2000), Hamedani
proposes a differentiated framework of teaching for mixed classes. Tomlinson states: ‘In
differentiated classrooms, teachers provide specific ways for each individual to learn as
deeply as possible and as quickly as possible, without assuming one student’s roadmap
for learning is identical to anyone else’ (Tomlinson 2000: 2).
Another challenge is the lack of heritage language usage by students outside the
classroom (Mahdi 1998; Modarrsi 2001; Sedighi 2010; Megerdoomian 2012b). Thus
teachers need to create a need for students to use Persian outside of the classroom.
Sedighi (2010) (p. 386) argues that community-based activities, such as assigning
students to spend time at nursing homes where Persian people live or other similar
activities, are helpful. Teachers need to create an environment in which the cultural
needs, attitudes, behaviours, and identity issues of heritage Persian students can be
addressed and fostered both inside and outside the classroom setting.
Page 29 of 31

A challenge which has already been discussed in section 15.3.4 is the lack of proper
background and training by the heritage language instructors, some of whom have only
been hired by virtue of being native speakers. The lack of suitable textbooks and
instructional material is yet another challenge. Fani (2012) argues that curricula designed
to teach Persian as a second language are not age-appropriate and do not correspond to
the level of linguistic exposure and proficiency of Persian heritage students. Class
material for heritage students should be specifically designed accrodring to the needs of
students. Romero (2000: 153) argues that ‘the students themselves should serve as the
point of departure for pedagogical structuring’. Kagan and Dillon (2001, 2008, 2009)
propose that macro approaches to heritage language teaching that take into account
heritage language learners’ global knowledge of the heritage language are particularly
effective; such macro approaches are often characterized as discourse-based, content-
based, genre-based, task-based, or project-based (see Chapter 16 for more discussion on
these topics). The proposed pedagogical paradigm by Megerdoomian (2010), where
instruction is through cutting-edge linguistic analysis rather than grammar rules, has
some merit, since students can tap into their intuition and discover linguistic
generalizations.
Heritage language teaching and especially Persian as a heritage language is a young and
emerging field that requires much more attention, research, and exploration. The current
needs and future areas for research can be summarized as follows.
1. More systematic research on linguistic and cultural profiles, attitudes, behaviours,

beliefs, and motivations of heritage speakers of Persian;
2. Study focused on developing pedagogical strategies for heritage language
teaching and learning;
3. More level-appropriate and age-appropriate instructional material;
4. Appropriate placement tests and assessment tools;
5. Raising awareness in educational institutions on the importance of heritage
Persian instruction;
6. Raising awareness in families/communities to foster the heritage language;
7. Raising awareness at the government level to invest and promote heritage
languages as a great national resource.
15.6 Summary
This chapter examined the concept of Persian as a heritage language from a wide variety
of perspectives. As an original contribution, this chapter provided a comprehensive
overview of the literature on Persian as a heritage language that unifies the existing
research in a cohesive way and can be used as a resource for future scholars. After
presenting the existing (p. 387) literature, the article studied various characteristics of
heritage Persian speakers in terms of their linguistic and metalinguistic abilities by
Page 30 of 31

examining their language style, phonetics, phonology, orthography, lexicon, syntax,

sociocultural norms, language attitude, identity, motivation, and community and parental
attitudes, and compared their performance with that of a native speaker and a second-
language learner. The chapter also discussed the existing pedagogical and budgetary
challenges and provided recommendations for future research and policy making for
Persian as a heritage language.
Notes:
(1) Note that some linguists tend to use the terms ‘Formal’ and ‘Informal’ for what here
are called ‘Written’ and ‘Spoken’ forms. The former terminology fails to capture the true
nature of this dichotomy as both written and spoken language can have the formal/
honorific and informal/non-honorific way of addressing others.
Anousha Sedighi
Page 31 of 31

Teaching Persian to Speakers of Other Languages

Pouneh Shabani-Jadidi and Anousha Sedighi

Subject: Linguistics, Translation and Interpretation, Languages by Region
Teaching and learning Persian as an additional language is facing unprecedented demand

inside and outside the Persian-speaking counties. This chapter discusses teaching Persian
to speakers of other languages from a variety of perspectives. It provides a short account
on the history of teaching Persian in non-Persian-speaking countries and discusses the
current status of teaching Persian to speakers of other languages both in the east and the
west. The chapter also discusses second-language acquisition studies on Persian as well
as the recent trends on pedagogy and assessment. It also investigates the development of
the Persian instructional material and discusses the available Persian textbooks. Lastly,
the current issues and challenges of teaching Persian to speakers of other languages is
discussed and explored.
Keywords: Persian, teaching, pedagogy, Persian as a second/foreign language, second-language acquisition,

instruction, textbook, curriculum, assessment, additional language
16.1 Introduction
THIS chapter discusses teaching Persian to speakers of other languages from a variety of
perspectives. The format of this chapter is as follows. Section 16.2 provides a short
account on the history of teaching Persian as a second language.1 Section 16.3 discusses
the current status of teaching Persian as a second language both in the east and the west.
Section 16.4 discusses second-language acquisition studies on Persian and Section 16.5
focuses on recent trends on pedagogy and assessment. Section 16.6 investigates the
development of the instructional material in general and the available Persian language
textbooks. Section 16.7 provides a discussion on current issues and challenges of
Page 1 of 26

teaching Persian to speakers of other languages, and section 16.8 summarizes the
chapter.
16.2 History of teaching Persian to speakers of

other languages
The informal instruction of the Persian language perhaps dates back as long as this
language existed and this topic can be a theme for an entire book by itself. This section
provides a (p. 389) diachronic survey of history of Persian language instruction beginning
with the Arab conquest of Iran.2 There have been many attempts to document the
grammar of Persian language for various geopolitical and religious reasons.3 The first
Persian language resource for foreigners was in the form of dictionaries. Among the
earliest dictionaries, one can mention Luqat Al-Fārs by Asadi Tusi, written around 1050
AD. One of the earlier grammar books for foreigners is Mantiq al-Xurs fi Lisān al-Furs
(The Eloquence of the Mute with Regard to the Persian Language) by Abu Hayān Nahvi
(1256–1344 AD). Homā’i (1959) reports at least one other Arabic resource, which has the
description of four languages: Arabic, Mongolian Turkish, and Persian and was written
during the Ilkhanid era at the turn of thirteenth century AD by Ibn al-Muhanā.
Fast forwarding to studies done by Europeans, who were mainly interested in Persian due
to ‘missionary and economic, and later for literary and linguistic reasons’ (Windfuhr
1979: 12), one can mention the grammar book by Raymundus (1601), which was
rewritten by Ilaminio Clementino Amerinio in 1614. Another grammar book by de Dieu
(1639) was published in Leiden. Windfuhr (1979) argues that de Dieu offers one of the
first systematic works, which describes items such as irregular verbs and offers a list of
verbs that are frequently used in compounds. Skipping several other grammar books
written in Rome and Vienna, the first real language textbook written for missionaries,
language teachers, and merchants was written by Joseph Labrosse in 1684 in Amsterdam.
Labrosse wrote his revolutionary book, which is based on the living language, after
having lived in Esfahan for fourteen years. In England, John Greaves, who was a
professor of Astronomy and a Persianist, published the first grammar of Persian in 1649.
Once the West became interested in the Indian riches, they paid less attention to the
study of the Persian language. The positive side of that was that the Indian scholars
focused their attention on the systematic study of Persian language and many inspiring
and household-name manuscripts were produced during this era. Famous works such as
Tabrizi’s (1651) Borhān-e Qāte’ and Tattavi’s (1654) Farhang-e Rashidi were written
during this time. Borhān-e Qāte’ was written by an Iranian named Khalaf Tabrizi who had
migrated to India. Farhang-e Rashidi, yet another widely popular dictionary, was
published by the Asiatic Society of Calcutta.
Page 2 of 26

After the sharp rise of Persian studies in India, the English were inspired to write Persian
textbooks again. Another work that can be considered a ‘textbook’ was written by Jones
(1771) in London. Similar to modern textbooks that define standards for skills that can be
achieved by the time of completion of the book, Jones asserts that with the aid of an
instructor, the learner should be able to converse and to translate a letter in less than a
year. Windfuhr (1979: 15) highlights several interesting discussions from Jones and the
fact that he recognizes
the generic function of the simple Nouns, vs. the particular or limited Nouns
marked by i and rā, the oblique function of the personal suffixes for the genitive,
dative, and accusative, and the differences between the empathic and reflexive
xod.
One cannot help being impressed by such findings that are still under
(p. 390)
investigation, and it is not surprising that this book was in high demand and republished
many times due to its practical nature. In 1826, an Iranian teacher named Mirza
Mohammad Ibrahim went to England and taught Persian there for eighteen years. The
next substantial Persian textbook was written by Ibrahim in 1841. Although the book is
entitled Grammar of the Persian Language, it in fact offers a very useful practical guide
with a series of dialogues that utilize the specific grammatical points being discussed.
The book was received with success and was translated into German and was re-arranged
by Fleischer (1847, 1875).
One of the oldest and widely cited Persian textbooks, Tazkarat Qāvāed Farsi, was written
around this era in Turkey by Ismail Samadi (1838). Windfuhr reports that the early works
of Russians recording Persian language dates back to the fifteenth century. However, it
was during the eighteenth century that Russians began to systematically work on Persian.
Leaving aside works such as glossaries and pure grammar books, the first Russian
textbook of Persian was written by Berezin (1853) with the help of a Persian assistant
Mirzā Kāzem Beyk. Subsequently Mirza-Dzafar, a Persian teacher at the Lazarveskij
Institute of Oriental Languages, wrote two Persian textbooks in Russian. The first was a
conversational textbook (1833) and the second was a grammar book (1834). This was
followed by yet another great piece of work written by Salemann and Zhukoskij, who had
spent considerable amount of time in Iran doing fieldwork. The importance of their work
is that in addition to classical Persian, it includes contemporary pronunciations and
idiomatic expressions used in everyday life. This masterpiece was written in Persian
script and included Russian transcription. The renowned scholar, Phillott (1919) has
published a substantial Persian grammar guide, which discusses Eastern dialects of
Persian as well as colloquial Persian. Another textbook that focused on the spoken form of
Persian and written around this time in Calcutta is by Abdulhaq Nasrābādi (Tabsarat-ol
Atfāl, or ‘ Teaching Spoken Persian’).
In North America, around 1950, there was a wave of interest in writing Persian language
textbooks. Dresden (1958) wrote a very interesting Persian reader (with the assistance of
Musavi from the University of Pennsylvania) for the United Sates Air Force Institute of
Page 3 of 26

Technology. The book was a selection of written materials in modern Persian with English
translation, typewritten Persian, handwritten Persian, and transcriptions. Among other
books, one can mention the books by Hodge (1960) and Spoken Persian by Obolensky et
al. (1963). The material was taught in the ‘build-up’ fashion, which lists the words first,
followed by complete sentences as part of a dialogue, and various drills, as well as
laboratory use for listening to audio files. Both books focused on the contemporary usage
of the language and colloquial Persian. Elementary Lessons in Persian by the renowned
scholar of Persian linguistics and pedagogy, Jazayery, was published in 1961. Although
mentioning the entire list of Persian material produced around the world is beyond the
scope of this article, the discussion of Persian textbooks will further continue in section
16.6.3.
While discussing Persian instruction in North America, it is worth noting that the Peace
Corps has had an instrumental role in producing Persian language experts prior to the
Islamic revolution. According to Marashi, most Peace Corps volunteers who spent time in
Iran returned to the US and completed graduate work before going on to assume high-
ranking positions in the US government and/or at prestigious universities.4 Marashi
argues (p. 391) that the Peace Corps programme opened the door to new methodologies,
as it designed a non-traditional curriculum that integrated basic language skills with
communication strategies and cultural understanding.
16.3 Current status of teaching Persian to

speakers of other languages
Teaching Persian to speakers of other languages is a growing field in Iran and other
Persian-speaking countries. In Iran, many universities such as the University of Tehran,
Shahid Beheshti University, Ferdowsi University, University of Shiraz, etc. offer
undergraduate and graduate degrees in this field, and thousands of international
students go to Iran annually to learn Persian. Additionally, prestigious Persian language
teaching organizations, such as Dehkhoda Institute and Sa’adi Foundation, offer various
kinds of Persian language classes to international students.
This section, however, focuses on the current status of teaching Persian in non-Persian-
speaking countries and sheds light on its breadth and popularity around the world.
Persian language is taught in various non-Persian-speaking countries at the university
level (both undergraduate and graduate level). While providing a comprehensive list of
the countries that offer Persian programmes is beyond the scope of this article, some
examples from the East and the West are provided below.
16.3.1 The East
Page 4 of 26

Persian language teaching at the university level has a century-old history in India,
starting when Persian was still the official language of this country in the nineteenth
century, when Persian was used to record historical accounts of India. Today, Persian
language programmes at the university level are growing rapidly both quantitatively and
qualitatively in India. For example, the University of Mumbai has recently included
modern Persian literature in their curriculum. In order to bridge the core subjects with
the Persian language, the university encourages students to write two papers in Persian.
It is worth mentioning that most of the Persian collections in the British Library are of
Indian origin (Ursual Sims-Williams 1981) and the same goes for the Persian manuscripts
in France’s Bibliothèque Nationale (Richard 1986).
In China, Persian has been taught in Muslim schools since the 1920s. In 1954 the
government of the People’s Republic of China removed Persian from the curriculum of
Tajik schools in Xinjiang (Sinkiang) Uighur Autonomous Republic (Sotudeh 1988).
However, Persian was reintroduced in 1957 into the modern Chinese educational system.
Currently several universities, such as Beijing University and Luoyang University of
Foreign Languages offer Persian programmes. Their Persian curriculum includes
language, translation of literary, historical, geographical, and media texts. Since 1986,
the Beijing centre has been offering a Persian programme at MA level.
(p. 392) As for Japan, Nakanishi (1987: 131) argues:
that one of the most striking characteristics of Iranian studies in Japan is

government initiative. The government has long taken the initiative in establishing
institutions and programs for such study. Particularly in the 1930s, the motivation
for studying Iran was partly political and strategic. Since the late 1960s, however,
there has been a more economic orientation.
Persian studies have been rapidly growing in Japan within the last half-century. Nakanishi
(1987) argues that this is because Japanese scholars have begun to overcome their
dependence on Western interpretations and to explore their own approaches. Tokyo
University and Osaka University established Persian programmes as early as the 1910s
and later Kyoto University started studies in Iranian history. The Japan Association for
Oriental Studies, established in 1965, is yet another centre to have Persian studies.
Due to its close proximity to Iran, we discuss Russia under this section. Until the 1990s,
only three universities in Russia had a Persian language programme. Since the fall of the
Soviet Union, at least ten Russian Universities offer a Persian programme and the
number of interested students is rising even more. Moscow State University has been
holding a Persian language Olympiad for more than a decade. Other universities that
offer Persian include Saint Petersburg State University, University of Bashkortostan, and
Saratov State University.
Among other countries that offer Persian programmes at the university level are Korea,
Armenia, Indonesia, Bangladesh, Turkey, Pakistan, and Georgia.
Page 5 of 26

16.3.2 The West
16.3.2.1 Europe
Persian language began to be studied in Europe only in the seventeenth century, which
coincided with the publication of three major Persian grammar books, namely de Dieu
(published in Leiden, 1639), Greaves (published in London, 1649), and Ignatius of Jesus
(published in Rome, 1661). Today, Persian is taught at many universities including the
University of Oxford, Cambridge University, University of St Andrews, University of
Edinburg, and SOAS in London. Persian used to be taught at Manchester as well until
very recently, but due to budget cuts, their Persian language programme was cancelled.
At the Universities of Oxford and Cambridge, students of Persian language spend one
year out of the four years of their undergraduate studies in Iran, studying at centres like
Dehkhoda language school, which is affiliated with the University of Tehran.
Hourcade (1987: 2) states:
in order to mark the identity of the Iranian civilization in contrast with the Semitic
or Turkish ones, emphasis has often, in France, been placed upon the Aryan
character of the Iranian peoples. It was not by chance that Montesquieu chose a
Persian instead of a Turk or an Indian to symbolize the independent foreigner.
In France, Persian studies started with the foundation of the Écoles des Jeunes de
Langues, which was originally established in Turkey. The teaching of Persian was
introduced after merger with the Collège Louis-le-Grand in Paris in 1763. During the late
eighteenth century, Persian started to be taught at the École Spéciale des Langues
Orientales. In the (p. 393) twentieth century, several other institutions began to teach the
Persian language, such as CNRS and Collège de France, where extensive studies have
been done on different aspects of Persian studies. In 1995, all the higher education
institutions teaching Persian and Iranian studies in Paris (i.e. CNRS, Sorbonne Nouvelle,
Institute National des Langues et Civilisations Orientales, EPHE) merged into a single
research team called ‘Monde Iranien’.
De Bruijn (1987: 173) states:
The most noticeable aspect of the Dutch contribution to Persian studies is the
early date of its beginning. Preceding most other nations of Europe, Dutch
scholars of the seventeenth century introduced the Persian language as an
academic subject and produced the tools necessary for studying it. Their
intellectual activity was contemporaneous with the flourishing of trade with Iran
and other countries where Persian was currently used.
Leiden University has the biggest Persian studies programme in the Netherlands with a
long history of focusing on this subject. In addition, Leiden University Press, one of the
leading European publishing houses, has a vibrant series on Iranian Studies.
Page 6 of 26

In Germany, the discipline of Iranistik (Iranistic) is a long-time topic of study, which is the
exclusive domain of linguistics and philology. Fragner and Matthee (1987: 78) argue that
German-language scholarship on Iran often seems fragmented and lacking in autonomous
critical discourse, due to the fact that in the beginning of the nineteenth century, Oriental
scholarship was considered frivolous and lacking substance. Thus, scholars who engaged
in Iranian studies mainly operated on the margin of both academic and political and
social life. The strength of Iranian studies, under these conditions, depended entirely on
the individual scholars such as Ehlers, Roemer, and Spuler, who were able to establish a
‘school’ of their own. Currently, Persian studies is present in many German universities,
such as the free University of Berlin, Georg-August-Universität Göttingen, University of
Hamburg, University of Bamberg, University of Marburg, etc.
Persian studies in Italy goes back to the second half of the sixteenth century, when there
developed a scholarly approach based on a recognition of Persian linguistic and literary
evidence. Raimondi (1536–1614), the great Italian Orientalist, defined the Persian
language as ‘the most beautiful in the world, divinely endowed with the spirit of
expression of concepts in poetry’ (Piemontese 1987: 101). Lockhart et al. (1973) report
that several works, including the travel accounts of Barbaro and Contarini, available with
ample commentaries, inflamed the passion for Persian studies. Today, Persian studies is
present at many Italian universities, including Sapienza University, University of Bologna,
University of Naples, and University of Venice.
Among other European countries that offer Persian programmes are Spain, Romania,
Bosnia, Poland, and Hungary. In the next section, we will focus on the history and current
status of Persian language programmes in North America.
Page 7 of 26

16.3.2.2 North America

Persian language programmes offered at the university level are numerous in North
America, mainly in the US. In addition to universities and community colleges, other
organizations such as the Defense Language Institute has been involved in teaching
Persian for many decades and currently offers a full immersion programme. Among the
oldest Persian (p. 394) programmes in the US are those at Columbia University,
University of Michigan, University of Pennsylvania, University of Chicago, and University
of Austin in Texas. Persian has been known as a strategic language in the US and
currently more than twenty universities in the US offer various kinds of Persian
programmes. The University of Maryland currently offers the only flagship programme
within the entire US. Many universities offer intensive summer schools, some of which
are in the immersion format. Some universities offer a year abroad during the second or
third year, during which students majoring in Persian attend university courses in Persian
in a Persian-speaking country (usually Tajikistan). The American Association of Teachers
of Persian, established in the 1980s, promotes the study and teaching of the Persian
language and culture within the US. During the last decade, several major Persian
programmes have been established through the allocation of external funding and
endowments from major Iranian–American organizations, which has led to the growth of
some programmes. Due to the fact that these programmes are focused on different core
goals, currently Persian programmes do not follow fixed guidelines or standards. This
causes students attending Persian summer schools to have uneven background levels,
which makes instruction rather tricky. Another source of having uneven backgrounds
comes from students with Persian heritage (discussed in detail in Chapter 15).
The field of Iranian studies has rapidly flourished in Canada within the past half-century.
The two major Persian programmes are offered by McGill University and University of
Toronto. The McGill Persian language programme is housed at the Institute of Islamic
Studies, which was established in 1952. The University of Toronto established their
Iranian Studies centre during the 1970s and it currently houses the Toronto initiative for
Iranian Studies. Other Canadian universities that offer a Persian programme are York
University, Simon Fraser University, and University of British Columbia. Despite the
extensive and long history of Persian language programmes in Canada, they are not
usually supported by external funding and endowments from major organizations, like
those in United States. It is hoped that in the future, Iranian community organizations
could also contribute to the promotion of Persian language programmes in Canada.
In the next section, we will discuss the very few existing second-language studies
focusing on Persian.
16.4. Second-language acquisition studies on

Persian
Page 8 of 26

Despite the abundance of second-language acquisition studies on more commonly taught

languages such as English and French, studies on less-commonly taught languages, like
Persian, are limited. A simple search result reveals the poverty of such surveys, unlike a
great number of studies on Persian speakers learning English as a second language.
Most of the existing work focuses on theories as how to teach a particular structure in
Persian. As such, they are mainly testing the applicability of the existing theories in more
studied languages like English on the teaching of Persian. The remainder of this section
provides an overview of literature on this topic, dividing it into two subsections of studies
on native speakers of English and studies on native speakers of other languages. These
studies (p. 395) are very important in our understanding of how Persian is acquired by
speakers of different languages. They are also crucial for Persian language textbook
writers who can benefit from studies on second-language acquisition in different settings
such that they can tailor their textbooks according to the findings of such studies.
16.4.1 Acquisition of Persian by native speakers of English
Most of the existing research on second-language acquisition of Persian has been

conducted on L1 speakers of English. In a syntax-oriented research, Tarallo and Myhill
(1983) provide a study of English speakers’ acquisition of relative clauses in several
languages, including Chinese, Japanese, Persian, German, and Portuguese. They test
various structures to separate interlanguage features attributable to first-language
interference from those universal to second-language acquisition. They study certain
features of Persian, such as word order and the existence of relativizer ke, which may be
deleted under certain conditions. Persian indirect objects are marked with a preposition,
as are possessives, and extraction of nouns with a preposition requires leaving a
resumptive pronoun. They also discuss the relativization of direct objects, which may
optionally leave a resumptive pronoun, while relativization of subjects may not. They test
several structures for Persian, including leaving a resumptive pronoun (correct except in
subject position), stranding a preposition (always incorrect), and moving the preposition
in front of the relativizer (also always incorrect). First-language interference has been
attested in different languages and at different levels.
At the phonetic and phonological level, Ghadessy’s (1998) PhD dissertation is about
American learners of Persian as a second language and their problems in the acquisition
of the Persian sound system, particularly stops. It is argued that the identified
phonological problems can stem from first-language interference, certain aspects of the
target language, learner strategies, methodology, and in some cases, inadequate
instruction. However, these sources of errors are the contributing factors in any aspect of
second-language learning, and not only specific to the acquisition of phonemes.
In terms of studying language skills, two studies have been conducted. The first by Abasi
(2012) focuses on writing skills and the second one by Alizadeh et al. (2016) focuses on
reading skills. Abasi (2012) studies the summary writing of American leaners of Persian
Page 9 of 26

through a cultural rhetorical lens in line with research on cross-cultural differences in

writing. He explores learners’ perceptions of the rhetorical structure of two texts, an
original Persian one and the other a Persian translation of an English text, while
summarizing them. The participants were all advanced students of Persian Media class.
All participants found it easier to comprehend and summarize the second text, as it was
written in English first, and it had a structural organization that students were familiar
with. They found the original Persian text disorganized, indirect and vague; therefore, it
took them longer to read the text. When it came to the summarizing task, it took them
longer, they used more conjunctions and they made more mistakes. Abasi argues that the
reason for the difference in the performance of students in the two tasks was due to their
unfamiliarly with the cultural cues, and suggests that there should be more studies on the
cultural aspects of texts in order to shed light on the matter.
Alizadeh et al.’s (2016) study focuses on the reading skill of the L2 speaker of Persian.
They investigate the dominant syntactic structures of Persian journalistic texts and their
impact (p. 396) on the reading skill of L2 learners of Persian. Their results indicated that
second-language learners of Persian have difficulty in understanding sentences extracted
from Persian newspapers due to various reasons, including the high frequency of complex
sentences, the scrambling nature of the language (where the word order is frequently
changed), the high frequency of passive constructions, and ellipsis (i.e. omission of
different parts of sentences).
In terms of language-learning strategies, one can mention Mokhtari’s (2007) dissertation,

where she studies the language-learning strategies of 166 students of Persian, and
provides empirical description of the language learning beliefs and strategies of students.
The data were collected from three American universities. Using three sets of
questionnaires (the Individual Background Questionnaire, the Beliefs about Language
Learning Inventory, and the Strategy Inventory for Language Learning), Mokhtari’s
results show that the participants reported holding strong beliefs in ‘motivation and
expectation’ and ‘foreign language aptitude’. The Descriptive analyses showed that
participants reported using compensation and social strategies most, followed by
cognitive, metacognitive, memory, and affective strategies. It is interesting to know that
these strategies can be taught explicitly to second-language learners to make them more
equipped and more autonomous learners (Flavel 1979; Livingston 1996; Anderson 2002;
to name a few).
A sociolinguistic-oriented study by Vakilifard and Khaleqizadeh (2012) investigates the

effect of gender on the use of learning strategies in second-language learners of Persian.
The results show that although there is not a significant difference in the number of
learning strategies used by male and female subjects, the two groups prioritize their
strategies differently; however, this difference in the selection of strategies is not
statistically significant, either. They point out that the order of these strategies in women
is:
Page 10 of 26

1) metacognitive;
2) social;
3) cognitive;
4) compensation;
5) affective;
6) memory
while for men, the order is:
1) metacognitive;
2) social;
3) compensation;
4) cognitive;
5) memory;
6) affective
(Vakilifard and Khaleqizadeh 2012: 52)
Of course, this is only a selected set of strategies and there are many more which could
have been tested by them. In general, there are three main categories of learning
strategies: 1) metacognitive; 2) cognitive; and 3) social affective; and in turn, each of
these categories include several subcategories (for a complete list and detailed
information about these subcategories, see O’Malley et al. 1985; O’Malley and Chamot
1990).
In a lexico-semantic-oriented study, Raqibdoust and Jamshidi (2012) investigate

(p. 397)
implicit versus explicit teaching of semantic fields of verbs in advanced second-language

learners of Persian, showing that explicit teaching of semantic fields of verbs helps
language learners retain the meanings of verbs. What Raqibdoust and Jamshidi did in
their explicit instruction was make the second-language learners in the experimental
group aware of the semantic fields of the verb under investigation by consciousness-
raising through the presentation of the materials, for example using colour, italics,
underlining, and other techniques to make the relevant information stand out. The
definitions of the verbs were given in a bubble in the margin. They concluded that these
definitions were retained due to the overt presentation of the materials. The effectiveness
of enhancing the knowledge of the second-language learners in respect to semantic fields
of the verb, and in general semantic fields of the lexicon is supported by the results of
experimental studies on the processing of words in first and second language (for a
detailed study of this topic, see Shabani-Jadidi 2014, 2016). The processing of Persian
words in first and second language is discussed and elaborated in detail in Chapter 17.
16.4.2 Acquisition of Persian by native speakers of other languages
Page 11 of 26

While most of the existing works on second-language acquisition of Persian is focused on

the acquisition of Persian by native speakers of English, few works have asserted their
attention to examining the acquisition of Persian by native speakers of other languages
including Arabic, Korean, and Mandarin.
Aghagolzadeh-Silakhori (2012) investigates the acquisition of Persian subjunctive mood

by Arabic speaking participants with intermediate proficiency in Persian. The subjunctive
mood was taught to the experimental group using mental spaces theory and to the
control group without using this theory. The results showed that those who learned the
subjunctive mood by using the concept of mental spaces of the space-builders, and the
probability of occurrence (as the central meaning of different moods), had a better
performance in realizing and applying appropriate mood in various sentences. The
control group was better able to produce sentences, the verbs of which expressed an
impossible action. The author argues that by using this theory, the teacher can show the
regularity of mood choice in various compound sentences in Persian language, and
present a documentary justification for each choice; thus, learners can realize mood in
Persian language with less probability of error.
Hwang and Kwak (2016) examine Persian non-high vowel perception by Korean L2
learners of Persian. They compare vowel perception in the two languages and show how
in Persian language textbooks in Korea, they are taught in the wrong way. They call for a
more accurate teaching of Persian vowels, especially those that do not correspond with
their Korean counterparts. They also discuss the fact that some Persian consonants do not
exist in Korean, and this causes problems in transliterations. The authors point out very
important topics in L2 teaching of Persian, such as the production and perception of
sounds, as well as the transliteration problems when there is no corresponding sound
found in the L1.
Another study that investigates the sound system acquisition, in particular, the stress
pattern of sentences, is that of Sadeqi and Mansoory-Harehdasht (2016). They examine
the phonetic characteristics of Persian sentence stress produced by Mandarin Chinese
speakers. (p. 398) They compared the phonetic correlates of fundamental frequency,
vowel duration, and vowel intensity between Persian native speakers and Mandarin
Chinese learners of Persian. Their results demonstrated stressed words in Persian
sentences were produced with significantly higher fundamental frequency and shorter
vowel duration by Mandarin Chinese learners of Persian than native speakers of Persian.
They attributed this difference to the prosodic interference of Chinese in the production
of sentence stress in Chinese learners of Persian. First-language interference is not
limited to Persian L2 acquisition and has been attested in second-language studies in
different languages (e.g. Schwartz and Sprouse 1994; Whong-Barr 2006).
16.5 Pedagogy and assessment

Page 12 of 26

This section discusses pedagogy and assessment with a focus on Persian language (see
Chapter 15 for more on pedagogy). The conceptualization and organization of language
teaching is a three-tier paradigm, encompassing techniques, method, and approach.
Techniques carry out a method that is consistent with an approach. In such a paradigm,
an approach is a set of assumptions that deal with language teaching and learning; a
method is a sequential manner of presenting the language material consistent with the
selected approach, and techniques are the implementation of the selected method and
approach in the classroom (Anthony 1963). Language teaching methodologies have
undergone quite a number of changes based on the prevailing linguistic approaches of
the time, which were in turn influenced by their contemporary prominent theories in
psychology. Providing a detailed discussion on the classical approaches and
methodologies of second-language teaching is beyond the scope of this article.5 However,
the next section briefly discusses the recent trends on pedagogy as some of these trends
are currently employed for the instruction of Persian to speakers of other languages.
16.5.1 Modern trends on pedagogy and assessment
Focus of language teaching turned from method of teaching to outcome of learning after
the emergence of the communicative approach, which has been the origin of modern
trends to language teaching, such as Content-Based Language Instruction, Task-Based
Language Teaching, and Competency-Based Instruction. By the end of the 1990s,
language educators and applied linguists came to the conclusion that no method is the
best method, hence the emergence of the ‘post-methods era’. However, a familiarity with
the traditional, classical, and branded approaches and methods would equip the teachers
with different tools, activities, and tasks, as well as a broader understanding of how they
would work and how they would affect language learning. As Richard and Rodgers put it,
‘we can therefore expect the field of second and foreign language teaching in the twenty-
first century to be no less a ferment of theories, ideas and practices than it has in the
past’ (2011: 254).
The American Association of Teaching Foreign Languages (ACTFL) has been the
(p. 399)
leading organization pioneering in modern trend for teaching and assessment of foreign
languages in North America and (recently) worldwide. The ACTFL proficiency guidelines
were first published in 1986 and later revised twice into the current 2012 version, which
describes the ACTFL guideline as:
descriptions of what individuals can do with the language in terms of speaking,

writing, listening, and reading in real-world situations in a spontaneous and non-
rehearsed context. For each skill, these guidelines identify five major levels of
proficiency: Distinguished, Superior, Advanced, Intermediate, and Novice. The
major levels Advanced, Intermediate, and Novice are subdivided into High, Mid,
and Low sublevels.
Page 13 of 26

(ACTFL Proficiency Guidelines 2012: 3)
The ACTFL guidelines are not based on any particular theory, method, or educational
curriculum, and they do not prescribe how an individual should learn a language. Rather,
they form an instrument for the evaluation of functional language ability. As such, the
main application of the ACTFL Proficiency Guidelines is for the evaluation of functional
language ability. However, the guidelines do have instructional implications. Throughout
the years, several researchers have worked on the controversial nature of ACTFL
Guidelines. Among others, Bachman and Savignon (1986), Bachman (1988), Fulcher
(1996), and Lantolf and Frawley (1985, 1988, 1992) provide detailed critiques of certain
areas of guidelines. For instance, Lantolf and Frawley (1992) indicate that they have
‘some concern about the experimental design and statistics’ of the studies that originate
the ACTFL Guidelines. Moreover, Fulcher (1996) reaffirms these concerns.
In collaboration with several other organizations, ACTFL undertook the task of defining
content standards, i.e. what students should know and be able to do in foreign language
education. Standards for Foreign Language Learning: Preparing for the 21st Century was
first published in 1996 and its recent third edition includes some less-commonly taught
languages such as Arabic. The tenets of this standards-based instruction are called 5Cs
and they include: Communication, Culture, Connections, Comparisons, and Communities.
Communication emphasizes the communicative aspect of the language and the
importance of using the language in real-life situations. Culture is considered crucial in
learning world languages, as each of these languages enjoys a particular culture, and
awareness of cultural differences between the first and the second language will help
learners integrate in the second-language culture more effectively. Connections focus on
bridging the gap between language learning and subject matters so that language
learning becomes more meaningful, less explicit, and more automatic. Comparisons invite
students to do linguistic and cultural comparisons between their first language and
culture and those of the second language. Communities emphasizes the application of
what is learned in class to the outside world by going on field trips, ordering in
restaurants in the second language, and other cultural activities.
In line with the above discussions, Ziahosseini (2006: 14) studies Persian language
pedagogy and skilfully captures the relation between the teaching methods and what
really happens in the language classroom by stating that:
Language teachers never fully and entirely implement what they have learned
about teaching methods in their classrooms. Rather, based on the circumstances
and situation of the class that they are dealing with and their own taste, they may
choose solutions outside a specific method.
(p. 400) 16.5.2 Persian proficiency assessment
Page 14 of 26

While a thorough discussion of assessment is clearly beyond the scope of this article,
below, several studies related to Persian will be discussed and explored. ACTFL offers
several tools for assessing (interpersonal) speaking, (interpretive) reading and listening
and (presentational) writing skills. However, the Oral Proficiency Interview (OPI) is the
most popular assessment tool offered out by the ACTFL. OPI is a reliable tool for
assessing how well a person speaks a language. The interview is interactive and
continuously adapts to the interests and abilities of the speaker. The speaker’s
performance is compared to the criteria outlined in the ACTFL Proficiency Guidelines
2012 for Speaking or the Inter-Agency Language Roundtable Language Skill Level
Descriptors for Speaking.
Marashi (1994) conducts a study on students’ level of oral proficiency in Persian and
compares it within the intensive and non-intensive programmes. He conducts his survey
on five American universities (University of Utah, University of Texas in Austin, University
of California in Los Angeles, University of Michigan, and Princeton University) and poses
three initial questions: What level of proficiency do students attain in Persian at the end
of the first year of study? Do students achieve a higher proficiency level through regular
academic year classes or the intensive summer programmes? How do we account for the
proficiency-level differences between the two programmes? After performing his survey
and conducting OPI tests on 130 students of these five universities within a five-year
span, Marashi comes to the following findings: The largest number of first-year students
in both the regular academic-year programme and intensive summer programmes
reached a level from Novice-High to Intermediate-Mid levels with the average rating of
Intermediate-Mid. His results also indicated that the level of proficiency attained in the
summer intensive course was not inferior, but rather superior to that of students
attending an academic-year programme. Marashi lists several factors behind this
observation, including the level of seriousness and motivation of the students attending
intensive programmes as well as the interruption of the academic year programme by
holidays, breaks, exam periods, etc.
In a more recent research, Sahraei and Jalili (2012) study the principles of creating
Persian language proficiency test. They present the preliminaries of the development of
what they call ‘Persian Language Proficiency Test’ (PLPT) in order to assess the Persian
proficiency of students who come to Iran to continue their academic education. Their test
avoids concentrating on language competence and instead focuses on the four language
skills to find out the level of their proficiency. They argue that the novelty of their work is
that it avoids direct and explicit assessment of grammar, vocabulary, and pronunciation.
Instead, it assumes that these components are required to succeed in those four main
language skills. The authors argue that their proficiency test is similar to the
International English Language Testing System (IELTS), which they consider as the most
prominent proficiency test worldwide.
Despite the existing proficiency tests developed for Persian, there is still no standard
proficiency tests for Persian, and the Persian language programmes all over the world use
Page 15 of 26

their own self-developed tests in order to evaluate the entry proficiency level of the
students and to place them in different levels, or to assess their exit level of proficiency.
In the next section, we will discuss language-teaching material, with a specific focus on
those written for Persian.
(p. 401) 16.6 Instructional materials

Like any other discipline, the writing of Persian language textbooks is a developing field,
which needs to be visited and revisited. This section provides a background on recent
trends in material development and discusses several existing Persian language
textbooks. One must keep in mind that there is no such thing as a bad or a good textbook.
The needs and goals of a class may require a certain material that may not necessarily be
aligned with the most recent trends in material development. As such, any textbook or
research in this field will be of benefit to Persian language educators as well as Persian
language learners depending on their needs and circumstances.
16.6.1 Theoretical issues
In order to write effective second-language material, one must remember that first-
language and second-language learning are different, as discussed earlier in this chapter.
Therefore, it is not possible to use first-language textbooks to teach second-language
students. Some of the differences in the way grammatical structures are acquired by first-
language and second-language learners are briefly discussed below. Hypotheses for
grammatical processing in first language versus second language have a major impact on
the decisions of second-language material developers.
Unlike first-language speakers, second-language learners have problems with the online
integration of different information sources, such as lexical, discourse-level, prosodic and
structural (Felser et al. 2003; Papadopoulou and Clahsen 2003; Akker and Cutler 2003;
among others). At the higher levels, second-language learners will be able to cope with
more layers of information at the same time. That is why introductory textbooks must
start with introducing phrases and short sentences, while longer texts should be
presented only after second-language learners have reached a certain level of
competence, when they are able to handle the integration of different information
sources. An ideal second-language textbook will present different linguistic elements in
different sections in each lesson; however, at the end of each lesson, these different
linguistic elements are integrated into one meaningful text so that students can assemble
the pieces of the puzzle that they had just learned. Otherwise, linguistic pieces will be
nothing but scattered information in their mental lexicon and therefore not readily
Page 16 of 26

available at the time of the processing of the linguistic input or the production of the
linguistic output.
Another issue is first-language transfer. First-language transfer has long been detected
and investigated. Properties of first language influence second-language processing
(Frenck-Mestre and Pynte 1997; Juffs 1998, 2005; among many others). Typically, first-
language transfer slows down the process of acquisition. For instance, word order is a
major challenge for English speakers leaning Persian. When making a Persian sentence,
English speakers tend to place the verb right after the subject, while in Persian, the verb
needs to be placed at the end. A good language textbook is aware of such common
mistakes due to first-language transfer and provides opportunities for these common
mistakes to be highlighted and practised in order to help the learner break the first-
language (p. 402) mould.6 First-language transfer does not always slow down second-
language acquisition. In a study done by Oller and Ziahosseini (1970), it is shown that
second-language learners of English whose native language employs a Roman alphabet
made more spelling errors compared to second-language learners whose native language
uses a non-Roman system. Thus they argued in favour of ‘interference’ of similar patterns
due to false generalizations.
Knowledge of the first language of the learners can sometimes be of great help to the
instructors, as they will be able to predict the types of errors that learners will make as
well as the sources of those errors. A contrastive analysis of the first language and
second language is extremely useful for a language textbook writer. By bringing fine
differences between first-language and second-language structures to the learners’
attention, we can make them aware of the potential mistakes and prevent them from
happening or by helping them overcome them more quickly and more efficiently. An
example between Persian and English will be the direct object/indirect object dichotomy.
There are certain verbs that require indirect object in Persian but direct object in English,
such as enjoy, respect, hate, etc. On the other hand, by reminding the second-language
learners of the similarities in first-language and second-language structures, we will help
them make a bridge between their old information and the new information, which will
lead them to establish connections in their brain and thus remember the new knowledge.
Another example is for French speakers learning Persian. Both French and Persian have
six person conjugations and the honorific (vous/tu) distinction. These contrastive analyses
are not limited to syntax. When writing a textbook, many factors including the sound
system, the morphological system, the semantic system, as well as the pragmatic system
must be taken into account.
16.6.2 Recent advances on material development
As a result of the recent trends in foreign language teaching that were discussed in
section 16.5.1, there has been a paradigm shift in foreign-language education. A direct
result of such a shift is the creation of the Standards-Based Curriculum Design. The
essential question is: What is the best way to package language learning? The
Page 17 of 26

requirements to consider for Standards-Based Language Instruction are provided below:

the focus should be on proficiency across the three communicative modes (discussed in
16.5.1); the content should cover the 5Cs; the curriculum should be based on thematic
units; use of authentic material and target language is highly encouraged; both the
communicative and linguistic aspects should both be covered; instructional strategies and
performance assessment must be implemented.
In the last twenty years, research on psychology, education, and second-language

acquisition has influenced second-language instruction. Doughty and Long (2003) put
together ten methodological principles7 commonly known as MP10, which are:
1) Use task (not text)

2) Promote learning by doing
(p. 403) 3) Elaborate input (do not simplify, do not rely solely on ‘authentic’)
4) Provide rich input

5) Create instances of chunk learning
6) Focus on Form
7) Provide negative feedback
8) Consider learner’s inner syllabus
9) Promote cooperation among learners
10) Individualize instruction.
They argue that for a successful language class, all ten guidelines should be adopted to
create an engaging and efficient educational setting.
Another fruit of this paradigm shift is the concept of Backward Curriculum Design, which
argues that a curriculum should begin with the end in mind. The process of Backward
Curriculum Design requires identifying the desired learning outcomes, determining
acceptable performance descriptors, and setting measurable goals and objectives. Once
the standards are determined, the objectives are defined, and the curriculum is designed,
the lessons are written. Each lesson includes some thematic units, including a variety of
topics, as well as subthematic units, which are different aspects and examples of the
thematic units. During each lesson, the focus is on what students will be able to do at the
end of the lesson. Therefore, class activities and tasks are geared towards this objective.
The assessment process is also created based on the desired goal and outcomes.
Task-based Language Instruction emphasizes tasks and activities in class that would help
make a bridge between class activities and real-life activities. Therefore, the tasks that
are done in groups have a pre-determined lesson plan, yet the linguistic content of the
task is determined by the students’ interaction in class. There are six stages in a typical
task-based lesson plan, namely:
1) a pre-task (introduced by the teacher);

2) a task (undertaken by the students);
3) planning (contemplated by the students);
Page 18 of 26

4) report (given by the students);

5) analysis (by the teacher);
6) practice (prompted by the teacher, carried out by the students).
The development of the Content-Based Language Instruction (CBI) has also gained
popularity. There are two interpretations of this concept. The broad interpretation defines
CBI as having its main focus on the content not the form. The other narrow interpretation
considers CBI as the instruction of a specific academic topic and using the language as a
tool to deliver that specific content. Regardless of the interpretation, CBI seems to be
receiving a lot of attention and demand among different universities and educators. A
good example of a North American university that offers a CBI model for Persian is the
Flagship programme at the University of Maryland. Abasi (2014) reports on the
implementation of the CBI at the Persian programme of the University of Maryland and
provides the description of their particular CBI model developed in response to the needs
of their programme, while taking into account the views of the students, language
instructors, and the content faculty.
Project-Based Language Learning (PBL) is another approach that has recently

(p. 404)
gained attention (Krajcik and Blumenfeld 2006). In this approach, instead of textbooks,
projects are used as vehicles to promote students’ motivation and understanding by
working for an extended period of time to investigate and respond to a complex question,
problem, or challenge. Shadiev (2007) argues that PBL has a positive effect on students’
content knowledge and the development of skills such as collaboration, critical thinking,
and problem solving. However, Brush and Saye (2008) argue that PBL is challenging for
teachers to implement, as they may struggle with planning and implementing PBL
effectively. Students also may struggle to set up the project, direct initial inquiry, organize
their time, and integrate technology into projects effectively.
16.6.3 Instructional materials for Persian
16.6.3.1 General overview

This section provides a general overview of the Persian textbooks written in the twentieth
century and beyond. While enlisting and examining all the existing textbooks are beyond
the scope of this article, the following general observations can provide future scholars
with some assistance. Earlier Persian language textbooks that were written in the 1900s
have some common features. For the most part, they start with grammar explanations
and move to examples and then to vocabulary and finally to exercises.8 These textbooks
are written in English and are grammar-based. The methodological approach that they
are based upon is the traditional grammar-translation. They contain transliteration and
specialized linguistic terminology. The exercises are mostly translation tasks. There is an
emphasis on accuracy, but none on fluency.
Page 19 of 26

With some exceptions, most of these textbooks ignore the listening and speaking skills.
The emphasis is mainly on linguistic competence and not on communicative proficiency.
Some textbooks written in the mid-century use the build-up technique, which is listing the
words first, followed by complete sentences as part of a dialogue followed by various
drills. They are written in English, use transliteration, and some even included
handwritten Persian in addition to typewritten Persian. The exercises are mostly drills
with some attention to listening and pronunciation. The emphasis is mainly on accuracy,
and less on fluency.
More recent Persian language textbooks have some other common features. Persian
language textbooks that are published in Iran mostly teach the alphabet in a way similar
to that found in first-language textbooks. They are written in three-to-four volumes from
introductory to advanced levels. They are mainly written by a team of authors and are
usually sponsored by major governmental organizations.9 The language of instruction is
Persian, and they do not contain transliteration. The most recent ones contain images and
pictures and have a separate exercise book and teacher’s book.10 The exercises are
controlled, but not limited to translation.
On the other hand, Persian language textbooks that are published outside Iran have some
other characteristics that are sometimes similar and sometimes different from those
(p. 405) published in Iran. For example, they are usually but not completely in Persian.
They do not usually use transliterations. They mainly contain audio files and images/
pictures. They mainly use authentic texts extracted from the literary texts.11 They focus
on proficiency as well as on the linguistic competence of the learners.12 More recent
Persian textbooks offer interactive companion websites that are designed to foster the
needs of students in the twenty-first century. They are thematically designed, provide a
clear set of communication goals in each lesson, and contain a ‘Scope and Sequence’
section, which is a regular part of the textbooks written for more commonly taught
languages and has been lacking in previous Persian textbooks.13 Content-based language
textbooks of Persian are mainly geared towards literature.14 Very recently, textbooks
focusing on other subject matters, such as media, have become available, the focus of
which is the presentation of authentic media texts and their corresponding exercises.15
Page 20 of 26

16.6.3.2 Evaluation of Persian textbooks

Ziahosseini (2006: 41) accurately argues that no pedagogical material is without a flaw.
He also states that there are no specific and set parameters to evaluate language
textbooks and thus the evaluations are based on the preferences of the evaluator.
Ziahosseini proposes two different kinds of evaluation: external evaluation and internal
evaluation. The ‘external evaluation’ helps us find the answer to the following questions:
what the purpose of the book is; what the target population is (children, university
students, etc.); what the language level of students is; which approach and method are
adopted by the author; what the audiovisual components of the book are; whether the
book seeks out visual aids for language learning; what level of emphasis is placed on each
of the four skills; whether the book has a teacher’s guide; whether an instructor whose
native language is different from that of the students can use this book; whether the book
includes the new vocabulary at the end of each lesson or at the end of the book so that
students do not require a dictionary; and whether the completion time of the book
matches the instruction time of the class. On the other hand, Ziahosseini argues that the
‘internal evaluation’ of a textbook helps us assess if the book achieves what it claims to
offer. In other words, the external evaluation is the quantitative and structural evaluation
of the textbook, while the internal evaluation is the content evaluation of the book.
Among earlier studies on the evaluation of Persian language textbooks are ketābshenāsi-
ye ketābhā-ye āmuzesh-e zabān-e farsi by Shemasi and Sharifzadeh (1995) and
ketābshenāsi-ye ketābhā-ye āmuzesh-e zabān-e farsi be gheyr-e farsi zabānān by
Zolfaghari (2004).16 The newly established Journal of Teaching Persian Language to Non-
Persian Speakers by Imam Khomeini International University has paved the way for the
publication of some research in this field. This journal has been in print since 2012 and
has already produced some interesting work. Some of the works published in that journal
are reported earlier in this chapter and below.
Davari-Ardekani and Aqa-Ebrahimi (2012) compare the authenticity of texts in

(p. 406)
three different Persian language textbooks published in Iran with non-educational texts,
and they conclude that two of the textbooks are closely tied with the non-educational
texts with respect to the authenticity of their texts, while the third one is not.
Rezai and Alipour (2012) study the reading passages of a recently published Persian
textbook based on Halliday’s Seven Functions of Language (1975). These functions are:
1) instrumental (to express needs);

2) regulatory (to express commands);
3) interactional (to make contact with others);
4) personal (to express feelings, ideas, etc.);
5) heuristic (to gain knowledge about the environment);
6) imaginative (to tell stories, jokes, etc.);
7) representational (to convey facts and information).
Page 21 of 26

Their study aims at investigating the presentation of functions in Persian language

textbooks. The specific textbook on which they did their study is the five-volume textbook:
Farsi Biamuzim. They focus on the first three volumes. They report that they have
observed function presentations in all the three volumes and that the number of functions
increases with the level of the textbook. They also report that from among the seven
functions of Halliday, the most commonly used ones in these volumes are the interactional
functions and the least commonly used ones are the imaginative functions. They then
conclude that, based on their findings, the textbook is successful in acting as a
communicative tool and in meeting learners’ needs in this regard. It would have been
interesting and relevant if the writers had assessed the knowledge of the students about
those functions prior to the courses and after. Such a quantitative study would have
added value to the qualitative research they did on this particular textbook.
Sahraei and Shahbaz (2012) provide a content analysis of some Persian textbooks
published in Iran based on Neil Anderson’s ACTIVE Model (1999). The ACTIVE Model is
based on six activities while reading, which will enhance the reading skill. These six
activities are as follows: Activate prior knowledge; Cultivate vocabulary; Teach for
comprehension; Increase reading rate; Verify reading strategies; Evaluate progress. The
authors evaluate the development of the reading skill in several Persian textbooks
published in Iran, according to the ACTIVE Model. They report that they cannot find any
textbook that makes use of the current findings in language teaching, including the
ACTIVE Model. The only common trend in all the textbooks they studied was the
cultivation of vocabulary followed by the verification of reading strategies. The remaining
four activities are neglected in the textbooks they studied. It would have been more
unbiased and useful if the authors had given both the strong points and the shortcomings
of the textbooks in question. The textbooks might have included similar features to the
tenets of ACTFL that the writers failed to notice.
Another recent work, by Ghareh-Gazi and Asgharpour Masooleh (2014), provides a

descriptive bibliography of various books and textbooks of teaching Persian to non-
Persian speakers. The authors enlist 166 books and examine 146 of them (as the others
were not accessible to them) in terms of factors such as whether or not the textbooks
have an introduction, teacher’s guide, exercise book, key to the exercises, length of the
instruction period, teaching methodology, language of instruction, level of instruction,
audiovisual (p. 407) components, use of transliteration, images and photographs, etc. The
book provides statistics to show the percentage of existence of each factor. While this
study is a significant contribution, it fails to examine more than a dozen Persian textbooks
published outside Iran within the last decade. The next section discusses some current
and crucial issues with regards to the status of teaching Persian outside Iran.
16.7 Current issues and challenges
Page 22 of 26

As mentioned earlier, there is a poverty of scholarly work on learning and teaching

Persian to speakers of other languages and the reasons for that are multifaceted. These
challenges, some of which may be more directed towards teaching Persian outside Iran,
lie in the curriculum of post-secondary Persian language programmes, which in itself
encompasses several other issues. Almost all higher education centres of Persian
language (both inside and outside Iran) suffer from the lack of standardization of
teaching materials as well as assessment tools. There have been some efforts to produce
standards and assessment tools both inside and outside Iran (see Ghanoonparvar et al.
2004; Mills and Minuchehr 2014; among others), but as of now, no consistent and well-
known standards and assessment tools have been adopted by universities.
Another major issue is the lack of commitment on the part of the universities. In most
cases, universities are not willing to commit to hiring a tenure-line faculty for the Persian
language position. Instead, they create instructor or lecturer positions with much less job
security. This enables them to make less of a commitment to the position. As a result, the
job requirement is usually lowered to an MA level, and the instructors are often hired by
virtue of their degree in Iranian Studies, which does not necessarily make them an expert
in the field of teaching Persian to non-native speakers, hence not having the necessary
training in language teaching. One way to overcome this problem is to hold teacher-
training workshops for language instructors. This is something that has been started in
some universities with a focus on the teaching of Persian language (such as STARTALK
programmes and the workshops organized by the American Association of Teachers of
Persian). Another reason that universities hire instructors is the higher teaching workload
required from an instructor (usually three courses per term, while tenure-line faculty are
typically required to teach two courses per term). This leaves instructors very little time
to do research, even if they do come with the required background.
An underlying issue is that universities tend to take language-teaching positions not

seriously. The evidence is that they tend to hire tenure-line faculty for disciplines such as
literature and social sciences and leave the language-teaching positions for lecturers and
adjunct instructors. There is clearly a need for educational policy makers to pay more
attention to the importance of teaching languages. The American Association of Teachers
of Persian organized a roundtable at the International Society for Iranian Studies
Conference 2016 in Vienna, entitled ‘The Current State of Persian Instruction at Colleges
and Universities’, to raise awareness about these issues at many levels: at the university
level; among colleagues in Iranian Studies; and also for the foundations which provide
funds for Persian language positions and can perhaps have a stronger voice in requesting
tenure-line positions when granting funds to institutions. The Modern Languages
Association and several other professional (p. 408) organizations have already recognized
the recent problems that language faculty are facing and thus have issued advocacy
policy statements and kits.
Page 23 of 26

16.8 Summary
This chapter provided a general overview of the history of teaching Persian to speakers of
other languages and looked at the current status of teaching Persian across the globe.
Issues within second-language acquisition were discussed with an eye on the studies
done on Persian. Modern trends and approaches to second-language teaching were
introduced and discussed. Finally, Persian language textbooks for non-Persian speakers
were discussed both in a theoretical and descriptive way, and the existing literature was
introduced and discussed. This chapter was written in the hope of serving as an invitation
to linguists who work on Persian language, Persian language educators, Persian textbook
authors, and test developers to work hand in hand and do collaborative projects in order
to fill the very clear gaps in the field of Persian language pedagogy. Much more needs to
be done for the advancement of pedagogy and assessment in a less-commonly taught
language, such as Persian, to reach the level of commonly taught languages.
Notes:
(1) Many studies differentiate teaching Persian as a second language from teaching
Persian as a foreign language; the important distinction between the two has to do with
whether the instruction takes place in Persian-speaking countries—Iran, Tajikistan, and
Afghanistan—or in an area where Persian is not generally spoken. In other words, the
main difference between the two is the environment outside of the classroom. Studies on
second-language acquisition however only use the term ‘Second Language/L2’ and define
a second language as any language other than one’s first language. Therefore, the third
or fourth language one learns will still be the second language in our definition. In this
article, we adopt the latter format and use the term ‘second language’ when referring to
teaching Persian to speakers of other languages.
(2) The information on this section has been mainly adopted from Windfuhr (1979).
(3) Many grammar books, written by Persians and for Persians, the history of which can
be dated back to the Sasanian era, are not discussed here. Here we only focus on
grammar books written for foreign language learners. Windfuhr (1979: 11) states:
‘Western scholarship, preoccupied with Arabic grammar, has done little to study Persian
treatises and their theoretical background. Not only Avicenna, but many other classical
scholars are likely to offer important insights for comparative linguistic theory’.
(4) Mehdi Marashi, personal communication.
(5) For a detailed discussion on traditional approaches and methodologies, refer to

Richard and Rodgers (2011).
Page 24 of 26

(6) See for instance, Shabani-Jadidi and Brookshaw (2010), Brookshaw and Shabani-Jadidi
(2012), and Sedighi (2015).
(7) Some of these principles have already been proposed and discussed by others, such as
Ellis (2003); Nunan (2003); Skehan (1998).
(8) See for instance, Thackston (1993).
(9) See, for instance, Saffar Moghaddam (2003).
(10) See, for instance, Zolfaghari et al. (2002).
(11) See, for instance, Brookshaw and Shabani-Jadidi (2012).
(12) See, for instance, Marashi (2003).
(13) See, for instance, Sedighi (2015).
(14) See, for instance, Thackston (1994).
(15) See, for instance, Shabani-Jadidi (2015).
(16) There are also some websites that include a comprehensive list of the material
created for teaching Persian to non-Persian speakers. See, for instance: http://
www.iranculturalstudies.com, http://www.persian-language.com, http://www.noormags.ir,
and http://www.saadifoundation.ir.

Page 25 of 26


Anousha Sedighi
Page 26 of 26

Psycholinguistics

Psycholinguistics

Subject: Linguistics, Psycholinguistics, Languages by Region
Psycholinguistics encompasses the psychology of language as well as linguistic

psychology. Although they might sound similar, they are actually distinct. The first is a
branch of linguistics, while the latter is a subdivision of psychology. In the psychology of
language, the means are the research tools adopted from psychology and the end is the
study of language. However, in linguistic psychology, the means are the data derived from
linguistic studies and the end is psychology. This chapter focuses on the first of these two
components; that is, the psychology of language. The goal of this chapter is to give a
state-of-the-art perspective on the small but growing body of research using
psycholinguistic tools to study Persian with a focus on two areas: presenting longstanding
debates about the mental lexicon, language impairments and language processing; and
introducing a source of data for the linguistic analysis of Persian.
Keywords: psycholinguistics, lexical decision making, priming techniques, processing, idiomatic expressions,
aphasic studies, compound words, complex predicates, Persian psycholinguistics
17.1 Introduction
THIS chapter investigates existing research in Persian psycholinguistics. While there is a
small but growing number of studies in Persian psycholinguistics, there are still many
areas in which psycholinguistic studies of Persian could be carried out with potential
profit to Persian linguistics and to psycholinguistics more generally.
With this in mind, this chapter provides an overview of the growing research on
psycholinguistics, its tools and its contribution to other fields in linguistics, as well as a
review of the research that has been done in Persian psycholinguistics. The aim of this
Page 1 of 31

Psycholinguistics
chapter is to introduce prevailing discussions in psycholinguistics to readers whose

specialities may be in other fields of Persian linguistics; to encourage more research into
Persian psycholinguistics in general; and to provide grounds for comparison between
Persian psycholinguistics and psycholinguistic studies with speakers of other languages.
The basic goal of psycholinguistics is to understand how human beings process language
—how they are able to comprehend and produce a series of sounds that carry meaning
mutually understood by speakers of the same language. To this end, theorists have
developed the notion of a mental lexicon. Since most psycholinguistic studies on Persian
in the literature are based on this notion, in what follows we will discuss the mental
lexicon, theories about how the mental lexicon works, the methods used in
psycholinguistic research to test these theories, and how these methods may be applied
in Persian linguistics. In the process, the existing studies on psycholinguistics in Persian
will be reviewed. Finally, the contribution that Persian psycholinguistics could make to
Persian linguistics overall will be pointed out.1
Page 2 of 31

Psycholinguistics
(p. 412) 17.2 The mental lexicon

The mental lexicon is said to be the dictionary in the mind containing all of the words
known by the speaker along with all other syntactic, semantic, morphological,
phonological, spelling, discourse, and register information of a language. It might be said
that the mental lexicon is even greater and broader than a traditional dictionary, since it
encompasses a thesaurus, a bilingual dictionary, a pictorial dictionary, and a topical
dictionary. This claim can be made because when we try to access a word, all the
information that can be found in these dictionaries is retrieved for us to select from in
order to access, process, or produce a given written or spoken word. Research on the
mental lexicon aims at providing the nature of this holistic dictionary as well as mappings
between the different dictionaries at the time of access. Therefore, such studies focus on
the structure of the lexicon as well as on lexical processing and representations. For
example, in studies on the processing of Persian compound words in native speakers of
Persian (Shabani-Jadidi 2014; Nojoumian et al. 2006), the data indicate that such words
are decomposed into their constituting elements orthographically from right to left. These
findings provide evidence for the structure of words in the mental lexicon. One can argue
that since Persian is a very rich language morphologically, the speakers of the language
tend to break the word into its constituents in a linear manner regardless of whether the
remaining element is an existing word or morpheme in the language. This argument has
been made for other morphologically rich languages like German, as well (Smolka et al.
2008).
Studies that focus on the structure of language have attracted theoretical linguists, who
have, in turn, developed different models and theories, such as lexicalist theories (e.g.
Pinker 1989), which consider the lexicon so rich as to include all the information
mentioned above, i.e. words, roots, affixes, and word-formation rules, as well as more
detailed information, such as information about the argument structures of various verbs.
This is in contrast to other kinds of theories, such as ‘constructionist’ theories (e.g.
Marantz 1997), which consider the lexicon to be quite poor and comprised only of atomic
roots, such as sound-meaning pairings, leaving all word building to be done by syntax. In
Marantz’s opinion, the grammar ‘constructs’ all words in the syntax by the same general
mechanisms (‘merge and move’; see Chomsky 1995) that construct phrases. This is in line
with other studies in the framework of Distributed Morphology (DM) that consider the
lexicon to contain only category neutral roots which then enter the syntactic derivation
formation by combining with category-defining functional heads (Borer 2005a&b, 2013).
Many syntactic studies on Persian complex predicates posit a constructionist view of
Persian CPrs (complex predicates are further discussed in Chapters 2, 3, 7, 8, 9, 10, 15,
and 19). For example, Folli et al. (2005) argue that only a syntax-based approach to
argument structure in Persian complex predicates can account for the compositionality
and independence of the elements in these constructions. Their theory is against a
lexicalist approach due to the interdependence and systematicity of the nominal
constituent and the verbal constituent of the complex predicate, as well as their
Page 3 of 31

Psycholinguistics
contributing role in determining the event structure and alternation possibilities of the
whole complex predicate (p. 1369). Similarly, other studies (e.g. Megerdoomian 2001,
2012a) also support a fully compositional, yet syntactic approach to Persian complex
predicates, where the semantic and syntactic properties of the complex predicate are
determined by the syntactic construction of the nominal and the verbal elements rather
than by their lexical entries.
There are also some other theories, such as the dynamical approach, which posit
(p. 413)
a computational model (e.g. Elman 2009) as operating in the mental lexicon, in which new
forms and new meanings are derived from morphological computational rules, and
parsing is done partially and locally rather than globally.
Although most studies to date have considered Persian complex predicates to be fully
compositional (e.g. Folli et al. 2005; Megerdoomian 2001, 2012a), there are some studies
that try to take another road. For example, Samvelian and Faghiri (2014) adopt a partially
compositional approach to Persian complex predicates, following Nunberg et al.’s (1944)
approach to Idiomatically Combining Expressions, where the elements of an idiom (e.g.
spill the beans) are ‘assigned a meaning in the context of their combination’ a posteriori
(Samvelian and Faghiri 2014: 65). What they mean by a posteriori seems to be the
argument that the elements of a complex predicate in Persian cannot contain the
meaning that they will be assigned after their being combined with the other element. In
other words, they do not consider the mental lexicon to be quite rich as to include
different shades of meaning of words, but rather it is the syntactic combination that is the
determining factor for the meanings of the compound constituents. This argument is in
contrast to the argument made by the findings of the experimental studies on Persian
complex predicates, where all the information is in the lexicon. For example, upon
hearing or reading the word zamin ‘earth’, both the literal (zamin kandan ‘earth-to dig’)
and the idiomatic (zamin xordan ‘earth-to eat / to hit’, meaning ‘to fall’) varieties as well
as its equivalent in other languages known by the individual in addition to their literal
and idiomatic varieties will be activated within milliseconds, which is not even noticeable
by the conscious mind (Shabani-Jadidi 2014, 2016). This finding in the experimental
studies, which will be more thoroughly discussed in section 17.3.2.2, is in line with other
studies in the literature on idiomatic expressions, where, upon the occurrence of the first
word in the idiom string, both the literal and the idiomatic meanings are activated (e.g.
Foss and Jenkins 1973; Swiney and Cutler 1979). For further information on idiomatic
expressions, see Chapters 9 and 11.
Therefore, the findings of studies done by theoretical linguists seem not to be conclusive,
and one way to validate them is to have them confirmed by those of experimental studies
by psycholinguists who are equally interested in the processes involved in the mental
lexicon at the time of process and access. Psycholinguistic studies have provided evidence
of how a word and its relevant information, such as its root, affixes, and word-formation
rules, are stored in and accessed from the mind of the individuals. Different kinds of
experiments have been devised to investigate these issues. At the word level, for example,
experimental studies have examined the effects of morphological relatedness (kārgar-kār
Page 4 of 31

Psycholinguistics
‘worker-work’), semantic relatedness (kārgar-ra’is ‘worker-boss’), syntactic relatedness

(mādarāne-barādarāne ‘motherly-brotherly’), and orthographic or phonological
relatedness (shenidan-shen ‘to hear-sand’), or even a combination of these on the speed
and accuracy of accessing and processing of the target word (second word) when
preceded by the related prime word (first word).
Depending on the language under investigation, the subjects tested, the conditions in the
study, the types of relatedness examined, and the technique used, such studies have
yielded different results and supported different theories. The decompositional approach
(Longtin et al. 2003 for French; Rastle et al. 2004 for English; Fiorentino and Poeppel
2007 for English; Kazanina et al. 2008 for Russian; Shabani-Jadidi 2014 for Persian;
among others) holds that (p. 414) polymorphemic words are decomposed into their
constituents during processing. In contrast to this approach there are:
(a) the non-decompositional approach (Butterworth 1983), which assumes that

polymorphemic words are processed as a whole;
(b) dual-access theories that support morphological decomposition for some
polymorphemic words, but non-decompositional status for others (Stanners et al.
1979; Schreuder and Baayen 1995; Baayen et al. 1997; among others);
(c) the distributional-connectionist model that challenges a computational theory of
mind (e.g. Seidenberg and Gonnerman 2000; Hay and Baayen 2005; among others)
and which holds that new forms and new meanings are derived from morphological
computational rules;
(d) the symbolic dual-route model, which focuses on orthographic, phonological, or
semantic representation (Coltheart et al. 1993; Coltheart et al. 2001; among others).
17.3 Areas of investigation in psycholinguistic

research
There are two major areas of investgation used in psycholinguistic research. One involves
the study of what naturally occurs in language production, such as spoonerisms and the
tip-of-the-tongue phenomenon. The second type consists of elicitation techniques,
involving language comprehension through lexical decisions, priming paradigms, and
reading aloud.
Due to the nature of these techniques, there are different research tools and different
statistical analyses that are used by researchers. Some of these are explored below.
17.3.1 Natural phenomena in language production
Page 5 of 31

Psycholinguistics
Some naturally occurring phenomena, especially mistakes in speech production, afford an

opportunity for psycholinguists to study language processing. Two of these phenomena
that have been the focus of psycholinguistic research are spoonerisms (e.g. Baro dasti?
instead of Daro basti? ‘Did you close the door?’) and the tip-of-the-tongue phenomenon.
While the study of these natural phenomena will enable us to understand the mental
lexicon and how it is structured, collecting samples for study poses certain difficulties.
For example, such phenomena are real-life phenomena, which makes collecting them and
analysing them quite difficult.
17.3.1.1 Spoonerisms
Spoonerisms are slips-of-the-tongue—the switching mistakes speakers make in the
production of speech, which could be at the segment, word, or morpheme level. Consider,
for example: (p. 415)
(1)
What is happening in a linguistics professor’s mind when he asks his doctor, āyā inflection
dāram? ‘Do I have an inflection?’ rather than, āyā infection dāram? ‘Do I have an
infection?’ Perhaps, the linguist has used ‘inflection’ more frequently than ‘infection’, so
his subconscious mind accesses the former instead of the latter, as it is more readily
available to him. Frequency effect has been tested and attested in a number of studies in
word processing (e.g. Grainger 1990; Perea and Pollatsek 1998; Paap et al. 2000; among
others).
What happens when a Persian native speaker uses mahāsen-āt ‘virtues/beard’ -āt ‘plural
marker’ instead of mahāsen ‘virtues/beard’, which is already plural? In the case of
mahāsen, which is an Arabic loan word in Persian, it seems that—in line with
psycholinguistic research on Arabic (Boudelaa and Marslen-Wilson 2004a, 2004b, 2005)—
not only is the three-letter root (h-s-n) activated, but so is its word pattern as well as
other similar word patterns for pluralization (here, -āt). In other words, when searching
for the word mahāsen in the mental lexicon in order to either process or produce it, not
only is mahāsen activated, but also other plurals of the same root, one being mohassan-āt
‘virtues’; somehow the plural mahāsen is blended with the plural marker -āt and the
erroneous word mahāsen-āt is produced.
From the above examples, we can discern that the entirety of the word or phrase must be
planned in the mind a priori before we produce it, or else we would not have switched the
elements within one word or the words within one phrase.
17.3.1.2 Tip-of-the-tongue
Page 6 of 31

Psycholinguistics
The tip-of-the-tongue phenomenon is something that we have all experienced, especially

when we are tired or switch languages. We dig in our mind to access a word on the basis
of its meaning, initial letter, spelling, rhyme, etc. For instance:
Esme doktoret chie? ‘What is the name of your doctor?’ … Sare zabuname! ‘It is on the tip
of my tongue!’ … Bā alef shuru mishe. ‘It starts with an A.’ … Bar vazne āsemānie. ‘It
rhymes with āsemāni’, … āhān, āterāni! ‘Oh, Aterani!’
This phenomenon indicates that accessing the mental lexicon, that is, the dictionary in
the mind, is very quick. It also means that we store a word together with its semantic,
phonological, orthographic, morphemic, and syntactic information in our mental lexicon.
Not much has been done on the tip-of-the-tongue phenomenon in Persian: the only study
is by Askari (1999). In fact, she conducted a study on tip-of-the-tongue phenomenon in
Persian–English bilinguals. Her subjects heard a definition in either Persian or English,
followed by a prime word in either Persian or English that was related to the target in
meaning, sound, or nothing at all. The subjects were supposed to supply the target word
that would best fit the description. Askari’s results indicated that similar-sound primes
accelerated the retrieval of tip-of-the-tongues. In addition, she observed the same effect
of primes in same and different language conditions, which she used to conclude that
both languages map onto a single lexicon, which in turn, supports the single-store model
of bilingual memory (Askari 1999).
(p. 416) 17.3.2 Experimental techniques in language comprehension
The experimental techniques that psycholinguists use to study language comprehension

come mainly from the field of psychology and thus entail observing change in behaviour
and measuring this change by different means. Some of the most commonly used
techniques are lexical decision, priming paradigm, reading aloud, and self-paced reading.
Below, these three techniques as well as other relevant topics are briefly explained.
17.3.2.1 Lexical decision task

The most commonly used technique in psycholinguistic research is lexical decision task
due mainly to its relative simplicity, as it does not require complicated, costly lab
technology. It can be used with a laptop anywhere in the world. It basically measures how
fast people classify written or spoken stimuli as words or non-words. Variations in
experimental conditions provide clues as the characteristics of the mental word
processing involved.
In order to make a decision on the lexicality of a word, the subjects must access their
mental lexicon to see whether the word exists, following which they will then judge the
string of letters to be a word. But if the word does not exist in the mental lexicon, they
will then judge the string of letters to be a non-word. It is interesting to note that based
on the frequency of encounters with a given word, its lexicality becomes more robust and
thus the response latency becomes shorter (see Chapter 11 for more information about
Page 7 of 31

Psycholinguistics
frequency). In other words, we make a quicker decision on the lexicality of a word if we

have heard or read it many times. This frequency effect on priming will be discussed in
section 17.3.2.2.
In a typical lexical decision task, a string of letters, be it a word or a non-word, flashes on

a computer screen or is delivered audibly through a speaker, and the subjects are
instructed to decide as quickly and as accurately as possible whether it is a word in their
language or not. In a masked priming experiment, which is the most commonly used
lexical decision technique, what the subjects do not notice is the presence of a prime
word that flashes on the computer screen for only 50 milliseconds. This is not enough
time for their conscious mind to notice, but their subconscious mind does indeed record
it. This technique has been used extensively to measure various kinds of relatedness
between the prime (the word recorded by the subconscious) and the target (the word that
the conscious mind records). They can be phonologically/orthographically,
morphologically, semantically, or syntactically related or unrelated. They can also be in
different languages known to the subjects in order to examine the effect of first language
on the second, or vice versa.
What gives us meaningful information is the time that it takes for the subject to react
during lexical decision tasks; that is, the time it takes to decide whether the word
displayed is a word or a non-word. There are various types of software that are used to
measure the reaction times as well as the accuracy of lexical decisions made from the
onset of the target display on the screen. Through such software, what is measured
(dependent variable) is the response latency; that is, how long it takes the subjects to
decide if the word is a real word or a nonsensical word, and the response accuracy; that
is, whether the decision made is correct or incorrect.
There are some psycholinguistic studies on Persian language, which makes use of
(p. 417)
lexical decision task. For example, using the masked priming technique, Shabani-Jadidi
(2014) investigated three kinds of relatedness in Persian complex predicates:
1) relatively transparent (e.g. ghazā-xordan ‘food-to eat’—GHAZĀ ‘food’ to eat);

2) relatively opaque (e.g. qasam-xordan ‘oath-to eat’—QASAM ‘oath’ to swear);
3) orthographically overlapping (e.g. shenāxtan ‘to recognize’—SHENĀ ‘swimming’).
The study aimed at investigating whether Persian compound verbs are decomposed into
their constituting elements while processing and whether this decomposition is based on
a particular relatedness, such as semantic transparency, or morphological relatedness, or
merely orthographic relatedness would trigger decomposition (see Chapters 2, 3, 7, 8, 9,
10, 15, and 19 for more discussion on compound verbs). The findings of this study are
discussed in detail in section 17.3.2.2.
Similarly Nojoumian et al. (2006) examined three kinds of relatedness in their

investigation of Persian compound nouns:
Page 8 of 31

Psycholinguistics
1) transparent (e.g. sar-angosht-SAR ‘fingertip / HEAD’);

2) opaque (e.g. sanjāb-SANJ ‘squirrel / MEASURE’);
3) orthographic (e.g. badraghe-BADR ‘seeing off / MOON’).
Similarly, the goal of this research was to examine the pattern of decomposition of
Persian compound nouns. The findings of this study will be discussed further in section
17.3.2.2.
Another study on Persian is that of Golshaie and Golfam’s (2015), where the processing of
conventional conceptual metaphors in Persian is investigated by measuring reading-time
latency. In this study, the subjects read a set of scenarios containing non-conventional
metaphors, conventional metaphors, and non-metaphor expressions, followed by novel
target sentences. The response latencies of reading these novel target sentences
indicated that non-conventional metaphors facilitated the reading of these novel target
sentences. The facilitation effect was not very strong for the conventional metaphors and
there was no facilitation effect in the non-metaphor condition. They came to the
conclusion that conventionality in metaphors inhibits the processing of conceptual
metaphors.
Like any other technique, the lexical decision technique has some problems, such as the
artificiality of the stimuli presented, and in general the artificiality of the task itself. The
fact that the subject has to sit in front of a computer is not conducive to natural
processing of the language. Another problem is that after several words, the subject
develops some kind of strategy to make lexical decisions.
Other than these problems that emanate from the task itself, there are some extraneous
factors that affect the decisions made by the subject. For instance, a non-word that is
similar to a word in the cohort of languages known by the subject might be erroneously
recognized as a word. The frequency of the stimuli is another interfering noise, which can
be controlled to some extent, but not fully. Another intervening factor is the fatigue or
distraction experienced by the subjects when undertaking the task. Therefore, nothing
can guarantee that their responses are all informed and conscious ones.
Despite these shortcomings, the lexical decision task is still the one most commonly used,
as it is somewhat easy to develop, quite easy to administer and analyse, and still
informative.
(p. 418) 17.3.2.2 Masked versus unmasked priming techniques

The lexical decision task described in section 17.3.2.1 is an example of masked priming—
the prime is displayed for a period of time too short to be consciously recognized. There
are various kinds of masked priming techniques, such as the four-field paradigm (mask–
prime–target–mask), where both the prime and the target are displayed very briefly
(Evett and Humphreys 1981), and the three-field (mask–prime–target) paradigm or the
‘sandwich’ technique, where the prime is sandwiched between a forward pattern mask (a
series of hash marks, #########)—the length of which is compatible with the length
Page 9 of 31

Psycholinguistics
of the prime—and the target stimulus, which is a backward mask (Forster and Davis
1984).
If both the prime (the first word) and the target (the second word) are displayed for
enough time to be noticed consciously, then the experiment is called an unmasked
priming experiment. Unlike in the masked priming paradigm, in the unmasked priming
paradigm, both the prime and the target are displayed for over 100 milliseconds, or until
a decision is made by the subject (Longtin et al. 2003; Smolka et al. 2014; among others).
Both these techniques have been widely used in psycholinguistic research, and a review
of the literature shows that they have yielded very different results. While the results of
the masked priming experiments are used to explore earlier stages of processing, the
findings of the unmasked technique tap into later stages of processing. Therefore, use of
both these experimental techniques gives us a more holistic picture of what happens in
the mind at the time of processing. In both masked and unmasked priming techniques,
different prime–target relations are investigated, such as:
1) repetition priming (e.g. gol–GOL ‘flower–FLOWER’);

2) form priming (e.g. gol–POL ‘flower–BRIDGE’);
3) semantic priming (e.g. gol–XĀR ‘flower–THORN’);
4) morphological priming (e.g. gol–GLOESTĀN ‘flower–FLOWER GARDEN’);
5) translation priming (e.g. gol–FLOWER).
The effect of the prime on the accelerated identification of the target as a word or non-
word is compared to that of a completely unrelated control prime–target pair (e.g. āheste-
GOL ‘slowly-FLOWER’). The unrelated control pair can vary in degrees of relatedness as
the control prime is unrelated to the control target and thus does not have any effect on
its priming (Perea and Rosa 2000). In fact, the control prime can even be replaced by
hash marks, or dollar signs, or any other string of symbols, as long as it is the same
length as the target and the experimental prime (Davis 2003).
The process of the priming effect can be explained by Forster’s (1999) bin model of lexical
access. According to this model of lexical access, lexical entries are categorized into
subcategories or ‘bins’ on the basis of their orthographic/phonological similarities.
Upon the presentation of the prime, closely matching, less closely matching, and non-
matching entries are accessed. Then upon the presentation of the target, the closely
matching entries, which have already been accessed, will be accelerated. In other words,
the perfectly matched and the closely matched entries are tentative candidates when
processing a word, and response times are typically shorter. This search will continue
until the bottom of the bin is reached and the appropriate entry is activated. For example,
when hearing or reading the verb shenidan ‘to hear’, Persian native speakers start
dissecting it linearly until they reach the (p. 419) end of the word. In this case, the
dissection path will be shen ‘sand’, sheni ‘sandy’, shenid ‘s/he heard’, and finally shenidan
‘to hear’.
Page 10 of 31

Psycholinguistics
1) Influence of linguistic factors on the priming effect
While masked and unmasked priming research techniques can be used with any
language, the structural characteristics of the language under investigation have an
impact on the priming effect. These characteristics include but are not limited to word
order, orthographical rules, morphological composition, creativity, and linearity.
For example, in experimental studies on polymorphemic words in concatenative (linear)

languages, like English, where the suffixes and prefixes are added to the root in a linear
manner, semantic transparency is said to influence processing (Marslen-Wilson et al.
1994; Longtin et al. 2003; among others). In other words, semantically transparent
complex words (government) prime their root (GOVERN), whereas semantically opaque
complex words (apartment) do not prime their root (APART).
However, in non-concatenative languages, like Hebrew or Arabic, semantic transparency

is said not to have an effect on processing even under the unmasked priming paradigm
(Frost et al. 1997; Boudelaa and Marslen-Wilson 2000; among others). Boudelaa (2013)
illustrates this point clearly by comparing root priming in transparent polymorphemic
words (e.g. kitābah ‘writing’ as the prime and maktab ‘office’ as the target) versus opaque
polymorphemic words (e.g. katibah ‘squadron’ as the prime and maktab ‘office’ as the
target). He asserts that the facilitatory effect exerted by the prime on the target is
observed not only in masked (covert) priming, but also in unmasked or overt priming (e.g.
cross-modal and auditory-auditory), unlike in English, where opaque pairs (e.g.
department–depart, quarterback–quarter) facilitate each other only in covert tasks
(Boudelaa 2013: 383).
The observed priming effect in both transparent and opaque polymorphemic words is said
to be caused by the morphological richness of the language. In morphologically rich
languages, the number of morphologically complex words is correspondingly high, which
leads to their being more frequent and less marked. This will cause the opaque words, as
well as the transparent words, and even the pseudo-complex word (e.g. shenidan ‘to hear’
as explained in the previous section) to be parsed or decomposed as much as possible,
regardless of the lexicality of the remaining segment.
The morphological richness effect on priming is not only attested for the
nonconcatenative languages, like Arabic and Hebrew, but also for concatenative
languages with a rich morphology, such as German and Persian (Smolka et al. 2008;
Shabani-Jadidi 2014; among others). Smolka et al. (2008) reported a strong priming effect
for semantically opaque prime–target pairs in unmasked priming experiments in German,
where the target was a verb (KOMMEN ‘come’), and the primes were: a purely
semantically related verb (nahen ‘approach’), a morphologically and semantically related
verb (mitkommen ‘come along’), and a purely morphologically related verb (umkommen
‘perish’), versus an unrelated verb (schaden ‘harm’).
Page 11 of 31

Psycholinguistics
Using the masked priming paradigm, Shabani-Jadidi (2014) investigated the priming
effect in three kinds of relatedness in Persian complex predicates:
1) relatively transparent (e.g. ghazā-xordan ‘food-to eat’—GHAZĀ ‘food’ to eat);

2) relatively opaque (e.g. qasam-xordan ‘oath-to eat’—QASAM ‘oath’ to swear);
3) orthographically overlapping (e.g. shenāxtan ‘to recognize’—SHENĀ ‘swimming’).
(p. 420)
The author investigated the priming effect of these compounds and pseudo-compounds
(i.e. the orthographically overlapping prime–target pairs) on their nominal constituent or
pseudo-constituent word (Experiment 2a), as well as the priming effect of these
compounds and pseudo-compounds (i.e. the orthographically overlapping prime–target
pairs) on their verbal constituent or pseudo-constituent word (Experiment 2b).
To illustrate the processing of Persian complex predicates through experimental studies,

the conditions and sample stimuli used in each of these two experiments,2 as well as the
results of each experiment are summarized in Tables 17.1–17.4.
Table 17.1 Examples of stimuli: Experiment 2a
Condition Test prime Control prime Target
Transparent compound chaai-rixtan davaa-xordan CHAAI

prime tea-to pour medicine-to eat tea
‘to pour tea’ ‘to take medicine’ ‘tea’
Opaque compound prime zabaan-rixtan zur-zadan ZABAAN

tongue-to strength-to hit tongue
pour ‘to try very hard’ ‘tongue’
‘to flatter’
Orthographic form overlap shenaaxtan resaandan SHENAA

‘to recognize’ ‘to give somebody a ‘swimming’
ride’
Table 17.1 depicts the conditions in Experiment 2a, which evaluated the priming of the
nominal constituent, which is the non-head and in word-initial position in Persian
compound verbs. There are mainly three conditions, namely: transparent, opaque, and
orthographic-form overlap. (p. 421)
Table 17.2 Priming of nominal constituent in non-head/word-initial position
Response time in milliseconds Mean

difference
Page 12 of 31

Psycholinguistics
Condition Target Control (Priming)
Mean (SE)— Mean (SE)—

Accuracy Accuracy
Transparent 519 (5.04)—98% 559 (5.41)—96% –40*

compound primes
Opaque compound 513 (5.3)—98% 538 (5.59)—96% –25

primes
Orthographic 542 (6.43)—97% 568 (5.61)—95% –26

overlapping primes
(*) p < 0.005
(**) p < 0.001
In Table 17.2, the mean response times and the responses for each condition are
presented. The mean difference in the last column is the difference between the response
time means in the prime–target and the prime–control pairs. This difference was used to
analyse the results. Since the reaction time to the prime–target pairs was smaller than
the response time to the prime–control pairs in all the three conditions, and because the
difference is statistically meaningful, we can conclude that there is a priming effect in all
three conditions. In other words, both the compound verb, regardless of its transparency,
and the pseudo-complex word, regardless of its compositionality, are decomposed into
their minimal constituents while processing.
Table 17.3 Examples of stimuli: Experiment 2b
Condition Test prime Control prime Target
Transparent compound prime kaado-daadan lebaas-pushidan DAADAN

gift-to give clothes-to put on to give
‘to give gifts’ ‘to put on’ ‘to give’
Opaque compound prime del-daadan lab-duxtan DAADAN

heart-to give lip-to sew to give
‘to fall in love’ ‘to be silent’ ‘to give’
Orthographic form overlap xandidan xaabaandan DIDAN

‘to laugh’ ‘to see’
Page 13 of 31

Psycholinguistics
‘to make somebody

sleep’
In Table 17.3, the conditions and sample stimuli of Experiment 2b are summarized. In this
experiment, the priming effect of the verbal constituent, which is both the head and
occupies the word-final position in Persian compound verbs, is investigated.
Table 17.4 Priming of head/word-final position
Response time in milliseconds Mean

difference
Condition Target Control (Priming)
Mean (SE)— Mean (SE)—

Accuracy Accuracy
Transparent 515 (6.10)—98% 532 (6.14)—97% –17

compound primes
Opaque compound 507 (5.59)—98% 532 (6.15)—97% –25

primes
Orthographic 526 (6.40)—96% 551 (6.64)—93% –25

overlapping primes
(*) p < 0.005
In Table 17.4 the results of Experiment 2b are summarized for the three conditions of
transparent, opaque, and orthographically overlapping as primes, and as targets, either
the (p. 422) second constituent, i.e. the verbal constituent of noun–verb compound verbs
or a pseudo-verbal constituent within a pseudo-polymorphemic word.
The results of Experiment 2b support the results observed in Experiment 2a, as there is a
priming effect in all three conditions. The only difference in the results of the two
experiments is the numerically greater mean difference observed in the nominal
constituent targets in comparison to the mean difference observed in the verbal
constituent targets in the transparent condition.
The discrepancy in the results of the transparent condition in Experiments 2a and 2b can
be accounted for by the possibility of competing verbs that can be matched with the noun
in the transparent condition unlike the opaque and orthographically overlapping
conditions. In opaque compounds, this competition does not occur as they are fixed,
frozen expressions. In other words, in order to form a transparent compound verb with
kādo ‘gift’, one can select from among different verbs, such as gereftan ‘to get a gift’,
Page 14 of 31

Psycholinguistics
kardan ‘to wrap a gift’, dādan ‘to give a gift’, xaridan ‘to buy a gift’, bordan ‘to take a gift’.
Therefore, it takes the parser longer to select from among these competing alternatives.
This phenomenon results in an increase in the processing load, which is reflected in the
smaller priming effect in the verbal constituent of transparent compound verbs. However,
in order to form the opaque compound verb qasam xordan ‘to swear’, qasam ‘oath’ has
only one option; that is, xordan ‘to eat’, since opaque compounds are idiomatic and fixed
expressions. The results of this study mean that at early stages of processing, Persian
speakers seem to decompose any decomposable element.
A similar study has been done on Persian compound nouns, in this case using a masked
priming paradigm and investigating three kinds of relatedness:
1) transparent (e.g. sar-angosht-SAR ‘fingertip / HEAD’);

2) opaque (e.g. sanjāb-SANJ ‘squirrel / MEASURE’);
3) orthographic (e.g. badraqe-BADR ‘seeing off / MOON’).
The results revealed a priming effect not only in the transparent condition but also in the
opaque and orthographic conditions (Nojoumian et al. 2006).
In another study, Shabani-Jadidi (2016) investigates compound verb processing in second

language speakers of Persian, where she compares the processing of transparent and
opaque compound verbs under the masked priming paradigm. The results showed a
significant nominal effect in the opaque condition and a numerically stronger nominal
priming effect in the transparent condition. In other words, morphological, but not
semantic relatedness (+M –S) seem to influence the priming effect, as indicated by the
significant nominal priming effect in the opaque condition. In addition, semantic and
morphological relatedness (+M +S) seem to affect priming, as revealed by the
numerically stronger nominal priming effect in the transparent condition. On the other
hand, syntactic-form relatedness does not seem to affect the priming effect, yet the
inhibitory effect in the opaque–opaque and transparent–opaque prime–target pairs could
be contributed to the increase in the processing load that semantic opacity adds to the
compound verb processing. In addition, although opaque compounds are expected to be
less frequent than transparent compounds for second language speakers, the fact that
both high-frequency and low-frequency compound verbs are decomposed into their
constituting elements confirms the automatic and online decompositionality of Persian
compound verbs.
The experimental Persian studies outlined above seem to support the idea that in
morphologically rich languages, opaque words, transparent words, and pseudo-complex
words (p. 423) are parsed or decomposed to their smallest constituting elements
(Boudelaa and Marslen-Wilson 2000; Smolka et al. 2008; Shabani-Jadidi 2014; among
others).
The morphological richness of a language is measured by different linguistic and non-

linguistic factors, such as:
Page 15 of 31

Psycholinguistics
1) the structure of the inflectional and derivational systems;

2) the productivity of the compounding system;
3) the proportion of semantically transparent versus semantically opaque compounds
(Smolka et al. 2008).
The structure of the inflectional and derivational systems is different in different

language families. For example, both Persian and English are Indo-European languages,
and thus affixes follow or precede the stem linearly (e.g. book, book-s; ketāb, ketāb-hā),
whereas Semitic languages such as Arabic and Hebrew have their affixes intervene
within their usually three-letter stems (e.g. k-t-b, ketāb, kotob).
The productivity of the compounding system allows native speakers of the language to
easily coin novel words and equally easily be understood by other native speakers of the
language. For example, in Persian, novel compound verbs are frequently used that
include a Persian verbal constituent and a nominal constituent from another language
(e.g. kilik-kardan, ‘click-to do’, to click).
The proportion of semantically transparent compared to semantically opaque compounds

corresponds to the frequency of idiomatic expressions and their frequency of use in the
language. Idiomatic Persian compound verbs are a good example of this factor (e.g. del-
dādan, ‘heart-to give’, to fall in love, or sar-be-sar-gozāshtan, ‘head-to-head-to put’, to
tease).
2) Irregular polymorphemic words
As mentioned in the previous section, linear decomposition happens for any

decomposable morpheme or pseudo-morpheme in processing. So the question is, for
example, in the case of Persian, is shenāxtan ‘to know’ processed as shen + āxtan ‘sand-
non-word’ or as shenā+xtan ‘swimming+non-word’ or as shenāxt + an ‘past stem of to
know + infinitive suffix’? What happens in case of irregular polymorphemic words? For
example, is ‘rang’ processed as ‘ring’ + ‘past’, and ‘ring’ activated along with ‘rang’ in
processing? Is the past tense marker ‘d’ in ‘told’ stripped off in processing just like in
regular verbs, such as ‘phoned’? If so, this would imply that irregular and regular
polymorphemic words are processed similarly. Experimental studies seem to confirm
these predictions. Stockall and Marantz (2006) reported that irregular past-tense forms
prime their stems to the same extent as regular ones. Kielar et al. (2008) discovered that
suffixed irregulars (e.g. slept–sleep, kept–keep) prime their stems to the same extent as
regulars, unlike their corresponding stem-change irregulars (e.g. taught–teach, sank–
sink). Fruchter et al. (2013) discovered similar priming effects in the three conditions of
identity, regular, and irregular, yet no significant priming effect in the pseudo-irregular
condition. These findings seem to be in line with the results of priming experiments on
non-concatenative languages, like Arabic or Hebrew, which support a lexical approach to
complex words, positing that not only the root information but also the structure
information is retrieved when processing complex words.
Page 16 of 31

Psycholinguistics
Most of these experiments investigate the word in isolation, but what happens if the word
is observed in a context? Pollatsek et al. (2010) performed an experiment on
polymorphemic (p. 424) word decomposition by putting words in context and employing
sentence processing. This technique will be explained in section 17.3.2.4. Example (2)
below is a sample sentence they gave to their subjects. These sentences are presented
here so that the reader sees what complexities the processing system has to deal with
and still does so perfectly.
(2)
Their results showed no difference in first fixation durations or first pass reading times
(fixation and eye-movement experiments are discussed in Section 17.3.2.4). Based on
their results, they argued that initial morphological structure building is not affected by
context and that the default segmentation for unXable items is [[unX] able]. Therefore, it
seems that, regardless of the existence of context, the polymorphemic word is
decomposed into its constituents.
However, the boundary of a word’s dissection is still a controversial topic in

psycholinguistic studies. While some researchers (e.g. Pollatsek 2010; among others)
argue for [[unX] able] decomposition, many others (Pylkkanen et al. 2009; Shabani-Jadidi
2014; Kazanina et al. 2008; among others) recognize an early decompositional route. In
other words, upon encountering any decomposable element, the parser dissects that
element and continues on to the next decomposable element.
As demonstrated in section 17.3.2.2 in the results from Experiments 2(a–b), in the case of
Persian, the same mechanism (that is, the early decompositional route) is at work,
although with the direction reversed as the writing system is from right to left. In fact,
this pattern seems to be analogous to the structure of polymorphemic words in Semitic
languages, like Arabic, which also reports decomposition in such words, claiming that
while processing polymorphemic words, not only is the root activated, but also the
information regarding its plural form and its corresponding word pattern (Boudelaa and
Marslen-Wilson 2004a,b, 2005).
This would lead us to the hypothesis that there might exist a universal mechanism in the
processing of natural languages, despite the apparent discrepancies in psycholinguistic
studies when it comes to different languages.
3) Influence of non-linguistic factors on the priming effect
Apart from the linguistic factors, there are other non-linguistic factors that still influence
the priming effect. One of these non-linguistic factors is the frequency effect. This
frequency effect is the result not only of the target frequency, but also of the prime
Page 17 of 31

Psycholinguistics
frequency and the degree of matching between these two frequencies. Earlier in the
chapter, different theories of polymorphemic word processing, such as decompositional,
non-decompositional, and dual-access theories were discussed. On the one hand, the
proponents of the dual-route models (Baayen, Dijkstra, and Schreuder 1997; Kuperman,
Bertram, and Baayen 2008; among others) consider decomposition to occur for low-
frequency words, and whole-word recognition for high-frequency words. On the other
hand, some proponents of the dual access theory (Hay 2001; among others) postulate that
only when the frequency of the target is higher than the frequency of the prime, is the
complex word decomposed into its constituents.
Therefore, another non-linguistic factor that affects the priming effect is the ratio
between the frequencies of the prime; that is, the polymorphemic word (surface
frequency) (p. 425) and its constituents (base frequency). The surface frequency is more
important in non-decompositional theories of polymorphemic word processing, whereas
the base frequency is more important in decompositional theories. By the same logic,
both surface and base frequencies are equally significant in dual-access theories.
The type of task employed in an experiment also seems to influence the priming effect.
For example, Balota and Chumbly (1984) investigated word frequency effect in three
different tasks: lexical decision, pronunciation, and category verification. They observed
that the lexical decision task is affected the most by the frequency, whereas the category
verification task is affected the least.
Frequency effect has also been reported in morpho-orthographic segmentation in:
1) a high-frequency prime condition (e.g. government–GOVERN);

2) a low-frequency prime condition (e.g. concretely–CONCRETE);
3) a pseudoword prime condition (e.g. monkage–MONK).
In such cases, similar priming effects were observed for all the conditions (24ms–27ms–
22ms, respectively) (McCormick, Brysbaert, and Rastle 2009).
As far as studies on frequency effect in Persian compound verbs are concerned, there is
little research in this domain. One reason could be the lack of an exhaustive corpus of
Persian compounds, especially compound verbs (see Ghayoomi et al. 2010 for the
problems of corpus building in Persian). Shabani-Jadidi (2014) compared the frequency of
compound constituents through a frequency rating study, where twenty Persian native
speakers rated the frequency of compound verbs and their constituents on a seven-point
scale (1 being the lowest and 7 the highest). After discarding the prime–target pairs that
did not match in frequency level, she calculated the ratio of the prime frequency over the
target frequency and ran an ANOVA test to see if there was any significant difference
between them. No significant difference was observed between the remaining
experimental prime–target pairs (e.g. del dādan ‘heart-to give’ to fall in love, as the prime,
and dādan ‘to give’, as the target) and control prime–target pairs (e.g. lab duxtan ‘lip-to
sew’, to be silent, as the prime, and dādan ‘to give’, as the target). In addition, since in all
Page 18 of 31

Psycholinguistics
conditions (i.e. transparent, opaque, and orthographic), a significant priming effect was
observed, it could mean that frequency has no role in priming effect in Persian compound
verb processing, yet more research on this topic is required to make a definitive claim.
17.3.2.3 Reading-aloud experiments

Reading aloud is another technique used to determine how the mental lexicon is
structured through discovering how a word is accessed and processed. For example,
Forster and Davis (1991) hypothesize that if the prime and the target share the same
onset (e.g. custom—CARPET), the speech onset latency for the target word is
accelerated, compared to an unrelated prime–target pair (e.g. powder–CARPET). This is
called the Masked Onset Priming Effect (MOPE) and it has been reported in reading
aloud experiments in different languages (Kinoshita 2000, 2003; Malouf and Kinoshita
2007 in English; Schiller 2004, 2007, 2008 in Dutch; Grainger and Ferrand 1996 in
French; Carreiras, et al. 2005; Carreiras, et al. 2009; Dimitropoulou, et al. 2010 in
Spanish). Some studies have reported results that suggest that the masked onset priming
effect is caused by phonological onset-overlap (p. 426) (e.g. kernel [kernəl]–CARPET
[kɑrpət]) rather than graphemic onset-overlap (e.g. circus [sırkəs]–CARPET [kɑrpət])
(Mousikou et al. 2010; Rastle and Brysbaert 2006; Schiller 2007; among others).
However, there are some other studies that consider a dual route for phonological and
graphemic onset-overlap (Coltheart et al. 2001; Mousikou et al. 2010; among others). The
Dual-Route Cascaded (DRC) hypothesis considers the masked onset priming effect to
reflect the serial process of converting graphemes into phonemes (GPC), first taking the
non-lexical route (where the graphemes are converted into phonemes one by one) and
then the lexical route (where the phonology of a word is retrieved as a whole).
Therefore, the lexical route is analogous to the holistic route in complex word processing,
whereas the non-lexical route is similar to the decompositional route of polymorphemic
word processing. However, at early stages of processing, there is constantly a race
between these two routes, with both of them being activated simultaneously (Forster and
Davis 1991; Schiller 2004; among others). Interestingly, Forster and Davis (1991) found
out that in words that have irregular pronunciations (e.g. pint), the non-lexical route is
ignored while the lexical route is activated (Forster and Davis 1991; see also Mousikou et
al. 2010; among others). This is in line with the results of priming experiments on
idiomatic expressions, where the whole-word route wins over the decompositional route
in processing (Gibbs et al. 1989; among others).
The masked onset priming effect is assumed to reflect only the non-lexical route, as it
deals with the speech onset latency for the target word in prime–target pairs (Coltheart
et al. 2001; Mousikou et al. 2010; among others). However, there is evidence that
supports both routes being activated simultaneously. This evidence comes from reading
aloud studies that involve reading regular (e.g. mint, hint, tint) and irregular words (e.g.
pint), where the regular words are read faster during a conditional naming task. In this
task, the subjects are required to read aloud the words but not the non-words. The slower
response latencies for irregular words are accounted for by competing pronunciations
Page 19 of 31

Psycholinguistics
retrieved from both routes (Kinoshita and Woollams 2002). This is in line with the
competing alternative effect, observed in transparent Persian compound verbs (as
discussed in section 17.3.2.2), where the competing verbal elements for the nominal
element slow down the processing of such verbs, in comparison to the processing of
opaque compound verbs (Shabani-Jadidi 2014).
Timmer et al. (2012) investigate whether native Persian speakers read aloud transparent
words (i.e. words containing long vowels, which are written), but not opaque words (i.e.
words containing short vowels, which are not written), faster when preceded by
phonologically similar, onset-matching primes (e.g. respectively, sāl ‘year’–SOT ‘voice’;
SOLH ‘peace’) compared to phonologically dissimilar, onset-mismatching primes (e.g.
respectively, tāb ‘swing’–SOT ‘voice’; SOLH ‘peace’). They observed that the subjects read
the phonologically matching prime–target pairs faster than their mismatching
counterparts in the transparent Persian words; whereas in the opaque Persian words they
found no priming effect, in line with reading aloud studies on other Indo-European
languages (Forster and Davis 1991; Carreiras et al. 2009; Mousikou et al. 2010; among
others). Although no priming effect was observed in the opaque Persian words using the
reading aloud technique, through using an early ERP (Event-Related Potentials) time
window (i.e. 80–160 ms), Timmer et al. (2012) reported an identical masked onset priming
effect for both transparent and opaque Persian words, in line with some other studies
that support the dual-route cascaded model (Coltheart et al. 2001; Schiller 2008;
Mousikou et al. 2010; among others).
(p. 427) 17.3.2.4 Sentence-processing techniques

The assumption behind sentence-processing research techniques is that we have to parse
a sentence in order to understand it. In other words, we understand a sentence by
processing the meaning and structure of its component parts. There are mainly three
types of sentence-processing experiments: timed-reading experiments, eye-movement
experiments, and brain activity experiments. Below, each of these experiments is briefly
explained.
1) Timed-reading experiments
Studies on sentence processing mostly involve timed-reading experiments. The basic

assumption of these experiments is that the more difficult a sentence is, the longer it
takes to be parsed. Therefore, one way to assess the difficulty level of a sentence is to
time it when it is being processed. There are mainly two types of timed-reading
experiments.
In the bar-pressing paradigm, the sentence is presented word by word on the screen,
thus the subjects read the sentence one word at a time and presses the space bar on the
computer to indicate they have processed the word.
In the moving-window paradigm, the sentence appears on the screen with all the words
obscured by dashes, only to be revealed, word-by-word, upon the pressing of the space
Page 20 of 31

Psycholinguistics
bar. Each time the space bar is pressed, the previous word is replaced by dashes and the
next word appears on the screen.
In both of these paradigms, the semantic load of the word determines the processing
time. In general, the content words (i.e. nouns, verbs, adjectives, adverbs, etc.) take
longer to process than function words (i.e. prepositions, demonstratives, pronouns,
articles, etc.). In addition, in both of these paradigms, the parser seems to be sensitive to
the syntactic structure, as the subjects pause at the end of clause boundaries. What can
be deduced from timed-reading experiments is that the length of time that it takes to
process a sentence informs us about aspects of its syntactic structure and the semantics
of that sentence.
Raghibdoost and Mehrabi (2010) investigate Persian verb processing during sentence
comprehension. They are interested in whether or not syntactic and semantic
complexities of transitive and intransitive Persian verbs affect real-time sentence
processing. The method used was a cross-modal lexical task to examine online sentence
processing. The results indicated that verb processing time within sentence during
sentence comprehension is not only affected by verb type (i.e. transitive versus
intransitive) but also by sentence syntactic structure (i.e. simple versus complex). These
findings confirm that the semantic and syntactic structure of a sentence influences the
processing time in timed-reading experiments.
2) Eye-movement experiments
Eye-movement experiments require more sophisticated instruments, as they track the eye
movement of the subjects (called saccades) while they are reading a sentence. Most of
these experiments reveal that subjects tend to fixate on semantically loaded or content
words. At the end of the clause boundary, if the subject notices that something has not
been parsed correctly, his eyes move backwards in the sentence to reconfigure the parse.
Therefore, sentences that contain semantically anomalous parts (e.g. I went to the library
and swam) or syntactically complex parts (e.g. The dog walked around the park was tired)
create many backward eye movements. In eye-movement experiments, the assumption is
that eye movement reflects processing; therefore, the greater the number of long
fixations and backward movements is, the more difficult a sentence is.
(p. 428) 3) Brain activity (ERP)
ERP is the acronym for Event-Related Potentials, and it is used in sentence-processing

experiments to measure the brain’s activity while reading a sentence. Just like timed-
reading experiments and eye-movement experiments, ERP experiments are sensitive to
semantically anomalous structures and syntactically infelicitous sentences. There is a
negative voltage change of approximately 400 milliseconds (N400) when reading
semantically anomalous structures (e.g. The book was too hard to eat), while there is a
positive voltage change of 600 milliseconds (P600) when reading a syntactically
infelicitous sentence (e.g. I of is school to).
Page 21 of 31

Psycholinguistics
What brain activity experiments tell us is that sentence processing happens immediately
and online, rather than when the parser finishes the sentence and has time to rethink it.
This observation is in line with priming experiments on morphologically complex words,
where processing is reported to be online and immediate (Longtin et al. 2003; Shabani-
Jadidi 2014; among others).
One of the few studies on Persian using ERP is that of Timmer et al. (2012), explained in
section 17.3.2.3. Although a masked onset priming effect was only observed in the
transparent Persian words, but not opaque ones in the phoneme-matching condition, yet
not in the phoneme-mismatching condition, they reported an identical masked onset
priming effect for both transparent and opaque Persian words, through using an early
ERP (Event-Related Potentials) time window, i.e. 80–160 ms. This effect was continued
into a later ERP time window, i.e. 300–480 ms, yet only for the transparent words, but not
for the opaque words.
17.4 Theories of language processing

In this section, the most prevailing theories of language processing are presented. The
first distinction to be made is the direction of processing, whether it is top-down or
bottom-up. Then we will discuss idiomatic expression processing and syntactic
processing.
17.4.1 Top-down versus bottom-up processing
In top-down processing we interpret a sentence immediately and automatically on the

basis of the information available to us. In other words, we do not wait until the end of a
sentence to put all the words together to figure out the meaning of the whole sentence,
but rather we process the sentence as we go. That is why, when reading a sentence, we
sometimes guess what comes next or read what we expect to come next, not always
accurately. That is also why, when we read a sentence with function words missing, we
can still understand the sentence perfectly (e.g. Marc went library borrow book ordered).
This is true at the word level too: our parser is capable of reading words with the vowels
missing, provided that the vowels are not meaning-distinctive (e.g. blckbrd).
Bottom-up processing entails performing an analysis to isolate phonemes or letters,

morphemes, words, phrases, and clauses and ultimately relate all these bits to the mental
lexicon. Therefore, there is neither prediction nor any forward projection involved.
(p. 429) 17.4.2 Processing idiomatic expressions
Page 22 of 31

Psycholinguistics
Idiomatic expressions, or idioms, are phrases situated on a continuum between

completely compositional (transparent) and completely idiomatic (opaque) expressions.
Psycholinguists are interested in studying idiomatic expressions as the results will
indicate which direction is taken by the parser when processing an ambiguous
expression. In addition, the study of idiomatic expressions will reveal interesting facts
about the syntactic versus semantic relationships, as the non-literal status of these
expressions may bring about dissociations between these two levels of relationship.
Processing idiomatic expressions can be viewed from two distinct approaches, namely
bottom-up and top-down. If idiom processing is bottom-up, then the syntactic parser will
process the idiom literally. However, if the processing of idioms is top-down, then the
semantic parser will cancel the syntactic input in order to produce the figurative meaning
of the idiom. Therefore, the processing system will be involved in bottom-up and top-
down processing simultaneously.
There are many studies in the psycholinguistic literature on the processing of idiomatic
expressions. For instance, Swinney and Cutler (1979) conducted two Phrase Classification
experiments to examine the nature of access, storage, and comprehension of idiomatic
phrases. They observed that, since classification times were significantly faster for idioms
(e.g. see the light) than for their matched non-idiom control phrases (e.g. get the light),
their data supported a Lexical Representation Hypothesis for the processing of idioms.
Lexical Representation Hypothesis holds that idioms are stored and accessed from the
lexicon in the same manner as any other word and upon the occurrence of the first word
in the idiom string, both the idiomatic and the literal meanings of the idiomatic phrase
are activated (Foss and Jenkins 1973).
The opposite view to the Lexical Representation Hypothesis is called the Idiom List
Hypothesis. This latter hypothesis holds that idioms are stored in and accessed from a
special list that is not part of the normal lexicon. Therefore, an idiom mode of processing
is activated when processing idioms. An important condition in the Idiom List Hypothesis
is that a literal analysis always precedes an idiomatic one while processing.
Another study that investigated different levels of ambiguity in idiomatic expressions is

that of Gibbs et al. (1989), who examined three kinds of idioms (literal, such as to pop the
question, semi-idiomatic, such as to carry a torch, and idiomatic, such as to chew the fat)
in order to investigate idiom processing. They reported slower processing for idiomatic
phrases, which were non-compositional, in comparison to literal phrases, which were
compositional. Therefore, it seems that the degree of transparency/opacity of idiomatic
expressions has a role in the choice of top-down/bottom-up routes of processing. In
addition, different languages are likely to be compatible with different approaches to
processing idiomatic expressions, as the frequency of these expressions in a language as
well as the productivity of forming such expressions in a language may well play a role in
how these expressions are processed.
Page 23 of 31

Psycholinguistics
Studies on Persian idioms are quite rare. Sadat Safavi (2013) compared structural and
semantic processing of Persian idioms and reported that processing of idioms seems to be
more demanding than the processing of non-idioms and that both hemispheres of the
brain are equally involved in idiom processing. It would be interesting to compare
different kinds of idioms in Persian, both structurally different ones (e.g. idiomatic
compound verbs, (p. 430) idiomatic compound nouns, or proverbs) and semantically
different ones (e.g. fully idiomatic, semi-idiomatic and non-idiomatic). These topics are
open for further research, as the field is quite a young one in Persian.
17.4.3 Syntactic processing
Syntactic parsing posits that the grammar of the language determines the order by which
elements of a sentence are processed. It is interesting to note that some grammatically
complex sentences are easy to parse (e.g. Mary ate the apple pie that her grandmother
made especially for her), while some grammatically easy sentences are hard to parse (e.g.
The dog walked around the park barked).
The latter type of structure is called a garden-path sentence. In such cases, researchers
see longer fixations, recalculations, backward eye movements, etc. For the parser to
make a correct interpretation, a bottom-up processing route is not useful, as it will lead to
the wrong syntactic interpretation of (S + V + PP). To reach the correct syntactic
interpretation of (S + modifying phrase structure + V), the parser must take the top-
down processing route. However, the results of experimental studies on garden-path
sentences reveal that both routes are taken, first the bottom-up and then the top-down.
To understand the peculiar case of garden-path sentences, look at the following example:
(3)
This is a garden-path sentence because only when we read it to the end, do we realize
that man is the verb and not a part of the NP. However, the parser preference is to parse
it quickly as a part of the NP rather than the beginning of the VP. The word on which
fixation happens is the second the, which is the disambiguating word and which makes
our eyes move backwards in order to recalculate the sentence. Therefore, the bottom-up
strategy and immediate parsing lead us to an incorrect calculation of the structure of this
sentence. We will then recalculate it by considering the top-down parsing strategy and
considering man as the VP rather than as part of the NP.
In (3), the garden-path sentence is within one simple sentence. However, sometimes it is
the absence of punctuation marks that transforms a sentence into a garden-path, and one
can disambiguate the sentence by placing the appropriate punctuation. Note the
following examples:
Page 24 of 31

Psycholinguistics
(4)
The disambiguating word in (4a) is the verb ran, and in (4b) it is the verb spit. However,
in both cases, placing a comma will eliminate the garden-path miscalculations, as
illustrated below.
(5)
(p. 431)
Another way to eliminate garden-path miscalculations is to switch the order of the main
and subordinate clause, as illustrated below.
(6)
In (5a–b) as well as (6a–b), the parser can arrive at the correct interpretation by taking
the bottom up route, whereas in (3) as well as (4a–b), the parser needs to take both
routes simultaneously in order to elicit the correct interpretation of the sentence. In other
words, the incorrect proposition must be inhibited in order for the parser to get to the
correct proposition, and this is costly and leads to longer reaction time or response
latency (Ferriera et al. 2001).
Comparable examples in Persian include sentences containing nouns that could be read
as the subject or as the object. As in English, if punctuation is added, the sentence will be
disambiguated.
(7)
In (7), the students are addressed, while the subject is professor. Yet, because students is
used in the initial position, the reader has to wait until the verb to disambiguate the
sentence and recognize that professor is the subject rather than students, as the verb is
singular and in Persian, we observe subject–verb agreement. Of course, if there were a
comma after students, the sentence would be disambiguated. Note that in Persian, the
word dāneshjuyān would need an ezafe for the sentence to be garden-path; however, as
Page 25 of 31

Psycholinguistics
the word ends in a consonant, the ezafe is not written and only read, resulting in the
garden-path.
Other than lack of punctuation that causes ambiguity in Persian sentences, structural
similarities can make a sentence ambiguous too. For example, in (8), suzānde shode
‘burnt’ can be read as an adjective for kotob (which will be read with an Ezafe) or a
present perfect verb. Until the reader gets to the main verb bud ‘was’, both of these
readings are possible, which results in a garden-path sentence.
(8)
Marefat and Arabmofrad (2008) studied garden-path sentences and their processing in
Persian. Although their results did not show any effect in the grammaticality judgement
times, they indicated that garden-path sentences increase the processing load. In fact,
the farther the disambiguating word is from the verb, the greater the processing load will
be.
17.5 Aphasic studies

Part of the evidence for psycholinguistics comes from clinical studies on aphasic patients,
that is, patients who have lost some linguistic ability due to brain injuries. For more
(p. 432) information about studies on aphasia in Persian and in other languages, refer to
Chapter 18. The errors made by patients with aphasia can help researchers understand
the structure of the mental lexicon and how it is processed. For example, some studies
investigated the knowledge of compounds in the mental lexicon, revealing that patients
substituted simple words for simple words and compound words for compound words.
Therefore, it seems as if knowledge of the compound as well as knowledge of the
compound structure must be a part of the lexicon independent of the ability to identify or
produce it (Semenza et al. 1992, 1997). Similarly, other studies showed that aphasic
patients tend to replace noun–noun compounds with well-formed noun–noun neologisms
and verb–noun compounds with well-formed verb–noun neologisms (see Chapter 13 for
more information on neologisms). Therefore, the knowledge of the compound structure
and knowledge of word-building rules also seem to be a part of the lexicon (Hittmair-
Delazer et al. 1994; Semenza et al. 1997; among others).
Delazer and Semenza (1998) studied the processing of Italian compound words and
different levels of processing based on observations of a patient with aphasia who had
naming difficulties only in compound words. They observed that the patient made
frequent errors in substituting semantically adequate constituents for the correct ones,
Page 26 of 31

Psycholinguistics
and they attributed this to the separate processing of constituents at the lemma level,
where processing is done subconsciously. Lemmas are activated when their semantic
conditions are met, which then activate their corresponding syntactic specifications.
The errors made by Delazer and Semenza’s (1998) subject tended to share systematic
characteristics, in that they all possessed a compound structure and they were
semantically adequate substitutions of the target. In addition, they all respected Italian
word-building rules. Consider, for example:
(9)
On the basis of this observation, Delazer and Semenza (1998) proposed that the
compound structure as well as the position of each constituent is specified before lemmas
are accessed. Since the patient substituted both first and second constituents, they
concluded that both constituents are activated simultaneously and in parallel.
Delazer and Semenza (1998) did not report any frequency or headedness effect. They
reported that in half of the cases, the paraphasias contained one part of the target while
the first and second constituents were often equally retained (thirteen times for the first
part and fourteen times for the second part). The retained constituents mostly remained
in their original position.
However, Rochford and Williams (1965), who examined the relative surface-base
frequencies using a naming task with English-speaking aphasic patients, reported a
frequency effect. They classified the stimuli as high–high, high–low, low–low, and low–high
frequency combinations. They reported that the frequency of the first constituent
determines the success in retrieving the complex word. (p. 433)
Turning now to aphasic studies on Persian, Nilipour (2000) studied the grammar of two
right-handed monolingual native speakers of Persian, who became aphasic due to
traumatic left brain damage. The tasks tested both writing and speech modalities. Both
patients used simple syntax with little variation. Most of the utterances lacked the lexical
verb or erroneously replaced the verbs with the ‘filler’ verb ast ‘is.’ In addition, there was
a tendency to use the present rather than the past tense, despite it being longer in form.
Nilipour (1989) reported verb impairment, shorter phrases, slower speech rate, reduced
syntax, omission of free grammatical morphemes, and omission of the object marker ra
(in two aphasic patients) in the paraphasias he observed.
Page 27 of 31

Psycholinguistics
When comparing the paraphasic evidence in Persian and that in other languages, such as
Italian (e.g. Delazer and Semenza 1998), we find out that the neologisms tend to respect
the word building of the respective languages. In addition, the knowledge of the
compound seems to be retained in paraphasias. That is why in Persian, the aphasic
patients tend to keep the nominal constituent and replace the verbal constituent with
another verb, in this case, the most frequently used verb. This is in line with the
frequency effect reported by Rochford and Williams (1965) discussed above.
Later, Nilipour and Raghibdoust (2001) investigated linguistic deficits in seven aphasic
native speakers of Persian. Their aim was to discover the general characteristics of the
agrammatic language in Persian as well as to provide a general pattern for aphasic
deficits in different types of aphasia in Persian. They describe every one of the seven
patients and give an account of their paraphasias. Overall, the results observed in these
seven patients revealed interruption in the verb, deletion of free grammatical
morphemes, substitution of bound grammatical morphemes, and reliance on nouns rather
than verbs, for example:
(10)
The prominence of the nominal constituent in compound verbs in Nilipour and

Raghibdoust 2001 supports the results of compound verb processing reported by
Shabani-Jadidi (2014), where the nominal constituent showed a stronger priming effect
than the verbal constituent.
In a later study, Ghonchepour and Raghibdoust (2012) investigated the processing of

compound nouns in transcortical aphasics. They report that their subjects tend to
decompose the compound noun into its constituents. Their evidence comes from the
paraphasic substituents, such as nāxon ‘nail’ for nāxon-gir ‘nail clipper’ at the word level,
and margush for xargush ‘rabbit’ at the segment level. The linear segmentation of
compounds in Ghonchepour and Raghibdoust’s (2012) study is in line with Nojoumian et
al.’s (2006) study on compound noun processing and Shabani-Jadidi’s (2014) study on
compound verb processing.
Page 28 of 31

Psycholinguistics
(p. 434)These studies on aphasic patients together with priming, eye-movement, and
brain activity experiments provide us with some bits and pieces of the whole picture, that
is, the structure of the mental lexicon and its contents.
17.6 Conclusion
This chapter has provided an overview of psycholinguistics, that is, the psychology of
language, and more particularly the psycholinguistics of Persian. After defining the main
tenets of the subject, the longstanding debates engaged in by experts were introduced,
and different experimental tools used in the field were presented. The results of a number
of studies were discussed, with special focus on the few psycholinguistic studies on
Persian language speakers that appear in the experimental research literature. The
results of experimental studies on the Persian language provide linguists who are
working on the Persian language with some means to evaluate and validate their findings.
For example, Samvelian and Faghiri’s (2014) partially compositional approach to Persian
complex predicates and Folli et al.’s (2005) and Megerdoomian’s (2001, 2012a) fully
compositional approach to Persian complex predicates can benefit from the findings of
the masked priming experiments reported in this chapter (e.g. Shabani-Jadidi 2014),
because since masked priming experiments tap into the subconscious, the results elicited
from them demonstrate automatic online decomposition. This automaticity could be
evidence for the formation of complex predicates to be a priori and at the subconscious
level before the complex predicate is produced. More experimental research in the field is
needed to reach a definitive conclusion, and the present chapter is an attempt to pave the
way for similar studies to be done.
Moreover, studies discussed in this chapter and their corresponding findings provide
psycholinguists working on other languages with a comparison between their findings
and the findings of studies on yet another language. These results will contribute to our
understanding of the mental lexicon and how it is structured. Through different tasks and
experiments, different subject populations, and different languages, we gain access to a
more universal understanding of the mental lexicon and natural language processing.
The author of this chapter has thus aimed to give a brief presentation of a several-
decades-old attempt to shed light on the mental lexicon. There are various linguistic
theories that are yet to be validated, and experimental studies can provide the means to
this end. Such studies can also benefit more applied fields of linguistics, such as first- and
second-language acquisition and second-language pedagogy, as they reflect the mental
processes involved in language acquisition and production. Through scientific methods of
data elicitation in psycholinguistics, linguists of different disciplines can test and validate
their theoretical assumptions and provide more reliable arguments.
Page 29 of 31

Psycholinguistics
It is to be hoped that, with a clearer view of the discipline, readers and especially the
younger generation of scholars of Persian linguistics will pursue further the topics of
psycholinguistic studies outlined in this chapter. Psycholinguistics in Persian is an under-
studied discipline, and future research on this topic will be a great contribution not only
to the field of Persian psycholinguistics, but also more generally to the field of Persian
linguistics, and linguistics as a whole.
Notes:
(1) Some of what is discussed this chapter is based on the author’s older work, mainly
Shabani-Jadidi (2014). More general discussions and further experimental studies on
Persian are added here in an attempt to present the current state of the art in Persian
psycholinguistic research.
(2) Shabani-Jadidi (2014).

Page 30 of 31

Psycholinguistics
Page 31 of 31

Neurolinguistics

Neurolinguistics
Reza Nilipour

Subject: Linguistics, Language and Cognition, Languages by Region
This chapter summarizes some first neurolinguistic studies conducted in Persian, using
patholinguistic data taken from monolingual and bilingual brain-damaged patients, as
well as five first neuroimaging studies in healthy native speakers of Persian. The
patholinguistic data are extracted from formal clinical linguistic assessments of a
heterogeneous group of brain-damaged patients with different etiologies and brain lesion
sites. The data are indicative of general agrammatic features of ‘syntactic simplification’
and ‘morphological regression’ reported in cross-language studies, along with language-
particular agrammatic features in spoken and written modalities for Persian consequent
to brain damage. The present patholinguistic data are also suggestive of a ‘non-unitary’
model of aphasia as a symptom complex phenomenon with disruptions of independent
linguistic levels consequent to different lesion sites. The data are not supportive of
independent production and comprehension language centres claimed in ‘classical model’
of brain and language but in support of new non-narrow localization brain–language
models.
Keywords: neurolinguistics, clinical linguistic, Persian, language neuroimaging, aphasia, morphological

regression, syntactic simplification
Page 1 of 49

Neurolinguistics
18.1 Introduction
IF theoretical linguistics is mostly concerned with the nature of the structure and
functioning of the language system, the major concern of neurolinguistics and cognitive
neuroscience is to seek out the neural basis of organization of human language and
mechanisms of language processing in the healthy brain based on experimental evidence.
In classical aphasiological studies, the major concern was about the location of language
processing in the brain. In other words, they were mostly concerned with ‘where’
questions which lead to the ‘classical model’ of brain and language and independent
production and comprehension centres (see Geschwind 1970). But in cognitive
neuroscience, new lesion studies and brain-imaging studies have revealed much
important information relevant to answering ‘where’, ‘what’, ‘when’ and ‘why’ questions
about the mechanisms of language processing which may shed more light on the neural
basis of language. The viable patholinguistic data taken from systematic cross-linguistic
studies based on normative data and lesion studies as well as experimental brain imaging
studies on language in healthy and damaged brain have revealed much important
information relevant to answering some of the questions pertinent to language
representation and localization in the brain (Pulvemüller 2014; Poeppel et al. 2012).
This chapter is intended to discuss some current neurolinguistics research from two
major subfields of neurolinguistic studies on monolingual and bilingual speakers of
Persian. One major subfield is patholinguistic studies in brain-damaged speakers with
emphasis on documented systematic aphasiological data based on the available formal
clinical linguistic batteries. The second group of data comes from the first homemade
experimental fMRI studies on healthy monolingual and bilingual speakers of Persian.
The main focus of the chapter will be on sorting out language-particular patholinguistic
findings from general patterns of impairments in native speakers of Persian following
brain damage in order to provide a more unified perspective on the issue of brain–
language relation. Since Iran is an example of a multilingual country, reference will also
be made to the patholinguistic bilingual brain data based on documented findings related
to two major Iranian bilingual populations speaking Azari or Kurdish and one Persian
dialect based on reports using the same validated clinical linguistic batteries in Iran (see
Tables 18.1 and 18.2). The second source of Persian neurolinguistic data will be based on
(p. 436) some available neurodynamic studies using brain imaging techniques on healthy
monolingual and bilingual Persian-speaking individuals (see Table 18.4).
18.2 Clinical linguistic batteries
Page 2 of 49

Neurolinguistics
A great deal of what we know today about the structure and functioning of the language
system in the brain has come from language assessment in clinical settings. An important
step in neurolinguistic studies in monolingual as well as bilingual patients is to collect
verifiable data for the purposes of diagnosis, therapy procedures and research (Paradis
and Libben 1987: 18). There are several sources of viable clinical linguistic data:
• comprehensive clinical linguistic batteries

• clinical linguistic specific tasks
• connected speech samples
• clinical linguistic screening batteries,
Since clinical linguistic data are coming from brain-damaged sources, the first step to
make any reasonable interpretation from the data is to have some solid evidence about
the validity of the obtained clinical data as compared with the performance of healthy
normal speakers of each language.
For research purposes, the results of a valid clinical linguistic assessment allow
researchers to correlate the patient’s language impairments with the lesion site to make
a viable reference to a brain–language model. The data can also be used to correlate the
patient’s language impairments with the performance of other patients of the same
language with different lesion sites in monolinguals or with the pattern of recovery of
bilingual patients with respect to various neurological and pathological factors involved
in bilingualism. The results obtained from valid clinical linguistic batteries should
eventually give some indications about the organization of one or two languages in the
same brain.
Hence, what is of interest for pure research into the neurofunctional organization of the
monolingual or bilingual brain is also useful to the clinician for diagnostic purposes and
to the therapist for prescribing or evaluating the most efficient course of treatment.
A valid clinical linguistic assessment will define the patient’s language deficits and ability
to communicate in an everyday situation. Therefore, age, gender, culture, spoken
language, written language, orthography, and education have all shown to affect the
quality of the data as compared from the normative data taken from healthy subjects in
each language. If the confounding factors on collecting normative data in clinical
linguistic assessment are observed, neurolinguistic assessment, for the most part, can be
implemented as a tool to localize lesions associated with language deficits.
Based on these premises a valid clinical linguistic battery for a specific language cannot
be a direct translation of the existing batteries of a Western language but needs
necessary linguistic and cultural transpositions in order to make it as equivalent as
possible to the target language (Stemmer and Whitaker 1998: 74).
Page 3 of 49

Neurolinguistics
Bearing in mind these neurolinguistic considerations about the quality of the data taken
from clinical linguistic batteries, a short introduction will be devoted to the development
of (p. 437) the major clinical linguistic batteries which have been implemented to collect
patholinguistic data from monolingual and bilingual speakers of Persian.
18.2.1 Comprehensive clinical linguistic batteries
The history of documented neurolinguistic studies in Iran can be traced back with the
development of the first clinical linguistic batteries and their applications in 1980s when
the development of Persian version of the BAT (the Bilingual Aphasia Test) was initiated
through an international multilingual aphasiological research project (Paradis and Libben
1987). In fact it was through this initial multilingual neurolinguistic research project that
the first clinical and experimental clinical linguistic battery was developed and normed
for Persian aphasialogical studies in Iran. Following the development of the Farsi version
of the BAT (Paradis et al. 1987) two more versions of the BAT were developed and
normed on healthy bilingual speakers of Azari (Paradis et al. 1987) and Kurdish (Paradis
et al. 1989) as the two major bilingual populations living in Iran (see Table 18.1).
Since the development of the three versions of the BAT for the Iranian aphasic
population, many Persian-speaking monolingual or bilingual patients with aphasia have
been assessed in clinical settings for research and therapeutic purposes, and some of
these cases have been reported at international conferences or published in Iranian or
international journals (e.g. Nilipour 1988, 1989, 2000; Nilipour and Ashayeri 1989). The
BAT has also been implemented as a valid research tool in some doctoral (e.g.
Raghibdoost 1999) or master’s theses (e.g. Rezaei 2008) in different academic institutions
outside and inside of Iran.
Following development of three versions of the BAT, the second comprehensive Persian
clinical linguistic battery was initiated based on a local research project for further
aphasiological studies (Persian Aphasia Battery) (Nilipour 1994, 2010). The general
structure of clinical linguistic skills and subtests of PAB was based on the format of
Boston Diagnostic Aphasia Examination (BDAE) (Goodglass and Caplan 1972) but the
scoring system and profiling was based on the BAT (+, –, and 0) for correct, incorrect,
and unanswered questions. The connected speech sample was also based on the picture
story of Bird Nest Story of the BAT with modification of the pictures to be arranged from
right to left for Persian speakers. After necessary linguistic and cultural transpositions on
the first draft of the PAB, the battery was normed on thirty healthy adult native speakers
of Persian. PAB has been recently revised based on the results on fifty-seven aphasic
patients with different lesion sites and has been a clinical linguistic tool for clinicians and
researchers in assessing language impairments of Persian-speaking aphasic patients
since 1994.
Page 4 of 49

Neurolinguistics
In both batteries, connected speech samples have been collected based on a revised
format of the Bird Nest Story adapted from the BAT which is used to measure and
evaluate the speech rate, the content and the grammatical structure of the connected
speech samples obtained from each patient as compared with the norms collected from
the healthy speakers. In this regard, the norms have been collected based on the same
format for different features of connected speech including features of speech rate
(fluency) and TTR (type/token ratio) and MLU (mean length of utterance) for adult
Persian speakers. Based on the norms collected from sixty healthy adult speakers on the
Nest Story picture of the BAT, the mean of fluency measure in Persian speakers is
reported at 105 words per minute and a mean TTR of 0.63 for descriptive speech
(Nilipour 1994, 2010). (p. 438)
Page 5 of 49

Neurolinguistics
Table 18.1 Clinical linguistic batteries for assessing acquired language impairments in Iranian brain-damaged patients
Author/s Name Target group Clinical Linguistic skills Measures of

linguistic tested linguistic
measure impairment
Paradis et al. Bilingual Aphasia Persian aphasics Clinical linguistic Comprehension, Phonology,
(1987) Test (BAT) profile, based on repetition, morphology,
32 linguistic tasks judgement, lexical syntax, lexicon,
access, semantic and
propositionalizing, connected speech
reading, and quality
writing Short and
comprehensive
versions
Paradis et al. BAT: Azari version Azari aphasics Clinical linguistic Short version of Phonology,
(1987) profile, based on BAT (23 tasks) morphology,
23 linguistic tasks syntax, lexicon,
semantic, and
connected speech
quality
Short version of
BAT
Page 6 of 49

Neurolinguistics
Paradis et al. BAT: Kurdish Kurdish aphasics Clinical linguistic Comprehension, Phonology,
(1987) version profile, based on production, morphology,
23 linguistic tasks reading and syntax, lexicon,
writing short semantic, and
version connected speech
(23 tasks) quality
Short version of
BAT
Nilipour Persian Aphasia Persian aphasics Clinical linguistic Comprehension, BDAE Linguistic
(1993/2010) Battery (PAB) profile, based on production levels and all
25 linguistic tasks reading, and linguistic skills
writing (25 skills)
Nilipour (2010) Persian Picture Mild aphasics and Profile of lexical Impairment of Diagnosis of
Naming Battery Alzheimer’s accessibility naming skills, naming access or
based on 50 cognitive
normed items of impairment, based
different on phonological or
categories semantic cuing
Bakhtiar et al. Predictors of timed Persian Timed picture Psycholinguistic Database of 200
2013 picture naming in population naming profile variables of normative data of
Persian naming: AoA (age coloured pictures
of acquisition), for Persian
imageability, name
Page 7 of 49

Neurolinguistics
agreement, visual Lexical access of

complexity timed-picture
naming
Nilipour (2014) P-WAB-1/ Persian aphasics AQ (aphasia Connected speech AQ as a measure

bedside version of and quotient) as a (content and of severity based
P-WAB Alzheimer’s severity measure fluency), auditory on six major
comprehension, linguistic skills
commands,
repetition, naming
Nilipour et al. Persian N and V Persian aphasics Lexical level Object and action Lexical
(2014) Picture Naming and naming accessibility of
Battery Alzheimer’s accessibility object and action
naming based on
psycholinguistic
variables
Nilipour (2015) P-DAB-2 Persian-speaking AQ (aphasia AQ: severity Quantitative

P-DAB-3 brain-damaged quotient) impairment of measure of
P-DAB-4 and LQ (language spoken skills severity of
Alzheimer quotient) LQ: severity impairment of
CQ (cortical impairment of spoken, written
quotient) written skills and cognitive skills
CQ: severity based on three
impairment of quantitative
cognitive skills
Page 8 of 49

Neurolinguistics
measures of AQ,
LQ, and CQ
Page 9 of 49

Neurolinguistics
(p. 439)
(p. 440) 18.2.2 Clinical linguistic-specific tasks
To test the level of impairment of any specific linguistic skill of a patient, either a section
of a comprehensive battery or a task-specific battery may be used to assess the level of
impairment of each patient. For instance, Grammaticality Judgments (173–82) or
Syntactic Comprehension (66–162) sections of the BAT are good examples of well-
structured tasks to be used for assessing the level of grammatical judgement or syntactic
comprehension of the patient in one or two languages. At the same time there are some
independent task-specific batteries developed for assessing the level of impairments of
different linguistic skills. Presently the three listed task-specific batteries have been
developed and are available at the lexical level in Persian. Normative data have been
collected on adult normal speakers of Persian to assess different levels and aspects of
lexical impairment in brain-damaged patients (see Table 18.1 for details):
• Persian Picture Naming Battery

• Predictors of Timed Picture Naming in Persian
• Persian Object and Action Picture Naming Battery
The present formal task-specific batteries have had several applications and reports by
some researchers and SLP clinicians (for more infomation, see Yadegari et al. 2008, and
the first fMRI studies using Noun as their stimuli in Table 18.4).
18.2.3 Clinical linguistic-screening batteries
Clinical linguistic-screening batteries can be useful as a quick diagnostic tool to screen

the level of language impairments in large groups of brain-damaged patients or two
groups of patients with different etiologies based on an operational index of severity. With
this aim in mind, P-WAB-1 has been developed as the first clinical linguistic screening test
to measure severity of aphasia in different groups of Persian-speaking brain-damaged
patients.
P-WAB-1 as a bedside clinical linguistic assessment tool has been adapted from Western
Aphasia Battery (WAB-R) (Kertesz 2003) and is used as a quick screening clinical
linguistic test to measure the severity of aphasia based on an overall operational index
called AQ (Aphasia Quotient). There are six different subtests, including the content as
well as measure of fluency of the sample of descriptive speech of the patient based on a
modified format of the Bird Nest Story of the BAT. P-WAB-1 has been validated and
normed based on criterion validity ratio (CVR) taken from the expert panel and the
performance of thirty healthy adult native speakers and can be used as a baseline for a
Page 10 of 49

Neurolinguistics
quick screening and diagnosis of aphasia among different groups of Persian-speaking

brain-damaged patients (see Nilipour et al. 2014).
The first version of P-WAB-1 has been used to measure severity of language impairments
of sixty consecutive brain-damaged patients referred to different university clinics for
rehabilitation and forty monolingual age-matched epileptic patients studied at Kashani
Hospital, Isfahan University of Medical Sciences (Nilipour et al. 2014). This study is the
initial step on adaptation of other versions of WAB-R (Kertesz 2003) for Persian to
measure the severity of brain-damaged patients based on three different operational
measures called Aphasia Quotient (AQ), Language Quotient (LQ), and Cortical Quotient
(CQ) proposed by Kertesz (2003) to assess spoken and written linguistic skills as well as
cognitive deficits (p. 441) following brain damage. These quantitative measures can be
used to classify Persian-speaking aphasic patients into different types.
Since WAB-R as a comprehensive clinical linguistic battery is reported to have prognostic

value in stroke patients and in brain degenerative conditions and the measures can be
interpreted to stage Alzheimer’s disease and primary progressive aphasia (P-83) (Kertesz
2006), the development of other versions of WAB for Persian is part of an ongoing
research project by the author. Presently P-WAB-1 is used as a measure of differential
diagnosis among patients with brain-degenerative diseases in two hospital clinics in
Tehran and Mashhad and will be available online at Persian Clinical Linguistic Database
(pcld.uswr.ac.ir) for researchers. The taxonomic and diagnostic approach of different
versions of Persian Diagnostic Aphasia Battery (P-DAB) based on measures of CQ, LQ, and
CQ to classify Persian aphasics into different types or syndromes is also under way and
the final report will be available soon.
18.3 Patholinguistic data on Persian

As proposed by Whitaker, impairments of grammatical structures are essentially linguistic
aspects of aphasia, relevant to the structure of language at the complex word, phrase,
sentence, and discourse levels (Whitaker 1997). For more information about studies on
aphasia in Persian and in other languages, refer to Chapter 17. Also in this context, one of
the most important features of neurolinguistic research is the recognition that valid
empirical data taken from clinical resources can be used to support or reject a proposed
brain–language model (Stemmer and Whitaker 1998: 49). In this context, a short review
of literature of major documented clinical linguistic studies using available formal
batteries conveys that Persian neurolinguistic studies are not abundant but are not in
their infancy either; they are, rather, in a developing stage.
In principle, aphasiological studies in the absence of war are often the results of data
from cerebral vascular accident (CVA) patients. But, in fact, a couple of the first well-
documented clinical linguistics studies on Persian were based on one trilingual trauma
and two war injury monolingual speakers of Persian (Nilipour and Ashayeri 1987; Nilipour
Page 11 of 49

Neurolinguistics
2000). The present documented neurolinguistic studies discussed in this chapter are
based on the application of the available formal batteries and consist of a combination of
monolingual and bilingual aphasiological studies with a great majority of them on
monolingual patients with CVA etiology (see Table 18.1 for details).
18.3.1 Bilingual clinical linguistic studies
From the present aphasiological studies to be reported in this chapter, nine studies are
devoted to bilingual patients whose first language was Persian and the second or third
language was English, German, or Azari. In all of these bilingual patients, relevant
versions of the BAT were used to assess the recovery pattern and the level of language
impairments in each language.
There is also one bilingual study on three Azari–Persian CVA patients with left subcortical
lesions (Azarpazhooh el al. 2010). Based on the results, the authors report a differential
pattern of recovery with the possibility of subcortical representation of L1. The etiologies
of the reported bilingual Persian-speaking patients were: CVA (six cases), trauma (two
cases) and AVM (one case) (see Table 18.2 for details). (p. 442)
Page 12 of 49

Neurolinguistics
Table 18.2 Neurolinguistic studies on monolingual and bilingual Persian-speaking brain-damaged patients
Authors Assessment No. of patients & Etiology Lesion Site Main Findings
Battery languages
Nilipour & BAT: short AS (trilingual) Brain Trauma LT (left temporal – Alternating
Ashayeri (1989) versions of Persian, German & lobe) antagonistic
Persian, German & English recovery
English between 2 Ls
with successive
recovery of a
third L
– Involuntary
control of
mother
language &
evidence of
mixing 3
languages
– Languages
can be
independently
functionally
impaired by
cerebral trauma
Page 13 of 49

Neurolinguistics
in the same
brain
Nilipour (1988) BAT: short 14 bilingual CVA, AVM, Different lesion – Different
versions of speakers of Brain Trauma sites patterns of
Persian, English, Persian with recovery in
Azari & different pairs patients with
Armanian of languages the same or
different lesion
site
– Individual
variables of
bilingualism
may lead to
different
patterns of
recovery
– Pattern of
recovery may
change in the
same person
over time
during acute
period
Page 14 of 49

Neurolinguistics
– The first
recovered
language may
be the language
of environment
Nilipour (1989) BAT: short PA –bilingual CVA LF-T – Different

versions of Persian patterns of
& English morphological
violations in
Persian &
English
– Modality
specific
(outloud
reading)
Morphological
changes of verb
in Persian
leading to past
tense
Page 15 of 49

Neurolinguistics
– More
deletions of free
function words
in English as
opposed to
more
morphological
violations in
Persian
– Task-specific
violations in
repetition task
in both
languages with
different
manifestations
Nilipour & BAT: short TB -bilingual AVM LT-P-O – Syntactic

Paradis (1995) versions of Persian PA- bilingual CVA Brain LF-T simplification in
German & English AH- trilingual Trauma LF-T favor of
canonical word
order in Persian
– Substitution of
inflectional
morphemes in
Page 16 of 49

Neurolinguistics
both languages
irrespective of
diagnosis
– More
vulnerability of
morphological
violations of VP
in Persian than
in English,
– Deletion of
free
grammatical
words in
English
Raghibdoust BAT & Two Persian CVA LH – Agrammatic

(1999) grammatical deficits may be
judgment tasks attributed to a
reduction or
disruption of
the efficiency of
language
processing
mechanism
Page 17 of 49

Neurolinguistics
– Complexity of
linguistic
material and
slowness of
language
processing
mechanism
have converged
to diminish or
reduce
linguistic
performance
Nilipour, (2000) BAT—Short HB Brain Trauma – LFT – Broca’s

version and MN (shrapnel) aphasia due to
– LP and Sub-
several connected different lesion
cortical
speech samples sites in two
involvement
different
patients with
different
degrees of
severity of
impairments
– General
violations of
Page 18 of 49

Neurolinguistics
agrammatism
vs. language-
specific
impairments
– Substitution
of /budan/ as a
low content
default filler
verb for
different
inflected
transitive and
intransitive
verbs
–
Substitution
of Persian
polymorphe
mic infinitive
in written
mode in
contexts of
inflectional
verb ignoring
tense, person
Page 19 of 49

Neurolinguistics
and number
of the
inflected
verb
– Evidence
against unitary
models (single –
level) disruption
of language &
independence
of Production &
comprehension
Nilipour & BAT: Persian & 7 cases of Persian CVA, AVM, & brain Different lesion – More
Raghibdoust English monolingual & Trauma sites Violations of VP
(2001) bilingual than NP as a
language-
particular
feature of
Persian
– More
morphological
violations in
Page 20 of 49

Neurolinguistics
Persian than in
English
– Different
patterns of VP
violation in
spoken &
written
modalities in
Persian
– Simplification
of syntax &
complex
inflected verb
form
Nilipour et al. – Auditory tasks 5 monolingual CVA – LH lesion – Selected

(2004) patients sound
– Sound – RH lesion
identification identification
– Basal Ganglia
deficits in 3
– Asemantic
patients
sound
recognition – Two patients
with normal
– Sound
performance
localization
but with
increased RTs
Page 21 of 49

Neurolinguistics
– Sound motion – Anatomo-

perception clinical
correlations
suggest
increase RTs
reflect
processing in
alternative
networks
Yadegari et al. Persian Picture 20 healthy CVA & LH – Significant

(2008) Naming Battery 17 Aphasics & Alzheimer difference in
20 Alzheimer naming
between
aphasic and
Alzheimer using
phonological &
semantic cuing
– Alzheimer
patients
answered to
semantic cueing
better but
aphasics
answered better
Page 22 of 49

Neurolinguistics
to phonological
cueing
Nilipour et al. – Auditory Tasks 10 patients L and R CVA 4 LH lesion – Different

(2010) – Persian patients 6 RH lesion & profiles of ear
– Verbal
Basal extinction &
dichotic &
Ganglia hemispheric
diotic tasks
neglect in L
– nonverbal
hemispheric
semantic &
subjects
asemantic
sound – RHD with BG
recognition & showed mild
localization hemi-spatial
inattention
– A network of L
& R BG &
cortical regions
might be
necessary for
non-verbal
sound
recognition
– L BG has a
different role in
sound object
Page 23 of 49

Neurolinguistics
segregation
versus sound
localization
Azarpazhooh et BAT- Persian & 3 patients CVA – L Subcortical – Subcortical

al. (2010) North East version lesion aphasia with
of Azari (striatocapsul)) differential
recovery
– L1 might have
more
subcortical
representation
than L2
– Subcortical
organization of
languages in
the bilingual
brain might
correlate with
age of
acquisition of
L2
Koumanidi BAT versions of One bilingual CVA LH

Knoph (2011) Farsi & Norwegian patient
Page 24 of 49

Neurolinguistics
– Differential
impairment of
Farsi &
Norwegian
– Impairments
are in line with
the structural
differences
between the
two languages
– Farsi
morphology and
lexical access
was better than
Norwegian
Nilipour et al. BAT versions of 25 Persian CVA Heterogeneous – Selective

(2012) Persian & patients & 5 sites of LH lesion impairments of
Badroodi dialect & Badroodi speakers different
PAB linguistic levels
in patients with
the same lesion
site
Page 25 of 49

Neurolinguistics
– Specific
Language
impairments
are in line with
structural
properties of
each language
– Double
dissociation
between
impairments of
Broca’s and
Wernicke’s
impairments
support
Pulvermulle’s
hypothesis
– Broca’s
syntactic
comprehension
was 4 times
better than
Wernicke’s
– Wernicke’s
MLU mean was
Page 26 of 49

Neurolinguistics
3 times higher
than Broca’s
– Data from
heterogeneous
group of
patients with
different lesion
sites support
underspecificati
on of functional
anatomy of the
classical brain-
language model
of independent
language
centers
– Language is
not monolithic
Mira Goral et al. BAT & WAB One trilingual CVA LF – Global
(2014) – Connected – Persian inhibition
speech samples – English & processes in the
stronger
– German
language in a
bilingual
Page 27 of 49

Neurolinguistics
patient due to
therapy
Nilipour et al. P-WAB-1 Persian – 30 normal CVA Patients with – AQ as an

(2014) LH anterior & operational
– 60 CVA ,40
posterior lesion measure of
Epileptics
sites severity of
patients
aphasia in
Persian aphasic
patients
– Classification
of aphasia
based on a cut-
off point of 93
AQ as a
measure of
degrees of
severity
Page 28 of 49

Neurolinguistics
(p. 443) (p. 444) (p. 445)
(p. 446)
The conclusions drawn from the results of the present bilingual reports on the recovery
pattern and level of language impairments in the bilingual brain consequent to different
etiologies and lesion sites are as follows.
• Languages in the same brain can be independently functionally impaired by cerebral

trauma.
• Differential pattern of recovery of two languages in the same brain can be a function
of language acquisition variables with the possibility of subcortical representation of
L1 as claimed in the study by Azarpazhooh el al. (2010).
• Brain damage in the left hemisphere of the bilingual brain may cause involuntary
control or inhibition of the mother tongue as reported in Nilipour and Ashayeri (1989)
and is not an indication of representation of the second or third language in the right
hemisphere.
• There can be global inhibition processes in the stronger language of the bilingual
brain due to therapy (Gorel et al. 2014).
• The first recovered and/or the best recovered language can be the language of the
environment.
• The subcortical organization of L1 in the bilingual brain might correlate with early
acquisition of L1 as compared to late acquisition of L2 as demonstrated by
Azarpazhooh el al. (2010).
• Different etiologies may cause alternating patterns of recovery or involuntary mixing
of languages as indicated in the trilingual patient in Nilipour and Ashayeri (1989).
Two major conclusions can be drawn from these clinical linguistic bilingual reports. The
recovery pattern of the languages in the damaged bilingual brain is basically controlled
by the language acquisition variables of the history of the bilingual patient. As evidenced
in Nilipour and Ashayeri’s (1989) study, the inhibited language is not lost in alternating or
differentially recovery patterns of languages. The second major conclusion about the
pattern of language impairments is that the surface manifestations and severity of
impairments in each language is not based on the law of all-or-nothing, but, as Paradis
(2001) has shown, it is a function of the specific structural properties of each language. In
principle, the impairments of each language in the bilingual brain as manifested in PA
(Nilipour 1989b) and Nilipour and Paradis (1995) are in line with the structural properties
and differences between the two or three languages in the multilingual brain.
18.3.2 Persian aphasiological studies in monolinguals
Page 29 of 49

Neurolinguistics
A summary of the heterogenious group of Persian aphasiological studies devoted to

different monolingual patients who became aphasic consequent to focal cortical brain
damages are presented in Table 18.2. These studies can be classified into three types.
There is one unpublished MA thesis on eleven cases of CVA patients (Rezaei 2012). There
is also one unpublished PhD study on two monolingual CVA patients using a Persian
version of the BAT (Paradis et al. 1987) to determine the general clinical linguistic
profiles, and two comprehensive Persian morphosyntactic tasks designed by the author
based on the control data to assess the level of each patient’s grammatical
comprehension and judgement (Raghibdoust (p. 447) 1999). Based on the results, both
patients were reported agrammatic in their production performance despite their
retained knowledge of and ability to access morphological elements in extracting the
syntactic structure of sentences (Raghibdoust 1999: 129).
The second group of studies was a comprehensive clinical linguistic research on two
young monolingual Iraq–Iran war injury patients (HB and MN) with a focal left
hemisphere traumatic lesion as a result of shrapnel with different lesion sites (LFT and
LP lesions respectively, see Nilipour 2000 for details). In both patients the short version of
the BAT was used to assess the general clinical linguistic profiles of their impairments.
Based on the results they were reported as Broca’s aphasia. One major feature of this
study is a comprehensive interlinear analysis of six different connected and spontaneous
speech samples based on the model of CLAS I connected speech analysis procedures in a
global cross-language study (Menn and Obler 1990, Ch. 2) with the aim of sorting out
general patterns of agrammatic features of speech from language-particular ones. With
this aim in mind, all the spoken as well as written connected speech samples were
presented with interlinear transcriptions, analysed and compared with the content and
structure of the norms collected from the same speech samples of an aged-matched
healthy speaker (for details, see Nilipour 2000, Appendices A and B).
A third group of neurolinguistic studies on monolinguals consisted of two new group

studies, one group consisted of eleven monolingual CVA patients (Rezaei 2008) and
another group consisted of five (two female and three male) chronic CVA patients who
spoke both Persian and Badrudi, a central Iranian dialect in the vicinity of Kashan (for
details of both studies, see Nilipour et al. 2012).
Rezaei’s (2008) group study consisted of eleven monolingual Persian-speaking CVA

patients. There were seven male and four female right-handed educated native speakers
of Persian with an average age of fifty with a left-hemisphere CVA. The lesion sites were
six FTP, three FT, and two TP.
The eleven monolingual CVA patients with three different lesion sites in Rezaei’s (2008)
study were assessed using the Persian Aphasia Battery (Nilipour 1993) to determine their
clinical linguistic profile and whether they were fluent (Broca’s) or non-fluent (Wernike’s)
aphasics. Based on the first assessment nine of them were diagnosed as Broca’s (non-
fluent) and two as Wernicke’s (fluent) aphasics. In the second phase of the study the level
of syntactic comprehension of fluent and non-fluent patients was assessed using Syntactic
Page 30 of 49

Neurolinguistics
Section (items 66–152) of the Persian version of the BAT. Also the samples of their
descriptive speech based on the Bird Nest Story of the BAT were analysed to measure
MLU (mean length of utterance) as an index of syntactic complexity and TTR (type/token
ratio) as an index of semantic richness of their connected speech. These two measures
may be used as indications of a neuropsychological double dissociation at the syntactic
and lexical levels between Broca’s and Wernicke’s patients.
The Badrudi group study consisted of five (two female and three male) chronic CVA
patients who knew Badrudi dialect and Persian. The therapist who knew the dialect
managed to adapt the Badrudi version of the BAT under the supervision of the author and
assessed the patients using the short versions of Persian and Badrudi of the BAT. The
connected speech samples based on the Bird Nest Story were also collected and analysed
to compare the improvement of their speech consequent to therapy. Based on the results
of their clinical linguistic profiles, the Badrudi patients who were suffering from different
lesion sites (e.g. three frontoparietal and two temporoparietal) were diagnosed as Broca’s
aphasics (Rezaei 2008).
(p. 448) 18.3.3 Neuropsychological studies in Persian
Two neuropsychological studies have been reported on two groups of native speakers of
Persian with brain lesions consequent to CVA. In the first study, five CVA patients with
right-hemisphere (three cases) and left-hemisphere (two cases) cortical lesions were
involved (Nilipour et al. 2004). In the second study, ten CVA patients with unilateral
cortical and/or unilateral and bilateral subcortical lesions were involved. In both studies,
the objective was to look at the correlation between different aspects of sound processing
and the lesion site (Nilipour et al. 2010). The CVA patients in both studies were assessed
using the Farsi short version of the BAT to look at the level of each patient’s clinical
linguistic impairments. In both studies, several auditory tasks were designed to look at
the involvement of cortical and subcortical regions in verbal and non-verbal sound
processing. The following verbal and non-verbal auditory tasks were designed for Persian
to assess different features of sound processing in patients with different lesion sites (for
details, see Nilipour et al. 2004, 2010). The auditory tasks in the first study were:
• Sound identification
• Asemantic sound recognition
• Sound localization
• Sound motion perception
Verbal and non-verbal auditory tasks in the second study included:
• Verbal tasks included: monosyllabic and disyllabic dichotic and diotic tasks
• Non-verbal tasks were:
Page 31 of 49

Neurolinguistics
• Semantic and asemantic sound recognition tasks

• Sound localization task
Both groups of CVA patients were initially assessed using the short version of the BAT to
determine the pattern of their clinical linguistic impairments. With respect to the auditory
tasks, the data taken from the aged-matched healthy native speakers were used as
control data to check the level of each patient’s performance on each task. In the first
study, six healthy aged-matched adults were selected as the control group. In the second
study, the mean performance of forty aged-matched adults were used as the control data.
The basic objective in both studies was to look at the ‘where’ and ‘what’ of sound
processing in patients with different lesions consequent to CVA. The following results are
a summary of neuropsychological findings of the two studies on the auditory processing.
The results of the first study indicated that: There were sound identification deficits in
three patients and normal performance in two patients but with increased RTs. Based on
these results, it is indicated that anatomo-clinical correlations suggest that increased RTs
reflect sound processing in alternative neural networks.
The results of the second study indicated that there are different profiles of ear extinction
and hemispheric neglect in patients with LH lesion. Also patients with RH and BG (basal
ganglia) lesions showed mild hemi-spatial inattention. These results suggest a network of
L and R BG as well as cortical regions for non-verbal sound recognition. Also L BG (left
basal ganglia) has a different role in sound object segregation versus sound localization.
(p. 449) 18.3.4 Task-specific studies
As mentioned in section 18.1, currently there are three formal task-specific batteries in
Persian that have been developed based on norms collected from healthy speakers. The
task-specific batteries have had several clinical applications by Iranian speech and
language pathologists and some other clinical linguistic researches. One published report
on the application of Persian Picture Naming Battery as a diagnostic clinical linguistic
tool on CVA and Alzheimer patients is Yadegari et al. study (2008). Based on the results,
the behaviour of the CVA and Alzheimer patients in their responses to the Picture Naming
Battery was different at different levels. There was a significant difference in correct
responses between aphasic patients giving fewer correct responses in naming the
pictures. Significant differences were also seen in the responses of the two groups using
phonological or semantic cues. Significant difference was seen in the responses of the
two groups of patients when semantic or phonological priming was delivered.
Alzheimer patients did better in their responses when semantic cue was delivered but
aphasic patients responded better when phonological priming was delivered (for details,
see Yadegari et al. 2008; Nilipour 2010).
Page 32 of 49

Neurolinguistics
18.3.5 Screening studies
From among the present published neurolinguistic studies, one is devoted to the
application of the bedside version of the Persian P-WAB designated as P-WAB-1. P-WAB-1
was initially normed on thirty adult healthy native speakers of Persian for psychometric
data. In a later study, P-WAB-1 was used as a screening test on sixty monolingual aphasics
and forty age-matched epileptic patients (see Tables 18.1 and 18.3; Nilipour et al. 2014).
Based on the results the mean scores of aphasics and the epileptic patients on different
subtests of P-WAB-1 are different, indicating that the scores can reliably be used to
differentiate the level and severity of impairments between aphasics and the epileptic
patients (Nilipour et al. 2014). Based on the results, as indicated in Table 18.3, AQ can be
used as a functional measure of severity of aphasia in Persian-speaking brain-damaged
patients and can be classified into distinct groups of severity. AQ can also be used as a
quantitative measure of diagnosis different types of aphasia as well as measuring pre-
and post-treatment efficacy.
As can be seen from the results of the mean scores of the subtests and AQ of the two
groups, P-WAB-1 is sensitive enough to differentiate the performance of two different
groups of patients. One of the main findings of this clinical linguistic study is that P-
WAB-1 is a sensitive clinical linguistic tool to differentiate different levels of linguistic
impairments and can be used as a valid measuring clinical tool to assess language
impairments in patients with different brain damage and etiology. It can also be used to
measure the severity of impairments in one patient or different groups of patients based
on AQ as an operational index proposed by Kertesz (1982a&b, 2013).
The adaptation of P-WAB-1 in Persian makes it possible to quantify the severity of aphasia
among Persian aphasic patients with different lesion sites. It is also possible to classify
the patients into different clinical subtypes based on AQ, and define the relationship
between the type of aphasia and the lesion site. Another application of P-WAB-1 is to
differentiate the level and severity of language impairments in different groups of brain-
damaged and neurodegenerative patients (see Table 18.3). (p. 450)
Page 33 of 49

Neurolinguistics
Table 18.3 Mean scores, SDs of subtests, and AQ of Persian aphasic and epileptic patients
Group No. Content Fluency Comprehen Repetition Naming AQ

sion
Aphasic 60 4.39 (+2.76) 3.17 (+2.26) 6.28 (+2.69) 4.9 (+3.28) 4.52 (+3.44) 49.77
(+23.49)
Epileptic 40 8.8 (+1.45) 7.98 (+1.37) 9.58 (+0.72) 9.88 (+0.38) 9.88 (+0.38) 92.98
(+6.76)
T** 100 9.27 12.04 7.56 9.56 10.01 11.31
(*) Mean scores of comprehension are means of auditory comprehension and sequential command subtests. Degree of freedom was .
98
(**) P-Value <0.001
Page 34 of 49

Neurolinguistics
(p. 451) 18.3.6 Neurolinguistic impairments in Persian
Based on systematic cross-linguistic studies, researchers can achieve a better theoretical

understanding of the nature of aphasia consequent to brain damage and eventually a
better understanding of brain–language relationship. Since different languages have
different morphosyntactic properties, data from systematic cross-language studies should
help researchers draw a clear line between the universal and language-particular
features of agrammatism consequent to brain damage (Menn and Obler 1990). The
presentation of patholinguistic data in this section is meant to depict a picture of
universal and language-particular features of language impairments observed among
Persian-speaking brain-damaged adults, based on systematic studies reported in the
clinical linguistic literature (see Table 18.2 for details).
Based on the data from clinical linguistic assessments and analysis of connected speech
samples of each patient, different levels of grammatical violations of each patient can be
diagnosed. In what follows, efforts will be made to specify some universal features of
language impairments from language-particular impairments in Persian. With this aim in
mind, the universal features of language impairments consequent to brain damage will be
classified under ‘syntactic simplification’ and ‘morphological regression’, but the surface
manifestations of ‘syntactic simplification’ and ‘morphological regression’ will be referred
to as language-particular features of impairments which are in accord with structural
properties of the Persian language.
18.3.6.1 Language impairments in bilingual patients

If each language has different morphosyntactic properties, one would expect to see
different linguistic violations of the two languages in the damaged bilingual brain. In this
context, the performance of Persian-speaking bilingual aphasic patients can be expected
to have different morphosyntactic manifestations in each language. In what follows, some
manifestations of ‘syntactic simplification’ and ‘morphological regression’ of the Persian-
speaking bilingual patients will be presented. The data are based on the performance of
each bilingual patient on relevant versions and subtests of the BAT, as well as analysis of
the connected speech samples of the patients. With respect to the results of cross-
language studies, the following general features are expected to be seen among the
impairments of the languages of a bilingual patient:
• Simplification of syntax and more reliance on canonical form

• Less accessibility to verbs than nouns
• More reliance on nouns than verbs
• More instances of deletion or disruption of verbs than nouns.
Many instances of these general features can be observed in the linguistic profiles and
connected speech samples of PA (Nilipour 1989b) and in Nilipour (2000) in connected
written samples as well as interlinear transcription samples of spoken connected speech.
Page 35 of 49

Neurolinguistics
With respect to manifestations of language-specific impairments, some interesting

instances can be best seen in the performance of patient PA, a Persian–English bilingual
aphasic in her performance to two equal tasks designed in the Persian and English
versions of BAT. The tasks are ‘Sentence Reading’ (ten items) and ‘Sentence
Repetition’ (ten sentences). Each task is designed to have ten sentences with different
syntactic patterns and complexities based on the structural properties of Persian and
English (Paradis and Libben 1987). (p. 452)
The results of PA’s performance on the two tasks can be summarized as follows.
There were instances of reconstruction of verb inflectional morphology in Persian,

resulting in a change in tense from present to past tense in all ten sentences in the same
task. Two instances of morphological regression in Persian are seen in following
sentences (for details, see Nilipour 1989):
(1)
(2)
There were also three items of free morpheme substitutions and twelve cases of omission
of grammatical particles in ten Persian sentences. In contrast to Persian, PA’s
performance in the same task in English consisted of thirty-seven cases of omission of
grammatical particles with no case of substitution (see Nilipour 1989 for details). As
manifested in the following two English sentences, all of the grammatical particles have
been deleted in the performance of PA (for details, see Nilipour 1989).
(3)
(4)
Another example of different morphosyntactic violations in the bilingual brain is in the

performance of patient TB (Nilipour 1988). TB’s morphosyntactic violations in Persian
consisted of deletion of obligatory grammatical morphemes of pre-posed and post-posed
particles of /be/ and /rā/ which do not exist in English. As instances of language-specific
violations, some evidence of subject–verb mis-agreement among TA’s connected speech
samples are seen in the following examples:
Page 36 of 49

Neurolinguistics
(5)
(6)
18.3.6.2 Language impairments in monolingual patients

One important instance of morphological regression can be observed in the category
specificity impairments of noun vs. verb, or as referred to in clinical linguistic assessment
literature (p. 453) as object and action naming impairments. The possibility of selective
impairments of noun and verb and/or more reliance on nouns than verbs after brain
damage has been reported in neuropsychological studies and cross-language reports
(Stemmer and Whitaker 1998: 11).
Among the patholinguistic data reported from Persian brain-damaged patients one can
find a variety of instances of noun and verb selective impairments. Deletion or more
vulnerability of verbs than nouns can also be seen as a general trend of impairment in
cross-language literature (Menn and Obler 1999) and in Persian patholinguistic literature
in both monolingual as well as bilingual patients (Nilipour 2000, 2008; Nilipour and
Paradis 1995). But the polymorphemic nature of Persian infinitive such as in /khor-d-an/
‘to eat’ or /khāb-ān-d-an/ ‘to put to sleep’ and morphological complexity of verb inflection
for aspect, number, and tense as well as certain phonological processes between present
and past stems (e.g. /sukh-t and /suz/ for infinitive form /sukh-t-an/ ‘to burn’ and different
present and past stems for certain verbs (e.g. /bin/ and /did/ for /did-an/ ‘to see’ can be
good candidates as major sources of vulnerability of verb and different manifestations of
language-specific impairments in both spoken and written connected speech samples of
Persian brain-damaged patients. Some manifestations of these language-specific
morphological regressions in the connected samples of spoken and/or written contexts
are given below (for more details, see Nilipour 2000, Appendices A and B in this chapter):
a-Broken-off forms of inflected verb
(7)
a- Deletion and or substitution of the inflectional morphemes and verb prefixes
Page 37 of 49

Neurolinguistics
(8)
b- Substitution of a filler verb / budan/ ‘to be’ for the appropriate transitive and
intransitive lexical verb in context
(9)
(10)
c- Substitution of the infinitive of the main verb and auxiliary for the proper inflected
form of the verb in written connected speech samples. (p. 454)
(11)
(12)
(13)
Page 38 of 49

Neurolinguistics
(14)
Further instances of syntactic simplification and/or morphological regression as well as

manifestations of language-specific impairments and mixing can be seen in the clinical
linguistic profiles and connected speech samples of other monolingual and bilingual
patients. Some of these instances can best be seen in different contexts of spoken and
written connected speech samples of AH, HB, and MN (for details, see Nilipour and
Ashayeri 1989; Nilipour 2000, Appendices A and B).
18.4 Neuroimaging studies in Persian

There are five published Persian neuroimaging studies using different lexical tasks to be
briefly discussed in this chapter. The fMRI studies were basically concerned with the
functional anatomy of speech and language comprehension and production at the lexical
level in healthy Persian speakers. In three of the fMRI studies, adult monolinguals, and in
two studies, bilingual speakers with English or French as the second language,
participated (see Table 18.4 for details).
The main objective of the first three neuroimaging studies (Mahdavi et al. 2008, 2010,
2011) was claimed to design protocols for pre-surgical clinical applications. But no
published pre-surgical application has so far been reported. As reported in the first study,
Word Production and Reverse Word Reading tasks were employed to collect data from
nine native speakers of Persian. Based on the results, the authors reported a robust
cortical activation of the classic language regions of Broca’s area in the left inferior
frontal gyrus in eight subjects out of nine in both tasks. They also reported activation of
Broca’s (p. 455) (p. 456) homologous area with much higher intensity in the right
hemisphere when responding to these tasks.
Page 39 of 49

Neurolinguistics
Table 18.4 fMRI studies in Persian
Authors Imaging tasks Stimuli Imaging design Participants Results
Mahdavi et al. Word production, Nouns Block fMRI 9 Persian native Strong activation
(2008) reverse word speakers of Broca’s area in
reading both tasks
Mahdavi et al. Word production Nouns Block fMRI 9 Persian–English Common regions
(2010) (Persian), reverse bilinguals were activated by
word reading both Persian and
(Persian), word English stimuli
generation specifically in Left
(English) Inferior Frontal
Gyrus (LIFG) and
other perisylvian
areas
Mahdavi et al. Word generation, Nouns Block fMRI 16 Persian native Word production,
(2011) object naming, speakers reverse word
word reading, reading, and word
word production, generation
reverse word robustly activated
reading language-related
areas, while object
naming and word
reading failed to
Page 40 of 49

Neurolinguistics
sufficiently
delineate these
activation areas
Ghazi Saeedi et al. Picture naming Nouns Functional 12 Persian–Persian Increased

(2013) (overt) connectivity speakers with proficiency in a
French lexical self- second language
training via results in a higher
computer degree of
automaticity and
lower cognitive
effort in the
bilingual brain
Momenian et al. Cued, covert Nouns and verbs Block fMRI 14 Persian native Common regions
(2016) sentence speakers were activated by
completion task both verbs and
nouns in occipital
cortex, temporal
cortex, and
cerebellum. In the
direct
comparisons, verb
processing
revealed larger
activation in
middle temporal
Page 41 of 49

Neurolinguistics
gyrus (bilaterally)
and left fusiform
gyrus
Page 42 of 49

Neurolinguistics
In the second study (2010) the authors employed three tasks: Word production (Persian);
Reverse Word Reading (Persian); Word Generation (English). The participants were
Persian–English healthy bilinguals. The objective was to differentiate cortical activation of
Persian as compared with English but they did not use the same tasks in both languages.
The results indicated that common regions were activated by both Persian and English
stimuli specifically in the Left Inferior Frontal Gyrus (LIFG) and other perisylvian areas.
They also reported activation in RH occipital cortex. As can be seen, in spite of using
different tasks in Persian and English, they reported activation of the same common
regions.
In the third study, five different tasks were employed (Word Generation, Object Naming,
Word Reading, Word Production, and Reverse Word Reading) with sixteen healthy
monolingual native Persian speakers. The authors intended to find the most optimized
neuroimaging tasks for clinical use and protocols (Mahdvi et al. 2012: 417). The authors
reported Word Production, Reverse Word Reading, and Word Generation activated
robustly lateralized language-related areas, while Object Naming and Word Reading
failed to sufficiently delineate these areas. The authors also reported robust activation of
classical language areas in all paradigms except in the Object Naming task (Mahdavi et
al. 2012).
In these three studies, there is no evidence of normative data in designing the language
tasks adapted from previous neuroimaging experiments used in other languages. Also in
the bilingual study, the tasks employed in order to see patterns of activation in the two
languages are not the same. Another inconsistency in these studies is that in most cases
different lexical tasks have resulted in the same patterns of activation. It seems that these
tasks have not been sensitive enough to tease out relevant patterns of activation related
to different lexical tasks in Persian or English.
In a recent fMRI study by Momenian et al. (2016), the objective was to look at neural
correlates of Persian object and action naming in fifteen healthy native speakers of
Persian. One novelty about task design in this study as compared to previous naming
studies in other languages was the application of a cued, covert sentence completion to
name each object and action.
The second feature of this first study on nouns and verbs was that the stimuli were
selected from normative data in two different databases in Persian (Bakhtiar et al. 2013;
Nilipour 2015). Objects were selected from different categories (living, non-living, and
tools) and verbs were a combination of common transitive and intransitive high-frequency
verbs. The objective of the study was to look at neural correlates of Persian noun and
verb and to determine whether language-specific properties of Persian influence the
activation patterns of nouns and verbs processing in the brain. Based on the results, the
authors reported bilateral activation for object naming in the inferior occipital gyrus,
inferior and middle temporal gyri, superior and inferior parietal gyri, but for action
naming they observed bilateral activation in occipital lobe, temporal gyrus, superior and
Page 43 of 49

Neurolinguistics
inferior parietal gyri. The verb generation is reported to have larger activation in bilateral
middle temporal gyrus and left fusiform which may be an indication of morphological
complexity of verb inflection of Persian as a language-specific feature.
In summary, among the four fMRI studies discussed, the activation results of the first
three were robust cortical activation of the classic language regions of Broca’s area. It
seems that the authors had a preconception that language is only processed in the
classical language centres and that extrasylvian areas have no role in language
processing. In contrast to these three studies, Momenian et al. (2016) reported bilateral
activation as well as activation (p. 457) of areas outside of perisylvian areas. While the
first three studies are supporting the classical models of brain and language, the results
of Momenian et al. (2016) study seem be in line with non-narrow localization models of
brain and language.
Another new language imaging study was a fMRI connectivity study by Ghazi Saeedi et al.
(2014), in which the authors looked at functional connectivity patterns during French
vocabulary learning using a computerized lexical-learning programme in twelve bilingual
Persian speakers in Canada. The authors reported that functional connectivity remained
unchanged across learning phases for L1 (Persian), whereas total, between- and within-
network integration levels decreased as proficiency for French as L2 increased. Based on
connectivity results, the authors concluded that increased proficiency in second-language
learning results in a higher degree of automaticity and lower cognitive effort. The authors
claimed that this study provides the first functional connectivity evidence regarding the
dynamic role of the language processing and cognitive control networks in L2 learning
(Ghazi Saeedi et al. 2014).
18.5 Concluding remarks on neurolinguistics

of Persian
This chapter intended to give a summary of some current neurolinguistics research from
two major subfields of neurolinguistic studies conducted on monolingual and bilingual
speakers of Persian. A short introduction of nine different clinical linguistic batteries
developed in Persian based on normative data taken from healthy speakers and
applicable for diagnosis, research, and therapy purposes among brain-damaged Persian
speakers was presented (Table 18.1). The patholinguistic studies from a heterogeneous
group of brain-damaged monolingual as well as bilingual Persian speakers were
presented, based on documented systematic aphasiological studies using formal clinical
linguistic batteries developed for Persian clinical linguistic purposes (Table 18.2). The
second group of neurolinguistic studies, concerning functional anatomy of language,
consisted of five fMRI studies on healthy monolingual and bilingual speakers of Persian
(Table 18.4).
Page 44 of 49

Neurolinguistics
Based on the results, the present clinical linguistic data from a heterogeneous group of
Persian aphasic patients with different lesion sites suggest the compatibility of the data
with general features of agrammatic language reported by other researchers in a global
cross-linguistic study (Menn and Obler 1990). Within the framework of cross-language
research, two general features of ‘syntactic simplification’ and ‘morphological regression’
were discussed. With respect to language-specific features of agrammatism, as observed
by Paradis (2001), the same underlying deficit may cause different surface manifestations
in different languages. Paradis has also observed that the larger the number of choices in
a paradigm, the more vulnerable the item (2001: 4). The hypothesis was confirmed in
Persian in different instances of vulnerability of the multifaceted VP and replacement of
the uninflected polymorphemic infinitive for the inflected verb in both spoken and written
contexts for different types of verbs (see Nilipour 2000, Appendices A and B for details).
Also, the co-occurrence of different types of language impairments at various linguistic
levels (morphology, lexicon, syntax), observed in the clinical linguistic profiles of Persian
aphasic patients with different focal lesions, argues against the monolithic linguistic
domains and independent language centres in the brain (p. 458) (production vs.
comprehension) as observed in other languages and reported by other researchers (Menn
and Obler 1990; Poeppel and Hickok 2004: 5).
Also the present Persian clinical linguistic data seem to be in support of

neuropsychological double dissociation proposed by Pulvermüller in patients with Broca’s
and Wernicke’s aphasia after focal lesions (Pulvermüller 2004: 66–73). As the data in
Rezaei’s study indicated, the behaviour of patients with Broca’s and Wernicke’s aphasia
supports Pulvermuller’s proposed neuropsychological double dissociation at syntactic and
lexical levels (Nilipour 2012). The comparison of mean syntactic comprehension scores
and MLU of connected speech samples of patients with Broca’s and Wernicke’s aphasia
can be applied as an index of neuropsychological double dissociation at the syntactic
level. On the other hand, the comparison of figures on TTR and function words (content
vs. function words) in the patients with Broca’s and Wernicke’s aphasia, in which one
feature is selectively damaged in one group of aphasic patients and not in the other
group, suggests a neuropsychological double dissociation at the lexical level (Nilipour et
al. 2012, Figs 2 and 3).
As the clinical linguistic profiles of patients with different lesion sites indicated there is
no one to one relationship between lesion site and aphasic syndromes. This finding
speaks to the shortcomings of the classical anatomical models regarding major aphasic
syndromes (Broca’s, Wernicke’s, and Conduction aphasia). As observed by Poeppel and
Hickok (2004: 5) and other researchers, not only are the classical areas of the brain–
language model underspecified for each major aphasic syndrome, but there are other
areas outside the classical regions implicated in language processing. As the major
syndromes from Persian aphasic patients indicated (for details, see Nilipour et al. 2012,
Tables 3 and 4), several cases of Broca’s and Wernicke’s aphasia consequent to the same
and/or different lesion sites are in support of Poeppel and Hickok’s theory about the
Page 45 of 49

Neurolinguistics
shortcomings and functional under-specification of the classical model (Poeppel and

Hickok 2004).
With respect to the results of few neuroimaging studies, it can be argued that if the tasks
and the stimuli implemented in the neuroimaging study are controlled for norms, as in
Momenian (2016) and Gazie Saeedi (2013), the analysis of the results of activation
patterns can be in support of new claims made by Poeppel and Hickok (2004) about the
anatomical shortcomings and functional under-specification of the classical narrow
localization brain–language model.
Finally, as the present data from Persian patholinguistic and neuroimaging studies are
limited, given the compatibility of the data in some ways with previous global cross-
language reports and manifestations of language-specific impairments as well as
proposed new neuropsychological models of language, we hope the present Persian
clinical linguistic and some neuroimaging data will cast little light on the theoretical
nature of neuropsychology of language. Much remains to be learned about the
neuropsychology of language in future neurolinguistic studies of Persian.
Acknowledgements
The research in this chapter was supported by INSF (Iranian National Science
Foundation) research grant to the author. The author is grateful to M. Momenian for his
cooperation in extracting fMRI data reported in this study. The constructive comments of
the editors and anonymous reviewer on the first draft of the chapter are also greatly
appreciated.
(p. 459) Appendices A and B

Appendices A and B represent two written samples with language specific “morphological
regression” and “syntactic simplification” written by two young (21 & 20) brain-damaged
educated (11 & 10) native speakers of Persian on two different topics. For more examples
of violations in their oral connected speech samples see (Nilipour, 2000 Appendices A &
B, PP. 1227–1241). The violations are mostly on verb construction and morphological
simplification. They are specified as bold in transcripts with the correct inflected verb
form underneath.
Appendix A
Page 46 of 49

Neurolinguistics
The following text is a sample of MN’s written connected speech explaining his war injury
experience. Verb violations are specified in bold. The context requires a clear inflected
verb form, but MN ignores decomposition process of the inflected form of the verb and
replaces uninflected multi-morphemic infinitive.
(p. 460) Transcription of Appendix A
Page 47 of 49

Neurolinguistics
*Line one: morphological transcription of each sentence with violated verb in bold
** Line two: correct morphological form of the required verb in bold
*** Verb deletions are between parenthesis
Appendix B
The following text is a sample of MN’s written connected speech explaining Nowruz
(Iranian New Year ceremony). The written text is quite legible with samples of “syntactic
simplifications” and verb morphological regressions as substitutions of simpler verb form
as a default simple verb ( /ast/= “is”)for the proper inflected form of the verb in all
contexts. Violations are in bold.
Page 48 of 49

Neurolinguistics
Eyde nouruz ghadim ast*. Pāyāne sal chārshanbe shab ātash va qheyre ast. avale
ruz
sā?ate . … . . eyd khāne sofre haft (sabze, sir, somāgh va. …) va māhi va
Ghor?ān, āyene va qheyre ast. eyd fāmil va khishān va. … . ast. va bābā khāne va
fāmil kudak va bache pul (eyd) ast. sizdah eyd khāne mosāfer bāgh
*All verb violations are in third person singular ( /ast/= “is”) or deleted in each sentence
Reza Nilipour
Reza Nilipour is Emeritus Professor of Neurolinguistics and Clinical Linguistics and

former Chairman of Department of Speech Therapy, University of Social Welfare and
Rehabilitation Sciences. He developed the first PhD programme in Speech Therapy
in Iran. He has been developing several clinical linguistic Batteries and is the author
and co-author of neurolinguistics chapter books and research articles in Brain and
Language, Neurolinguistics, Aphasiology, and Basic and Clinical Neuroscience. He is
a member of Academy of Sciences in charge of Linguistics department. He was guest
professor to European Masters of Clinical Linguistics, University of Potsdam,
Germany in 2005.
Page 49 of 49

Computational Linguistics

Karine Megerdoomian

Subject: Linguistics, Computational Linguistics, Languages by Region
This chapter introduces the fields of Computational Linguistics (CL)—the computational

modelling of linguistic representations and theories—and Natural Language Processing
(NLP)—the design and implementation of tools for automated language understanding
and production—and discusses some of the existing tensions between the formal
approach to linguistics and the current state of the research and development in CL and
NLP. The paper goes on to explain the specific challenges faced by CL and NLP for
Persian, much of it derived from the intricacies presented by the Perso-Arabic script in
automatically identifying word and phrase boundaries in text, as well as difficulties in
automatic processing of compound words and light verb constructions. The chapter then
provides an overview of the state of the art in current and recent CL and NLP for Persian.
It concludes with areas for improvement and suggestions for future directions.
Keywords: computational linguistics, natural-language processing, linguistic modelling, light verb constructions,
writing system, automatic boundary recognition
Page 1 of 24

19.1 Introduction
COMPUTATIONAL Linguistics (CL) is an interdisciplinary field that develops
computational methods to explore the scientific questions of linguistics. CL is mainly
concerned with identifying classes of linguistic representations, and building grammars
and algorithms that can best capture the linguistic patterns in text or speech. The
methodology has also been used to develop software tools that can facilitate linguistic
analysis, language documentation, and language teaching. Computational Linguistics is
closely related to Natural Language Processing (NLP), which employs computer science
technologies and resources to understand, translate, or generate human language.
Although sometimes CL and NLP are used interchangeably, the goal of NLP research may
be less on developing models of linguistic theory and more focused on building tools that
obtain effective solutions, regardless of whether the results represent theoretical
concepts in linguistics. The software tools developed in the fields of NLP and CL can be
applied to various domains where large amounts of text or speech need to be processed,
such as political science, economics, business, medicine, and digital humanities.
As the main goal of CL and NLP systems is to effectively process and analyse language
text or speech, emphasis is often given to coverage and speed, rather than an accurate
representation of linguistic theory or explanation of language phenomena. As a result,
dominant linguistic formalisms such as Government and Binding Theory (Chomsky 1981),
the Minimalist Program (Chomsky 1992), or Distributed Morphology (Halle and Marantz
1983) have rarely been implemented within CL or NLP systems. The research on
predominant formal theories typically focuses on a particular set of linguistic
constructions and the formalisms remain too vague to be used in building computational
systems that could process the full set of constructions encountered in the data. The
theories modelled in CL therefore tend to be a certain class of formal linguistic theories
that are reasonably computationally tractable. These include constraint-based grammars
(e.g. Lexical Functional Grammar or LFG (Bresnan 1982), Head-Driven Phrase Structure
Grammar or HPSG (Pollard and Sag 1987), and Construction Grammar or CG (Goldberg
1995)), as well as (p. 462) Categorial Grammar (Ajdukiewicz 1935), Tree-Adjoining
Grammar (Joshi, Levi, and Takahashi 1975), and Dependency Grammar (Agel et al. 2003).
Application of computational approaches to Persian language began earnestly about two

decades ago. Since then, Persian has become the focus of an increasing number of
research and commercial projects in NLP and CL. In the US and Canada, Persian
language tools and resources are in high demand among industry, academic centres, and
governmental institutions. In addition, there are several computational projects on
Persian language in Iranian universities and private companies, as well as in Europe. This
recent emphasis on Persian language analysis, the growing availability of online and
digitized Persian data, combined with the latest advances in processing power have given
rise to many advances in the fields of Persian CL and NLP.
Page 2 of 24

This chapter provides an overview of the field of Persian CL by presenting the essential
components of computational linguistic analysis, discussing the main challenges of
Persian CL and NLP, and showcasing some of the important resources and methodologies
developed in the field.
19.2 Statistical vs. symbolic approaches

There exist two main approaches in the field: statistical approaches that employ
probabilistic methods for learning from annotated corpora; and symbolic methods that
take advantage of a knowledge-based system of rules. Knowledge-based or rule-based
NLP systems analyse data using a grammatical model where the linguistic properties of a
language are encoded (possibly using a metalanguage). Such a system would capture
linguistic patterns such as morphology or syntactic structures using a set of rules.
Statistical systems use probabilistic algorithms to analyse a document and build a
language model, which consists of the probability of occurrence of, for example, a
sequence of words. These systems need to be given as input a large corpus for the
domain or subject of interest in order to derive the distributional probability that allows
to estimate the likelihood of a term to appear next to or near another term for that
language. Based on the application, statistical systems may require as input a pre-
annotated corpus where all words or structures have been associated with a tag (often
marked manually). The statistical system is then trained on the corpus and builds a
probability matrix that stores the probability of an individual word or phrase belonging to
a certain linguistic category of interest, as well as its distributional probability. (See
Chapters 3 and 11 for more discussion on corpus studies.)
Each of these computational approaches presents its own advantages and shortcomings.
Rule-based systems provide consistent results and can often correctly analyse complex
and long structures, but they are generally unable to provide analyses for constructions
that have not been included in the rule set. The advantage of a probabilistic or statistical
system is that, for instance, when it encounters an unknown word, it can use the
distributional information gathered from the sequence of words in the training corpus to
determine (or guess) the grammatical class of the unknown word given its nearby
context. Statistical systems can reach high accuracies quickly, given a sufficient size
corpus, but are limited by the type of data they are trained on. Statistical machine-
learning approaches can readily be adapted to a new language if provided with the data
annotated with linguistic information for that (p. 463) language, whereas rule-based
approaches require the development of new grammars for each new language. Many
modern systems combine both statistical and rule-based methodologies; these systems
are known as hybrid. Since rule-based systems can accurately annotate large
grammatical constructions, they are used to analyse and mark-up data as a first pass.
Statistical systems are then applied to disambiguate the results or to guess the analyses
for the unknown elements. Another approach is to include syntactic or semantic
Page 3 of 24

knowledge in the pre-annotated corpus used for training a statistical system. This method
allows the statistical system to detect deeper level linguistic features and to use them in
analysing or classifying new data sets.
Most current approaches in the field show a demonstrable emphasis on statistical

methods at the expense of the implementation of theoretical linguistic knowledge. This
tension and interplay between the statistical and symbolic approaches can also be seen in
the domain of Persian NLP and CL.
19.3 NLP applications

Vast quantities of textual data are becoming available in digital form, including online
media and news sources, published documents, user-generated content (e.g. blogs and
social network posts), private databases (e.g. medical histories, legal records), digitized
historical documents, and personal emails. Anywhere language comes in contact with
information technology, or where humans need to interact with computers, language
needs to be organized so that it can be handled and processed by computational means.
These changes, combined with major advancements in computer processing power, have
opened the path for a myriad of new domains of application for CL and NLP methods.
This section provides an overview of the main applications in CL and NLP, and provides
examples of how these approaches can be applied for linguistic processing, language
analysis, and language pedagogy.
19.3.1 Textual analysis
Textual analysis depends on technologies of varying levels of difficulty. Optical Character

Recognition (OCR) can be used to convert hardcopy documents into a digitized text
format to facilitate deeper automatic processing. The ability to search for terms in a
document (string matching), spell-checking and auto-correction, word frequency analysis,
and basic grammar checkers are now very commonplace. Yet these applications require a
number of resources and technologies such as efficient dictionary access or stemming
(removing affixes to obtain the stem of words) that work in the background.
19.3.2 Translation and teaching aids
Translation memory tools are databases that continually capture previously translated
segments. These segments are reused to aid translators and speed up performance. They
are also useful in making technical or domain terminology available to translators.
Page 4 of 24

NLP technology has also been used in the classroom by building automatic
(p. 464)
plagiarism detectors that can identify documents that are too similar in content, or
automated essay-scoring programs that can grade student writing in an educational
setting. Among other applications, tools have been incorporated in the foreign-language-
learning classroom to offer students access to information on correct syntactic and
inflectional formations, and to provide exercise and test material.
19.3.3 Information retrieval
A number of NLP approaches can help sift through the large amounts of textual
information that we are faced with each day. Information Retrieval (IR) provides the user
with the ability to search through large databases or online sites and retrieve documents
that match the query. A subtype of IR is Cross-Language Information Retrieval (CLIR)
that allows the user to search for terms in one language (e.g. English) and retrieve
documents satisfying the query in other languages. This is typically done by leveraging
bilingual dictionaries prior to search.
19.3.4 Knowledge discovery
Further analyses performed on textual data allow for extraction of certain data types
usually called entities, such as references to people, places, organizations, or dates.
Certain tools can be used to also extract events (activities and their associated
participants), or relations between various entities, as well as relations between events
such as causation. These approaches make use of name matching and resolution
techniques that can unify various ways of spelling the same entities (e.g. U.S.A. and USA,
Khaddafi and Qaddafi). These technologies aim at answering the question ‘who did what
to whom’ with a reasonable level of confidence by automatically discovering the
information in the text.
With the growing use of social media, citizens, governments, and companies have an
interest in understanding the online discussions about issues, products, or policies,
identifying the relevant or most influential opinions, and anticipating emerging issues and
trends. This has given rise to a myriad of tools performing topic detection, sentiment
analysis, and predictive modelling.
The results of the various knowledge discovery components can be combined and
presented to the user as a summary of the content of all documents. For instance, a social
media analytics system can generate a summary of the top topics discussed on Twitter in
a certain location, the general sentiment of the posters on those issues, and how the
opinions change through time.
19.3.5 Machine translation
Page 5 of 24

The goal of Machine Translation (MT) is to automatically translate text from one language
to another. Although many advances have recently been made in this domain, it is still
considered one of the more difficult problems in the field, as it depends on the accurate
(p. 465) application of various subcomponents such as the analysis of morphology,
syntactic structures, semantics, and even pragmatics.
19.3.6 Speech processing
Much of natural language occurs in audio or video form. Some of the important
applications in this domain include the use of speech technology to speak to smart phones
and attempts to make the content of online videos searchable. The challenges faced by
automatic speech processing systems are manifold, as it is necessary to segment the
speech patterns, to analyse the content of the language and in certain cases, to perform
machine translation. These systems often also need to generate human speech.
19.3.7 Foundational components
In order to successfully develop the applications introduced in this section, computational

linguists need access to resources such as machine-readable dictionaries and digital
corpora (collections of text or speech material). In addition, these higher-level
applications are built on foundational technologies that perform analysis at the word
level, identify phrasal elements, and parse syntactic structures. A sample analytic
pipeline is illustrated in the schema in Figure 19.1, leveraging various foundational
components and applications. Depending on the domain of study and the final goal, each
application may make use of distinct components.
(p. 466)

Figure 19.1 Sample language processing pipeline.
Adapted from ParsiPardaz, a Persian language toolkit
(Sarabi et al. 2013)
Page 6 of 24

19.4 Issues in Persian NLP

Persian language offers a number of important challenges to automatic analysis, ranging
from issues raised by the writing system in word sense disambiguation and boundary
detection, challenges due to the relatively free word order in parsing, to difficulties
presented by the large number of Multi-Word Expressions (MWEs) or compounds used in
the language. In addition, although the traditional written form of Persian in Iran,
Afghanistan, and Tajikistan is very similar, there is a strong diglossia within each nation
due to the distinction between the conversational and literary variants of the language.
This section presents some of the main challenges raised by the intricacies of Persian
language and script for automatic computational approaches. The focus of this chapter,
however, remains on text analysis; applications in speech processing will not be
discussed. In addition, we will limit the discussion mainly to text written in the Perso-
Arabic script. Since the medium of analysis is text, the writing system’s properties
crucially affect the tools’ efficiency and implementation. Once the surface words have
been segmented and retrieved, linguistic properties of Persian play an important role for
providing deeper language analysis.
19.4.1 Character set encoding
In order for any natural language document to be machine-readable, its characters must
be represented in a character encoding, in which one or more bytes in a file map to a
known character. Most Persian language digital text is nowadays encoded in the Unicode
standard (Unicode Consortium 2006), most commonly implemented in the UTF-8
character encoding. The advantage of the Unicode standard is that it has been developed
for multilingual applications and allows the encoding of all known language characters in
distinct form, removing all ambiguities. However, encoding issues often occur in
processing of Persian text. For instance, besides the range of Unicode characters
designed for Persian, users sometimes employ Arabic characters when creating digital
text. Hence, the letters kāf ( ) and ye ( ) can be expressed by either the Persian
encoding (\u06a9 for and \u064a for ) or by the Arabic Unicode (\u0643 for and
\u06cc or \u0649 for ). Any Persian processing system should be able to process all of
these possible input versions.
19.4.2 Word boundary
The first essential challenge of CL is to clearly define the characters, words, and
sentences in a digital natural language text. This step, known as tokenization or text
segmentation is, in fact, not a trivial matter due to the various ambiguities presented by
human language and writing systems. The extended Arabic script used in writing Persian
Page 7 of 24

texts naturally brings about certain ambiguities, since the diacritic vowels are usually not
written, yet the system should be flexible enough to be able to detect diacritics when they
appear in the text. Furthermore, the inconsistent usage of the whitespace in Persian
documents gives rise to problems in detecting word, phrase, and sentence boundaries.
(p. 467)
The optional use of white space can be mitigated if the first word ends in a final form
character.1 For instance, in (zendegi-e ānhā) ‘their life’, the first word
zendegi ‘life’ is followed by the pronoun ‘their’ without any overt spacing. However, a
reader can distinguish the two words as distinct, since the first word terminates in a final
form character. In digital text, this is accomplished by inserting a control character,
known as the zero-width non-joiner (ZWNJ).2 A text segmenter can therefore treat the
ZWNJ akin to a white space and use it to delimit token boundaries. Ambiguity arises,
however, if the first word ends in a character that does not include a final form. In such
cases, distinct words appear as a single token. For example, in (raftand
mardom) ‘people left’, the first word raftand ‘left’ terminates in a non-final form and
merges with the next term with no characters distinguishing the boundary between the
two tokens. In order to analyse these adjoined words, certain systems use a post-
segmentation script to separate unrecognized tokens at possible boundary points and to
look up the resulting words in the lexicon.
Optionality of the white space also raises issues in the analysis of morphemes that can
appear detached. Inflectional morphemes such as the progressive prefix (mi), the
plural morpheme (hā), or the superlative suffix (tarin), can appear either as
bound to the host, as free affixes separated by a final form character (or ZWNJ), or
separated with an intervening space. These three options are represented in Table 19.1,
where the ZWNJ character is represented with a dot in the transliteration. Components
such as the morphological analyser that needs to automatically identify the morphological
structure of words, and the stemmer that is used to remove affixes and maintain the stem
form of the word, should be able to recognize all of these forms and to provide the correct
analysis.
Complex Tokens refer to multi-element forms, which consist of affixes that represent a
separate lexical category than the one they attach to. These tokens occur in writing when
(p. 468) they consist of elements that can optionally appear attached to other words, such
as the preposition (be) ‘to’, the determiner (in) ‘this’, the postposition (rā), the
relativizer (ke) ‘that’, or the copula (ast) ‘is’. Examples of these complex
tokens are provided in Table 19.2, where the English translations clearly show that these
single words in fact represent compound forms consisting of distinct lexical categories.
Table 19.1 Plural morpheme example in attached, detached, and separated forms in
the Persian writing system
Page 8 of 24

Attached Detached form (with Separated by

form ZWNJ) space
Persian
Transliteration ketābhā ketāb⋅hā ketāb hā
Gloss book-PL book⋅PL book PL
Translation ‘books’ ‘books’ ‘books’
Table 19.2 Complex tokens consisting of two distinct but attached lexical categories
Persian
Transliteration bešive inkār Behtarast
Gloss to-manner this-work better-is
Translation ‘in manner (of)’ ‘this work/task’ ‘it’s better’
The Part-of-Speech (POS) Tagger is used to associate grammatical tags to words, such as
noun, preposition, verb, pronoun, etc. Distinguishing the different elements in these
complex tokens is therefore important at this level. The POS tagger is often used in many
higher-level applications, such as machine translation or distinguishing the meaning of
polysemous words.
As Persian allows some level of agglutination, single words may correspond to a full
sentence in a language like English. For example, the term
(puldārtarinhāyešānand), glossed as ‘rich-SUP-PL-CLIT.3PL-COP.3PL’, can be translated

as ‘They are the richest among them’. In order to provide accurate analysis and
translation, the NLP system for Persian language needs to be able to parse the various
elements within this token and map them to their correct meanings and translations.
Page 9 of 24

19.4.3 Word sense ambiguity
Ambiguous words are a common problem in NLP applications. The English word ‘bank’
can refer to the financial institution, the edge of a river, or the verb indicating holding an
account at a financial institution. Word Sense Disambiguation (WSD) components attempt
to associate the correct semantic or grammatical tag with polysemous words encountered
in text. Polysemy in Persian is reinforced by the intricacies of the writing system. In
Perso-Arabic script, the diacritics (the vowels /æ/, /e/, and /o/) are usually not represented
in the text. This in turn creates certain ambiguities: The word (written as (krm)), for
instance, can be pronounced with different vowel combinations resulting in five possible
lexical elements: kerm ‘worm’; karam ‘generosity’; kerem ‘cream’; korom ‘chrome’; karm
‘vine’. A reader uses the context to determine the correct sense (and pronunciation) of
the word in the sentence.
Page 10 of 24

19.4.4 Phrasal boundary
Parsing refers to the process of identifying and representing the syntactic structures in a
sentence. This task is complicated in Persian text due to difficulties in automatically
detecting (p. 469) phrasal boundaries. Numerous factors contribute to the ambiguity of
the Persian phrasal structure: diacritics are typically not written, which produces lexical
ambiguities, there are very few overt morphemes in the language to mark the boundaries
of a Noun Phrase (NP) or Preposition Phrase (PP), there are often no particles in written
text linking the constituents of a noun phrase, since the ezafe morpheme is usually an
unwritten diacritic. (For more information about ezafe, refer to Chapters 2, 3, 6, 7, and 9
in this volume.) Furthermore, since the basic word order in Persian is Subject–Object–
Verb, the lack of overt morphology for marking boundaries makes it very difficult to
determine where the subject ends and the object begins. All of these factors, coupled with
very long sentences in written media, a relatively free word order and the optionality of
the subject, cause immense difficulties in parsing Persian text.
There are, nevertheless, some cues that can be leveraged in identifying NP and PP
boundaries: Two of the most consistent lexical items that delimit the boundary are the
Pronoun and the Proper Name. Oftentimes, a proper noun demarcates the end of the NP
in Persian as in (vazir xāreje-ye āyande-ye āmrikā) ‘the
future Secretary of State of America’. The postposition (rā), (ro) in conversational

language, always marks the boundary of an object noun phrase or a topicalized phrase.3
In addition, a number of affixes such as the pronominal clitic (tān/šān), the
indefinite article (i), and the relativizing affix (i), typically indicate that the end of a
noun phrase has been reached. On the other hand, the presence of an ezafe morpheme
indicates that the boundary of the NP has not been reached and the nominal or adjectival
element needs to be joined with the constituent that follows it. Note that the lack of an
ezafe morpheme can also be used in detecting the boundary of a noun phrase. Hence, if a
noun or adjective ends in a vowel and is not followed by the ezafe suffix as in
(enfejārhā) ‘explosions’—as opposed to (enfejārhāye) ‘explosions (of)’
where -ye is the transcribed form of the ezafe affix—it can be used by a parser to denote a
phrasal boundary.
19.4.5 Phonological rules
In Persian, the form of morphological affixes varies based on the ending character of the
stem. Hence, if an animate noun ends in a consonant, it receives the plural morpheme –ān
as in (zanān) ‘wives/women’. If the animate noun ends in a vowel, the glide /y/ is
inserted between the stem and the suffix, as in (gedāyān) ‘the poor’. If the word
ends in /e/ which is represented in writing with what is known as a final ‘h’ character
Page 11 of 24

(known as the ‘silent he’) as in (bigāne) ‘foreigner’, then the last character of the
word is eliminated in writing and is replaced by the -gān suffix, as in
(bigānegān) ‘foreigners’. These phonological rules apply across categories and are not
limited to the plural formation, e.g. glide insertion before the indefinite morpheme on the
word (dānešju), as in (dānešjuyi) ‘a university student’. In
order to recognize these constructions, it is usually more efficient to implement the
phonological rules that apply in these cases instead of listing all the possible morphemes
as independent affixes. (p. 470)
A problem arises in Persian with characters that may be either vowels or consonants
depending on the context and cannot be analysed correctly simply based on the
orthography. For example, the letter (vāv) can either be a vowel (pronounced /u/) or a
consonant (pronounced /v/)—as illustrated in the contrast between (dānešju)
(university student) and (gāv) ‘cow’. Thus, any morphological system would need to
distinguish the words based on their pronunciation, since the phonetic representation of
Persian nouns and adjectives plays a crucial role in the type of phonological rule that
should apply to morpheme boundaries.
19.4.6 Multiword expressions
One of the biggest problems in Persian language analysis is the presence of a large
number of Multi-Word Expressions (MWEs) and include certain compound tenses such as
future or modal forms, light verb constructions (also known as phrasal verbs, compound
verbs, or complex predicates), and compound nouns. (For more information about light
verbs, refer to Chapters 3 and 8 in this volume.)
These elements range from lexical units such as (banā-bar-in)

‘therefore’ (literally: ‘based-on-this’) where the subelements behave as single units but
can be separated from each other with white space in text, to phrasal verbs where the
subelements can be separated from each other with intervening words. For instance, the
compound verb (ezhār kardan) ‘to express’ (literally: ‘expression do’)
can be separated by the intervening noun (taasof) ‘regret’ in
(ezhāre taasof kardand) ‘they expressed regret’. Given the
range of MWEs that appear in language, a uniform approach cannot be applied. Instead,
different techniques should be used to capture the distinct types of MWEs, ranging from
lexicalized phrases, to the more compositional expressions, and taking into account
discontinuous elements that should be treated as a single unit (Sag et al. 2002).
Page 12 of 24

Analysis of the unit-like elements can be accomplished generally easily by listing them in
the lexicon. Certain compound forms can be analysed in the morphological module by
undergoing recursive rules. Thus, for verbs, once a participle is formed in the
morphological analyser, it may combine with an auxiliary to create a compound tense. For
example, the compound imperfect tenses are formed by joining the past participle and the
present auxiliary ‘to be’ as in (1). In the double compound past tense shown in (2), the
participle is combined with the past auxiliary, which could itself be conjugated (i.e. past
participle form of the auxiliary is followed by the present auxiliary).
(1)
(2)
(p. 471)
Page 13 of 24

Table 19.3 Examples of Persian light verb constructions
Noun + LV Adj + LV Adv + LV PP + LV
Compound Verb
Transliteration āb dādan xošk šodan birun kardan be yād āvardan
Gloss water give dry become out make/do to memory bring
Translation ‘to water’ ‘to dry’ (intr.) ‘to dismiss’ ‘to recall’
Page 14 of 24

One of the most challenging yet fascinating elements in Persian linguistics is the high
frequency of compound verbs (also known as light verb constructions). These MWEs lie at
the interface between lexicon, morphology and syntax, and their subparts may also
appear separated from each other by intervening words in a sentence. These verbs
consist of a nominal, adjectival, adverbial or PP element followed by a light verb (LV), as
exemplified in Table 19.3. These constructions range from compositional elements, where
the meaning of the construction can be derived compositionally by the combination of the
two units, to idiomatic expressions.
The subparts of these elements can be separated from each other in text by intervening
morphemes (e.g. object pronoun clitics) as shown in (3), by modifiers as in (4), or by
phrasal elements (e.g. noun phrase or preposition phrase) illustrated in (5). If these
elements are simply listed in the lexicon as single units, they would not be accurately
recognized by the system. Given the productivity of the light verb constructions in
Persian and the ability to separate the two components, listing all possible forms in the
lexicon is not a viable option.
(3)
(4)
(5)
Another instance of phrasal elements are the nominal compounds in which the head of
the noun is the first token. As an example, consider the nominals (xamare
sorx) ‘Red Khmer’ or (māšine lebāsšuyi) ‘washing machine’,

where the plural affix (hāye) appears on the first subpart instead of attached to the end of
the compound, as shown: (p. 472) (xamarhāye sorx) and
Page 15 of 24

(māšinhāye lebāsšuyi). These compounds are also best

treated as a phrasal element rather than as a single unit in the lexicon.
19.4.7 Conversational language
Persian language demonstrates a strong form of diglossia, which here refers to situations
where the language spoken by the people in a society differs considerably from the
traditional written variant (Ferguson 1959). For more discussion on diglossia see
Chapters 2 and 13. Although most Persian NLP systems have traditionally been developed
for analysing the high-level language variant found in news text, the recent explosion of
blogs, microblogs (e.g. Twitter), forums and social networks has created a large amount
of text written in the conversational or low-level variant of the language.
The main differences found in conversational language involve phonological alternations,

differences in word-formation rules, deviance from the verb-final word order found in
high-level speech, and a large number of neologisms (newly coined words) and loans. In
addition, code switching—occasions where a speaker alternates between two or more
languages, or variants of the language, in the context of a single conversation—is often
encountered in online social media text. NLP systems need to be able to accurately
process the features of the conversational variant of the language as well as the forms of
the literary variant. In some cases, it is crucial for social media analytic tools to
understand the structure of Pinglish—Persian language written in Latin characters—as its
use is prevalent in chat forums and other social media sites.
19.4.8 Dialectal variants
Most of the discussion in this section has revolved around the issues presented by the
Persian spoken in Iran and the features of its Perso-Arabic script. It is needless to say that
a comprehensive Persian computational linguistic system will need to be able to process
and analyse all major variants of the language.
Tajiki Persian is distinct from the other two variants of Persian in that it employs an
extended version of the Cyrillic alphabet, which does not present the same challenges as
the Iranian and Afghan Persian variants (see also Chapters 2, 3, 11, and 13 for more on
Tajiki and Dari Persian). Tajiki text is much less ambiguous than its corresponding Perso-
Arabic script, as all the vowels are generally represented in this writing system and
capitalization is used for proper names and at the beginning of sentences. The
orthography corresponds more directly to the Persian language pronunciation. For
instance, the sounds /s/ and /t/ are represented with the Cyrillic character ‘с’ and ‘т’
respectively, regardless of the original spelling.4 This has of course created certain
homonyms in Tajiki Persian which are differentiated in the Arabic-based orthography. For
instance, the two words of Arabic origin ‘concealment’ and ‘line’, where /t/ is
Page 16 of 24

represented with different letters in each instance, are both written as сатр /satr/ in
Tajik. Since (p. 473) Iranian and Tajiki Persian have developed differently, they also have
distinct lexical borrowings and divergent pronunciation which are represented in text.
Table 19.4 Examples of separable affixes in Afghan Persian (Dari) text
Afghan Persian ‫ﮐﺎﻓﯿﺴﺖ‬ ‫ﻣﻮﺍﻓﻘﺎﻡ‬ ‫ﻧﮕﺎﻫﻬﺎﯾﺸﺎﻥ‬
Transliteration kāfi⋅st muwāfaq⋅am nagāh⋅hāya⋅šān
Gloss enough-is agreed-am look-PL-their
Translation ‘it’s enough’ ‘I agree’ ‘their looks/glances’
The Persian spoken in Afghanistan, or Dari, presents its own challenges for computational
analysis as the writing system represents even more flexibility than the Iranian Persian
variant. Various Dari publications choose different ways of representing the spacing on
the affixes and compounds. The spacing for a particular word may even vary within the
same document. For instance, affixes that are typically written as joined in certain
contexts in the Perso-Arabic script of Iran may appear as detached in Dari text as
illustrated in Table 19.4, where the ZWNJ character is represented by a dot in the
transliteration.
In addition, the ezafe affix and the indefinite article are sometimes written
interchangeably in Dari orthography, even in news or literary publications. For instance,
the indefinite article following a ‘silent he’ letter can appear in several forms as shown in
the examples in (6). It is interesting to note that, of these three forms, only the first
instance is considered ‘correct’ in the standard Persian orthography taught in Iran. And
the third form is actually considered ‘incorrect’ since it is the form of the ezafe and not of
the indefinite article. However, all forms are present and considered relatively correct in
the writings of educated Dari speakers.
(6)
Page 17 of 24

Both Afghan Persian and Tajiki Persian make use of distinct vocabulary and display
syntactic variations, as compared to Iranian Persian, that need to be captured and
processed by the NLP systems.
19.5 Tools and resources

Most approaches to Persian NLP rely on statistical machine-learning techniques that
require a large data set for the domain of interest (i.e. news, conversational language,
medical, or technical text). Based on the application, the data need to be annotated with
the relevant linguistic information. Annotated corpora are also helpful in evaluating
statistical (p. 474) and symbolic NLP systems. Furthermore, both rule-based and
statistical approaches make use of dictionaries and lexicons, as well as foundational
components for processing the Persian data. Following many years where there was a
dearth of such material for Persian language,5 a number of important resources have
been developed and shared with the scientific community, which have in turn fuelled the
development of a growing number of NLP applications for Persian. This section provides
an overview of the existing Persian language resources such as corpora, lexicons,
language processing tools, and more advanced systems for Persian NLP and CL.
19.5.1 Language grammars
Existing grammar books and descriptions of Persian linguistic properties are not always
adequate for development of NLP tools. One of the main challenges to performing
computational analysis involves the properties of the writing system as discussed in
section 19.4, which are typically never addressed in either traditional grammatical
descriptions or linguistic analyses of Persian language. Thus, specialized grammatical
descriptions are needed that address the intricacies of the writing system as well as
linguistic properties and patterns that are relevant for computational applications
(Riazati 1997; Megerdoomian 2000; Shamsfard 2011).
19.5.2 Corpora
Language-specific corpora are needed for training statistical systems. In particular,

annotated data sets facilitate the automatic induction of linguistic features and properties
that can be used by the statistical system to analyse new documents. Text corpora can be
monolingual—containing a single language—or parallel—containing the same content in
two languages. These data sets can be important for linguists in identifying language
usage or discovering linguistic structures. In addition, corpora can be studied in the
classroom to teach vocabulary and linguistic patterns. Recent years have seen an
Page 18 of 24

explosion in the number of Persian language corpus sets, some of which are described in
section 19.5.2. Most of these resources are freely available for research purposes.
The Hamshahri corpus (Darrudi et al. 2004), the Persica corpus (Eghbalzadeh et al. 2012)
and the Tehran Monolingual Corpus (TMC)6 are large-scale collections of online news
text. TMC contains 250 million tokens (300,000 unique words), the second version of the
Hamshahri corpus includes over 300,000 articles, and Persica contains more than 1.5
million articles. The articles included in Persica and Hamshahri have been annotated for a
subject category, e.g. politics, culture, sports. The Bijankhan corpus (Bijankhan et al.
2011) consists of news and common texts categorized into subjects. In addition, its 2.6
million tokens have been manually tagged for part-of-speech information (see Chapter 11
for more on part-of-speech tagging). The original version of the Bijankhan corpus
contains 550 distinct POS tags describing the inflectional information on each word, but
variants have (p. 475) since been created with fewer tags: Bijankhan processed version
(forty tags) and Uppsala Persian corpus (thirty-one tags). The irBlogs corpus (AleAhmad
et al. 2016) and TalkBank Persian7 are collections of blog posts.
The Tehran English–Persian (TEP) parallel corpus (Pilevar et al. 2011) includes 61,000
aligned sentences, which are very useful for developing MT systems. The English–Persian
parallel corpus8 includes 100,000 aligned sentences tagged by subject. Persian Wikipedia
has also been used as a bilingual (or multilingual) corpus which can be used as a parallel
data set (Mohammadi and QasemAghaee 2009). There exist, in addition, Persian language
speech corpora such as the Farsi Speech Database (Farsdat) (Bijankhan et al. 1994) that
includes 100 speakers representing different dialects of Iran, the CHILDES corpus9
representing children’s speech, and the CALLFRIEND Farsi corpus (Canavan et al. 2014).
In recent years, Persian language Treebanks have been developed such as the Persian
Treebank (PerTreeBank) (Ghayoomi 2012), the Uppsala Persian Dependency Treebank
(UPDT) (Seraji et al. 2014), and the Persian Dependency Treebank (PerDT) (Rasooli et al.
2013). A Treebank is a corpus of sentences annotated with syntactic analysis and can be
used to train statistical parsers.
Corpora can be used by both language teachers and linguists to identify phenomena of
interest. Using a concordance software program, the user can identify the most frequent
terms used in a corpus and detect patterns of language use, per subject or register, which
can guide the teaching of important vocabulary items and language constructions within
context. A corpus-based analytic study of linguistic patterns in adult and child language
can inform linguistic theory and modelling.
19.5.3 Lexical resources
Lexical resources can take many forms including a list of lemma (dictionary) forms of
Persian words, machine-readable bilingual dictionaries to be used for translations
(Amtrup et al. 2000), domain lexicons (e.g. an electronic glossary of nuclear terms), and
gazetteers that contain lists of named entities such as people, organizations, and places.
Page 19 of 24

More specialized lexicons include ontologies that attempt to capture pragmatic or world
knowledge (Shamsfard and Barforoush 2002), WordNet-type lexicons that represent the
various senses of a word with relations of synonymy and hypernymy (Montazery and Faili
2010), valency lexicons that include the subcategorization requirements for verbs and
deverbal elements (Rasooli et al. 2011), or sentiment lexicons listing a set of positive and
negative polarity terms (Amiri et al. 2015).
19.5.4 Foundational technology
Persian NLP practitioners have placed much effort in developing crucial foundational
technology and methods. These range from tools that process the basic script to full
morphological and syntactic analysis. For instance, users can type in Latin
(p. 476)
characters and see an automatic transliteration into Persian script (Google Input Tools10)
or employ tools to transcribe from the Tajiki Cyrillic writing system to the Perso-Arabic
script. Spellcheckers and software for automatic grammar error correction for Persian
are now available (Ehsan and Faili 2013) and recent work has been performed in building
a Persian-language plagiarism detection tool (Rakian et al. 2015). Tools have been
developed for stemming (Jadidinejad et al. 2009 (PerStem)) and for providing full
morphology of words in a text outputting the associated lemma and affixation
(Megerdoomian 2004; Sagot and Walther 2010).
Knowledge-based taggers (Sagot et al. 2011 (PerLex); Mohseni and Minaei-bidgoli 2010)
provide the POS information on each word based on the output of the morphological
analysers, whereas statistical taggers (Seraji et al. 2012a (TagPer), Tasharofi et al. 2007)
have been developed by training the system on a pre-annotated corpus. In order to obtain
syntactic parses, several grammars have been built for analysing the syntactic
constituents in a text (Müller and Ghayoomi 2010 (PerGram)). In addition, Persian
language Treebanks have been used to construct statistical parsers (Seraji et al. 2012b).
Although morphological analysers and POS taggers can reach high levels of accuracy on
specific types of textual data (e.g. news), Persian syntactic parsers are still in their
infancy.
19.5.5 Advanced NLP systems
The first major NLP system developed for Persian was the Shiraz Persian to English
Machine Translation system (Amtrup, Mansouri-Rad, and Zajac 2000). The Shiraz system
was inspired by an HPSG type grammar with typed feature structure rules to describe
words and their linguistic features, and unification operations to capture agreement and
to build larger structures. The system segmented the tokens in a text, performed full
morphological analysis, and provided parses for NP, PP, and VP structures. The final
Persian parses were transformed into the English syntactic word order and translated
using a bilingual lexicon. Since then, researchers have developed other knowledge-based
Page 20 of 24

grammars (Systran) and trained several statistical MT systems on parallel corpora

(Google Translate, Language Weaver, Persian SMT; see Pilevar and Faili 2010). Although
many advances have been made in Persian MT, the translations obtained are typically
useful for obtaining the main topics and gist of a document rather than an accurate final
product. This is the case for the field of MT in general where results do not represent
perfect translations, yet important improvements are obtained by developing systems for
specific domains of application (e.g. legal, aviation, medical).
An interesting approach to MT involves leveraging a pivot or bridge language for which

more resources and tools exist. For instance, Bakhshaei et al. (2010) build a Persian-to-
German MT system by first translating the Persian text to English and then applying an
English to German MT system to obtain the final results. Parvaz and Megerdoomian
(2008) develop a Tajiki to English MT tool by first converting Tajiki script to the Perso-
Arabic script and then leveraging existing Persian to English MT to translate the Tajiki
Persian-language text. (p. 477)
Word Sense
Disambiguation (WSD) is
the task of assigning the
appropriate sense or
meaning to polysemous
words in a text or
discourse. Statistical
approaches to WSD
Click to view larger require a pre-tagged
Figure 19.2 Sample block diagram of a Persian– corpus marked with the
English speech-to-speech system (Georgiou et al.
2006)
correct sense of each word
in order to provide
adequate training material
for the machine-learning algorithm. As of today, no semantically tagged corpora exist for
Persian, so researchers have opted to combine information extracted from raw (i.e.
unmarked) or POS-tagged corpora with knowledge-based resources such as a thesaurus
or WordNet (Miangah and Khalafi 2005; Hamidi et al. 2007; Makki and Homayounpour
2008; Soltani and Faili 2010).
A number of researchers have focused on developing Information Retrieval systems

whose goal is to find the documents that contain text relevant to the query entered by the
user (Karimpour et al. 2008). In addition, CLIR systems are built and evaluated as part of
the text retrieval experiments of Persian at the CLEF (Conference and Labs of the
Evaluation Forum) international conference (AleAhmad et al. 2008).
In recent years, research funding in the United States has supported the development of
English-to-Persian and Persian-to-English speech-to-speech systems, with a focus on the
Dari dialect (see Figure 19.2). These systems allow an English speaker to communicate
with a Persian speaker in close to real time. Such systems include an Automatic Speech
Page 21 of 24

Recognition (ASR) component that converts spoken language to text, a MT component,

and a Speech Generation component that converts the translated text to speech. These
handheld tools typically perform well in domain-specific areas or topics, such as the
medical domain or military applications (Georgiou et al. 2006; Prasad et al. 2013).
Among the important systems developed for Persian language are classifiers that
automatically assign documents to a set of pre-defined categories. The categories can
represent diverse concepts thus the classification method can be applied to the
identification of topics (Farhoodi et al. 2011; Jadidinejad and Marza 2015), ideological
discourse, or metaphors (Ghavidel et al. 2015) found in a given document. The same
approach has also been used to (p. 478) automatically recognize Persian handwritten
numbers or text within documents to improve OCR results (Karimi et al. 2015).
With the advent of user-generated content such as blogs, microblogs (e.g. Twitter), and
social networking sites, a large number of opinionated data have been made available
online. These include ideological blogs, debate forums, opinion pieces, and customer
reviews. The large-scale availability of these data sets in Persian has aided the
development of systems for automatic sentiment analysis, also known as opinion mining,
aimed to identify the attitude, sentiment, and emotions of the user towards a topic
(Hajmohammadi and Ibrahim 2012; Saraee and Bagheri 2013).
19.6 Future directions

Recent years have seen a steady and fast-paced growth in the field of Persian CL and
NLP. This is, in large part, due to the availability of research funding from government
institutions provided to both academic and industry researchers in Iran and in Western
nations. There has been a strong focus on developing Persian language resources for NLP
as well as statistical systems within university research labs in Iran and to a lesser degree
in Tajikistan, allowing computer science graduate students to move into the field of NLP.
In addition, Iranian students pursuing graduate degrees in various international
institutions with strong NLP emphasis have contributed to the field in important ways.
The Persian NLP field has made major strides in tackling some of the basic challenges
presented by Persian language and the Perso-Arabic writing system. The development of
several corpora and lexical resources has allowed the training of various statistical
systems, and has led to a number of interesting corpus-based studies. The existence of
tagged corpora has further facilitated the extraction of syntactic and semantic
information to build or improve lexicons. Although the research and development in these
domains can still be improved upon, important gaps exist in a number of lesser-studied
yet fascinating domains of analysis: novel technical approaches are being promoted in the
field of NLP at large; these same methodologies can be applied to improve existing
capabilities in Persian language systems. These include improvements to statistical MT
systems, the use of neural networks—known as Deep Learning—in identifying closely
Page 22 of 24

related terms, and the incorporation of semantic information to enhance results. The field
of Persian CL can benefit from the enhancement of syntactic parsers, and event and
relations analysis software. Furthermore, research on social media data sets can provide
new directions for Persian CL and NLP including processing of Pinglish (Persian
transcribed in Latin characters), conversational language MT, narrative analysis, more in-
depth work on sentiment analysis and ideology detection, and the processing of
metaphors, irony, and sarcasm.
One of the major challenges, however, is the lack of collaboration between NLP
practitioners with linguists and teachers of Persian. Further emphasis should be placed
on transitioning NLP tools into the classroom. One of the areas that has recently brought
together linguists and computational linguists of Persian is the phenomenon of Persian
light verb constructions or compound verbs. Recent projects have focused on
automatically recognizing these MWEs and extracting information on the verbs’
subcategorization frames or their semantics. Leveraging existing corpora, researchers
have built a database of Persian (p. 479) light verb constructions (PersPred: Samvelian
and Faghiri 2013), enhanced the Persian WordNet (Mansoory et al. 2012), or developed
grammars for analysing the compound verbs (Müller and Ghayoomi 2010). The
continuation of such collaboration between linguists and computational linguists of
Persian can help detect new patterns in the data to inform linguistic analysis, provide
tools to facilitate documentation of dialects, test theoretical hypotheses, and design
models to evaluate linguistic theories of specific language phenomena.
Notes:
(1) In the Perso-Arabic writing system, letters in a word are often connected to each
other. Most characters have a different form depending on their position within the word.
The initial form indicates that no element is attached to the element from the right (i.e.
there is no ‘attaching’ character before it, but there is one following the character). Note
that an initial form does not mean that the character is in the beginning of a word, it only
indicates that the character is not at the end of the word. Characters are in medial form if
they have an attaching character both before and after them. The final form denotes that
the character is at the end of a word. Certain characters (alef ( ), dāl ( ), zāl ( ), re ( ), ze (
), že ( ), vāv ( )) have only one form regardless of their position within the word.
(2) This is accomplished in Microsoft Word documents by typing Ctrl-Shift-2.
(3) Note, however, that the postposition ‘ro’ can be followed by a modifying relative clause
as in (film-i-ro ke mā diruz didim … ) (movie-IND-POST that we yesterday saw … ) ‘The
movie that we saw yesterday … ’.
(4) The Perso-Arabic script maintains the distinct forms of /s/ ( ) or /t/ ( )
representing the historical root of words. These letters have distinct phonetics in
standard Arabic but have lost their distinctions in modern Persian pronunciation.
Page 23 of 24

(5) Although Persian language resources for CL and NLP applications existed, they
typically remained proprietary and were not openly shared with the research community.
(6) Available at http://ece.ut.ac.ir/en/node/940.
(7) https://www.sketchengine.co.uk/talkbank-persian/.
(8) Available from the European Language Resources Association (ELRA) at http://
catalog.elra.info/product_info.php?products_id=1111.
(9) http://childes.talkbank.org/.
(10) https://www.google.com/inputtools/try/.
Karine Megerdoomian
Karine Megerdoomian is a Principal Computational Linguist at MITRE, a federally

funded research and development centre, and Adjunct Faculty at the Communication,
Culture, and Technology department at Georgetown University. Karine’s expertise is
in the domains of social media analytics and linguistically informed Natural
Language Processing with a focus on Middle Eastern Languages. Her current
research focuses on the relationship between language in online media and
associated socio-political issues—with emphasis on sentiment analysis, and automatic
framing and narrative analysis. Karine’s linguistic research has focused on the study
of complex predicates and the syntax–semantics interface.
Page 24 of 24

The Oxford Handbook of Persian Linguistics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Oxford Handbook of Persian Linguistics

Uploaded by

Copyright:

Available Formats

Introduction

Oxford Handbooks Online

Print Publication Date: Aug 2018

Abstract and Keywords

Keywords: Persian, phonetics, phonology, morphology, syntax, sociolinguistics, teaching Persian,

Subscriber: Gothenburg University Library; date: 16 October 2018

Research on Persian linguistics had previously centred on historical linguistics,

We have invited the internationally renowned leading scholars of major subfields of

Subscriber: Gothenburg University Library; date: 16 October 2018

Subscriber: Gothenburg University Library; date: 16 October 2018

While Chapter 7 is mainly a description of generative approaches to Persian syntax,

Part IV, is on language and words, encompassing topics such as morphology,

Subscriber: Gothenburg University Library; date: 16 October 2018

teaching Persian to speakers of other languages from a variety of perspectives including

Subscriber: Gothenburg University Library; date: 16 October 2018

Pouneh Shabani-Jadidi is Senior Lecturer of Persian Language and Linguistics at

Access is brought to you by

Subscriber: Gothenburg University Library; date: 16 October 2018

Oxford Handbooks Online

Print Publication Date: Aug 2018

Abstract and Keywords

2.1 Over 2,500 years of Persian

Subscriber: Freie Universitaet Berlin; date: 29 September 2018

2.2 Research on Old Persian

For grammar, traditionally approached in a historical perspective, Kent (1953) and

Subscriber: Freie Universitaet Berlin; date: 29 September 2018

comprehensive updated overview of Old Persian grammar in the framework of Old

2.3 Research on Middle Persian

The sections on Middle Persian in the ground-breaking survey of Middle Iranian by

Subscriber: Freie Universitaet Berlin; date: 29 September 2018

2.4 Research on the history of New Persian

As to studies more specifically related to the history of the language, a comprehensive

Subscriber: Freie Universitaet Berlin; date: 29 September 2018

2.5 Old Persian: documentation, use, script,

Subscriber: Freie Universitaet Berlin; date: 29 September 2018

2.6 Old Persian innovations

1) Old Persian ϑ, d, d, as against s, z, z in the other Iranian languages, from Iranian

Subscriber: Freie Universitaet Berlin; date: 29 September 2018

Subscriber: Freie Universitaet Berlin; date: 29 September 2018

2.6.1 New perfect and pluperfect

Subscriber: Freie Universitaet Berlin; date: 29 September 2018

2.6.2 Beginnings of the ezafe construction

The relative pronoun haya-/taya-5 used to join a modifier to a usually preceding

Subscriber: Freie Universitaet Berlin; date: 29 September 2018

Subscriber: Freie Universitaet Berlin; date: 29 September 2018

2.7 Loanwords in Old Persian

2.8 From Old to Middle Persian

Subscriber: Freie Universitaet Berlin; date: 29 September 2018

2.9 Middle Persian: documentation and scripts

Subscriber: Freie Universitaet Berlin; date: 29 September 2018

Middle Persian formed in post-Achaemenian times as a development of Old Persian and

2.9.1 Inscriptional Middle Persian

2.9.2 Manichaean Middle Persian

Subscriber: Freie Universitaet Berlin; date: 29 September 2018

(p. 18) 2.9.3 Zoroastrian Middle Persian

2.9.4 Christian Middle Persian

Subscriber: Freie Universitaet Berlin; date: 29 September 2018

2.9.5 Middle Persian scripts

The Pahlavi script is characterized by a heterographic writing system that

2.10 A new language type: survivals and

1) lenition of consonants in non-initial position through (a) voicing of the old

Subscriber: Freie Universitaet Berlin; date: 29 September 2018

Subscriber: Freie Universitaet Berlin; date: 29 September 2018

Labial Labiode Dental Palatal Velar Laryng

Nouns of relationship -Ø -ar -ar arān (-arīn, -arūn)

Other nouns -Ø -Ø -Ø -ān (-īn, ūn)

1 singular ēm (am, om) ān

2 ēh > ē - Ø ( ē ) āy

3 ēd (ad) ād ēh > ē

1 plural om, ēm (am) ām

2 ēd ēd ād

Preterite PP + present of h šud hēm I went, have gone

Past preterite PP + preterite of h šud būd hēm I had gone

Perfect PP + present of ēst šud ēstēm I have gone

Pluperfect PP + preterite of ēst šud ēstād hēm I had gone