This action might not be possible to undo. Are you sure you want to continue?
L.BalaSundaraRaman, Ishwar Sridharan
Abstract Tamil is a classical language belonging to the Dravidian language family. Poetry in Tamil literature, especially from the post-classical period and the neo-classical period, largely adheres to the welldefined rules of metre described in tolkāppiyam, and later in yāpparunkalam, yāpparunkalakkārigai and other books on Tamil grammar.1 Metre in Tamil is defined in terms of six elements eḻuttu (phone), acai (metreme), cīr (metrical foot), taḷai (linkage), aṭi (metrical line) and toṭai (ornamentation). Based on the metre, Tamil poems are classified into five types of verses (pā) viz. 1. veṉpā 2. āciriyappā 3. vañcippā 4. kalippā and 5. maruṭpā. The authors’ claim is that the grammar governing metre in Tamil is well-structured such that it can be written as a Context-free Grammar (CFG). The productions in this CFG are written using the six elements. With the formal grammar for the metres expressed in EBNF, it is now possible to write a parser for the grammar. A parser, named visaineri2, that analyses a given metrical text in Tamil and generates a tree consisting of the six basic elements of metre has been developed by the authors. This parser is built on the spark framework.3 The parser takes as input tamil verses, checks for conformance to the formal grammar specified in EBNF, and produces as output an XML file specifying the various metremes in a schema along the lines of TEI P54 and [JLC].5 As a direct outcome, visaineri helps automate the intensive operation of annotating literary works with metrical elements. This process is otherwise done by a labourious manual effort. After annotation, the text can be fed to a search engine to search for metrical patterns. Various statistical analyses can also be performed on the annotated corpus. Since there is copious amount of literature available to be annotated and analysed, the parser described above will be very helpful for researchers. In addition to analysing existing corpora, the parser has been employed by modern poets who want to write poems conforming to classical metre using visaineri.
Niklas, Ulrike (1988). "Introduction to Tamil Prosody". Bulletin de l'Ecole française d'Extrême-Orient 77 (1): 165–227. doi:10.3406/befeo.1988.1744. ISSN 0336-1519. 2 http://www.visaineri.net/ 3 http://pages.cpsc.ucalgary.ca/~aycock/spark/ 4 TEI Consortium, eds. TEI P5: Guidelines for Electronic Text Encoding and Interchange. [November 1, 2007]. TEI Consortium. 5 Jean-Luc Chevillard, Critical editions of Tamil works: exploratory survey and future perspectives (INFITT 2009, Köln, 25th October)
System of Tamil Prosody
The system of Tamil prosody is defined in terms of the following six elements:6 1. 2. 3. 4. 5. 6. eḻuttu (phone) acai (metreme) cīr (metrical foot) taḷai (linkage) aṭi (metrical line) toṭai (ornamentation)
Among the above, except for a few toṭai, all the elements are identified by phonological rules. These well-defined rules have a direct bearing on the “metrical essence” of any Tamil pā. Based on these rules, a stanza of Tamil poetry can be classified into any of the four pā or their 12 variants.7
eḻuttu or phone
eḻuttu form the basic tokens of the Tamil prosodic system. Specifically, acai are composed of them. Certain phonological rules in tolkāppiyam specify the length of a metrical line in terms of the phones. Several features of ‘ornamentation’ depend upon the attributes of phones occuring in specific positions of metrical feet. Tamil eḻuttu are classified into primary phones and secondary phones. The primary phones are further classified into vowels and consonants. The 12 vowels are further classified into short (‘kuṟil’) and long (‘neṭil’) vowels. The 18 consonants are classified into three groups viz. hard, soft, and middle.
acai or metreme
acai is composed of phones and forms the basic unit of Tamil metres. Each acai or metreme can have one or two syllables, and in certain specific scenarios, three syllables. A short syllable or one long syllable, either ending in a consonant or otherwise, but not forming a part of a disyllabic sequence, is called nēr-acai. A sequence of an open short syllable followed by a long syllable that can either end in a consonant or remain open, is called nirai-acai. When an overshort u (a secondary phone) follws nēr-acai or nirai-acai in specific locations of a pā, it can be called nērpu or niraipu as the case may be.
In Introduction to Tamil Prosody, Ulrike Niklas elaborates how unique the elements of Tamil prosody are. She
favours a terminology distinct and aligned with the original Tamil definition. The same has been followed in this paper.
cīr or metrical foot
One to four metremes can form a cīr or metrical foot. The metrical feet in turn form the aṭi or metrical line. The metrical feet are classifed according to the number and sequence of metremes that they are formed of. Each of these types have a mnemonic or patternword as its name. The patternword itself will have the sequence of metremes that it represents. The full list of metrical feet can be found in the Context-free grammar given later below.
taḷai or linkage
The constraints on the way one cīr can be linked with another dictate the taḷai. The linkage elements taḷai define the relationship between the last acai of a cīr with the first acai of its following one. There are seven types of taḷai, based on the acai composition of the cīr under consideration (‘nilaimoḻi’) and the first acai of the following cīr (‘varumoḻi’).
aṭi or metrical line
An aṭi is a line in a pā. It can be defined variously in terms of its component metrical feet or the sequence of taḷai. In addition, tolkāppiyam specifies the counts of phones in an aṭi. U. Niklas lists the types of aṭi and the number of feet and the range on the number of phones each of them are composed of. The types are kuṟaḷaṭi (‘dwarf’), cintaṭi (‘short’), aḷavaṭi (‘standard’), neṭilaṭi (‘long’), and kaḻineṭilaṭi (‘extremely long’).
toṭai or ornament
There are five kinds of ‘ornament’ and seven methods of ‘ornamentation’ that can occur in corresponding phones within or across metrical lines. Ornaments include alliteration and rhyme. Ornamentation happens with a single metrical line and across metrical feet. Both ornaments and ornamentation form the toṭai. While many of the toṭai are phonological relationships, the authors have kept them under the purview of future research.
Context-free grammar for Tamil prosody
At the level of phones, metremes, and metrical feet, the phonological rules are shared between the various kinds of pā. These rules form a Context-free grammar. This Context-free grammar has been expressed in Extended BackusNaur Form (EBNF) below. pā ::= aṭi pā pā ::= aṭi aṭi ::= cīr DELIMITER aṭi aṭi ::= cīr aṭi TERMINAL aṭi ::= īṟṟuccīr aṭi TERMINAL cīr ::= īracaiccīr cīr ::= mūvacaiccīr cīr ::= nālacaiccīr īṟṟuccīr ::= ōracaiccīr īṟṟuccīr ::= niraipu (yāpp. 6,7) nēr ::= nēr ::= nēr ::= nēr ::= nēr ::= (yāpp. 8,9) nirai ::= nirai ::= nirai ::= nirai ::= nirai ::=
kuṟil oṟṟu neṭil oṟṟu kaṭaikkuṟil kaṭaineṭil kuṟil
kuṟil kuṟil kuṟil kuṟil kuṟil
kuṟil oṟṟu kaṭaikkuṟil kuṟil neṭil oṟṟu kaṭaineṭil
īṟṟuccīr ::= nērpu (yāpp. 14) ōracaiccīr ::= malar ōracaiccīr ::= nēr (yāpp. 11) īracaiccīr īracaiccīr īracaiccīr īracaiccīr
nirai ::= kuṟil neṭil malar malar malar malar ::= ::= ::= ::= kuṟil kuṟil kuṟil kuṟil kuṟil oṟṟu kaṭaikkuṟil neṭil oṟṟu kaṭaineṭil
::= ::= ::= ::=
karuviḷam puḷimā kūviḷam tēmā
nērpu ::= nēr overshort_u niraipu ::= nirai overshort_u karuviḷam ::= nirai nirai puḷimā ::= nirai nēr kūviḷam ::= nēr nirai tēmā ::= nēr nēr karuviḷaṅkaṉi ::= nirai nirai nirai puḷimāṅkaṉi ::= nirai nēr nirai kūviḷaṅkaṉi ::= nēr nirai nirai tēmāṅkaṉi ::= nēr nēr nirai karuviḷaṅkāi ::= nirai nirai nēr puḷimāṅkāi ::= nirai nēr nēr kūviḷaṅkāi ::= nēr nirai nēr tēmāṅkāi ::= nēr nēr nēr karuviḷanaṟuniḻal ::= nirai nirai nirai nirai kūviḷanaṟuniḻal ::= nēr nirai nirai nirai puḷimānaṟuniḻal ::= nirai nēr nirai nirai tēmānaṟuniḻal ::= nēr nēr nirai nirai karuviḷanaṟumpū ::= nirai nirai nirai nēr kūviḷanaṟumpū ::= nēr nirai nirai nēr puḷimānaṟumpū ::= nirai nēr nirai nēr tēmānaṟumpū ::= nēr nēr nirai nēr karuviḷantaṇṇiḻal ::= nirai nirai nēr nirai kūviḷantaṇṇiḻal ::= nēr nirai nēr nirai puḷimāntaṇṇiḻal ::= nirai nēr nēr nirai tēmāntaṇṇiḻal ::= nēr nēr nēr nirai karuviḷantaṇpū ::= nirai nirai nēr nēr kūviḷantaṇpū ::= nēr nirai nēr nēr puḷimāntaṇpū ::= nirai nēr nēr nēr tēmāntaṇpū ::= nēr nēr nēr nēr
(yāpp. 12) mūvacaiccīr mūvacaiccīr mūvacaiccīr mūvacaiccīr mūvacaiccīr mūvacaiccīr mūvacaiccīr mūvacaiccīr (yāpp. 13) nālacaiccīr nālacaiccīr nālacaiccīr nālacaiccīr nālacaiccīr nālacaiccīr nālacaiccīr nālacaiccīr nālacaiccīr nālacaiccīr nālacaiccīr nālacaiccīr nālacaiccīr nālacaiccīr nālacaiccīr nālacaiccīr
::= ::= ::= ::= ::= ::= ::= ::=
karuviḷaṅkaṉi karuviḷaṅkāi puḷimāṅkaṉi puḷimāṅkāi kūviḷaṅkaṉi kūviḷaṅkāi tēmāṅkaṉi tēmāṅkāi
::= karuviḷanaṟuniḻal ::= karuviḷanaṟumpū ::= karuviḷantaṇṇiḻal ::= karuviḷantaṇpū ::= puḷimānaṟuniḻal ::= puḷimānaṟumpū :: puḷimāntaṇṇiḻal ::= puḷimāntaṇpū ::= kūviḷanaṟuniḻal ::= kūviḷanaṟumpū ::= kūviḷantaṇṇiḻal ::= kūviḷantaṇpū ::= tēmānaṟuniḻal ::= tēmānaṟumpū ::= tēmāntaṇṇiḻal ::= tēmāntaṇpū
Based on the P5 guidelines8 developed by the Text Encoding Initiative, Jean-Luc Chevillard has proposed a format to represent the parse tree of Tamil metrical text.9 The authors have adapted his proposal and added attributes like the
TEI Consortium, eds. TEI P5: Guidelines for Electronic Text Encoding and Interchange. [November 1, 2007]. TEI
Jean-Luc Chevillard, Critical editions of Tamil works: exploratory survey and future perspectives (INFITT 2009,
Köln, 25th October)
type of cīr and taḷai. An example output for a pā from tirukkuṟaḷ is given below.
பா பாய உ பாய மைழ பா கி பா
With the grammar for Tamil prosody expressed in EBNF format, it is now possible to write a parser to check for conformance of metrical texts to the grammar. The authors have developed the visaineri parser using the SPARK framework10 in Python programming language. The parser follows the standard four-stage parsing process. Each phase performs a well-defined task, and passes an output data structure on to the next phase.
The four stages of Visaineri parser are given below: 1. Lexical Analysis: This stage breaks the input text into a list of tokens (eḻuttu) 2. Syntax Analysis: In this stage, the parser checks if the list of tokens (eḻuttu) come together to form metremes (acai), metrical foot (cīr), metrical line (aṭi) in conformance with the CFG. The result of the parsing is an Abstract Syntax Tree (AST). 3. Semantic Analysis: In this stage, the tree is traversed, information on the various acai, cīr and aṭi collected to check if the semantics are in accordance with the taḷai rules and updates the corresponding nodes in the AST. 4. XML Generation: Once the AST is complete, the XML file is generated by traversing the AST.
In order to show the utility of a metrical parser beyond outputting a parse tree corresponding to a verse, the authors ran it on a classical work of ethics, the Tirukkuṛaḷ. It is a poetic text from the patiṉeṇkīḻkaṇakku collection dated to the post-Sangam period. All the poems in the text are written in Venpa metre. For the purposes of this paper, the authors fed 1323 verses from the text through the parser and obtained the XML output. The statistical analysis of tirukkuṛaḷ poems is presented below: Frequency distribution of various prosodic features Prosodic feature nēr-acai nirai-acai karuviḷaṅkāi cīr kūviḷaṅkāi cīr puḷimāṅkāi cīr kūviḷam cīr karuviḷam cīr tēmā cīr puḷimā cīr Frequency 14301 6456 507 364 1008 866 602 2549 1339 nāḷ Malar iyarcīrveṇṭaḷai veṇcīrveṇṭaḷai nēr cīr at acai beginning nēr cīr at acai end nirai cīr at acai beginning nirai cīr at acai end Prosodic feature Frequency 174 661 4868 3070 5144 7132 4117 2129
The table lists the various prosodic features and their observed frequency. The first two features show the distribution of nēr and nirai acai across poems. Due to the nature of Venpa grammar, the numbers are skewed towards nēr. The next nine features show the distribution of metrical feet patterns (cīr). The subsequent two features show the distribution of linkages between acai. Since tirukkuṛaḷ verses are written in Venpa metre, iyarcīrveṇṭaḷai and veṇcīrveṇṭaḷai are the only types of linkages allowed between cīr. The last four features show the distribution of nēr and nirai occuring at the beginning and end of each cīr.
The theoretical outcome of the research described here is in establishing indigenous Tamil prosody as a Context-Free Grammar. In addition to that, the parser saves a lot of manual labour spent by linguists who identify the prosodic elements by hand and aggregate statistics based on that. This process is both cumbersome and error-prone. With plenty of texts still to be parsed and analysed, automation has a significant utility.
Most of the Tamil works available from ancient to modern times are in poetic form. Going forward, the authors intend to run the available Tamil works in metrical form through the parser. The parsed information will be used for collecting aggregate information about the distribution of prosodic elements and other analyses. There is a recent increase in interest in writing modern poems according to metrical rules. The parser will be used to enable people to write metrical poetry online by providing realtime feedback on the conformance to the phonological rules.