Professional Documents
Culture Documents
net/publication/228410898
CITATIONS READS
8 1,889
3 authors:
Rahman Ali
University of Peshawar
55 PUBLICATIONS 776 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Ihsan Rabbi on 28 May 2014.
82
Proceedings of the Conference on Language & Technology 2009
83
Proceedings of the Conference on Language & Technology 2009
Some other rules are mentioned in other [similarly] [ours] [another] [friend] [talking] [do].
subsections.
“Another of our friend has the same notion.”
4.3. Lexicon
The word “ ” may be Reciprocal Pronoun (as
The lexicon used here is the simplest one [13]. in the first example) or Adjective (as in the second
Lexicon contains words and the corresponding part-of- example).
speech tags taken from the tagset. Initially, manual Now the rule for disambiguation is as
tagging is performed for a small set of Pashto text.
Manual tagging consists of all the words that belong to Given input: “ ”
close classes tags such as prepositions, conjunctions, If (+1 is Noun)
interjections and pronouns. Lexicon will grow when Then assign Adjective tag
real text will be used that is taken from sources such as Else assign Reciprocal Pronoun tag
the newspapers, books and Internet on this POS tagger
and get the words tagged. Then those words that are Similarly, consider other examples.
new to the lexicon are extracted from the tagged text
and inserted into the lexicon. ـ5& % 6 ړ8 او9 ﮯ1& :ه
4.4. Extraction of new words [haγǝ] [muskǝy] [šo] [aw] [laṛ] [pǝ] [xpalǝ] [maxǝ]
[he] [laugh] [to-become] [and] [went] [to] [own] [way]
A word that is currently not available in the lexicon
may be extracted from the output text. The mechanism “He smiled and went on his way.”
is to first tag those words that are in the lexicon
whereas those words that are not in the lexicon are Consider a further example:
given a specific mark. Afterwards, those new words are
tagged using rules. When the extraction is done, it is 'ي ـ2 =& راوړم 'و' ? ا:اوس در! ه
displayed and checked manually. Corrections are done
to those tags that are incorrect and are then entered into [aos] [bǝ] [darlǝ] [haγǝ] [ǰamǝy] [raoṛam] [če]
lexicon.
[naoiane] [ye] [aγondi].
4.5. Tagging rules [now] [future-particle] [you (directional-pronoun)]
[that] [cloths] [bring] [that] [bride] [her (clitic)] [wear].
Most words in any language take multiple tags.
Assigning a specific tag to a word requires rules. In “Now, I shall bring those clothes for you that are worn
Pashto text, there exist words that are candidates for by brides.”
multiple tags. Consider the following example in this
regard. The word’: ’ هmay be personal pronoun (as in the
first example) or demonstrative pronoun (as in second
)* &)&ت ل ـ+& ,- او. % % د ' & د example).
The rule of disambiguation is:
[da] [yewbal] [nǝ] [bǝ] [mo] [da] [xpali] [xpali]
[s̠obǝy] [aw] [ςlaqǝy] [motaςliq] [maςlomat] [kawal]. Given input: “:”ه
If (+1 is Noun)
[from] [each other] [to] [future-particle] [ours-clitic]
Then assign Demonstrative Pronoun tag
[from] [own] [own] [province] [and] [area] [about]
Else assign Personal Pronoun tag
[information] [take].
“We got information from each other about each other’s On the same token, consider the following
province and area.” examples.
84
Proceedings of the Conference on Language & Technology 2009
“People used to be very happy to the sun in this Given input: “New word”
season.” If (+1 is Punctuation/Auxiliary verb/Copula verb)
Then assign verb tag
"ر 'ﮥ د8 & روان ?ﯥ دا ﮥ6 A داF G ! H& &ر ډ:ه Else assign noun tag
ـ
[haγǝ] [mor] [ḍǝyr] [manςe] [kawalo] [če] [bačǝy] [da] 5. Test and result
[tǝ] [pǝ] [komǝ] [rawan] [ye] [da] [ṣ̌ǝ] [lar] [nǝ] [dǝ].
Initially, a very limited lexicon and rules were
[he] [mother] [very] [forbid] [to-do] [that] [son] [it] present in POS tagger. So, when we performed POS
[you] [on] [which] [go] [to-be] [it] [good] [way] [not] tagging in our POS tagger, its accuracy was low.
[to-be]. However, when more text is tagged and manual
corrections are done for those words that are new words
“He was always stopped by his mother from the way he to lexicon, the lexicon will grow and rules will be
was following as it was not the right one.” added for those new words. After some time, the
accuracy level will also be increased. When our lexicon
In the same way is reached to 100,000 words and rules to 120, the
accuracy became 88%. All these are summarized in the
ديG
? وA & ورځ ? ز& زا:ر" هL A ! E ر Table 1.
" & د &ر ز& د" ـ# 'L&& د
Table 1: Results of POS tagger
[yǝ] [xore] [!] [tǝ] [gorǝ] [kanǝ] [haγǝ] [balǝ] [wraj]
[ye] [zama] [zamano] [tǝ] [wayali] [di] [če] [da] No. of Words in No. of POS tagger
Lexicon rules Accuracy
[mamagano] [srǝ] [mo] [da] [mor] [zmakǝ] [dǝ]. 100 10 40%
[o] [sister] [!] [you] [see] [tick] [that] [other] [day] [him 1000 40 62%
(clitic)] [mine] [sons] [to] [talk] [to-do] [that] [to] 10,000 70 76%
[uncles] [has] [their (clitic)] [her] [mother] [land] [to- 100,000 120 88%
be].
From the above table, it is clear that the accuracy is
“O sister! You see that it has been said to my sons that increased with increasing the number of words in the
their mother has a share in their uncles’ property.” lexicon and the number of rules for disambiguation.
The word ‘A’ may be personal pronoun or a 6. Conclusion and future work
postposition. In the first sentence, it is used as a
postposition whereas in second sentence it is a personal To enable most of our common people to
pronoun. In the third sentence, its both forms exist. communicate with the computer and use a computer
Now the rule for disambiguation is as more efficiently, it is necessary to enable computers to
know our language. Part of speech tagging plays a vital
Given input: “A” role in natural language processing. This paper presents
If (-1 is Noun) a reasonably accurate POS tagger for Pashto language.
Then assign Postposition tag Part of Speech tagging helps in the creation process
Else assign Personal Pronoun tag of a parser [8], therefore the future work includes the
design of a parsing algorithm for Pashto language. The
Similarly, new words that may not be present in the structure of Pashto sentences must be identified during
lexicon can also be the input of POS tagger. So, the development of Pashto parser.
assigning a proper tag for the new word may also be
considered. We know that the word order of Pashto
sentences is SOV [1]. When a word is scanned and that
7. References
word is not in lexicon, it is highly likely that the given [1] F. Babrakzai, “Topics in Pashto Syntax”, Ph.D Thesis,
word will be a noun or verb because these are open Linguistics Department, University of Hawai’I, 1999.
85
Proceedings of the Conference on Language & Technology 2009
[3] M. A. Khan, and F. Zuhra "A Computational Approach to N1CD Singular Common Mه
the Pashto Pronoun", PUTAJ Science, University of Direct Noun [halak]
Peshawar, 2006. Boy
N1CO Singular Common Mه
[4] F. Zuhra, and M. A. Khan, "Towards the Computational
Oblique Noun [halak]
Treatment of the Pashto Verb", In Proceeding of Conference
on Language and Technology (CLT07), Brar Gali Summer Boy
Campus, University of Peshawar, August 7-11, 2007. N2CD Plural Common هن
Direct Noun [halakān]
[5] S. Khoja, R. Garside, and G. Knowles, “A tagset for the
morpho-syntactic tagging of Arabic”, Computing Department Boys
Lancaster University N2CO Plural Common '-ه
Oblique Noun [halakāno]
[6] E. Marsi, A. V. Bosch, and A Soudi. “Memory based
Boys
morphological analysis generation and part of speech tagging
of Arabic”, Center for Computational Linguistics Rabat, N1PD Singular Proper --
Morocco. Direct Noun
N1PO Singular Proper --
[7] D. T. Hu, “Development of Part of Speech Tagging and Oblique Noun
Syntactic Analysis Software for Chinese Text”, Bachelors & N2PD Plural Proper Direct --
Masters thesis of Engineering in Electrical Engineering and Noun
Computer Science. N2PO Plural Proper --
[8] M. Dalrymple, “How much can part-of-speech tagging
Oblique Noun
help parsing?”, Centre for Linguistics and Philology, PP Personal Pronoun "ز
University of Oxford, Oxford OX1 2HG UK. [zә]
I
[9] H. Penzl, "A Grammar of Pashto", University of
PS Possessive Pronoun &ز
Michigan, February 1954 .
[zmā]
[10] S. Reshteen, “Pashto Grammar”. Mine
PD Demonstrative 2د
[11] T. Rahman, “The Pashto language and identity-formation Pronoun
in Pakistan”,
[dáγa]
http://www.informaworld.com/smpp/title~content=t71341186 This
6 PI Indefinite Pronoun ه
[her]
[12] H. Tegey, and B. Robson, "A Reference Grammar of
Pashto", Center for Applied Linguistics, Washington, D.C,
Every
1996. PX Reflexive Pronoun نO %
[xpǝl jān]
[13] D. Jurafsky, and J. H. Martin, “Speech and Language Self
Processing”, Published by Pearson Education (Singapore).
PR Reciprocal Pronoun
[14] M. A. Zyar, "Pashto Grammar", Danish Publishing [yew bәl]
Association Qissa Khwani Bazar, Peshawar, 2003. Each Other
PN Intensive Pronoun %56
[15] H. Schmid, “Part-Of-Speech Tagging With Neural
Networks”, Institute for Computational Linguistics,
[paxplǝ]
Azenbergstr.12, 70174 Stuttgart, Germany. Oneself
PG Interrogative PQ
[16] M. J Yar, “Gul Meena”: A Pashto Novel published by Pronoun [cúk]
Pashto Academy University of Peshawar, ISBN 969-418-
026-0
Who
PL Relative Pronoun م
86
Proceedings of the Conference on Language & Technology 2009
[kúm yaw]
Which One Adj Adjectives G!
PT Directional Pronoun را [ṣ̌kolәy]
[rā] Beautiful
I FW Foreign Word --
PC Clitic Pronoun &ﯥ
Abb Abbreviation --
[me]
Sym Symbol --
I
PB Distributive ه . Full Stop -
Pronoun [her yew] , Comma ،
Each One
VS Present verb رم ? Question Mark ؟
[xoram] ! Exclamation Mark !
Eat
VT Past Verb "وړ : Colon :
[woxṛǝ] ; Semi-Colon ;
Ate
VI Infinitive Verb
لOL “ Open Quotation ”
[garjedal] Mark
” Closed Quotation “
Walking
Mark
PF Future Particle
( Open Parenthesis )
[bǝ]
Indicate Future ) Closed Parenthesis (
VA Auxiliary Verb ل
[ Open Square ]
[kawal] Bracket
To Do { Open Curly Bracket }
VSC Present Copula D
Verb [yәm] } Close Curly Bracket {
To Be ] Closed Square [
VTC Past Copula Verb وم Bracket
[wom] " Neutral Quotation "
To Be
‘ Open Single ’
Pre Preposition د
Quotation Mark
[da] ’ Closed Single ‛
From Quotation Mark
Pos Postposition C
[kṣ̌e]
In
Num Numeral
[yew]
One
Int Interjection G?ه
[haәy]
For Sorrow
Con Conjunction او
[aw]
And
Adv Adverb ديR'
[niždəy]
87