• Embed Doc
  • Readcast
  • Collections
  • CommentGo Back
Download
 
 303
FEATURES FOR NOORI NASTALIQUE
 Aamir Wali, Atif Gulzar, Ayesha Zia, Muhammad Ahmad Ghazali, Muhammad Irfan Rafiq,Muhammad Saqib Niaz, Sara Hussain, and Sheraz Bashir 
ABSTRACT
 
Most of the scripts existing today consist of huge inventory of characters. All charactershave certain features that help perceivethem differently from one another. NooriNastalique script, like all other scripts,extracts certain characteristic featuresdefined in both visual and articulatory terms.This paper uncovers these features andanalyzes them.
1. INTRODUCTION
Till today lot of technological advancementhave taken place in development of character recognition systems from patternmatching to the more dynamic featureextraction techniques. Developing such asystem for Nastalique, a widely used Urdufont, by means of pattern or templatematching would require a huge templateinventory. This is because of contextsensitive nature of Nastalique and accordingto previous research (Aamir..,2001) thereare many as 474 shapes for 20 characters.Recognition through Feature extractionmethodology can be an efficient solution tothe Nastalique problem.This paper analyzes and lists some featuresthat can be employed to uniquely distinguishorthography of all alphabets and assist intheir recognition. Note that all features arelogical, that is they are features that ahuman mind will look for to perceive acharacter differently from other characters.
2. LITERATURE REVIEW
 
Nastalique is one of the most widely usedUrdu fonts. Categorical division of writtenscript is analogous to that used inphonology. For this reason we definecorresponding concepts in similar way.Noori Nastalique is one of the most widelyused Urdu fonts. Readers of this languagesystematically ignore certain properties of script and perceive two different shapes asthe same character. We call the storedversions of written script as
graphemes
.Thus graphemes are the smallest unit of shape recognizable as an alphabet to themind. That is, graphemes are how wementally store different shapes of charactersin our memory. All the different surfacerealizations of an underlying grapheme areits
allographs
. So such features should bedevised that would satisfy all allographs of Nastalique.Majority of the world’s languages areunwritten (Fromkins, Victoria. 2000, p. 528).For most of the languages that are, their writing systems are simple and context free.Others are much more complex and actuallyhave a context sensitive writing systems.Urdu is one such language. This complexityof Urdu is mainly due to couple of reasons.One is Urdu writing system is cursive. Morethan one character joins together to form aligature. Important thing to be observedhere is that the characters change their shape
 
depending upon their position in theligature. Each letter is written in a slightlydifferent form depending on whether itcomes in the beginning, middle or end of aword or whether it occurs on its own i.e. in adetached form. This is shown below:In the above figure the character has formedfour shapes according to its position.Another interesting property of Urdu writingsystem is that characters change their shapes depending upon the charactersfollowing and preceding it shown below.This change follows some rules like shapeof the next letter to join with and the shapeof the character, which is joining.
 
Center for Research in Urdu Language Processing304
In the above figure it can be clearly seenthat the second character 
seen
()changes its shape in accordance withfollowing and preceding characters. Thus,there can be multiple allographs of 
seen
since we are mapping different visual imageto the same grapheme.A graphical unit can be parallel or complementary distributed depending on itsenvironment. If two graphical units in thesame environment have different meaning itimplies parallel distribution as in case of graphemes. If environments for twographical units are
mutually exclusive
itimplies complementary distribution
 
as incase of 
 
allographs.The primary purpose of the present studywas to examine all the characters of NooriNastalique along with their allographs anddevise features at logical level to describeand identify them.
4. Methodology
The basic study was done at a school (Z.NHigh School) on grade 2-4 students. Thestudents were given some ligatures andasked to classify the characters in ligatures.When asked on how they distinguishcharacters, grade-2 students did not gaveany justification. But grade 3-4 studentscould identify the characters mainly due tonumber and position of dots and due tosome unique characteristics of somecharacters like, dot like head, number of dents etc.The detailed study was done at CRULP withaccordance to how children perceivecharacters and their various forms(allographs). The results of this analysis isstated and discussions next.
4. Results
 
Following is the list of features that canuniquely identify all the allophones identifiedin the previous section.
1.
Number of Dots:
Single or group of diacritics similar to a dot. This featurecan have values 0,1,2,3.
 
2. Position of Dots:
As can be seen thesedots can be below or above a character body. A dot diacritic along a character body or below is considered to have thefeature [-above].
 3. Ascenders:
Prominent vertical strokeabove the baseline. Below a, b, c areexamples of ascenders, while d, e. arenon-ascenders.
 
(a) (b) (c)(d) (e)
4.
Descenders:
Feature describing anallograph visually prominent below abaseline
.
 
5. Connected Forward:
Signifies if anallograph is connected to the followingallograph. This can also be +ve or –ve.For all isolated and last positionallographs (in a ligature) this value isnegative. From the above featureand are positively connected forward.
 6. Kink:
sharp edges within an allograph.
 7. Diacritic:
Diacritics other than dots suchas play a major role in allographidentification.
 
 
 305
 8. Concavity:
Specifies the direction of anarrow opening within an allograph
above
the baseline. An allograph canhave concavity left or right. Concavityupward or downward is not consideredconcave.
 9. Circular head:
This feature describesallographs that have a circular head.This may be filled or hollow.
 
10.
Ellipse:
This feature describesallographs that have a prominent elliptic-like shape.
 
11. Dot-like head:
A connected dot. Inother words a dot forming a part of anallograph. This is
 
not to be confusedwith a normal dot that is visiblyseparated from the allograph body.
 12. Diagonal:
A long prominent strokeinclined at a certain degree.
 13. Number of Dents:
A tooth likestructure.
 
5. Discussion
In Urdu the diacritics play a major role in notonly pronunciation but also in identification.Many of Urdu letters only differ by number of dots and their position. So first let usconsider the features related to dotdiacritics. Consider the following data wherea dot below or along the character body isperceived as
bay 
.A dot at the same position can be confusedwith
 jeem
:
 
But this does not happen. The kinkcontaining head makes the difference. Sowe can describe
 jeem
as +kink and
bay 
as-kink.Having said this,
khay 
(similar to
 jeem
buthaving a dot above) and
ghain
both havefeatures [dots=1, +above, +kink]. This gaverise to another feature concavity alreadydescribed in the previous section.
Khay 
and
ghain
have the concavity left rightrespectively.Coming back to the diacritics, consider thefollowing pairs of data set.
[]
 
[][] []
 Clearly the character body in each pair is anexact duplication of one another, the onlydifference being their diacritics. They couldtherefore not be ignored and have beenincluded in the list of features.In Urdu,
lam
at the start or within a wordlook similar to
alif 
. The cue to tell thedifference is based on the knowledge that
alif can only occur in word final position
or isolated. Since
lam
looked like alif at start or in middle
 
of the word, this led to the reason
of 00

Leave a Comment

You must be to leave a comment.
Submit
Characters: ...
You must be to leave a comment.
Submit
Characters: ...