Professional Documents
Culture Documents
) )
+ = (6)
The segnenlalion of vords is conpIeled ly sinpIy conparing lhe dislance G vilh lhis
oplinaI dislance or lhreshoId
0
G . The case
0
G G > , idenlifies lvo vords and lhe aIlernalive
case idenlifies lvo sul-vords.
0 20 40 60 80 100 120 140
0
0.02
0.04
www.intechopen.com
nteractive Knowledge Discovery for Baseline
Estimation and Word Segmentation in Handwritten Arabic Text 629
4. ExperimentaI ResuIts and Discussions
Texl recognilion syslens can le cIassified inlo lvo areas, prinled lexl or handvrillen lexl.
Irinled lexls have siniIar shapes if prinled nany lines using differenl devices, hovever,
due lo differenl vriler slyIes, handvrillen lexls have high varialiIily. The nain goaI of a
handvriling recognilion syslen is lo delernine lhe cIass of lhe characler or vord. Il is a
nore difficuIl lask lo design a recognilion syslen vhich can recognize lhe handvriling of
nany peopIe inslead of jusl lhe handvriling of a singIe vriler. AIso, in order lo evaIuale
handvriling recognilion syslens, lhe accuracy and lhe speed have lo le neasured and
conpared lo lhose of an average of hunan reader. In lhe Iileralure, sone recognilion
syslens vere reporled vilh high recognilion rales. This is generaIIy due lo lheir lesling dala
vhich consisled of a snaII sel of vords vrillen ly fev vrilers, ralher lhan a slandard
dalalase. Any recognilion syslen needs a Iarge dalalase lo lrain and lesl lhe syslen. ReaI
dala fron lanks or lhe posl code are confidenliaI and inaccessilIe for non connerciaI
research. AIlhough sone vork has leen conducled on Aralic handvrillen vords, lhis
generaIIy used lhe aulhors ovn snaII dalalases or dalalases vhich vere unavaiIalIe lo lhe
pulIic. Mosl recognilion syslens have leen deveIoped for cerlain appIicalions such as lhe
reading of poslaI addresses or cheques. An exanpIe of a Iarge slandard LngIish dalalase
suilalIe for lhe deveIopnenl and lraining and lesling of recognilion syslens is lhe one
crealed 14 years ago ly HuII (HuII 1994). This vas deveIoped for lhe cenlre of LxceIIence for
Docunenl AnaIysis and Recognilion (CLDAR) al lhe Slale Universily of Nev York al
uffaIo and consisls of 5OOO cily nanes, 5OOO slale nanes, 1OOOO ZII codes, and 5OOOO
aIphanuneric characlers. The NalionaI Inslilule of Slandards and TechnoIogy (NIST) has
aIso provided a handvrillen dalalase vhich incIudes LngIish Iellers in Iover and upper
cases, nunler digils and lhe conpuler and Connunicalion Research Laloralory of lhe
InduslriaI TechnoIogy Research Inslilule in Taivan have reIeased a handvrillen Chinese
characlers dalalase vrillen ly 2OOO vrilers |Huang, 1993 #1O6.
The dala sel for lhe experinenls is lhe IIN/LNIT dalalase. Iechvilz el aI. (Iechvilz,
Maddouri el aI. 2OO2) reIeased lhe
Any recognilion syslen needs a Iarge dalalase lo lrain and lesl lhe syslen. ReaI dala fron
lanks or lhe posl code are confidenliaI and inaccessilIe lo non connerciaI research. LarIy
vork conducled using Aralic handvrillen vords, generaIIy used snaII individuaI
dalalases or presenled resuIls on dalalases vhich vere unavaiIalIe lo lhe pulIic.
ConsequenlIy, lhere vas no lenchnark lo conpare lhe resuIls ollained ly researches. This
silualion changed in 2OO2 vhen lhe IIN/LNIT dalalase (vvv.ifnenil.con) lecane
avaiIalIe free for non connerciaI research. The IIN/LNIT dalalase, is very inporlanl in
lhis conlexl and has leen used as a slandard lesl sel |9j. In lolaI nore lhan 1OOO differenl
peopIe vere seIecled lo vrile lheir nanes and lo fiII in one or nore forns vilh handvrillen
pre-seIecled nanes of Tunisian lovn/viIIages and lhe corresponding poslcode. AII lhe
forns vere scanned al 3OO dpi and converled lo linary inages.
The dalalase consisls of 937 Tunisian lovn/viIIages nanes logelher vilh lheir poslcodes. In
lolaI nore lhan 1OOO differenl vrilers vere used. Lach vriler vas asked lo fiII in one or
nore forns vilh handvrillen pre-seIecled nanes of Tunisian lovn/viIIages and lhe
corresponding poslcode. AII lhe forns vere scanned al 3OOdpi and converled lo linary
inages. The inages are divided inlo five sels so lhal researchers can use sone of lhen for
lraining and sone for lesling. Sone pre-processing lasks incIuding noise renovaI, lexl lIock
segnenlalion, linarizalion and vord segnenlalion have leen done during lhe deveIopnenl
www.intechopen.com
Recent Advances in Technologies 630
of lhe IIN/LNIT dalalase lo nake cropped linary inages of lhe nanes of lovns and
viIIages avaiIalIe.
Corresponding lo lhe lesl dala sels as descriled alove, ve design lvo phases of
experinenls lo evaIuale lhe proposed aIgorilhns, vhich incIude: Ihase-1: experinenls lo
evaIuale perfornances of laseIine eslinalion, and Ihase-2: experinenls lo evaIuale
perfornances of connecled conponenl anaIysis and lhe perfornances of vord
segnenlalion,
In phase-1, our experinenls focus on lhe laseIine eslinalion. Due lo lhe facl lhal lhe
laseIine is a parl of lhe ground lrulh of lhe IIN/LNIT dalalase, so il is possilIe lo evaIuale
lhe laseIine. Iigure 7 shovs lhe phase-1 experinenlaI resuIls on laseIine eslinalion, fron
vhich il can le seen lhal lhe proposed aIgorilhn vorked veII vhen appIied lo 4OOO inages
using lhe four differenl sels (lhe firsl 1OOO inage forn sels a,l,c,and d vas seIecled). The
resuIls for laseIine eslinalion reach 97.675 of accuracy, vhich nakes lhe proposed
aIgorilhn nore effeclive in eslinaling a vord laseIine. TalIe 2 sunnarizes lhe
experinenlaI resuIls for lhe laseIine eslinalion proposed aIgorilhn.
Sel a l c d average
Iercenlage () 97.4 97.8 97.9 97.6 97.675
TalIe 2. Ierfornance of lhe laseIine eslinalion aIgorilhn
In conparison vilh lhe exisling vork, our laseIine aIgorilhn perforns leller in eslinaling
lhe laseIine. TalIe 3 sunnarizes lhe resuIls of our aIgorilhn conpared lo lhe resuIls of
exisling vork.
Melhod Hough
Irojeclion |31j
SkeIelon ased |31j Iroposed AIgorilhn
Iercenlage () 88 88 88.9
TalIe 3. Ierfornance of proposed aIgorilhn vs. olher nelhods
In generaI, caIcuIaling lhe laseIine error is used lo eslinale lhe laseIine quaIily. The error is
caIcuIaled as lhe area lelveen lhe ground lrulh laseIine and lhe eslinaled laseIine in
pixeIs. Iigure 8, shovs an exanpIe of caIcuIaling lhe laseIine error, vhiIe Iigure 9 shovs
lhe reIalion lelveen lhe eslinaled laseIine and lhe ground lrulh laseIine.
Iig. 7. LxanpIe laseIine eslinalion resuIls
www.intechopen.com
nteractive Knowledge Discovery for Baseline
Estimation and Word Segmentation in Handwritten Arabic Text 631
Iig. 8. aseIine error
Iig. 9. The reIalion lelveen lhe eslinaled and lhe ground lrulh laseIine
To conpIele phase-2 experinenls, verlicaI hislogran and connecled conponenl anaIysis are
carried oul for vord segnenlalion. Word segnenlalion approaches are lased on lhe
assunplion lhal lhe lexl Iines are slraighl. This vorks veII for nachine prinled docunenls,
lul il faiIs on lhe handvrillen docunenls having curviIinear lexl Iines. Here, lhe dislances
lelveen sul-vords are neasured and conpared lo an oplinaI lhreshoId lo delernine if lhe
dislance corresponds lo separalion of lvo vords or nol. The segnenlalion aIgorilhn
searches for horizonlaI gaps lelveen lhe connecled conponenls on a pre-eslinaled
lhreshoId. In conparison vilh lhe exisling vork, our vord segnenlalion aIgorilhn
iIIuslrales significanl advanlage, vhich can le highIighled as: in lhe case of niss-spaced
vords, vhere lhe aIgorilhn faiIed lo delernine lounding loxes, spaces vere aulonalicaIIy
adjusled nol adjusled nanuaIIy using graphicaI looIs.
In generaI, lhere are severaI lypes of error lhal occur during lhe process of segnenlalion
vhalever lhe approach used. These errors can le sunnarized as:
1) Over segnenlalion, vhen lhe nunler of segnenls is grealer lhan lhe acluaI
nunler.
2) Under segnenlalion vhen lhe nunler of segnenls is Iess lhan lhe acluaI nunler.
3) MispIaced segnenlalion vhen lhe nunler of segnenls is righl lul lhe Iinils are
vrong.
We have lesled our lechniques on a lesl sel of 5OO inages and lhe resuIls are conpared lo
lhe ground lrulh lased on lhe grouping of lhe lounding loxes inlo vords. TalIe 4
sunnarizes lhe vord segnenlalion resuIls, sone of lhe resuIls are presenled in Iigure 1O.
www.intechopen.com
Recent Advances in Technologies 632
Iron TalIe 4 ve can see lhal lhe correcl segnenlalion rale achieved for inages is 85. The
segnenlalion error of 15 is due lo lhe varialions in handvriling, especiaIIy irreguIar
spaces lelveen sul-vords and vords, such as loo snaII spaces lelveen vords (vhich viII
Iead under segnenlalion ly incorreclIy nerging lvo vords logelher) or loo Iarge spaces
lelveen sul-vords (vhich nay le vrongIy laken as lvo vords and Iead lo over-
segnenlalion). LxanpIes of lhese errors are iIIuslraled in Iigure 11.
In conparison vilh lhe exisling vork, il is difficuIl lo conpare our vork lo |35j since lhey
have used sone olher crileria and lhey have chosen 2OO inages. They did nol nenlion
vhich 2OO inages of lhe dalalase vhich nake our aIgorilhn can nol le inpIenenled lo lhe
sane dala. Moreover, in lhe case of niss-spaced vords, vhere lhe aIgorilhn faiIed lo
delernine lhe lounding loxes, our aIgorilhn perforn leller since il reduces lhe nunlers of
such errors. The dislance lelveen vords and sul-vords vere aulonalicaIIy nornaIized ly
using knovIedge of lhe Aralic Ianguage nol adjusled nanuaIIy using lhe graphicaI looIs, in
vhich lhe vord case can le delernined. Ior exanpIe, lhe vord in Iigure 11 (a) vas over
segnenled, lul afler dislance nornaIizalion lhe vord inage is nov segnenled correclIy as
shovn in Iigure 12.
In addilion, lhe ruIes of Aralic Language vriling can le expIoiled and appIied lo lhe
dislance nornaIizalion. The originaI inage is scanned fron righl lo Iefl coIunn ly coIunn,
and lhe vhile (lIank) coIunns are delecled and adjusled in size in order lo reduce lhe
dislances lelveen sul-vords as IIIuslralion of dislance nornaIizalion is given in Iigure 12
shov. Afler appIying lhe dislance nornaIizalion, lhe Aralic vords are correclIy segnenled.
Since each handvrillen inage has Cround Trulh (CT) infornalion for evaIualion purposes,
lhe resuIls are conpared vilh lhe IIN/LNIT CT infornalion.
No. of
Inages
Correcl
segnenlalion
Under
segnenlalion
Over
segnenlalion
MispIaced
segnenlalion
5OO 85 9 4 2
TalIe 4. OveraII segnenlalion resuIls
Iig. 1O. LxanpIe successfuI vord segnenlalion resuIls
www.intechopen.com
nteractive Knowledge Discovery for Baseline
Estimation and Word Segmentation in Handwritten Arabic Text 633
Iig. 11. LxanpIe faiIed vord segnenlalion resuIls
original image distance normalized image in double(1)
distance normalized image in double(2) distance normalized image in binary
Iig. 12. LxanpIe faiIed vord segnenlalion resuIls
5. ConcIusion
Aralic handvriling recognilion depends on accurale pre-processing and segnenlalion. This
chapler proposes a rolusl nelhod for laseIine eslinalion and a slalislicaI anaIysis lo
delernine an oplinaI lhreshoId for vord segnenlalion. y using knovIedge of polenliaI
posilions of lhe laseIine, nore accurale resuIls are ollained in conparison vilh lhose
vilhoul knovIedge supporl. In addilion, lhe oplinaI lhreshoId ollained is found lo le very
effeclive for roluslIy segnenling vords in Aralic lexl.
A conponenl-lased nelhod is inlroduced lo segnenl vords fron handvrillen Aralic lexls.
Since nany peopIe have enphasized eilher segnenl-free lased nelhods or Ieller or slroke
lased approaches, vords segnenlalion has nol le veII addressed. Here, our vork provides
www.intechopen.com
Recent Advances in Technologies 634
a praclicaI vay of accuraleIy segnenling vords fron lhe lexl. This is usefuI and nore
fIexilIe lhan segnenl-free lased approaches as il can nake good use lhe conponenl parls of
inages in furlher recognilion. AIso, lhis approach is sinpIer and nore rolusl lhan Ieller-
lased nelhods lecause lhe Ieller has nuch difficuIly in effecliveIy segnenling arlilrary
handvrillen characlers. We have found lhal dislance infornalion is very usefuI for
segnenling vords, lul inprovenenls are sliII desiralIe. A dislance nornaIizalion lechnique
naking use of knovIedge of lhe Ianguage vas appIied lo reduce lhe nunlers of over and
under segnenlalion errors. Iurlher invesligalions viII ain lo furlher inprove vord
segnenlalion ly using Ianguage knovIedge for vaIidalion.
6. References
|Aluhaila, I. S. I., M. }. }. HoIl, el aI. (1996). "Irocessing of linary inages of handvrillen lexl
docunenls." Iallern Recognilion 29(7): 1161-1177.
AI-adr, . and R. M. HaraIick (1995). Segnenlalion-free vord recognilion vilh appIicalion
lo Aralic. Iroceedings of lhe Third InlernalionaI Conference on Docunenl
AnaIysis and Recognilion.
AI-Ma'adeed, S., D. LIIinan, el aI. (2OO2). A dala lase for Aralic handvrillen lexl
recognilion research. Lighlh InlernalionaI Workshop on Ironliers in Handvriling
Recognilion
AI-Rashaideh, H. (2OO6). "Ireprocessing phase for Aralic Word Handvrillen Recognilion."
Infornalion Transnissions in Conpuler Nelvorks 6: 11-19.
AIna'adeed, S. (2OO6). Recognilion of Off-Line Handvrillen Aralic Words Using NeuraI
Nelvork. Ceonelric ModeIing and Inaging--Nev Trends
AIna'adeed, S., C. Higgens, el aI. (2OO2). "Recognilion of Off-Line Handvrillen Aralic
Words Using Hidden Markov ModeI Approach " 16lh InlernalionaI Conference on
Iallern Recognilion (ICIR'O2) 3: 481-484.
AIna'adeed, S., C. Higgins, el aI. (2OO4). "Off-Iine recognilion of handvrillen Aralic vords
using nuIlipIe hidden Markov nodeIs." KnovIedge-ased Syslens 17(2-4): 75-79.
AInuaIIin, H. and S. Yanaguchi (1987). "A nelhod of recognilion of Aralic cursive
handvriling." ILLL Transaclions on Iallern AnaIysis and Machine InleIIigence 9(5):
715 - 722
Anin, A. (1997). Off Iine Aralic characler recognilion: a survey. Iourlh InlernalionaI
Conference on Docunenl AnaIysis and Recognilion.
Anin, A. (1998). "Off-Iine Aralic characler recognilion: lhe slale of lhe arl." Iallern
Recognilion 31(5): 517-53O.
Anin, A. (2OOO). "Recognilion of prinled aralic lexl lased on gIolaI fealures and decision
lree Iearning lechniques." Iallern Recognilion 33(8): 13O9-1323.
Anin, A., H. AI-Sadoun, el aI. (1996). "Hand-prinled aralic characler recognilion syslen
using an arlificiaI nelvork." Iallern Recognilion 29(4): 663-675.
Anin, A. and H. . AI-Sadoun (1992). A nev segnenlalion lechnique of Aralic lexl.
Iroceedings., 11lh IAIR InlernalionaI Conference on Iallern Recognilion, 1992.
VoI.II. Conference : Iallern Recognilion MelhodoIogy and Syslens, .
Iarooq, I., C. Venu, el aI. (2OO5). Ire-processing nelhods for handvrillen Aralic
docunenls. Iroceedings Lighlh InlernalionaI Conference on Docunenl AnaIysis
and Recognilion.
www.intechopen.com
nteractive Knowledge Discovery for Baseline
Estimation and Word Segmentation in Handwritten Arabic Text 635
Ireenan, H. (1961). "On lhe encoding of arlilrary geonelric configuralion." ILLL Trans.
LIeclronic Conpuler 1O: 26O-268.
Cray, R. M. (1989). "veclor quanlizalion." ILLL Trans. ASSI(1): 4-29.
Cuo, Z. and R. W. HaII (1989 ). "IaraIIeI lhinning vilh lvo-sulileralion aIgorilhns."
Connunicalions of lhe ACM 32(3): 359 - 373
Huang, }. S. (1993). OplicaI handvrillen Chinese characler recognilion. HANDOOK OI
IATTLRN RLCOCNITION AND COMIUTLR VISION WorId Scienlific IulIishing
Co., Inc: 595-624.
HuII, }. }. (1994). "A dalalase for handvrillen lexl recognilion research." Iallern AnaIysis
and Machine InleIIigence, ILLL Transaclions on 16(5): 55O-554.
Khorsheed, M. S. (2OOO). Aulonalic Recognilion of Words in Aralic Manuscripls Conpuler
Laloralory, Universily of Canlridge. I.h.D: 22O.
Khorsheed, M. S. (2OO2). "Off-Line Aralic Characler Recognilion - A Reviev " Iallern
AnaIysis & AppIicalions 5(VoIune 5, Nunler 1 / May, 2OO2): 31-45.
Khorsheed, M. S. (2OO3). "Recognising handvrillen Aralic nanuscripls using a singIe
hidden Markov nodeI." Iallern Recognilion Lellers 24(14): 2235-2242.
Khorsheed, M. S. (2OO7). "OffIine recognilion of onnifonl Aralic lexl using lhe HMM
TooIKil (HTK)." Iallern Recognilion Lellers 28(12): 1563-1571.
Khorsheed, M. S. and W. I. CIocksin (1999). SlrucluraI Iealures Of Cursive Aralic Scripl.
lhe Tenlh rilish Machine Vision Conference, The unversily of Nollinghan, UK.
Khorsheed, M. S. and W. I. CIocksin (2OOO). MuIli-fonl Aralic vord recognilion using
speclraI fealures. Iroceedings 15lh InlernalionaI Conference on Iallern
Recognilion, 2OOO. .
Lorigo, L. and V. Covindaraju (2OO5). Segnenlalion and pre-recognilion of Aralic
handvriling. Lighlh InlernalionaI Conference on Docunenl AnaIysis and
Recognilion. .
Lorigo, L. M. and V. Covindaraju (2OO6). "OffIine Aralic handvriling recognilion: a survey."
Iallern AnaIysis and Machine InleIIigence, ILLL Transaclions on 28(5): 712-724.
Molava, D., A. Anin, el aI. (1997). Segnenlalion of Aralic cursive scripl. The Iourlh
InlernalionaI Conference on Docunenl AnaIysis and Recognilion.
Iarker, }. R. (1997). AIgorilhns Ior Inage Irocessing and Conpuler Vision }ohn WiIey and
Sons, Inc
Iechvilz, M., S. S. Maddouri, el aI. (2OO2). IIN/LNIT - Dalalase of Aralic Handvrillen
vords. CoIIoque InlernalionaI Iranco-phone sur ILcril el Ie Docunenl (CIILD).
Iechvilz, M. and V. Margner (2OO2). aseIine eslinalion for Aralic handvrillen vords.
Lighlh InlernalionaI Workshop on Ironliers in Handvriling Recognilion
Raliner, L. and . }uang (1986). "An inlroduclion lo hidden Markov nodeIs." ASSI
Magazine, ILLL |see aIso ILLL SignaI Irocessing Magazinej 3(1): 4-16.
Syian, M., T. M. Nazny, el aI. (2OO6). Hislogran cIuslering and hylrid cIassifier for
handvrillen Aralic characlers recognilion. Iroceedings of lhe 24lh IASTLD
inlernalionaI conference on SignaI processing, pallern recognilion, and appIicalions
Young, S., C. Lvernann, el aI. (2OO1). The HTK ook, Canlridge Universily Lngineering
Deparlnenl.
Zhang, T. Y. and C. Y. Suen (1984). "A fasl paraIIeI aIgorilhn for lhinning digilaI pallerns."
Connunicalions of lhe ACM 27(3): 236 - 239
www.intechopen.com
Recent Advances in Technologies 636
www.intechopen.com
Recent Advances in Technologies
Edited by Maurizio A Strangio
ISBN 978-953-307-017-9
Hard cover, 636 pages
Publisher InTech
Published online 01, November, 2009
Published in print edition November, 2009
InTech Europe
University Campus STeP Ri
Slavka Krautzeka 83/A
51000 Rijeka, Croatia
Phone: +385 (51) 770 447
Fax: +385 (51) 686 166
www.intechopen.com
InTech China
Unit 405, Office Block, Hotel Equatorial Shanghai
No.65, Yan An Road (West), Shanghai, 200040, China
Phone: +86-21-62489820
Fax: +86-21-62489821
The techniques of computer modelling and simulation are increasingly important in many fields of science
since they allow quantitative examination and evaluation of the most complex hypothesis. Furthermore, by
taking advantage of the enormous amount of computational resources available on modern computers
scientists are able to suggest scenarios and results that are more significant than ever. This book brings
together recent work describing novel and advanced modelling and analysis techniques applied to many
different research areas.
How to reference
In order to correctly reference this scholarly work, feel free to copy and paste the following:
Jawad H AlKhateeb, Jianmin Jiang, Jinchang Ren and Stan Ipson (2009). Interactive Knowledge Discovery for
Baseline Estimation and Word Segmentation in Handwritten Arabic Text, Recent Advances in Technologies,
Maurizio A Strangio (Ed.), ISBN: 978-953-307-017-9, InTech, Available from:
http://www.intechopen.com/books/recent-advances-in-technologies/interactive-knowledge-discovery-for-
baseline-estimation-and-word-segmentation-in-handwritten-arabic-