Professional Documents
Culture Documents
Recognition of Handwritten Script
Recognition of Handwritten Script
ORG
158
1. INTRODUCTION
ecognition of handwritten text is a procedure in whichaninputistakenfrommanualdataformsuch aspaperdocuments,photographsorimagesusingtouch screens and other devices. It is the formal process of converting the printed or scanned materials into text or word files which is stored and afterwards it is easy to maintaintheautomateddata. This paper is about handwritten script recognition in which scanned image of handwritten paragraph is segmented into lines then these segmented lines are processedandwordsareseparatedfromeachline.After this, isolated characters are extracted from each word eventually. After segmentation process, recognize these characters and the output is the computerized text on display. Inanyautomatedsystem,itiseasytosearchtherecord, addoredittherecord,andinalong run,allthedatais savedandstoredforalifetimebutmanualformatdata can be destroyed due to human mistakes natural accidentsormishapse.g.fireetc. Similarlyincopiesofmanualorpaperformdatawillbe degraded eventually but the digital data along with its thousandscopiescantbedegradedoveralifetime.The digitaldataisreusedasmuchasyouwant.Andyouwill never see the degrade data in it. Computer technology
Aroosh Zahra is under graduate student of Department of Software Engineering, Fatima Jinnah Women University The Mall, Rawalpindi,Pakistan. 2 Memoona Khanam is Professor in the Department of Software Engineering, Fatima Jinnah Women University The Mall, Rawalpindi,Pakistan. 3AsimMunirisAssitantProfessorintheDepartmentofComputer Science,IslamicInternationalUniversity,Islamabad,Pakistan. 4 Malik Sikandar Hayat Khiyal is Professor and Chairman of DepartmentofComputerScienceFatimaJinnahWomenUniversity TheMall,Rawalpindi,Pakistan.
1
requires less storage space for records or data than the hardcopyfiledatabasestoragesystem. For example, if we take the example of police criminal recordstoredinfilesofpaperform,thisrecordcouldbe a million pages. Now if any detective wants to search anything,itwouldtakealongtimetosearchit.Butifthe samedataorrecordispresentinautomatedform,itisa veryquickprocesstosearchanyrecord. But the problem is that if someone wanted that record digitallyavailable,hehastotypealltherecord.Itistime consuming process and also introduces chances of mistakes in adding the data. For this purpose we use automatic recognition of handwritten text which converts the scanned format of text into machine readable text which is useful for further textprocessing applications. Following are the main applications of handwriting recognition which can be achieved due to handwritten recognition: Signature Verification recognizes the signature of writer. PostalAddress Interpretation includes the recognitionofaddress,zipcodeetc. BankChequeProcessinginvolvestherecognitionof amountwrittenonbankcheque. Writer Recognition interprets the writing and then identifiesthewriter. Proper interpretation of data filled on any kind of formsandapplicationsmanually.
1.1. Limitation
Itisnotanewtechnology,manyresearchesonthisarea has been already taken but still the ultimate goal of a handwritten character recognition system with 100% accuracyisnotachieved. This is due to the reason that often even people are not ableto recognizeeveryhumanwrittentextwithoutany
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 12, DECEMBER 2011, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG
159
doubt.Mostpeoplecannotevenreadtheirownwriting. Soitisveryimportantthatwriterhaswrittenclearly.
2.
LITERATUREREVIEW
Bandaru [1] proposed a system for identification of handwritten characters in which multilayered network algorithm is implemented and user can input one character or pattern and system will identify it. He designedagraphicuserinterfaceinwhichusercanonly train or identifies one character at a time. In proposed system,multilayeredneuralnetworkhavingtwohidden layersisusedtotrainthecharacter.Outputisproduced afterhundredepochsforeachcharacter. Fabrizioetal.[2]describestheprocessofextractingtext and apply it to the images taken from local city. It use morphological operations for extracting features of text andsegmentationwhileforclassificationandrecognition purpose, it uses combination of SVM (support vector machine) classifiers. This system is efficient but still produces some flaws in accuracy of classification and properselectionoftext. Devireddy et al. [3] presents a system for handwritten character recognition taken through mouse input. This system trains the input data first and then classifies the data in order to recognize the character or pattern by using the back propagation network algorithm. It does not recognize all the input patterns, but if the input is continuously fed in to the system, due the learning ability,thesystemwillrecognizethelettergradually. Leary[4]describesthepreprocessingofhandwrittentext Firstly line segmentation is done by assuming that lines of text are horizontal. Histogram of black pixels in x direction is generated. Minima are considered as cut positions. Then skew correction is done to correct the alignment of segmented lines with xaxis. First lower baselineisestimatedandalsoitsangletohorizontalaxis, thentheleastsquareslinearregressioniscomputedtofit the baseline. Then after computing the arctangent of slopeandrotatingtheimageaccordingtoitremovesthe skew. Similarly slant correction is also handled in this paper to keep the writer text upright. Affine transformation is used to keep collinearity and also the ratio among distances to remove slant. Afterwards, baseline positioning is also done by calculating the gradient and analyzing the slope and thus finds the boundaries of line. Then word segmentation is done usingverticalprojectionsandkmeans. Rehman et al. [5] show the comparison of implicit segmentationmethodwithexplicitsegmentationmethod ofofflinecursivestyleofhandwriting.Alltheprocessing is same for both implicit and explicit segmentation techniques except the actual segmentation algorithm.
Results show that recognition using explicit base segmentationismoreefficientthantheotherone. Rehman et al. [6] proposes a very simple and fast approach for character segmentation of unconstrained handwritten words. The developed segmentation algorithmdoesoversegmentationinfewcasesduetothe inherent nature of the cursive writing. To boost the effectiveness of the algorithm, an Artificial Neural Network is used to train with major amount of segmentation points for cursive word. Neural network extracts incorrect segmented points efficiently. For testing purpose, benchmark database IAM is used. In this paper, first author locate the segmented points by calculating the candidate segment column whose sum are only 0 or 1. Then the proposed segmentation algorithm is integrated with neural network using back propagation algorithm. Due to the minimum over segmentation,neuralnetworkisleastweigheddownand thusspeedisoptimum. Ganapathy et al. [7] improves the accuracy of character recognitionupto85%.TheyfirstlyuseMultiscaleneural network for the training of characters present in high resolution images, then thresholding is used for the increaseinlevelofaccuracyofsystem. Som et al. [8] used neural network to recognize the charactersofhandwrittentextandafterwardsforallthe mismatch characters, Euclidian distance metric is used whichresultsinincreaseintheaccuracyofrecognitionof characters.
3. PROPOSED FRAMEWORK
Figure1isaproposedframeworkofsystem:
Figure1:ProposedFramework
4. PROPOSED TECHNIQUE
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 12, DECEMBER 2011, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG
160
4.1 Preprocessing
Preprocessing of image is very important step for any further processing. System has an option to choose scannedimageofanyformat.ThisRGBimageshownin figure2isconvertedintograyscaleimagefirstandafter calculating the level of grayscale image, image is convertedintobinaryimage. Thenmedianfilterisalsoappliedonbinaryimage.After that negative of image is taken and dilatation and thinningofimageisapplied.5
Similar to the problem of character segmentation, word segmentationisalsonotsimpleandeasy.Gapsbetween words are generally expected to be larger than gaps betweencharactersinaword. Aslightlylessrobust,butsignificantlyfasterapproachis implemented. Using a vertical projection histogram of the line as shown in figure 4, minima below a certain thresholdnearzeroarelocated.Thisgenerallysegments allwords,butthelineishighlyoversegmented.Asthe false positives are almost always narrow, thus kmeans clusteringisperformedonthedatawithktakingastwo to separate significant divisions from insignificant divisions.(Seefigure5)
Figure4:VerticalProjectionofalineofimage
Figure5:Worddetectioninalineofimage
Figure2:OriginalScannedImage
Figure3:ProjectionofImage
Note: All the processing is applied on negative images in segmentation as well as for recognition. The displayed images are shownusingnotcommand
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 12, DECEMBER 2011, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG
161
By using boundaries, segmented characters are scaledproperly. By using area, the characters having less than 50 areaareconsideredasjunk.Thisisdonetoignoreall thedotsandcommas.
Figure6:FinalSegmentedCharacters
Sample 1 has 56 words which are detected properly, sample 2 has 61 words in which 60 words are detected, sample 3 has 61 words which are properly detected, sample4has61wordsinwhich60areproperlydetected, sample 5 has 60 which all are detected properly while last testing sample 6 has 58 characters in which 52 are properly detected. Figure 8 is the graph of accuracy whichdescribesthat97.7%accuracylevelisachievedin thecaseofwordsegmentationinthesegmentedlinesof thescannedimage.
Figure8:AccuracyGraphofWordDetection
5. EXPERIMENTAL RESULTS
Followingareaccuracyresultsingraphs:
Figure7:AccuracyGraphofLineDetection
Figure9:AccuracyGraphofCharacterDetection
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 12, DECEMBER 2011, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG
162
Figure 10 is the graph of accuracy which describes that 43% accuracy level is achieved in the case of character recognitionoftheproperlysegmentedcharacters.
Figure10:AccuracyGraphofCharacterRecognition
Segmentation of lines is done using horizontal projections and segmentation of lines is done through vertical projections. The segmentation accuracy in segmentation of lines is 100% while in segmentation of words,accuracylevelof97.7%isalsoachieved.Asfaras segmentation of characters is concerned, it is deeply affected by writing pen pointer. So the accuracy result for segmentation is achieved to 74.38% but it has an advantagetoworkwithanywritingstyle. Recognitionofcharactersofpropersegmentedcharacters achieves43%accuracylevel.Torecognizethecharacters withhighestaccuracyandconsiderableamountoftime, therobustwayisbyusingtemplatescorrelationsasmain recognitionmethod.
7. REFERENCES
[1] Sunith Bandaru, Handwritten Character RecognitionusingNeuralNetworks,Departmentof Mechanical Engineering, Indian Institute of TechnologyKanpur,India,2010 [2] J. Fabrizio, M. Cord and B. Marcotegui, Text Extraction From Street Level Images, IAPRS, Vol. XXXVIII,Part3/W4,September,2009 [3] Srinivasa kumar devireddy, settipalli appa rao, Hand written Character Recognition Using Back Propagation Network, ,Journal of Theoretical and AppliedInformationTechnology,March2009 [4] Ryan E. Leary, Unrestricted OffLine Handwriting RecognitionAPreprocessingApproach,Rensselaer PolytechnicInstitute,December15,2009 [5] Amjad Rehman, Zulkifli Mohamad and Ghazali Sulong, Implicit Vs Explicit based Script Segmentation and Recognition: A Performance ComparisononBenchmarkDatabase,International JournalofOpenProblemsinComputerScienceand
6. CONCLUSION
The most important phases in the recognition of handwrittenscriptaresegmentationoflines,wordsand characters. Segmentation till word level would not create much problem as it is easy to detect the space between two lines and two words but as far as character detection is concerned, space finding between two characters is a difficult task undoubtedly. And proper recognition of charactersdependsontheisolatedcharacters.
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 12, DECEMBER 2011, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG
163
Mathematics (IJOPCM), Vol. 2, No. 3, pg. 352364, September2009 [6] Amjad Rehman Khan, Zulkifli Mohammad, A Simple Segmentation Approach for Unconstrained CursiveHandwrittenWordsinConjunctionwiththe Neural Network, International Journal of Image Processing,Vol.2,Issue3,June2008 [7] Velappa Ganapathy, and Kok Leong Liew, Handwritten Character Recognition Using Multiscale Neural Network Training Technique, World Academy of Science, Engineering and Technology39,2008 [8] TanmoySom&SumitSaha,Handwrittencharacter recognitionbyusingNeuralnetworkandEuclidean distance metric, Department of mathematics, AssamUniversity,Silchar,INDIA,2008
BIBLIOGRAPHY
Aroosh Zahra is the under graduate student of Department of Software Engineering in Fatima Jinnah WomenUniversitytheMall,Rawalpindi,Pakistan. MemoonaKhanamistheProfessorintheDepartmentof Software Engineering in Fatima Jinnah Women University the Mall, Rawalpindi, Pakistan. Her qualification is MSCS, M.ED and now doing PHD and herareaofinterestisartificialintelligence. AsimMuniristheAssistantProfessorintheDepartment ofComputerScienceinIslamicInternationalUniversity, Islamabad.HisqualificationisMSc.(ComputerScience), M.S.(ComputerScience)andPh.D.(Pursuing) Dr. M. Sikandar Hayat Khiyal is Chairman Dept. Computer Sciences and Software Engineering in Fatima Jinnah Women University Pakistan. He served in Pakistan Atomic Energy Commission for 25 years and involvedindifferentresearchanddevelopmentprogram of the PAEC. He developed software of underground flow and advanced fluid dynamic techniques. He was also involved at teaching in Computer Training Centre, PAEC and International Islamic University. His area of interest is Numerical Analysis of Algorithm, Theory of Automata and Theory of Computation. He has more than hundred research publications published in National and International Journals and Conference proceedings. He has supervised three PhD and more than one hundred and thirty research projects at graduateandpostgraduatelevel.HeismemberofSIAM, ACM, Informing Science Institute, IACSIT. He is associate editor of IJCTE and coeditor of the journals JATIT and International Journal of Reviews in Computing.Heisreviewerofthejournals,IJCSIT,JIISIT, IJCEEandCEEofElsevier.