Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
2Activity
0 of .
Results for:
No results containing your search query
P. 1
Thanks Google for Hindi OCR Guidelines

Thanks Google for Hindi OCR Guidelines

Ratings: (0)|Views: 38 |Likes:
Published by Brijesh Verma
Hindi OCR Guidelines
Hindi OCR Guidelines

More info:

Published by: Brijesh Verma on Jan 20, 2013
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

03/26/2013

pdf

text

original

 
Installation guidelines for Hindi/Indic languages OCR (Windows)
Thanks Google, UBUNTU & all open source resources for Internet based Gayatri and Yagyawhichhelped us to develop Hindi/Indic languages OCR (Optical Character Recognition) for our megaUnicode conversion project of Vedic Literature
.
गूगल 
,
ऊबूणट
 
और 
 
सभी 
 
खुले
 
ोत 
 
संसाधन 
 
को हमारेवैिदक सािहय कमेगा यूनकोड पांतर परयोजना कलए िहद 
/
भारतीय भाषाओंओसीआर (ऑटकल केटर परवतक) उपलध कराने कलए धयवाद.
This will help us to propagate & implement OUR WILL/Our Solemn Pledge for everyone to have alife like our P.Gurusatta
.
यह सहायता हर िकसी का जीवन पू.गुसा की तरह जीने कलए, हमारढ संकप कचार-सार करने ममदद करेगा.वंदनीया माताजी  - ‘‘बेटा! मुझे और गुजी को कभी अलग मत करना।’’ िफर बोलीं, ‘‘बेटा, आनेवालेसमय  म द  ुनया अपनी समयाओंका समाधान मेगीत म और गुजी वचन  म    ू ँढेगी।’’ सच तो है, शव और श को भला अलग िकया भी कसे?-ी -झाँकी  पृ. ७३
Overview of Hindi/Indic/Multilingual OCR:
1.Scan document. (300DPI for better output) Image or PDF file.2.If images are of not high quality, then for post-processing of scanned pages, save/export PDF asimages (.tif, png) into one folder.3.UseScan Tailor softwarefor post-processing of scanned pages. Detailed user guidelines follows.4.Make PDF file from images by creating PDF. Files > Create PDF from multiple files > Add files.5.InstallTesseract, one of the most accurateopen source OCR engine available.Restart your  computer for new system path to be assignedfor Tesseract. 6.UsegImageReader softwarefor OCR. Save file in same folder. Detailed user guidelines follows.7.Check & correct spellings using any spell checker software.http://www.awgp.in/spellchecker/or http://www.bhashagiri.com/for Hindi.8.Convert fonts. Use Hindi Lekhak.http://www.awgp.in/hindilekhak/or download. 9.Print for manual proof reading. Detailed user guidelines follows.10.Check manually logical errors of the document.11.Feedback is always welcome.
NOTE:
Google'sTesseracthin.traineddata found working good for Chanakya/Arial Unicode MSlike fonts.Nitin's hin.traineddatafound working good for Mitra/Mangal/KrutiDev like fonts.
Required Installation instructions:
(We should be connected to Internet throughout the installation process.)
1.
gs905w32.exe - GPL Ghotscripthttp://sourceforge.net/projects/ghostscript/ 
2.
3.
vcredist_x86.exe - MS VC++ Redistributable Setup.http://www.microsoft.com/en-in/download/details.aspx?id=5555 
4.
Scan Tailor - An interactive post-processing tool for scanned pages.http://sourceforge.net/projects/scantailor/ 
5.
tesseract-ocr-setup-3.02.02.exehttp://tesseract-ocr.googlecode.com/files/tesseract-ocr-setup-3.02.02.exe a.Make Internet connection ON.b.Choose Components
 
Download & Install Hindi Language DataDownload & Install Math / Equation Detectc.Installation complete successfully.
6.Restart is important immediately.
7.
Copy-paste hin.traineddata file C:\Program Files\Tesseract-OCR\tessdata folder if it is notdownloaded there fromhttp://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.02.hin.tar.gz 
8.
You may try more Indic language traineddata files & paste into above folder fromhttp://code.google.com/p/parichit/downloads/list. Thanks to Indu and RKVS Ramanhttp://code.google.com/p/parichit/for their Parichit (
 पिरचत 
) project. Accuracy is low.
 
Scan Tailor - An interactive post-processing tool for scanned pages.
http://sourceforge.net/projects/scantailor/1.Download and install.2.Put all scanned images / exported images from PDF file into one folder.3.Start Scan Tailer. Open new project. Select folder.4.Select all files / required files. Click “Fix DPI even if …..”. Click OK.

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->