Professional Documents
Culture Documents
Prof Amol Bhilare, Varad Kulkarni, Prem Chavhan, Adesh Ramgude, Mayur Jadhav
The idea of OCR technology has been around for a long time
and even predates electronic computers. The original OCR
design proposed by Paul W. Handel in 1931. He applied for a
Illustration of 2-D reduction to 1-D by a slit
patent for a device “in which successive comparisons are made
between a character and a character a photo-electric apparatus
(a). an input numeral “4” and a slit scanned from left to right.
would be used to respond to a coincidence of a character and
an image. This means you would shine a light through a filter
(b) Black area projected onto-axis, the scanning direction of
and, if the light matches up with the correct character of the
the slit.
filter, enough light will come back through the filter and
2. Peephole Method:- For local binarization we choose:
1)Niblack Method
This is the simplest logical template matching method. Pixels 2)Adaptive Method
from different zones of the binarized character are matched to 3)Sauvola Method
template characters. An example would be in the letter A, 4)Bernsen Method
where a pixel would be selected from the white hole in the In Global Binarization methods, The Fixed Thresholding
binarization method uses a fixed threshold value to assign 0’s
center, the black section of the stem, and then some others
and 1’s for all pixel positions in a given image.It is given as
outside of the letter. Each template character would have its
follows:
own mapping of these zones that could be matched with the
character that needs to be recognized. The peephole method
was first executed with a program called Electronic Reading
Automation in 1957. The Otsu’s thresholding method is used for automatic
binarization level decision, based on the shape of the
histogram. It is given by
OCR results are mainly attributed to the OCR recognizer VI.COMPOSITION OF CHARACTERS AND WORDS FOR
software, but there are other factors that can have a WRITTEN WORDS
considerable impact on the results. The simplest of these
factors can be the scanning technique and parameters. A horizontal line is drawn on top of all Characters of a
The table below summarizes these factors which affect the word that is referred to as the header line or shi- rorekha.
performance of OCR. It is convenient to visualize a Devanagari word in terms
of three strips: a core strip, a top strip, and a bottom strip.
The core and top strips are separated by the header line.
S.No Factors affecting the performance of OCR The following Figure shows the image of a word that
contains five characters, two lower modifiers, and a top
modifier. The three strips and the header line have been
1 Quality of original source marked.
VII.CONCLUSION
We tested our Devanagari OCR system on various images VIII.REFERENCES
taken from the internet as well as from other sources. A
performance of approximately 93% of the character level is 1. Sean O’Brien, Dhia Ben Haddej “Optical Character
obtained. The input and output images are given below Recognition”
2. Veena Bansal.,R.M.K.Sinha“A Complete OCR for Printed
Hindi Text in Devanagari Script”
3. “ISauvola: Improved Sauvola’s Algorithm for Document
Image Binarization” Zineb Hadjadj, Abdelkrim Meziane,
Yazid Cherfa,Mohamed Cherietand Insaf Setitra
4. “Tesseract OSCON”Google.
Input Image
Output Image
.
The image output after whole processing is given in output