You are on page 1of 3

MINOR PRO1ECT

SYNOPSIS



Title: Optical character recognition

Team Members:

Sushil Rawat (0621153108)
Shailesh Mishra (0721153108)
Sourav Singh(0761153108)
Vaibhav Goel(0831153108 )





Bharati Vidyapeeth`s College oI Engineering
A-4 Paschim Vihar, Rohtak Road, New Delhi-110063
Affiliated to: Guru Gobind Singh Indraprastha University
(2008-12)

OCR(optical character recognition):

Optical Character Recognition (OCR) uses a device that reads pencil marks and converts
them into a computer-usable Iorm. OCR technology recognizes characters on a source
document using the optical properties oI the equipment and media. OCR improves the
accuracy oI data collection and reduces the time required by human workers to enter the
data.
This is what OCR does. OCR looks at each line oI the image and attempts to determine iI
the black and white dots represent a particular letter or number. OCR was actually
developed originally to assist sight-impaired individuals gain access to printed
inIormation. That same technology has been updated and improved and is now used to
"read" computer Iiles.


Objective/Aim:

1. A an entry gate to the digitized papers is search
2. A full text searching
3. A searching for in text references
4. A due to the massive size of digitized material, the
only way is very good OCR,

Future scope

Manually reading all oI those pages is a very boring task. Fatigue, boredom, and human
error almost ensure that a page will be missed here and there. It is just a gamble as to
whether or not the pages that will be missed are important or not. Given the huge amount
oI time involved in the task, no one is going to pay Ior a second pass.

With OCR, though, this whole process is simpliIied and made more accurate. Once the
documents have been scanned and processed through the OCR module, there is a text
version oI every page available. Now someone can launch a search Ior 'John Jones and
let the computer do the searching. It will Iind every page oI every document where that
name appears. The process may take some time, depending upon how many pages are to
be searched, but no matter how many pages there are, there is no cost involved. No one
has to dedicate any time to the process once it starts.

When the OCR process is completed, it will have assembled a list oI every page Irom
every document that contains the word or words that were used in the search. Those
pages can be selected, reviewed, or even printed. OCR is a great research tool and can
provide vastly superior access to critical inIormation than can manual searches.

The latest android market also largely demands optical character recognition Ior its
diIIerent applications Ior example handwriting character recognition is based in the same
concept, thus ocr is a bright and a huge topic to study and explore.

imitations
It is important to understand the limitations and capabilities oI OCR. While it is a great
tool, it is not perIect. The biggest Iactor in the success or Iailure oI an OCR process is the
quality oI the original documents. It has been our experience that iI the original
documents were clean, laser printed pages, OCR should read 98 oI the words
correctly. Some words may not be read correctly iI there is handwriting over it, or iI there
are stamps or other marks that partial cover the text.

II the original documents were Iaxes, or multi-generational photocopies, or were printed
with a dot matrix printer, the success rate oI OCR drops oII quickly.These types oI
documents may only have a 60-80 successIul read. The same is true oI even laser
printed documents that have lots oI lines and boxes. The lines and boxes conIuse OCR,
because OCR tries to read the lines as part oI the text. II the original documents were
hand written, OCR will NOT read the inIormation at all.

You might also like