Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
0Activity
0 of .
Results for:
No results containing your search query
P. 1
part3

part3

Ratings: (0)|Views: 1 |Likes:
Published by Rohit Gupta

More info:

Categories:Types, Research, Science
Published by: Rohit Gupta on Dec 09, 2012
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as DOCX, PDF, TXT or read online from Scribd
See more
See less

12/09/2012

pdf

text

original

 
1.
 
Introduction
Optical Character Recognition is an important and practical technology in thecomputer age. More people than ever before are using personal computers, laptop,tablets, and e-readers to read documents and books.This means that old print media must be scanned and converted to a digitalformat in order to be accessed from these devices. Optical CharacterRecognition (OCR) programs are used to read scanned images and convertthem into a digital character-based format.This project provides a general survey and basic implementation of OpticalCharacter Recognition.First,the history and current state of OCR technology isexamined. Then, there is an overview of the methods employed by OCR programsand the classification.Optical Character Recognition is an important and practicaltechnology in the computer algorithms therein. Finally, there is a basicMATLAB implementation of an OCR program that will take text in an imageand convert it to plain text.
2. History
Since late 1920s, there have been attempts by many engineers to developOCR systems;
 
however, it was not until the 1950s that the firstcommercial OCR system became
 
available. This was because thetechnology was not needed in many places, and it was
 
too expensive toimplement OCR due to its immature algorithms during the era.
 
David H.Shepard, however, invented the technology and developed the firstmachine to
 
convert printed texts into machine language for computerprocessing, and was issued
 
U.S. Patent number 2,663,758, called “Gismo.”
Based on the technology, he also found
 
Intelligent Machines ResearchCorporation (IMR), and became the first one who sold the
 
first commercialOCR machines. The practicality of the IMR scanner was the different
 
way of scanning from what Gismo had. While Gismo could scan when a printed textis
 
reasonably close and vertically fit, IMR scanners were able to scan anycharacters in the
 
scanned field.
 
 
3. Applications
Although the market for OCR isn't large, many developers are takingadvantage of basic
 
principles. Currently, the largest use of OCR amongconsumers is from Adobe Acrobat
 
Figure no. 1:
An Industrial OCR station manufactured by Vision Group
 
Inc.
 
and Microsoft OneNote. This allows for easy character recognition andconversion to digital form. The software incorporates simple drop downmenus, and scans the digital format quickly. This includes business cards,books, white boards, and can even export receipts to spreadsheets. Thesoftware even goes one step further and does text to speech, for visuallyimpaired users who are using the OCR as a second set of eyes.
 
3.1 CAPTCHA
 
CAPTCHA (Completely Automated Public Turing test to tell Computers andHumans
 
Apart), uses a form of anti-OCR. Its main purpose is to tell if aninput is from a user, or a
 
script. It prompts for a randomized code to there-input into a text field. The code is
 
distorted or disguised to fool OCRinto generating false positives, and ultimately limiting
 
access. This is no limitspam from scripts, and other computer generated access to a site.
 
Figure no. 2
:
A typical CAPTCHA
 
A service owned by Google, Inc., known as reCaptcha, uses humans to helpdigitize old
 
document in very little time. These features are useful whenscanning textbooks, as search functions becomes available. One may alsoexport novels to text for easier storage
 
after scanning.Another application of OCR technology is at the post office. Addresses andzip codes are
 
often handwritten on the envelope. Optical Characterrecognition allows the post office
 
to automatically read the address of a pieceof mail and sort it to the appropriate bin for
 
delivery books. This softwaretakes words not easily recognized by OCR software and pairs it
 
withwords that it knows the answer to. Many OCR systems throw flags for wordsthat
 
are ambiguous or for which there is no answer for. If such a flag is thrownit will save the
 
word into a database and uses it in conjunction for generatedknown words in a Captcha
 
field. If the answer for the correct word isinput into the text field, the program will
 
assume the other word iscorrect. This process is repeated until the certainty for the
 
unknown wordincreases. This is an innovative method and also increases OCR algorithm
 
recognition by comparing questionable positives with human input.
 
Currently, Captcha is displayed over 100 million times a day, with themost popular
 
being from Facebook, Twitter, TicketMaster, and online forums.

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->