Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1


Ratings: (0)|Views: 1 |Likes:
Published by Rohit Gupta

More info:

Categories:Types, Research, Science
Published by: Rohit Gupta on Dec 09, 2012
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as DOCX, PDF, TXT or read online from Scribd
See more
See less





Optical Character Recognition is an important and practical technology in thecomputer age. More people than ever before are using personal computers, laptop,tablets, and e-readers to read documents and books.This means that old print media must be scanned and converted to a digitalformat in order to be accessed from these devices. Optical CharacterRecognition (OCR) programs are used to read scanned images and convertthem into a digital character-based format.This project provides a general survey and basic implementation of OpticalCharacter Recognition.First,the history and current state of OCR technology isexamined. Then, there is an overview of the methods employed by OCR programsand the classification.Optical Character Recognition is an important and practicaltechnology in the computer algorithms therein. Finally, there is a basicMATLAB implementation of an OCR program that will take text in an imageand convert it to plain text.
2. History
Since late 1920s, there have been attempts by many engineers to developOCR systems;
however, it was not until the 1950s that the firstcommercial OCR system became
available. This was because thetechnology was not needed in many places, and it was
too expensive toimplement OCR due to its immature algorithms during the era.
David H.Shepard, however, invented the technology and developed the firstmachine to
convert printed texts into machine language for computerprocessing, and was issued
U.S. Patent number 2,663,758, called “Gismo.”
Based on the technology, he also found
Intelligent Machines ResearchCorporation (IMR), and became the first one who sold the
first commercialOCR machines. The practicality of the IMR scanner was the different
way of scanning from what Gismo had. While Gismo could scan when a printed textis
reasonably close and vertically fit, IMR scanners were able to scan anycharacters in the
scanned field.
3. Applications
Although the market for OCR isn't large, many developers are takingadvantage of basic
principles. Currently, the largest use of OCR amongconsumers is from Adobe Acrobat
Figure no. 1:
An Industrial OCR station manufactured by Vision Group
and Microsoft OneNote. This allows for easy character recognition andconversion to digital form. The software incorporates simple drop downmenus, and scans the digital format quickly. This includes business cards,books, white boards, and can even export receipts to spreadsheets. Thesoftware even goes one step further and does text to speech, for visuallyimpaired users who are using the OCR as a second set of eyes.
CAPTCHA (Completely Automated Public Turing test to tell Computers andHumans
Apart), uses a form of anti-OCR. Its main purpose is to tell if aninput is from a user, or a
script. It prompts for a randomized code to there-input into a text field. The code is
distorted or disguised to fool OCRinto generating false positives, and ultimately limiting
access. This is no limitspam from scripts, and other computer generated access to a site.
Figure no. 2
A typical CAPTCHA
A service owned by Google, Inc., known as reCaptcha, uses humans to helpdigitize old
document in very little time. These features are useful whenscanning textbooks, as search functions becomes available. One may alsoexport novels to text for easier storage
after scanning.Another application of OCR technology is at the post office. Addresses andzip codes are
often handwritten on the envelope. Optical Characterrecognition allows the post office
to automatically read the address of a pieceof mail and sort it to the appropriate bin for
delivery books. This softwaretakes words not easily recognized by OCR software and pairs it
withwords that it knows the answer to. Many OCR systems throw flags for wordsthat
are ambiguous or for which there is no answer for. If such a flag is thrownit will save the
word into a database and uses it in conjunction for generatedknown words in a Captcha
field. If the answer for the correct word isinput into the text field, the program will
assume the other word iscorrect. This process is repeated until the certainty for the
unknown wordincreases. This is an innovative method and also increases OCR algorithm
recognition by comparing questionable positives with human input.
Currently, Captcha is displayed over 100 million times a day, with themost popular
being from Facebook, Twitter, TicketMaster, and online forums.

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->