Jagruthi Institute of Engineering and Technology: Optical Character Recognition

JAGRUTHI INSTITUTE OF ENGINEERING AND
TECHNOLOGY
OPTICAL CHARACTER RECOGNITION

UNDER THE GUIDENCE OF
Mrs.NAGA SUNEETHA
A.MADHAVI15P81A0573)
K.SAIKUMAR (15P81A0576)
K.MANISHA(15P81A0579)
INTRODUCTION
Optical Character Recognition involves the detection of text content on images

and translation of the images to encoded text that the computer can easily
understand. An image containing text is scanned and analyzed in order to
identify the characters in it. Upon identification, the character is converted to
machine-encoded text. The image is first scanned and the text and graphics
elements are converted into a bitmap, which is essentially a matrix of black and
white dots.
EXISTING SYSTEM
In the running world there is a growing demand for the users to convert the
printed documents in to electronic documents for maintaining the security of
their data. In this system the user can only read the text present in the file
but he cannot edit directly. If the user want to make any changes to the files
then he has to digitalize the text by typing manually.
DISADVANTAGES
The drawback in the existing system is the conversion of text documents to

digitization was achieved by manually typing the text on the computer. If any
correction to the pictured text can be done only by typing the whole text into
digital format. The time taking for making correction to the text will be more.
PROPOSED SYSTEM
Our proposed system is OCR which supports multiple functionalities to

be performed on the files. The multiple functionalities include editing
and searching too where as the existing system supports only editing of
the document. The files can be digitalize easily only by uploading the
files. The files can be of types jpg, png etc. In this proposed system
handwritten text can also be recognized.
ADVANTAGES
• The benefit of proposed system that overcomes the drawback of the

existing system is that it supports multiple functionalities such as editing
and searching. Time taken for digitalizing the text less. Easy conversion for
files from text format to digitalize format. The system can also display the
text which is handwritten.
HARDWARE REQUIREMENTS
• Processor - Pentium
• Speed - 200 GHZ
• RAM - 256MB(min)
• Hard Disk - 4 GB(min)
SOFTWARE REQUIREMENTS
• Operating System : Windows XP

• Programming Language : python
MODULES
• Image Acquisition
• Pre-processing
• Segmentation
• Feature Extraction
• Classification and Recognition
Image Acquisition
The pre-processing is a series of operations performed on scanned input image.

It essentially enhances the image rendering it suitable for segmentation. The
role of preprocessing is to segment the interesting pattern from the
background. Generally, noise filtering, smoothing and normalization should be
done in this step. The pre-processing also defines a compact representation of
the pattern. Binarization process converts a gray scale image into a binary
image. Dilation of edges in the binarized image is done using sobel technique.
Pre-processing
The raw data depending on the data acquisition type is subjected to a number
of preliminary processing steps to make it usable in the descriptive stages of
character analysis. The image resulting from scanning process may contain
certain amount of noise. Depending on the scanner resolution and the inherent
thresholding, the characters may be smeared or broken. Some of these defects
which may cause poor recognition rates and are eliminated through pre-
processor by smoothing digitized characters
Segmentation
The pre-processing stage yields a clean character image in the sense that a
sufficient amount of shape information, high compression, and low noise on a
normalized image is obtained. The next OCR component is segmentation.
Here the character image is segmented into its subcomponents
Feature Extraction
The objective of feature extraction is to capture essential characteristics of

symbols. Feature extraction is accepted as one of the most difficult problems
of pattern recognition. The most straight forward way of describing character
is by actual raster image. Another approach is to extract certain features that
characterize symbols but leaves the unimportant attributes.
Classification and Recognition
The classification stage is the decision making part of a recognition system and
it uses the features extracted in the previous stage. A feed forward back
propagation neural network having two hidden layers with architecture of is
used to perform the classification. The hidden layers use log sigmoid
activation function, and the output layer is a competitive layer, as one of the
characters is to be identified.
UML DIAGRAMS
 Use case diagram

 Sequence diagram
USE CASE DIAGRAM
SEQUENCE DIAGRAM
Output Screens
• Output screen of handwritten text of 68

• Output screen of handwritten text of 32
• Images of several text digits
• Accuracy of output text
• Different styles of handwritten digits (Dataset)
TEST CASES
Test Id Test Case Input Description Excepted Output Test Case Status
1. Uploading Image When the user want to Image file is selected and Pass
open a file uploaded
2. To pre-process image Image will be taken for Conversion from RGB to Pass
pre-processing B/W image
3. Feature extraction A Gray Scale Image Character features Pass

should be extracted
4. Output file Normalized character to File containing only the Pass

the neural network text
CONCLUSION
• This research shows and explains the use of the K-Nearest Neighbor
algorithm in an Optical Character Recognition program. Through this
experiment, it can be seen that the K-Nearest Neighbor algorithm can be
used to classify images into alphabets in an OCR. It executes the job fairly
well too, achieving a precision of 76.9%.
Future Enhancement
• The Optical Character Recognition software can be enhanced in the

future in different kinds
• of ways such as
• Training and recognition speeds can be increased greater and greater
by making it more user- friendly.
• Extensive features can also be added to the software like,
• 1. Translation
• 2. Voice reading
REFERENCE
• https://dl.acm.org/citation.cfm?id=553104
• https://www.slideshare.net/karanpanjwani752/optical-character-
recognition-ocr
• https://www.slideshare.net/nikbharat/project-report-of-ocr-
recognition?from_action=save
• Meilir Page-Jones: Fundamentals of Object Oriented Design in UML,
Pearson Education
THANK YOU
ANY QUERIES?

Jagruthi Institute of Engineering and Technology: Optical Character Recognition

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Jagruthi Institute of Engineering and Technology: Optical Character Recognition

Uploaded by

Copyright:

Available Formats

JAGRUTHI INSTITUTE OF ENGINEERING AND

OPTICAL CHARACTER RECOGNITION

Optical Character Recognition involves the detection of text content on images

The drawback in the existing system is the conversion of text documents to

Our proposed system is OCR which supports multiple functionalities to

• The benefit of proposed system that overcomes the drawback of the

• Operating System : Windows XP

The pre-processing is a series of operations performed on scanned input image.

The objective of feature extraction is to capture essential characteristics of

 Use case diagram

• Output screen of handwritten text of 68

3. Feature extraction A Gray Scale Image Character features Pass

4. Output file Normalized character to File containing only the Pass

• The Optical Character Recognition software can be enhanced in the

You might also like