You are on page 1of 28

JAGRUTHI INSTITUTE OF ENGINEERING AND

TECHNOLOGY

OPTICAL CHARACTER RECOGNITION


UNDER THE GUIDENCE OF
Mrs.NAGA SUNEETHA

A.MADHAVI15P81A0573)
K.SAIKUMAR (15P81A0576)
K.MANISHA(15P81A0579)
INTRODUCTION

Optical Character Recognition involves the detection of text content on images


and translation of the images to encoded text that the computer can easily
understand. An image containing text is scanned and analyzed in order to
identify the characters in it. Upon identification, the character is converted to
machine-encoded text. The image is first scanned and the text and graphics
elements are converted into a bitmap, which is essentially a matrix of black and
white dots.
EXISTING SYSTEM

In the running world there is a growing demand for the users to convert the
printed documents in to electronic documents for maintaining the security of
their data. In this system the user can only read the text present in the file
but he cannot edit directly. If the user want to make any changes to the files
then he has to digitalize the text by typing manually.
DISADVANTAGES

The drawback in the existing system is the conversion of text documents to


digitization was achieved by manually typing the text on the computer. If any
correction to the pictured text can be done only by typing the whole text into
digital format. The time taking for making correction to the text will be more.
PROPOSED SYSTEM

Our proposed system is OCR which supports multiple functionalities to


be performed on the files. The multiple functionalities include editing
and searching too where as the existing system supports only editing of
the document. The files can be digitalize easily only by uploading the
files. The files can be of types jpg, png etc. In this proposed system
handwritten text can also be recognized.
ADVANTAGES

• The benefit of proposed system that overcomes the drawback of the


existing system is that it supports multiple functionalities such as editing
and searching. Time taken for digitalizing the text less. Easy conversion for
files from text format to digitalize format. The system can also display the
text which is handwritten.
HARDWARE REQUIREMENTS

• Processor - Pentium
• Speed - 200 GHZ
• RAM - 256MB(min)
• Hard Disk - 4 GB(min)
SOFTWARE REQUIREMENTS

• Operating System : Windows XP


• Programming Language : python
MODULES

• Image Acquisition
• Pre-processing
• Segmentation
• Feature Extraction
• Classification and Recognition
Image Acquisition

The pre-processing is a series of operations performed on scanned input image.


It essentially enhances the image rendering it suitable for segmentation. The
role of preprocessing is to segment the interesting pattern from the
background. Generally, noise filtering, smoothing and normalization should be
done in this step. The pre-processing also defines a compact representation of
the pattern. Binarization process converts a gray scale image into a binary
image. Dilation of edges in the binarized image is done using sobel technique.
Pre-processing

The raw data depending on the data acquisition type is subjected to a number
of preliminary processing steps to make it usable in the descriptive stages of
character analysis. The image resulting from scanning process may contain
certain amount of noise. Depending on the scanner resolution and the inherent
thresholding, the characters may be smeared or broken. Some of these defects
which may cause poor recognition rates and are eliminated through pre-
processor by smoothing digitized characters
Segmentation

The pre-processing stage yields a clean character image in the sense that a
sufficient amount of shape information, high compression, and low noise on a
normalized image is obtained. The next OCR component is segmentation.
Here the character image is segmented into its subcomponents
Feature Extraction

The objective of feature extraction is to capture essential characteristics of


symbols. Feature extraction is accepted as one of the most difficult problems
of pattern recognition. The most straight forward way of describing character
is by actual raster image. Another approach is to extract certain features that
characterize symbols but leaves the unimportant attributes.
Classification and Recognition

The classification stage is the decision making part of a recognition system and
it uses the features extracted in the previous stage. A feed forward back
propagation neural network having two hidden layers with architecture of is
used to perform the classification. The hidden layers use log sigmoid
activation function, and the output layer is a competitive layer, as one of the
characters is to be identified.
UML DIAGRAMS

 Use case diagram


 Sequence diagram
USE CASE DIAGRAM
SEQUENCE DIAGRAM
Output Screens

• Output screen of handwritten text of 68


• Output screen of handwritten text of 32
• Images of several text digits
• Accuracy of output text
• Different styles of handwritten digits (Dataset)
TEST CASES
Test Id Test Case Input Description Excepted Output Test Case Status

1. Uploading Image When the user want to Image file is selected and Pass
open a file uploaded

2. To pre-process image Image will be taken for Conversion from RGB to Pass
pre-processing B/W image

3. Feature extraction A Gray Scale Image Character features Pass


should be extracted

4. Output file Normalized character to File containing only the Pass


the neural network text
CONCLUSION

• This research shows and explains the use of the K-Nearest Neighbor
algorithm in an Optical Character Recognition program. Through this
experiment, it can be seen that the K-Nearest Neighbor algorithm can be
used to classify images into alphabets in an OCR. It executes the job fairly
well too, achieving a precision of 76.9%.
Future Enhancement

• The Optical Character Recognition software can be enhanced in the


future in different kinds
• of ways such as
• Training and recognition speeds can be increased greater and greater
by making it more user- friendly.
• Extensive features can also be added to the software like,
• 1. Translation
• 2. Voice reading
REFERENCE

• https://dl.acm.org/citation.cfm?id=553104
• https://www.slideshare.net/karanpanjwani752/optical-character-
recognition-ocr
• https://www.slideshare.net/nikbharat/project-report-of-ocr-
recognition?from_action=save
• Meilir Page-Jones: Fundamentals of Object Oriented Design in UML,
Pearson Education
THANK YOU
ANY QUERIES?

You might also like