Professional Documents
Culture Documents
A Mini Project
Submitted in partial fulfillment of the
Requirements for the award of the Degree of
BACHELOR OF TECHNOLOGY
In
(2016-2020)
1|Page
SRI KRISHNADEVARAYA UNIVERSITY
COLLEGE OF ENGINEERING AND TECHNOLOGY
ANANTAPUR – 515003
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
CERTIFICATE
Certified that this is a bonafide record of the dissertation work entitled, “Image to Text App”,
done by E.B.MEGHANA bearing Admn. No: 1610116 and G.MAHALAKSHMI bearing
Admn. No: 1610121 submitted to the faculty of Computer Science and Engineering, in partial
fulfillment of the requirements for the Degree of BACHELOR OF TECHNOLOGY with
specialization in COMPUTER SCIENCE AND ENGINEERING from Sri Krishnadevaraya
University College of Engineering and Technology, Anantapur.
2|Page
DECLARATION
We hereby declare that the project report entitled “IMAGE TO TEXT APP” submitted to the
Department of Computer Science and Engineering, Sri Krishnadevaraya University, Anantapuramu for
the partial fulfilment of the academic requirement for the degree for Bachelor of Technology in Computer
Science and Engineering is an authentic record of our work carried out during the final year under the
esteemed guidance of Mrs. D.GOUSIYA BEGUM, M.Tech(CSE), Lecturer of Computer
Science and Engineering Department, College of Engineering and Technology, Sri Krishnadevaraya
University, Ananthapuramu.
1.
2.
3|Page
ACKNOWLEDGEMENT
The satisfaction and euphoria that accompany the successful completion of any
task would be incomplete without the mention of people who made it possible whose constant
guidance and encouragement crowned our efforts with success. It is a pleasant aspect that I have
now the opportunity to express my gratitude for all of them.
My special thanks to the faculty of CSE Department for giving the required information in
doing my project work. Not to forget, I thank all the non-teaching staff, my friends and class
mates who had directly or indirectly helped and supported me in completing my project in time.
4|Page
Finally I wish to convey my gratitude to my parents who fostered all the requirements and
facilities that I need.
E.B.MEGHANA (1610116)
G.MAHALAKSHMI (1610121)
5|Page
ABSTRACT
Most of the information available today is either on paper or in the form of still
photographs and videos. To build digital libraries, this large volume of information needs to be
digitized into images and the text converted to ASCII for storage, retrieval, and easy
manipulation.
Text recognition is a technique that recognizes text from the paper document in the
desired format (such as .doc or .txt). “Image to Text App” is the mobile application that
describes techniques for converting the textual content of a paper document into a machine-
readable format so that user can easily copy, edit the output text, and can paste and reuse
wherever he wants.
6|Page
Contents
Chapter-1: PROJECT OVERVIEW
1.1 Introduction
1.2 Objective
1.3 Literature Survey
1.4 Motivation
1.5 Applications
1.6 Advantages
Chapter-5: IMPLEMENTATION
5.1 Programming Languages
5.1.1 Java
5.2 Source Code
5.2.1 MainActivity.java
5.2.2 activity_main.xml
5.3 Steps for implementing
7|Page
5.3.1 Enable USB debugging on your android phone
5.3.2 Get started with remote debugging android devices
LIST OF FIGURES
ABBREVATIONS
REFERENCES
APPENDICES
STUDENT BIODATA
8|Page
Chapter 1
PROJECT OVERVIEW
1.1 Introduction
Whenever we scan the documents through the scanner, the documents are
stored as images in the computersystem. These images contain text that cannot be
edited by the user. But to reuse this information it is very difficult for the computer
system to read the individual contents and search the contents form these documents
line-by-line and word-by word. The reason for this difficulty is the font
characteristics of the characters in paper documents are different to font of the
characters in computer system.
As we read the words, our eyes and brain continuously carry out optical character
recognition in such a way that we are not able to recognize it. Our eyes are recognizing
the luminous pattern of printed character and our brain is using this to figure out what we
are trying to say. Apart from humans , nowadays even the computer are capable of
performing this task using the technique called OCR.
9|Page
OCR helps in bringing the text available in analog format into a digital
form.Nowadays many organizations are depending on OCR systems to eliminate the
human interactions for better performance and efficiency. The objective of the paper is to
utilize this feature of the computer through an android app. This visual capability is
brought out using a android mobile phone working on Tesseract OCR engine. The
android app provides the user to recognize the text from either an image stored in the
gallery , image taken with a camera, from a stored document in mobile or allows to store
a name of the locations from the map application available in mobiles. This app can be
used for automatic number plate recognition, extracting business card information into
the contact list, Automatic insurance documents key information extraction, the
converted text can then be fed to the text to speech application and can be used as a
assertive technology for visually impaired users.
1.2 Objective
Our objective is to utilize the visual capabilities of the Android mobile phone to
extract information from a business card. We use the camera features of the Android to
capture data. Extracting information from the business card requires accurate recognition
of the text of the business card. Any camera image of the business card would be subject
to several environmental conditions, such as variable lighting, reflection, rotation, and
scaling (we would desire the same data to be extracted from the business card regardless
of the distance from the camera), among others.
10 | P a g e
To achieve high speed in data processing it is necessary to convert the analog data
into digital data. Storage of hard copy of any document occupies large space and
retrieving of information from that document is time consuming. Optical character
recognition system is an effective way in recognition of printed character. It provides an
easy way to recognize and convert the printed text on image into the editable text. It also
increases the speed of data retrieval from the image. The image which contains characters
can be scanned through scanner and then recognition engine of the OCR system interpret
the images and convert images of printed characters into machine-readable characters .It
improving the interface between man and machine in many applications.
The objective of OCR software is to recognize the text and then convert it to editable
form. Thus, developing computer algorithms to identify the character in the text is the principal
task of OCR. A document is first scanned by an optical scanner, which produces an image form
of it that is not editable. Optical character recognition involves. Translation of this text image
into editable character codes such as ASCII . The below diagram shows the processing
mechanism of OCR system as shown in figure 1.1.
11 | P a g e
1.3 Literature Survey
Benjamin Z. Yao, Xiong Yang, Liang Lin, Mun Wai Lee and Song-Chun Zhu [1]
proposed an image parsing to text description that generates text for images and video
content. Image parsing and text description are the two major tasks of his framework. It
computes a graph of most probable interpretations of an input image. This parse graph
includes a tree structured decomposition contents of scene, pictures or parts that cover all
pixels of image.
Over past decade many researchers form computer vision and Content Based
Image Retrieval (CBIR) domain have been actively investigating possible ways of
retrieving images and videos based on features such as color, shape and
objects[2][3][4][5][6].
Paper [7] introduced by Yi-Ren Yeh, Chun-Hao Huang, and Yu-Chiang Frank
Wang presents a novel domain adaptation approach for solving cross domain pattern
recognition problem where data and features to be processed and recognized are collected
for different domains,
S. Shahnawaz Ahmed, Shah Muhammed Abid Hussain and Md. Sayeed Salam
[8] introduced a model of image to text conversion for electricity meter reading of units
in kilo-watts by capturing its image and sending that image in the form of Multimedia
Message Service (MMS) to the server. The server will process the received image using
sequential steps: 1) read the image and convert it into a three dimensional array of pixels,
2) convert the image from color to black and white, 3) removal of shades caused due to
nonuniform light, 4) turning black pixels into white ones and vice versa, 5) threshold the
image to eliminate pixels which are neither black nor white, 6) removal of small
components, 7) conversion to text.
In [10] Fan-Chieh Cheng, Shih-Chia Huang, and Shanq-Jang Ruan gave the
technique of eliminating background model form video sequence to detect foreground
and objects from any applications such as traffic security, human machine interaction,
object recognition and so on. Accordingly, motion detection approaches can be broadly
classified in three categories: temporal flow, optical flow and background subtraction.
Iasonas Kokkinos and Petros Maragos [11] formulate the interaction between
image segmentation and object recognition using Expectation-Maximization (EM)
algorithm. These two tasks are performed iteratively, simultaneously segmenting an
image and reconstructing it in terms of objects. Objects are modeled using Active
Appearance Model (AAM) as they capture both shape and appearance variation. During
12 | P a g e
the E-step, the fidelity of the AAM predictions to the image is used to decide about
assigning observations to the object. Firstly start with oversegmentation of image and
then softly assign segments to objects. Secondly uses curve evolution to minimize
criterion derived from variational interpretation of EM and introduces AAMs as shape
priories.
Mina Makar, Vijay Chandrasekhar, Sam S. Tsai, David Chen and Bernd Girod
[13] proposed that streaming mobile augmented reality applications requires both real-
time recognition and tracking of objects of interest in a video sequence. A temporally
coherent keypoint detector and design efficient interframe predictive coding techniques
for canonical patches, feature descriptors and keypoint locations. Mobile Augmented
Reality (MAR) Systems are more important with growing interest in applications that use
image based retrieval on mobile devices. Streaming MAR applications require real-time
recognition and tracking of objects of interest.
First generation OCR systems The first commercialized OCR of this generation
was IBM 1418, which was designed to read a special IBM font407. The recognition
method was template matching, which compares the character image with a library of
prototype images for each character of each font [14].
Third generation OCR systems For the third generation of OCR systems, the
challenges were documents of poor quality and large printed and hand-written character
sets. Low cost and high performance were also important objectives. Commercial OCR
systems with such capabilities appeared during the decade 1975 to 1985[14].
OCRs Today (Fourth generation OCR systems) The fourth generation can be
characterized by the OCR of complex documents intermixing with text, graphics, tables and
mathematical symbols, unconstrained handwritten characters, color documents, low-quality noisy
documents, etc. Among the commercial products, postal address readers, and reading aids for the
blind are available in the market [14].
13 | P a g e
1.4 Motivation
As we can see in our daily lives , people take images of some documents when
they have no other source to take that document with them , but later they have to read
each and every word from it. So we thought to make a project in which we just take an
image and process it to extract the text present in the image. It saves a lot of time to read
the text from an image.
1.5 Applications
1. Banking -
The uses of image text recognition vary across different fields. One widely
known application is in banking, it is used to process checks without human
involvement.
A check can be inserted into a machine, the writing on it is scanned instantly,
and the correct amount of money is transferred. This technology has nearly been
perfected for printed checks, and is fairly accurate for handwritten checks as well,
though it occasionally requires manual confirmation. Overall, this reduces wait times
in many banks.
14 | P a g e
2. Legal -
In the legal industry, there has also been a significant movement to digitize
paper documents. In order to save space and eliminate the need to sift through boxes
of paper files, documents are being scanned and entered into computer databases.
3. Healthcare –
By using image recognition technology they are able to extract information from
forms and put it into databases, so that every patient's data is promptly recorded. As a
result, healthcare providers can focus on delivering the best possible service to every
patient.
4. Image –
15 | P a g e
Optical character recognition has been applied to a number of applications. Some
of them have been explained below.
5.Legal Industry-
OCR is used in Legal industry for digitize documents, and directly entered to
computer database. Legal professionals can further search documents required from huge
databases by simply typing a few keywords [15].
6.Healthcare –
Healthcare professionals always have to deal with large volumes of forms for each
patient, including insurance forms as well as general health forms. To keep up with all of this
information, it is useful to input relevant data into an electronic database that can be accessed as
necessary. Form processing tools, powered by OCR, are able to extract information from forms
and put it into databases, so that every patient's data is promptly recorded [15].
Initially it was aimed towards recognizing printed sheets which can be edited into
playable form with the help of electronic methods. It has many applications like processing of
different classes of music, large scale digitization of musical data and also it can be used for
diversity in musical notation [15].
9.Handwriting Recognition
It is the ability of a computer system which scans the image of handwritten text by
scanner and extracts only handwritten character from that image [16].
16 | P a g e
1.6 Advantages
❖ Retrieving invoice data is easy with the use of OCR. There’s no need to manually
input all of the data into digital format when the technology can do the job for
you!
❖ Extracting tables from documents can be a very lengthy task. One of the best
benefits of OCR is that you can swap hours of computer work for a 1-minute task.
Typing, typing, typing. Is that all you seem to do? OCR can help when a retyping
task is on your to-do list. There’s no need to spend hours at your computer desk
when the technology is capable of whizzing through the text for you. Save time,
effort and benefit from a fully-searchable digital document at the end.
Do you want to spend hours searching for data? Bulky files are gone thanks to the
use of OCR. You can scan all the documents that you need and they’ll be text
searchable too. Simply enter a keyword into the ‘searchable PDF’ next time you
need a piece of data quickly.
You won’t need to worry about physical storage, documents can be found in one
place – digitally. Digital versions are far easier to backup and you won’t risk any
tea spillages either. Control exactly where your data is saved and access it
whenever you need to.
Editing scanned documents can cause a headache. Therefore one of the best
benefits is it makes this task a breeze. It’ll swiftly convert them and allow you to
make the changes you need to in the format of your choice.
17 | P a g e
This Project Stop keeping clients on hold and benefit from running a simple
search on your computer. Get the data that you need quickly. Avoid the filing
cabinets and keep your customers engaged.
18 | P a g e
Chapter 2
REVIEW OF LITERATURE
As discussed earlier text recognition from images is still an active research in the
field of pattern recognition. To address the issues related to text recognition many
researchers have proposed different technologies, each approach or technology tries to
address the issues in different why. In this section we present a detailed survey of
approaches proposed to handle the issues related to text recognition.
Rhead has considered real world UK number plates and relates these to
ANPR. It considers aspects of the relevant legislation and standards when applying
them to real world number plates. The varied manufacturing techniques and varied
specifications of component parts are also noted. The varied fixing methodologies
and fixing locations are discussed as well as the impact on image capture.
Malakar has described that extraction of text lines from document images is
one of the important steps in the process of an Optical Character Recognition (OCR)
system. In case of handwritten document images, presence of skewed, touching or
overlapping text line(s) makes this process a real challenge to the researcher.
19 | P a g e
Tirthraj Dash et al have discussed HCR using associative memory net
(AMN) in their paper . They have directly worked at pixel level. Dataset was
designed in MS paint 6.1 with normal Arial font of size 28. Dimension of image
was kept 31 X 39.
[Vaidya 1999] use a feature-based approach for numeral recognition. They have
used a statistical method by assigning weights to each feature and assessing the numerals
using these weights. We also use the feature-based approach to recognize the handwritten
words. Our approach is different from theirs as words may be written in cursive writing
whereas numerals aren't. It is not always possible to break words into characters. Hence
we have to use a continuous process of matching the set of features to a database while
accounting for the permutations as new features come into view and old ones are
discarded. Also, the set of features in the case of alphabets is larger than that in the case
of numerals.
[Spitz 1998] use character shape coding process for typed word recognition. They
have a small dictionary to which all the words in the document belong. After scanning the
words, they are classified on the basis of the regions that they occupy (extending above
middle-line, extending below bottom-line or completely between the two). This narrows
down the range of possibilities for the word which is then matched against all these
possibilities. We had considered this approach but it would have been highly inefficient
in our case which is more general as ours is not restricted to a small fraction of a
dictionary nor is it restricted to typed documents where the characters are easily
distinguishable.
20 | P a g e
In 2004 N. M. Noor, M. Razaz and P. Manley-Cooke Proposed system using global
geometrics feature extraction and geometric density classifier for feature extraction then
neural fuzzy logic used for classification.
Evaluation of the system has achieve for Geometric Density 77.89% and Geometric
Feature 76.44% accuracy rate [6]. In 2010 Dewi Nasien, Habibollah Haron and Siti Sophiayati
Yuhaniz This studies Take three datasets from NIST database considered lowercase letters
189,411, uppercase letters 217,812 and combination of uppercase and Lowercase letters
407,223 sample are used.
Those Samples are divided into 80% for training and 20% for testing. For feature
extraction used Freeman chain code (FCC). Support vector system (SVM) is selected for
recognition step show nearest neighbor achieve 61.53% accuracy when neural network gives
57.69%. Math lab tool was used for features extracted and recognition. The evaluation
outcome suggests Nearest Neighbor is a better recognizer comparing with artificial neural
network when implemented to English Characters.
In 2015 Ashraf Abdel Raouf, Colin A. Higgins, Tony Pridmore and Mah-moud I. Khalil
Haar studied approach for recognizing Arabic character using Haar Cascade Classifier (HCC)
These classifiers were trained and tested on some 2,000 images. To extract feature Haar-like
feature extraction used and boosting of a classifier cascade. The system was tested with real text
image and produces 87% accuracy rate for Arabic character recognition.
In 2017 N. Lamghari, · M. E. H. Charaf and · S. Raghay On this research the data are
divided into three parts. From 34,000 character 70% are used for training, 15% for testing phase
and 15% for validation. To extract feature hybrid feature extraction used (pixel density,
resize, freeman code, structural features, invariant) for recognition used feed forward-back
propagation neural network. The system has achieved 98.27% high recognition rate.
21 | P a g e
In 2018 Noor A. Jebrila, Hussein R. Al-Zoubib and Qasem Abu Al-Haijac In addition
to the preprocessing step includes in particular three levels. In the primary section, they
employed word segmentation to extract characters.
In the second one section, Histograms of Oriented Gradient (HOG) are used for feature
extraction. The very last phase employed Support Vector Machine (SVM) for classifying
characters.
They have carried out the proposed method for the recognition of Jordanian
metropolis, city, and village names as a case examine, similarly to many other phrases
that offers the characters shapes that aren't included with Jordan cites. The set has cautiously
been selected to include each Arabic character in its all forms. To the conclusion, they have
got constructed their own dataset inclusive of greater than 43.000 handwritten Arabic
phrases (30000 used for training and 13000 used for testing stage). Recognition result show 99%
rate of accuracy.
In 2011 Gyanendra K. Verma, Shitala Prasad, and Piyush Kumar Curvelet present in
approach for Hindi handwritten character recognition using curvelet transformer.
The study are used dataset that contain 200 images of character (each image contains all
Hindi characters). Feature extract using curvelet transform and for recognition k-nearest
neighbor the experiment result show more than 90% accuracy .
In 2013 Divakar Yadav, Sonia Sánchez-Cuadrado and Jorge Morato develop optical
character recognition system using neural network for Hindi characters and trained with 1000
dataset. Feature extraction technique is histogram of projection based on mean distance,
on pixel values and vertical zero crossing.
Then classify using back-propagation neural network with two hidden layers.
Experimental result show 98.5% correct recognition.
22 | P a g e
In 2015 Akanksha Gaur and Sunita Yadav this system extract feature using k-means
clustering and classified used support vector machine using linear kernel and Euclidean
distance. The evaluation show that SVM has better results using linear kernel than
Euclidean distance.
In 2018 Nikita Singh present system with the title “An Efficient Approach for
Handwritten Devanagari Character Recognition Based on Artificial Neural Network” for
recognition hind character. For feature extraction they used histogram oriented gradients (HOG)
and recognition used artificial neural network (ANN) classifier. The system get 97.06% high
accuracy .
23 | P a g e
Chapter 3
TEXT RECOGNISTION SYSTEM
In this section we briefly describe the overall architecture of text recognition system
as shown in figure 3.1.
24 | P a g e
A text recognition system receives an input in the form of image which contains
some text information. The output of this system is in electronic format i.e. text information
in image are stored in computer readable form.
(A) pre-processing
(C) post-processing.
A. Pre-processing Module
The paper document is generally scanned by the optical scanner and is converted in
to the form of a picture. A picture is the combinations of picture elements which are also
known as pixels.
At this stage we have the data in the form of image and this image can be further
analyzed so that’s the important information can be retrieved. So to improve quality of the
input image, few operation are performed for enhancement of image such as noise removal,
normalization, binarization etc.
1) Noise Removal
Noise removal is one of the most important process. Due to this quality of the
image will increase and it will effect recognition process for better text recognition in
images. And it results in generation of more accurate output at the end of text recognition
processing. There are many methods for image noise removal such as mean filter, min-
max filter, Gaussian filter etc.
2) Normalization
25 | P a g e
3) Binarization
Binarization is a technique by which the gray scale images are converted to binary
images. This separation of text from background that is required for some operations such as
segmentation. Figure 3.2 shows a colour image
26 | P a g e
B. Text Recognition Module
This module can be used for text recognition in output image of pre-
processing model and give output data which are in computer understandable form.
Hence in this module following techniques are used.
1) Segmentation
2) Feature Extraction
Feature extraction is the process to retrieve the most important data from
the raw data. The most important data means that’s on the basis of that’s the
characters can be represented accurately.
zoning,
Histogram etc.
3) Classification
27 | P a g e
This process used extracted feature of text image for classification i.e. input
to this stage is output of the feature extraction process.
Classifiers compare the input feature with stored pattern and find out best
matching class for input. There are many technique used for classification such as
Template Matching,
C. Post-processing Module
The output of text recognition module is in the form text data which is understand by
computer, So there need to store it in to some proper format( i.e. text or MS-Word )for farther
use such as editing or searching in that data
28 | P a g e
The above figure 3.3 shows different processes which are done in OCR system (Fig. 3.3).
1. Image acquisition
Input image for OCR system might be acquire by scanning document or by capturing
photograph of document. This is also known as digitization process [17].
2. Preprocessing
Preprocessing consist series of operations and it used to enhance an image and make it
suitable for segmentation. Noise get introduced during document generation. So Proper filter like
mean filter, min-max filter, Gaussian filter etc. may be applied to remove noise from document.
. Binarization process converts gray scale or colored image to black and white image. To
enhance visibility and structural information of character Binary morphological operations like
opening, closing, thinning, hole filling etc. may be applied on image.
3. Segmentation
Generally document is processed in hierarchical way. At first level lines are segmented
using row histogram. From each row, words are extracted using column histogram and finally
29 | P a g e
characters are extracted from words. Accuracy of final result is highly depends on accuracy of
segmentation [17].
4. Feature extraction
Feature extraction is the important part of any pattern recognition application. Feature
extraction techniques like Linear Discriminant Analysis (LDA), Principle Component Analysis
(PCA),Independent Component Analysis (ICA), Chain Code (CC), Scale Invariant Feature
Extraction (SIFT),Gradient based features, Histogram might be applied to extract the features of
individual characters. These features are used to train the system [17].
5. Classification
When image is provided as input to OCR system, its features are extracted and given as
an input to the trained classifier like artificial neural network or support vector machine.
Classifiers compare the input feature with stored pattern and find out the best matching class for
input [17].
6. Post processing
This step is not compulsory; it helps to improve the accuracy of recognition. Syntax
analysis, semantic analysis kind of higher level concepts might be applied to check the context of
recognized character [17]
1) Pre-Processing.
2) Feature Extraction.
3) Image Segmentation.
4) Text Conversion.
30 | P a g e
5) Text-to-Speech synthesis.
1. Edge Detection
A set of connected pixels that forms a boundary between two disjoint regions is known as
an edge. The task of segmenting an image into regions of discontinuity is done using edge
detection. Edges usually occur on the boundary of two different boundaries in an image. Edge
detection helps to clearly identify the changes in region of an image where gray scale and texture
change in the regions of an image.
There are many available edge detection techniques for extracting edges from images
such as Robert, Prewitt and Sobel which were not much efficient. Then in 1986 John. F. Canny
developed an algorithm which provided high probability of edge detection and error rate.
2. Canny Algorithm
This algorithm focuses mainly on three main aims of low error rate, minimize distance
between real edge and detected edge and minimum response i.e. one detector response per edge
to detect the edges in an image.
3. Image Segmentation
Segmentation is often one of the critical steps in analyzing the images because additional
overhead of moving to each new pixel of an image while working with object in an image. Once
image segmentation is done successfully, the other stages in image analysis are much easier.
31 | P a g e
While considering a fully automatic conversion algorithm, the success of image
segmentation is partial and sometimes requires manual intervention. Segmentation mainly has
two main objectives:
2) perform change in organizing the pixels of image into higher-level units so that the
objects become more meaningful.
32 | P a g e
They are:
1)Input image:
2)Pre-processing:
In this phase pre-requesit processing on the input images such as removing noise and making it
more usable to be recognizable for the system are carried out.
3)Feature extraction:
This phase is one of the important one. Extracting priliminary features and dividing them into
geometric elements like arc, line and circle and comparing these elements with known set of
characters which are store in the database.
After feature extraction, system requires assistance of database in order to recognize the objects
in the image, so matching is done.
5)Generate text:
After successful recognition of objects, it is now important task of the system to generate
appropriate text for every input image.
6)Speech output:
33 | P a g e
The appropriate speech output for the generated text is given in the final phase.
Widely used as a form of data entry from printed paper data records – whether passport
documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of
static-data, or any suitable documentation – it is a common method of digitizing printed texts so
that they can be electronically edited, searched, stored more compactly, displayed on-line, and
used in machine processes such as cognitive computing, machine translation, (extracted) text-to-
speech, key data and text mining. OCR is a field of research in pattern recognition, artificial
intelligence and computer vision.
Early versions needed to be trained with images of each character, and worked on one
font at a time. Advanced systems capable of producing a high degree of recognition accuracy for
most fonts are now common, and with support for a variety of digital image file format inputs.
Some systems are capable of reproducing formatted output that closely approximates the
original page including images, columns, and other non-textual components.
3.5. Preprocessing
The OCR success rate is contingent on the success percentage of each stage.
34 | P a g e
A. Factors Affecting the Text Recognition Quality
Many factors influence the precision of character recognized using OCR. The factors are
scan resolution, scanned image quality, printed documents category either photocopied or laser
printer, quality of the paper, and linguistic complexities. The uneven illumination and
watermarks are few factors faced in OCR system that influence the accuracy of OCR.
The preprocessing step is necessary to obtain better text recognition rate, using efficient
algorithms of preprocessing creates the text recognition method robust using
noise removal,
skewing correction,
C. Preprocessing methods
The majority of OCR application uses binary / gray images. The images mayhave
watermarks and/or non-uniform background that make recognition process difficult without
performing the preprocessing stage. There are several steps needed to achieve this.
1. The initial step is to adjust the contrast or to eliminate the noise from the image called
as the image enhancement technique.
2. The next step is to do thresholding for removing the watermarks and/or noise
followed by the page segmentation for isolating the graphics from the text.
3. The next step is text segmentation to individual character separation followed by
morphological processing.
35 | P a g e
The morphological processing is required to addpixels if the preprocessed image has
eroded parts in the characters.
The filters are applied to defeat the high or low frequencypresent in the image.
Eliminating the high frequencies in the image is smoothing, and the low frequencyelimination is
enhancing or edge detection in the image.
The following figure 3.4 shows the original image and 3.5 and 3.6 shows the images
applied with Prewitt and Canny edge detection methods. These filtering techniques may give
effective text detection from images available in natural scene.
36 | P a g e
Fig 3.6 : Edge detection Prewitt method
1. Global thresholding:
Image thresholding is the method of isolating the information from its background.
Hence, this method is usednormallyto grey-level, or scanned color images and itis categorized as
global and localthresholding.
37 | P a g e
Globalmethod of thresholding chooses avalue of threshold for the completeimage from
the intensity histogram. Global thresholdingautomatically reduces a greylevel image to a binary
image. The local adaptive thresholding method for each pixel it uses different values based on
the information of local area.
The figure 3.8 shows the global threshold applied using Otsu’s method.
38 | P a g e
CHAPTER-4
SYSTEM REQUIREMENTS
I. Android
The android is an open-source operating system that means that it’s free and anyone can
use it. The android has got millions of apps available that can help you manage your life
one or other way and it is available to low cost in the market for that reason android is
very popular.
Android Architecture:
• Linux kernel
• Libraries
39 | P a g e
• Android runtime
Linux kernel:
The android uses the powerful Linux kernel and it supports a wide range of hardware
drivers. The kernel is the heart of the operating system that manages input and output requests
from the software. This provides basic system functionalities like process management, memory
management, device management like camera, keypad, display etc the kernel handles all the
things. Linux is really good at networking and it is not necessary to interface it to the peripheral
hardware. The kernel itself does not interact directly with the user but rather interacts with the
shell and other programs as well as with the hardware devices on the system.
Libraries:
The on top of a Linux kennel there is a set of libraries including open-source web
browsers such as WebKit, library libc. These libraries are used to play and record audio and
video. The SQLite is a database that is useful for the storage and sharing of application data. The
SSL libraries are responsible for internet security etc.
Android Runtime:
The android runtime provides a key component called Dalvik Virtual Machine which is a
kind of java virtual machine. It is specially designed and optimized for android. The Dalvik VM
is the process virtual machine in the android operating system. It is a software that runs apps on
android devices.
40 | P a g e
Application framework:
The application framework layer provides many higher-level services to applications such as
windows manager, view system, package manager, resource manager, etc. The application
developers are allowed to make use of these services in their application.
You will find all the android applications at the top layer and you will write your application and
install it on this layer. Examples of such applications are contacts, books, browsers, services, etc.
Each application performs a different role in the overall applications.
1. ANDROID STUDIO
Android Studio is the official integrated development environment (IDE) for Android
application development. It is based on the IntelliJ IDEA, a Java integrated development
environment for software, and incorporates its code editing and developer tools.
Android Studio uses an Instant Push feature to push code and resource changes to a
running application. A code editor assists the developer with writing code and offering code
completion, refraction, and analysis. Applications built in Android Studio are then compiled into
the APK format for submission to the Google Play Store.
41 | P a g e
4.3 Hardware Requirement:
RAM is short for “random access memory” and while it might sound mysterious, RAM is one of
the most fundamental elements of computing. RAM is the super-fast and temporary data storage
space that a computer needs to access right now or in the next few moments.
42 | P a g e
Chapter 5
IMPLEMENTATION
Implementation is the process of converting a new system into an operational one. The
designed system is converted one using suitable programming languages. Implementation
includes all those activities that takes place to convert old system to new.
For implementation of this project we used Android studio IDE and Java Programming
language.
A programming language is our way of communicating with software. The people who
use programming languages are often called programmers or developers. The things we tell
software using a programming language could be to make a mobile application look a certain
way, or to make an object on the page move if the human user takes a certain action.
5.1.1 JAVA
Java is a programming language that has been around a lot longer than Android. It is an
object-oriented language. This means it uses the concept of reusable programming objects. If this
sounds like technical jargon, another analogy will help. Java enables us and others (like the
Android development team) to write Java code that can be structured based on real-world things,
and here is the important part: it can be reused.
43 | P a g e
5.2 Source Code:
5.2.1 MainActivity.java
EditText mResultEt;
ImageView mPreviewIv;
String cameraPermission[];
String storagePermission[];
Uri image_uri;
@Override
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_ main);
mResultEt = findViewilyId(R.id.resultit);
mProvIewIv = findViewElyId(R.id.imageIv);
44 | P a g e
@Override
getMenuInflater().inflate(R.menu.menu_main, menu);
return true;
@Override
int id = item.getItemId();
if (id == R.id.addImage){
showImageImportDialog();
if (id == R.id.settings){
return super.onOptionsItemSelected(item);
dialog.setTitle("Select Image");
if (which== 0)(
if (!checkCameraPermission()) {
requestCameraPermission();
45 | P a g e
}
else (
pickCamera();
if (which ==1)({
if (!checkStoragePermission()){
requestStoragePermission();
else{
pickGallery();
}});
dialog.create().show();
intent.setType(“image/*");
startActivityForReault(intent, IMAGE_PICK_GALLERY_CODE);
46 | P a g e
ContentValues values = new ContentValues();
image_uri =
getContentResolver().insert(MediaStore.Images.Media.EXTERNAL_CONTENT_URI, values),
startActivityForResult(cameraIntent, IMAGE_PICK_CAMERA_CODE):
return results ;
/*Check camera permission and return the result *In order to get high quality image we have to
save image to external storage first before inserting to image view that's why storl- : will also be
required*/ boolean result = ContextCompat.checkSelfPermission( this,
Manifest.permission.CAMERA) ==(PackageManager.PERMSSION GRANTED):
47 | P a g e
public void onRequestPermissionsResult (int reguestCode, String[] permissions, @NonNull)
switch (requestCode){
if (grantResults.length >0){
pickCamera();
else {
Toast.makeText(this,”permissiondenied”,Toast.LENGTH SHORT).show();
} break;
case STORAGE_REQUEST_CODE:
if (grantResults.length >0){
if (writeStorageAccepted){
pickGallery();
else {
Toast.makeText(this,”permissiondenied”,Toast.LENGTH SHORT).show();
@Override
if (resultCode == RESULT_OK){
48 | P a g e
if (requestCode ==IMAGE_PICK_GALLERY_CODE){
CropImage.activity(data.getData()) .setGuidelines(CropImageView.Guidelines.ON)
.start(activity this);
{ Croplmage.ActivityResult = CropImage.getActivityResult(data);
5.2.2 Activity_main.xml
<RelativeLayout
xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:tools="http://schemas.android.com/tools"
x mlns:app="http://schemas.android.com/apk/res-auto"
android:layout_width="match_parent"
android:layout_height="match_parent"
tools:context="MainActivity"›
49 | P a g e
<ScrollView
android:layout_width=”match_parent”
android:layout_height="wrap_content”>
<LinearLayout
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:orientation="vertical>
<android.support.v7.widget.CardView
android.layout_width=”match-parent”
androld:layout_height=”wrap_content”
app:cardBackgroundColor="fff”
app:cardUseCompatPadding="true"
app:cardCornerRadius="3dp"
app:cardElevation="3dp”>
<LinearLayout
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:orientation="vertical"
android:padding=”5dp”>
<TextView
android:text="Result"
android:textColor="@color/colorPrimary"
android:textSize="20sp"
50 | P a g e
android:layout_width=”match_parent”
android:layout_height="wrap_content”/>
<EditText
Android:id=”@+id/resultEt”
android:autoLink="all"
android:background=”@null”
android:padding="5dp”
android:textColor="#000”
android:layout _width=”match_parent"
</LinearLayout>
</android.support.v7.widget.CardView>
<android.support.v7.widget.CardView
android.layout_width=”match-parent”
androld:layout_height=”wrap_content”
app:cardBackgroundColor="fff”
app:cardUseCompatPadding="true"
app:cardCornerRadius="3dp"
app:cardElevation="3dp”>
<LinearLayout
android:layout_width="match_parent"
android:layout_height="wrap_content"
51 | P a g e
android:orientation="vertical"
android:padding=”5dp”>
<TextView
android:textColor="@color/colorPrimary"
android:textSize="20sp"
android:layout_width=”match_parent”
android:layout_height="wrap_content”/>
<ImageText
android:id=”@+id/imageIV”
android:layout _width=”wrap_parent"
</LinearLayout>
</ScrollView>
</RelativeLayout>
Fig5.1: Remote Debugging lets you inspect a page running on an Android device from your
development machine.
4. Open DevTools.
5. In DevTools, click the Main Menu Main Menu then select More tools > Remote devices.
53 | P a g e
Fig 5.2: Opening the Remote Devices tab via the Main Menu
54 | P a g e
8. Connect your Android device directly to your development machine using a USB cable.
The first time you do this, you usually see that DevTools has detected an unknown
device. If you see a green dot and the text Connected below the model name of your
Android device, then DevTools has successfully established the connection to your
device. Continue to Step 2.
Fig5.4: The Remote Devices tab has successfully detected an unknown device that is
pending authorization
9. If your device is showing up as Unknown, accept the Allow USB Debugging permission
prompt on your Android device.
Step 2: Debug content on your Android device from your development machine
55 | P a g e
2. In the Remote Devices tab, click the tab that matches your Android device model name.
At the top of this page, you see your Android device's model name, followed by its serial
number. Below that, you can see the version of Chrome that's running on the device, with
the version number in parentheses. Each open Chrome tab gets its own section. You can
interact with that tab from this section. If there are any apps using WebView, you see a
section for each of those apps, too. In Figure 5 there are no tabs or WebViews open.
Inspect elements
Go to the Elements panel of your DevTools instance, and hover over an element to highlight it
in the viewport of your Android device.
56 | P a g e
You can also tap an element on your Android device screen to select it in
the Elements panel. Click Select Element on your DevTools instance, and then tap the element
on your Android device screen. Note that Select Element is disabled after the first touch, so you
need to re-enable it every time you want to use this feature.
Click Toggle Screencast to view the content of your Android device in your DevTools
instance.
• Clicks are translated into taps, firing proper touch events on the device.
• To scroll, use your trackpad or mouse wheel, or fling with your mouse pointer.
Some notes on screencasts:
• Screencasts only display page content. Transparent portions of the screencast represent
device interfaces, such as the Chrome address bar, the Android status bar, or the Android
keyboard.
• Screencasts negatively affect frame rates. Disable screencasting while measuring scrolls
or animations to get a more accurate picture of your page's performance.
• If your Android device screen locks, the content of your screencast disappears. Unlock
your Android device screen to automatically resume the screencast.
57 | P a g e
Chapter 6
RESULTS AND ANALASYS
We can see
1. Image Preview tab below which gives the preview of the image and
2. Result Tab top to it which shows output text and we can see a icon in the
top right corner, it is used to select the Image.
58 | P a g e
6.2 Selecting Image
If we click on that icon, these are the otions we can see. We can directly take photo of a
printed copy by using Camera button; or take the photo from the Gallery.
59 | P a g e
6.3 Cropping Image
After selecting the image, we can see crop it to which we want .
60 | P a g e
6.4 Output
Finally we get the ouput.In the result top we are showing with the ouput text.In the bottom we
can see the image that we have cropped
61 | P a g e
6.5 Further usage of data
The result Ouput is editable. As we can see the options there, we can cut,copy,paste and
share the selected data
62 | P a g e
6.6 Sample outputs :
The application is observed on different types of images, posters, handwritten texts,
webpages screen shots etc., those outputs of the application are given below.
(a) (b)
(c) (d)
63 | P a g e
(e) (f)
(g) (h)
64 | P a g e
Chapter 7
CONCLUSION AND FUTURE SCOPE
7.1Conclusion
Nowadays, a lot of documents are produced in paper form but it is obvious, that
automatic data recognition systems are very popular. The document is repeatedly copied and
changed during subsequent processing steps, so it exists in many different copies. In some
applications they can successfully help humans, but in some cases they are useles.
The processing of images is faster and more cost-effective. One needs less timefor processing,
as well as less film and other photographing equipment. It is more ecological to process images.
No processing or fixing chemicals areneeded to take and process digital images.
In this way we surveyed many techniques which are necessary to implement image to
text as well as speech system. Our contribution towards this work will surely be helpful for blind
as well as physically disabled people of our society. This is a small help form our side for such
people to make them more interact able with real world. More focus is on recognition of object
in an image. This will ultimately result in identifying important objects from an image. This
paper contains an abstract view of various technique proposed in recent past year for image to
text conversion and text to speech conversion.
7.2Future Work
We can further extend the project to recognize other languages scripts. That will be helpful
to make a lot of very sacred and important Ancient books, Upanishads, Novels, Holy books etc can be
easily digitized.
The project can be implemented on intranet in future. Project can be updated in near
future as and when requirement for the same arises, as it is very flexible in terms of expansion.
With the proposed software of database Space Manager ready and fully functional the client is
now able to manage and hence run the entire work in a much better, accurate and error free
manner.
65 | P a g e
The work reported in this thesis can be extended in the following directions. 1. Font
Independent OCR An Optical Character Recognition system could be developed by considering
the multiple font style in use. Our approach is very much useful for the font independent case.
Because, for font or character size, it finds the string and the strings are parsed to recognize the
character. Once character is identified, the corresponding character could be ejected through an
efficient editor. Efforts have been taken to develop a compatible editor for Tamil and English
. OCR for Tamil and other Indian Languages Except Bangla and Hindi, all other Indian
languages require development of an OCR for printed characters, and for handwritten characters,
OCR has to be developed for all languages (including Bangla and Hindi). Of course, OCR for
printed characters are easy when compared to the handwritten cursive scripts. Even for the
printed document recognition, an OCR should be able to perform the all features besides
character recognition, such as spell check, sentence and grammar check, Also an editor with key
board encoding and font encoding is required. With this approach the printed and handwritten
characters are recognizable easily with less effort and more accuracy. A module for Skew
correction and line separation, word and character separation along with an editor with spell
checker and grammar checker could be designed for ‘developing a complete OCR. Further, with
a little fine tuning on the modules, such as, skew correction and line separation, word and
character separation, a complete OCR could be designed for handwritten scripts of any language
for that matter. It is proposed to apply the approach to all manuscripts recognition of South-
Indian languages. Since some of the characters in some of the languages have similar characters
viz, Tamil and Malayalam have similar features among few characters, and Telugu and Kannada
have similarity among most of the characters, our approach could be applied for these languages
and could be extended to all other languages.
Cursive Characters OCR There is heavy demand for an OCR system which recognizes
cursive scripts and manuscripts like Palm Leaves. This actually avoids keyboard typing and font
encoding too. Steps have been taken in our laboratory to develop an OCR for handwritten Tamil
characters. 94 4. Language Converter through OCR Once a complete OCR has been developed
for two languages with font encoding, spell checker and grammatical sentence check, then a
converter could be implemented to convert sentences from one language to another language.
66 | P a g e
LIST OF FIGURES
Fig5.1: Remote Debugging lets you inspect a page running on an Android device
from your development machine.
Fig 5.2: Opening the Remote Devices tab via the Main Menu
Fig 53: The Discover USB Devices Checkbox is enabled
Fig5.4: The Remote Devices tab has successfully detected an unknown device
that is pending authorization
Fig5.5: A connected remote device
Fig 6.1 :Intial View Of the Application
Fig 6.2: Selecting Image Of the Application
Fig 6.3: Cropping Image Of the Application
Fig6.4: Output Of the Application
Fig 6.5: Further Usage Of Data
Fig 6.6 Outputs on different types of images
67 | P a g e
ABBREVATIONS
68 | P a g e
REFERENCES
[1] Benjamin Z. Yao, Xiong Yang, Liang Lin, Mun Wai Lee
and Song-Chun Zhu, “I2T: Image Parsing to Text
Description” IEEE Conference on Image Processing,
2008 .
[2] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta
and R. Jain, “Content-based image retrieval at the end of
the early years,” IEEE Transactions PAMI, vol 22, no.
12, 2000.
[3] Y. Rui, T. S. Huang, and S. F. Chang, “Image retrieval:
Current techniques, promising directions, and open
issues,” Journal of Visual Communication and Image
Representation, vol. 10,1999.
[4] M. S. Lew, N. Sebe, C. Djeraba, and R. Jain, “Contentbased multimedia information
retrieval: State of the art
and challenges,” ACM Transactions on Multimedia
Computing, Communications, and Applications, vol. 2,
no. 1, pp. 1-19, Feb. 2006.
[5] C. Snoek and M. Worring, “Multimedia video indexing:
A review of the state-of-the-art,” Multimedia Tools
Appl, vol. 25, no. 1, 2005.
[6] R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image
retrieval: Ideas, influences, and trends of the new age,”
ACM Computing Surveys, vol. 40, no. 2, pp. 1-60, Apr.
2008.
[7] Yi-Ren Yeh, Chun-Hao Huang, and Yu-Chiang Frank
Wang, “Heterogeneous Domain Adaptation and
Classification by Exploiting the Correlation Subspace,”
IEEE Transactions on Image Processing, vol. 23, no. 5,
May 2014.
[8] S. Shahnawaz Ahmed, Shah Muhammed Abid Hussain
and Md. Sayeed Salam, “A Novel Substitute for the
Meter Readersin a Resource Constrained Electricity
Utility” IEEE Trans. On Smart Grid, vol. 4, no. 3, Sept.
69 | P a g e
2013.
[9] A. Abdollahi, M. Dehghani and N. Zamanzadeh, “SMSbased reconfigurable
automatic meter reading system,”
in Proc. 16th IEEE Int. Conf. Control Appl. Part IEEE
Multi-Conf. Sust. Control Singapore, Oct. 1-3, 2007, pp.
1103-1107.
[10]Fan-Chieh Cheng, Shih-Chia Huang and Shanq-Jang,
“Illumination-Sensitive Background Modeling Approach
for Accurate Moving Object Detection,” IEEE Trans. On
Broadcasting, vol. 57, no. 4, Dec 2011.
[11]Iasonas Kokkinos and Petros Maragos, “Synergy
between Object Recognition and Image Segmentation
using the Expectation-Maximization Algorithm”, IEEE
Transactions on Pattern Analysis and Machine
Intelligence, vol. 31, no. 8, Aug. 2009.
[12]T. Cootes, G. J. Edwards and C. Taylor, “Active
Appearance Models,” IEEE Trans. Pattern Analysis and
Machine Intelligence, vol. 23, no. 6, pp. 681-685, June
2001.
[13]Mina Makar, Vijay Chandrasekhar, Sam S. Tsai, David
Chen and Bernd Girod, “Interframe Coding of Feature
Description for Mobile Augmented Reality” IEEE
Trans. Image Processing, vol. 23, no. 8, Aug 2014.
70 | P a g e
Character Recognition” by Mansi shah &
Gordhan B Jethava Department of Computer
Science & Engineering Parul Institute of
Technology, Gujarat, India. Information
Technology Department Parul Institute of
Engg. & Technology, Gujarat, India.
71 | P a g e
Appendices
A. https://www.slideshare.net/IAMINURHEARTS1/ocr-ppt-35272335
B. https://www.sciencedirect.com/topics/engineering/image-processing
C. https://shodhganga.inflibnet.ac.in/bitstream/10603/36771/16/16_chepter%207.pd
f
D. https://shodhganga.inflibnet.ac.in/bitstream/10603/9849/11/11_chapter%206.pdf
E. https://mail.google.com/mail/u/1/#sent/QgrcJHsbgZXbhtDcSmXrbzQHVWVwQLSjZ
xl?projectr=1&messagePartId=0.1https://mail.google.com/mail/u/1/#sent/QgrcJHs
bgZXbhtDcSmXrbzQHVWVwQLSjZxl?projector=1&messagePartId=0.1
F. https://www.slideshare.net/avisek_roy91/digital-image-processing-
12632314#:~:text=Conclusion%20The%20processing%20of%20images,take%20and
%20process%20digital%20images.
G. http://cas.sdss.org/DR6/en/proj/advanced/processing/conclusion.asp
H. https://shodhganga.inflibnet.ac.in/bitstream/10603/176215/13/14_chapter%205.p
df
72 | P a g e
Student Bio-Data :
1.
Name: E.B. Meghana
Father Name: E.B. Ravi teja goud
Roll. No: 1610116
Date of Birth: 26/09/1998
Nationality: Indian
Communication Address:
Town/Village: Veldurthi Mandal: Veldurhi District: Kurnool
PIN Code: 518216
Ph. No: 8341200889
e-mail: meghanabashyam@gmail.com
Permanent Address:
Town/Village: Veldurthi Mandal: Veldurhi District: Kurnool
PIN Code: 518216
Ph. No: 8341200889
e-mail: meghanabashyam@gmail.com
Qualifications:
Degree: Bachelor of Technology
Branch: Computer Science and Engineering
Technical Skills:
Languages : C, Java, Html, Css
Softwares : Android Studio, Eclipse IDE, Star UML, R Studio
Basic Computer Skills : MS Office, Power point, DB
Management
Advanced Computer Skills : Web Development, Data
Structures, Coding, Debugging
Area of Interest:
Big Data, Data Base Managemnet, Web Development
73 | P a g e
2.
Name: G. Mahalakshmi
Father Name: G. Ramakrishna
Roll. No: 1610121
Date of Birth: 19/07/1999
Nationality: Indian
Communication Address:
Town/Village: Veldurthi Mandal: Veldurhi District: Kurnool
PIN Code: 518216
Ph. No: 7386053083
e-mail: mahalakshmig1967@gmail.com
Permanent Address:
Town/Village: Veldurthi Mandal: Veldurhi District: Kurnool
PIN Code: 518216
Ph. No: 7386053083
e-mail: mahalakshmig1967@gmail.com
Qualifications:
Degree: Bachelor of Technology
Branch: Computer Science and Engineering
Technical Skills:
Languages : C, Java, Html, Css
Softwares : Android Studio, Eclipse IDE, Star UML, R
Studio
Basic Computer Skills : MS Office, Power point, DB
Management
Advanced Computer Skills : Web Development, Data
Structures, Coding, Debugging
Area of Interest:
Big Data, Data Base Managemnet, Web Development.
74 | P a g e
DECLARATION
We hereby declare that the project report entitled “IMAGE TO TEXT APP” submitted to the
Department of Computer Science and Engineering, Sri Krishnadevaraya University, Anantapuramu for
the partial fulfilment of the academic requirement for the degree for Bachelor of Technology in Computer
Science and Engineering is an authentic record of our work carried out during the final year under the
esteemed guidance of Mrs. D.GOUSIYA BEGUM, M.Tech(CSE), Lecturer of Computer
Science and Engineering Department, College of Engineering and Technology, Sri Krishnadevaraya
University, Ananthapuramu.
1.
2.
75 | P a g e