You are on page 1of 37

AUTOMATICALLY DETECT AND RECOGNIZE TEXT IN

NATURAL IMAGES USING MATLAB


Miniproject Report submitted for the award of the degree of
Bachelor of Engineering
In
ELECTRONICS & COMMUNICATION ENGINEERING
(OSMANIA UNIVERSITY, HYDERABAD)
By
RUCHITHA REDDY.T H.T. No. 2451-19-735-107
MANISHA REDDY.K H.T. No. 2451-19-735-111
TWINKLE LUDHANI H.T.NO.2451-19-735-116
Under the Guidance of
T.KAVITHA
Associate Professor,Dept.of Electronics and Communication Engineering,
Maturi Venkata Subba Rao Engineering College,Hyderabad.

Department of Electronics & Communication Engineering

MATURI VENKATA SUBBARAO (MVSR)

ENGINEERING COLLEGE

(Sponsored by Matrusri Education Society, Estd 19810


(Affiliated to OU : Accredited by NBA & NAAC)
Febraury 2020-2021

1|Page
CERTIFICATE

This is to certify that the Project Report entitled “AUTOMATICALLY DETECT AND
RECOGNIZE TEXT IN NATURAL IMAGES USING MATLAB” is the bonafied record of the
project work carried out under my Guidance and Supervision by

RUCHITHA REDDY.T R.NO 2451-19-735-107

MANISHA REDDY.K R.NO 2451-19-735-111

TWINKLE LUDHANI R.NO 2451-19-735-116

in partial fulfillment of the requirements for the award of Bachelor of Engineering in


Electronics and Communication Engineering and is submitted in the Department of
Electronics and Communication Engineering, Maturi Venkata Subba Rao Engineering
College, Hyderabad .

Project Guide Head of the Dept.

Mrs.T KAVITHA Dr. S. P. Venu Madhav Rao


Assoc Prof,ECED Professor,ECED
Maturi Venkata Subba Rao Maturi Venkata Subba Rao
Engineering college Engineering College

2|Page
CANDIDATE'S DECLARATION

We hereby certify that the work which is being presented in the report entitled
“AUTOMATICALLY DETECT AND RECOGNIZE TEXT IN NATURAL IMAGES USING MATLAB”
in partial fulfilment of requirements for the award of degree of B. E. (Electronics &
Communication Engineering) submitted in the Department of Electronics & Communication
at MATURI VENKATA SUBBA RAO (MVSR) ENGINEERING COLLEGE under OSMANIA
UNIVERSITY, Hyderabad is an authentic record of our own work carried out under the
supervision of T KAVITHA. The matter presented in this report has not been submitted by us
in any other University / Institute for the award of any degree.

Signature of the Student

RUCHITHA REDDY .T

MANISHA REDDY.K

TWINKLE LUDHANI

This is to certify that the above statement made by the candidate is correct to the best of
my knowledge.

Signature of the Guide

Mrs. T. KAVITHA,
Assoc. Prof., ECED

3|Page
ACKNOWLEDGEMENTS

This is an acknowledgement of the intensive drive and technical competence of many


individuals who have contributed to the success of our Project.

This is with sincere gratitude that we would like to express our profound thanks to our guide
Mrs. T KAVITHA, Assoc. Prof., ECED, Department of Electronics and Communication
Engineering, for her valuable guidance and support. She has been a constant source of
encouragement and inspiration for us in completing this work.

We express our sincere thanks to Dr. G. Kanaka Durga, Principal, MVSR Engineering College,
for facilitating us to carry out our project work inside the campus and for providing us with
all the necessary facilities and for her constant encouragement.

Finally, we express our sincere thanks to our family members for their continuous co-
operation and encouragement extended during the project work.

RUCHITHA REDDY.T H.T. No. 2451-19-735-107

MANISHA REDDY.K H.T. No. 2451-19-735-111

TWINKLE LUDHANI H.T.NO. 2451-19-735-116

4|Page
ABSTRACT

Text characters in natural scenes and surroundings provide us with valuable information
about the place and even about some important/legal information. But, it’s not really easy
to recognize those text information due to distortions, hence it is important for us to detect
such texts and recognize them.

Automated text detection is a difficult computer vision task. In order to accurately


detect and identify text in an image, two major problems must be addressed. The primary
problem is implementing a strong and reliable method to distinguish between text and non-
text regions in images. Part of difficulty stems from unlimited combinations of fonts, lighting
conditions, distortions and other variations that can be found in the images.

This project explores key properties of some popular and proven methods of text
detection- maximum stable external regions (MSER), geometric properties and stroke width
variation. For the detection of text regions, firstly some pre-processing is applied to the
natural image and then after detecting MSERs some geometric properties and stroke width
variation is applied remove the non-text region. Then we merge the text regions to get a
final bounding box, finally OCR is performed on the selected region i.e on the final bounding
box.

This text detection and recognition helps the visually impaired as they get a
computerized aid and it also helps in automatic license plate checking system.

5|Page
LIST OF CONTENTS:

List of Figures 8

CHAPTER 1 – INTRODUCTION
1.1 Objective Of The Project 10

CHAPTER 2 - LITERATURE SURVEY


2.1 Existing Methods And Backgrounds 12

CHAPTER 3- THEORETICAL ASPECTS OF PROJECT


3.1 Introduction 14

3.2 Maximally stable extremal regions (MSER) 14

3.3 Stroke Width Transform(SWT) 15

3.4 Optical character recognition (OCR ) 17

3.3 Applications Of The Project 18

3.4 Advantages 18

3.7 Disadvantages 19

CHAPTER 4 - BLOCK SCHEMATIC OF THE PROJECT


4.1 Block Diagram 20

4.2 Algorithm / Flowchart 21

4.3 Implementation 22

6|Page
CHAPTER 5 - OVERVIEW OF THE SOFTWARE TOOL USED
5.1 Introduction To MATLAB 29

5.2 Features Of MATLAB 30

CHAPTER 6 – RESULTS 32

CHAPTER 7 - CONCLUSION AND FUTURE SCOPE


8.1 Conclusion 35

8.2 Future Scope 35

REFRENCES 36

7|Page
LIST OF FIGURES

Figure No Figure PAGE NO

3.2 Maximally stable extremal regions 15


(MSER)

3.3 Stroke Width Transform(SWT) 16

3.4 Optical character recognition (OCR) 18

4.1 Block Diagram 20

4.2 Algorithm/Flowchart 21

4.1 Preprocessing the image 22

4.2 Detect candidate region using 23


MSER

4.2 Removing non text regions 24


(geometric)

4.2 Removing non text regions 25


(stroke width)

4.2 Removing non text regions 26


(stroke width)

4.2 Merge text regions 27

4.2 Merge text regions 28

4.2 Recognize detected text using OCR 28

8|Page
5.1 Introduction to MATLAB 29

5.2 Features to MATLAB 31

6 Results 32

6 Results 34

9|Page
CHAPTER 1 – INTRODUCTION

Text detection in natural images has vital role in the field of artificial intelligence, augmented
reality and other innovations. It helps to remove noise in images and identify text.
Nonetheless, it is additionally a difficult issue because of the variability in imaging conditions,
for example, lighting, specular reflections, commotion, obscure, and nearness of blocks over
the content, and in the changeability of the content itself, for example, its scale, introduction,
textual style, and style. Great text discovery calculations should accordingly be vigorous
against such fluctuations. The text detection plays an important role in life as it is used for
vision type applications.

Currently it faces difficulties like background complexity, different direction of the text,
complexity in backgrounds, diversity of scene text and interference factors etc. In order to
overcome these difficulties, CNN (Convolutional neural networks) and related methods are
used , and also in some cases and valley and ridge techniques are used. Here the result is to
detect the text and increasing the accuracy level. Here to implementing different methods for

text detection and classification.

Natural Image contains many technical and digital information used in the
different fields of computer vision. In recent years, visual detection and recognition of text
from image is claimable because of its application in content based image searching,
automatic number plate recognition, extracting passport or business card or bank statement
information, converting handwriting to real time control of computer, etc. But owing to
irregular background, variations of font style, size, colour, orientation, geometric and
photometric distortion, text must have to be robustly detected from any natural scene
image.

10 | P a g e
Text detection and recognition is a mandatory requirement for many content based
image analysis applications. Recognition part is handled for most of the languages by Optical
Character Recognition (OCR) engines with high accuracies by employing deep learning-based
approaches. However OCR engines are working quite poorly when they are given whole images
as stated in . Therefore, in order to detect text regions in an image, elimination of non-text
regions is one way to approach to the problem. As in ,both natural and computer-generated
images have some specific issues to deal with.

Natural images may have clutter and full of object scenes .On the contrary, computer-
generated images usually have wide range of colour schemes and flashy graphics, especially if
taken from a high resolution movie or a video game. Our approach utilizes Maximally Stable
Extremal Regions(MSERs) first, as proposed in . Although MSER has a disadvantage in blurry
images, it is quite powerful at detecting stable regions in an image by applying series of
thresholds. Then, separating text and non-text regions by using some metrics related to stroke
widths is the critical part of the system. These metrics mostly benefit from the geometric
attributes of text and non-text regions. After these processes, the expectation is having only text
characters, since only these regions in an image have constant stroke values .Combining these
with connecting the close components lets us acquire words and text groups in the end. In case
of still having non-text groups, as a final stage, OCR engine assistance is applied as a decision
maker. Although this OCR engine elimination step is proposed in , combining it with the
previous steps we mentioned is our main contribution in this work.

1.1Objective Of The Project:

The objective of the program is to detect the text and recognize them from the image .In
recent decades, detecting text in complex nature scenes is a hot topic in computer vision, since
text in images provides much semantic information for human to understand the environment.
Reading text is challenging because of the variations in images. Text detection is useful for
many navigational purposes e.g. text on google API's and traffic panels etc .Moreover, text

11 | P a g e
detection is a prerequisite for a couple of purposes, such as content-based image analysis,

image retrieval, etc. By detecting the text from images like a sign board we can warn a driver

12 | P a g e
CHAPTER 2 – LITERATURE SURVEY

2.1 Existing Methods And Background

The required prerequisite for this project, we gathered information from the following
papers as follows:

In this paper [1] the researcher proposed an innovative algorithm to find value of stroke
width in natural images. This algorithm helps to detect many font and languages. This
includes pre processing, extraction or text localization, classification and character detection.
The different classification methods used are SVM, Adaboost, CNN; Text-CNN etc. This paper
provides a detailed study of evolution of text detection in natural images. It compares,
analyses and also discusses the different methods to overcome existing challenges in text
detection.

In this paper[2] the researcher used the technique hybrid technique, for enhancement of
the image, robust technique for background subtraction, supervised machine learning
algorithm for maps an input to an output based on example input output pair, applying
Gaussian filter method for removing the noise, and use histogram-oriented gradient (HOG)
for detection of the object. , connected component extracting, finding connected
components group, using SVM classifier for learning the machine algorithms, classification
of CCs, and CRF based post processing.

In this paper[3] they used they used the methods image preprocessing, edge detection for
finding the edges, stroke width transforms for finding stroke width, form connected
component, filtering for removing noise, feature extraction to extract the feature

13 | P a g e
components, and finally classification using random forest for learning m method
Classification.

The researcher here used[4] they use the methods like classification scheme for text
detection and character recognition, component linking and work partition, error
correlation in character recognition, case disambiguation and training data they also used ,
connected component extracting, finding connected components group, using SVM
classifier for learning the machine algorithms, classification of CCs, and CRF based post
processing.

In this paper [5] the researcher used the ‘and valley image’(AVI) and ‘and ridge image’ (ARI)
are the first extract input images. The components of the character are detected from AVI
and ARI. And finally find the text in and each image. Text is the important way for convey
the information. Some of the methods have been proposed to extract the text image. One
of the major difficulties is that to retrieving the indoor and outdoor texts.

In this research journal they implemented the methodologies like, edge segmentation
stroke width transforms for segmenting the edges using stroke width transform, colour
description for describing the colours, to merge edge for combining the edges, edge
classification for edge and be classifying.

In this paper The OCR (optical character recognition) system have introduce new techniques
and algorithm. For this, applied many methods to improve the accuracy. Here mainly
consider the old font text that means historical text. And here also we can use the different
learning systems such as neural networks and export system.

In this paper text region detection method(HOG) is used to detect the text, image
segmentation (Niklack’s local binarization algorithm), image centroid and zone (ICZ) based

14 | P a g e
distance metric feature extraction system for extracting the features, zone centroid and
zone (ZCZ) based distance metric feature extraction system, and ICZ+ZCZ based distance
metric feature extraction system.

CHAPTER 3 - THEORETICAL ASPECTS OF PROJECT

3.1 Introduction:

Automated text detection is employed in modern software systems to perform tasks such
as detecting landmarks in images and video, surveillance, visual sign translation mobile apps
etc. Many of these applications require a robust automated text detection system that can
accurately detect text on signs in natural and unnatural images.

The task of extracting text data in a machine-readable format from real-world images is
one of the challenging tasks in the computer vision community. Reading the text in natural
images has gained a lot of attention due to its practical applications in updating inventory,
analysing documents, scene understanding, robot navigation, and image retrieval. Although
there is significant progress in text detection and text recognition fields, this problem is still
challenging due to the complexity of the natural scene images.

3.2 Maximally stable extremal regions (MSER) :

In computer vision, maximally stable extremal regions (MSER) are used as a method
of blob detection in images. This technique was proposed by Matas et al. to find
correspondences between image elements from two images with different viewpoints. This
method of extracting a comprehensive number of corresponding image elements
contributes to the wide-baseline matching, and it has led to better stereo matching and
object recognition algorithms. It basically finds regions that remain same under a wide range
of threshold values. While thresholds are being applied, groups of connected components
form the set of all regions. These regions are extremal, since the pixels inside the regions
have either higher or lower intensities than the ones outside of the region. All the regions
with minimum variations through different thresholds are defined as maximally stable.

15 | P a g e
The MSER algorithm has been used in text detection by Chen by combining MSER
with Canny edges. Canny edges are used to help cope with the weakness of MSER to blur.
MSER is first applied to the image in question to determine the character regions. To
enhance the MSER regions any pixels outside the boundaries formed by Canny edges are
removed. The separation of the later provided by the edges greatly increase the usability of
MSER in the extraction of blurred text.[An alternative use of MSER in text detection is the
work by Shi using a graph model. This method again applies MSER to the image to generate
preliminary regions. These are then used to construct a graph model based on the position
distance and color distance between each MSER, which is treated as a node. Next the nodes
are separated into foreground and background using cost functions. One cost function is to
relate the distance from the node to the foreground and background. The other penalizes
nodes for being significantly different from its neighbor. When these are minimized the
graph is then cut to separate the text nodes from the non-text nodes. To enable text
detection in a general scene, Neumann uses the MSER algorithm in a variety of projections.
In addition to the greyscale intensity projection, he uses the red, blue, and green color
channels to detect text regions that are color distinct but not necessarily distinct in
greyscale intensity. This method allows for detection of more text than solely using the
MSER+ and MSER- functions discussed above regions. Sample regions found by MSER.

Figure-1 Detection Of MSER Regions

SWT Elimination
16 | P a g e
3.2 Stroke Width Transform(SWT) :

SWT is a local image operator, which computes the most likely stroke width for each
pixel that is included by corresponding stroke . Normally, this transformation exposes stroke
information at each pixel and they are grouped by similarity of their stroke width values.
However, in this paper, stroke width values of candidate regions are utilized to eliminate
non-text regions. Stroke widths are found similarly as stated . In edge map of the original
image, each pixel that lies on an edge point is connected with the opposite side of the
stroke. For this, a direction is selected by gradient of the starting pixel. However, to obtain
the correct region of stroke(inside or outside), it should be traversed for both sides, since
we do not know whether the text is written with dark colour on light background or vice
versa. Then these two edges are connected, if their gradients are almost in opposite
directions .Usually, π/6difference is sufficient. In the end, a set of stroke width values are
acquired for a region .For a typical text region, the very first parameter that comes to mind
is the variance and it is expected to be low .All parameters related to stroke widths (SW)
that are used to eliminate non-text regions.

Count indicates total count of stroke widths found in a region is standard


deviation of them, whereas σ2isvariance. Width is accepted stroke width for region, which is
the most common value for that region. And size is max(width, height)of the region. Figure
4 shows the final state of all regions after geometric and SW eliminations. There are still too
many unwanted regions left. Most of these regions come from the foliage area .

Figure-2 Stroke width Variation

Implementation of the SWT . (a) A typical stroke. The pixels of the stroke in this example are
darker than the background pixels. (b) p is a pixel on the boundary of the stroke. Searching

17 | P a g e
in the direction of the gradient at p, leads to finding q, the corresponding pixel on the other
side of the stroke. (c) Each pixel along the ray is assigned b y the minimum of its current
value and the found width of the s.

3.4 Optical character recognition or optical character reader (OCR):

OCR is the electronic or mechanical conversion of images of typed, handwritten or


printed text into machine-encoded text, whether from a scanned document, a photo of a
document, a scene-photo (for example the text on signs and billboards in a landscape
photo) or from subtitle text superimposed on an image (for example: from a television
broadcast).[1]

Widely used as a form of data entry from printed paper data records – whether passport
documents, invoices, bank statements, computerized receipts, business cards, mail,
printouts of static-data, or any suitable documentation – it is a common method of digitizing
printed texts so that they can be electronically edited, searched, stored more compactly,
displayed on-line, and used in machine processes such as cognitive computing, machine
translation, (extracted) text-to-speech, key data and text mining. OCR is a field of research in
pattern recognition, artificial intelligence and computer vision.

Early versions needed to be trained with images of each character, and worked on one font
at a time. Advanced systems capable of producing a high degree of recognition accuracy for
most fonts are now common, and with support for a variety of digital image file format
inputs.[2] Some systems are capable of reproducing formatted output that closely
approximates the original page including images, columns, and other non-textual
components. As it can be observed , there may be still unwanted regions left, if image
contains foliage or small-shaped objects. However, the characteristics of these regions are
irregular and random. So, as a final step, giving these regions to an OCR engine will
hopefully leaves us with only text regions. Tesseract OCR engine is used for his assistance.

18 | P a g e
Regions that give no character results are eliminated. The images in which not too many
regions are left at this stage may not require OCR assistance. After this process, final regions
left are surrounded by boxes and can be.

Figure-3 OCR Detection

3.4 Applications of the project:


The applications of this project are-

1. Automatic car number plate recognition,


2. Extracting passport or business card or bank statement information,
3. Book cover recognition,
4. Computerized aid for visually impaired,
5. Automatic detection of street name, location, traffic warning, and name of
commercial goods, and
6. Converting handwriting into real time control of computer.

3.5 Advantages:

The advantages of this project are

 A paper based form are often became an electronic form which is straightforward to
store or send by mail.

19 | P a g e
 It is cheaper than paying someone amount to manually enter great deal of text data.
Moreover it takes less time to convert within the electronic form.

3.6 DISADVANTAGES:

The disadvantages of this project are-

 There is the need of lot of space required by the image produced.


 The quality of the image can be lost during this process.
 Fails in some natural scene images which have very poor contrast text and strong
illumination
 There is the need of lot of space required by the image produced.

20 | P a g e
CHAPTER 4 - BLOCK SCHEMATIC OF THE PROJECT
Image detection and recognition .The project consists of several steps. These are finding regions
with MSER features, geometric elimination, finding stroke widths with SWT , and connecting obtain
text groups. Although MSER and SWT are known methods, they will be explained briefly in the
following sub-sections. For geometric elimination and connecting characters, there are various
parameters and some intermediary steps that are contributed in this study.

4.1 Block diagram:


In natural scenes, the text part is generally found on nearby sign board or other objects.
The extraction of such texts is difficult because of noisy background, diverse fonts and text
sizes. The method of text extraction is divided into two processes-

(1) Text detection – It is the process of localizing various regions of the scene which
contain text. It helps in removing most of the non-text regions which acts as noise
during the extraction of required text.
(2) Text recognition - The process of converting pixel based text (image text) into
readable code. The main purpose of it is to distinguish between different text types
and compose them properly.

Figure 4– block diagram to acquire the text

21 | P a g e
4.2 Algorithm / Flowchart:
The text detection applications created for this research project implements the
following workflow. This workflow performs pre-processing, text detection in a given image
and optical character recognition when possible.

1) Pre-process images. This includes, converting images to high contrast images, converting
colour images to gray scale images and rotating images (or of bounding boxes themselves)
as needed.

2) Process each image with MSER feature detector algorithm.

3) Perform first pass of non-text region filtering on the features detected by MSER algorithm
using discriminating geometric properties.

4) Perform second pass of non-text region filtering on the remaining features using stroke
width variation filtering.

5) On the remaining regions (text regions), calculate the overlapping bounding boxes of
these regions and combine them to create one bounded region of text.

6) Detect the text in the image and pass the detected text bounding boxes to an OCR
algorithm to determine the text that was found in the image.

22 | P a g e
Figure 5 – Flowchart for text detection and recognition

4.3 Implementation:
There are 6 steps of detecting and recognizing text from an image. They are --

STEP1 – Pre-processing the image:

The image is pre-processed i.e. the color image is converted in gray scale image and the
images with low contrast are converted into high contrast images.

Let’s take an example for easy understanding.

23 | P a g e
Figure 6

STEP2- Detecting text region using MSER:

The text detection application implemented for this project uses MSER to perform the initial
region detection in images. MSER is a popular feature detection algorithm that detects
stable, connected component regions of gray level images called MSERs. Though MSER is
sensitive to noise and blur it offers important advantages over other region-based detection
algorithms as it offers affine invariance, stability, and highly repeatable results.

Since text characters generally have consistent color we begin by finding the regions of
similar intensities in the image using MSER. MSER regions are the regions that have constant
intensities. The figure below shows the MSER regions detected-

24 | P a g e
Figure 7
As we can see in the above image, the image is converted into gray scale image and the
MSER regions have been detected. We can also observe that some of the non-text regions
have also been detected. To remove these non-text regions we use- (1) geometric
properties and (2) stroke width variation.

STEP3 – Removing non-text regions/ filtering based on geometric properties:

Although the MSER algorithm picks out most of the text, it also detects many other stable
regions in the image that are not text. a rule-based approach i.e. the geometric properties
can be used to filter out the non-text regions using simple thresholds.

The geometric property thresholds selected for filtering non-text regions vary from
image to image and are often affected by characteristics such as font style, image distortions
(blur, skew, etc.) and textures to name a few. These property values are a fast and easy way
to discern non-text regions from text features in images.

There are several geometric properties that are good for discriminating between text and
non-text regions, properties used in the system implemented for this project are –

(1) Aspect ratio- The ratio of the width to the height of bounding boxes.

(2) Eccentricity- Is used to measure the circular nature of a given region. It is defined as the
distance between the foci and/or the ellipse and its major axis.

(3) Solidity- Is ratio of the pixels in the convex hull area that are also in a given region. It is
calculated by

area

convex area

(4) Extent- The size and location of the rectangle that encloses text.

(5) Euler Number- Is a feature of a binary image. It is the number of connected components
minus the number of holes (in those components).

25 | P a g e
The image below shows the text region after removing non-text regions based on geometric
properties.

Figure 8
STEP4 - Removing non-text regions/ filtering based on stroke width variation:

Stroke width variation is another common metric used to distinguish between text and non-
text regions. The main idea behind using stroke width to filter non-text region is based
around on the fact that usually text vs other elements in an image have constant stroke
width. This means that the binary images can be transformed into stroke width images i.e.
skeleton images and these skeleton images are used to calculate the stroke width variation.

26 | P a g e
Figure 9
In the image shown above, the stroke width images has very little variation over most of the
region. This indicates that the region is most likely to be a text region because the lines and
curves that make up the region all have similar widths.

In order to use stroke width variation to remove non-text regions using a threshold value,
the variation over the entire region must be quantified into a single metric as follows:

Stroke Width Variation Metric:

Standard Deviation of Stroke Width Values


Mean of Stroke Width Values
Then, a threshold can be applied to remove the non-text regions. Note that this threshold
value may require tuning for images with different font styles. The image below shows the
text region after removing non-text regions based on stroke width variation.

Figure 10
STEP5 - Calculating the overlapping bounding boxes of and merging them to create one
bounded region:

Here, all the detection result are composed of individual text characters. To use these
results for recognition tasks, the individual text characters must be merged into words or
text lines. This enables recognition of the actual words in an image, which carry more

27 | P a g e
meaningful information than just the individual characters. As shown in a small example
here, the string ‘HELP’ vs the set of individual characters (‘E’,’H’,’P’,’L’), were the meaning of
the word is lost without the correct ordering.

To merge individual text regions into words or text lines we need to first find neighboring
text regions and form a bounding box around these regions. This makes the bounding box of
neighboring text regions overlap such that the text regions that are part of the same word
or text line form a chain of overlapping bounding boxes as shown in the figure.

Figure 11
Now the overlapping bounding boxes can be merged together to form a single bounding box
around individual words or text lines. To do this we compute the overlap ratio between all
bounding box pairs, this quantifies the distance between the pairs of the text regions so that
it is possible to find groups of neighboring text regions by looking for non-zero overlap
ratios. Once the pair wise overlap ratios are computes we use graph to find all the text
regions ‘connected’ by non-zero overlap ratio.

The output are the indices to the connected text regions to which each bounding box
belongs. We use these indices to merge multiple neighboring bounding boxes into a single
bounding box by computing the minimum and maximum of the individual bounding boxes
that make up each connected component.

Finally, before showing the final detection results, suppress false text detections by
removing bounding boxes made up by just one text region. This removes isolated regions

28 | P a g e
that are unlikely to be actual text given that text is usually found in groups i.e. words and
sentences. The image below shows the final detected text.

Figure 12
STEP 6 - Recognize Detected Text Using OCR:

The primary objective of all experiments is to visually verify text detection by plotting
individual character and line or word bounding boxes on processed images. An additional
verification step is employed using Matlab’s OCR library. Matlab’s OCR library is
implemented by a popular, robust open source OCR engine called Tesseract. Without any
additional training the OCR library’s default English language model is able to recognize the
text from the natural image texts. Therefore, after detecting the text regions we perform
OCR to recognize the text within bounding box.

The final detected and recognize text is –

' HANDICIXPPED
PARKING
SPECIXL PLATE
REQUIRED
UNAUTORIZED
VEHICLES
MAY BE TOWED
AT OWNERS
EXPFNSE

29 | P a g e
'

Figure 13

CHAPTER 5 - OVERVIEW OF THE SOFTWARE TOOL USED

5.1 INTRODUCTION TO MATLAB:

 MATLAB (an abbreviation of "MATrix LABoratory") is a software package for high


performance mathematical computation, visualization and programming
environment developed by MathWorks. It provides an interactive environment with
hundreds of built-in functions for technical computing, graphics and animations.
 Matlab was written initially to implement a simple approach to matrix software
developed by the LINPACK (Linear System Package) and EISPACK (Eigen System
Package) projects.
 Matlab is a modern programming language environment and it has refined data
structures, includes built-in editing and debugging tools and supports object-
oriented programming.
 Matlab is multi-paradigm, so it can work with multiple types of programming
approaches, such as functional, object-oriented and visual.

30 | P a g e
Figure 14

It is both a programming language as well as a programming environment. It allows the


computation of statements in the command window itself.

 Command Window:
In this window one must type and immediately execute the statements, as it
requires quick prototyping. These statements cannot be saved. Thus, this is can be
used for small, easily executable programs.
 Editor (Script):
In this window one can execute larger programs with multiple statements, and
complex functions These can be saved and are done with the file extension ‘.m ‘.
 Workspace:
In this window the values of the variables that are created in the course of the
program (in the editor) are displayed. <="" b="" style="box-sizing: border-box;">
This window displays the exact location(path) of the program file being created.

MATLAB Library comes with a set of many inbuilt functions. These functions mostly perform
mathematical operations like sine, cosine and tangent. They perform more complex
functions too like finding the inverse and determinant of a matrix, cross product and dot
product .Although MATLAB is encoded in C, C++ and Java, it is a lot easier to implement
than these three languages. For example, unlike the other three, no header files need to be
initialised in the beginning of the document and for declaring a variable, the data type need

31 | P a g e
not be provided. It provides an easier alternative for vector operations. They can be
performed using one command instead of multiple statements in a for or while loop

5.2 FEATURES OF MATLAB:-

 This is a high level programming language with data structures, control flow
statements, functions, input/output and object oriented programming.
 It offers a huge library of mathematical functions needed for computing statistics,
linear algebra, numerical integration, filtering, fourier analysis, optimization and
solving regular differential equations.
 A ‘toolbox’ is a set of functions designed for a specific purpose and complied as a
package.
 Matlab can natively support the sensor, video, image and various real-time data
from IDBC/ODBC databases.
 It offers built-in graphics useful for data visualizing and tools for generating custom
plots.

Figure 15: This is an image of screen of MATLAB

32 | P a g e
CHAPTER 6 – RESULTS

The text detection application used in these experiments is not overly robust, but shows the
power of the approach employed for text detection. Even with a modest implementation
the application was able detect primary and in some cases secondary text in a given image
with a few iterations of manually tuning thresholds.

As expected though there are some discriminating geometric properties and other values
that were more important for tuning to detect text in the non-natural images than in natural
images.

 Issues With Detecting Text Using Custom, Highly Stylized Fonts-


In non-natural images there can be a lot variation among the subjects in an image
that can affect text detection.
Characteristics such as high contrast, line spacing and the spacing between the letter
characters presented the most challenge for the approaches employed in the
implemented text detection application. Despite these challenges the application
with some tuning was able to successfully detect with reasonable proficient, primary
and secondary text regions in the non-natural images.

The following geometric properties thresholds and related values that appear to have some
sensitivity to high contrast images where text is highly stylized, custom fonts.

(1) Expansion amount


(2) High contrast emotions

33 | P a g e
Input image Detected MSER Regions

Removing non-text regions based on Removing non-text regions based on


Geometric properties Stroke Width Variation

OCR Detection Recognised image


Figure 16 Detecting and Recognizing text in the image

34 | P a g e
Figure-17 Detecting and Recognizing text in image

35 | P a g e
CHAPTER 8 - CONCLUSION AND FUTURE SCOPE

1.1 CONCLUSION:

The approach of using Maximally Stable Extermal Regions (MSER) based feature
detection, stroke width variation and geometric property thresholding even in a modest text
detection application yields reasonably proficient results. Though highly stylized, custom
fonts have some affect on the geometric properties thresholds used to filter non-text
regions the affect was not as dramatic and did not have anticipated impact.

Text detection is applicable in real world scenarios like optical character recognition,
artificial intelligence, distinguish between human and machine inputs and spam removal.
Variation in environment in which the image is captured makes it a difficult process.

The characters which are identified are classified in to meaning full word or sensible
sentences. In this work, a system to detect text and classify of the same is presented. The
accuracy of the proposed system is 92.31%.

1.2 FUTURE SCOPE:

The next step is to automate the process of finding optimal threshold values for each
image, for each region geometric property. By automating this process we will have a more
robust and scalable means of conducting further analysis on different types of images and
the properties that affected text detection the most.

36 | P a g e
REFRENCES:

1. Chen, Huizhong, . "Robust Text Detection in Natural Images with Edge-Enhanced


Maximally Stable Extremal Regions." Image Processing (ICIP), 2011 18th IEEE
International Conference on. IEEE, 2011.
2. Li, Yao, and Huchuan Lu. "Scene text detection via stroke width." Pattern
Recognition (ICPR), 2012 21st International Conference on. IEEE, 2012.
3. Neumann, Lukas, and Jiri Matas. "Real-time scene text localization and recognition."
Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE,
2012.
4. Gonzalez, Alvaro, et al. "Text location in complex images." Pattern Recognition
(ICPR), 2012 21st International Conference on. IEEE, 2012.
5. Amritha S, Nadarajan, Thamizharasi A“A Survey on Text Detection in Natural
Images", International Journal of Engineering Development and Research (IJEDR),
ISSN: 2321-9939, Volume.6, Issue 1, pp.60-66, January 2018. 2. Chandio A, Pickering.
M, & Shafi, K. (2018, March).
6. Character classification and recognition for Urdu texts in natural scene images. In
Computing, Mathematics and Engineering Technologies (iCoMET), 2018 International
Conference on (pp. 1-6). IEEE.
7. Tridib Chakraborty (2017), Text recognition using image processing, International
Journal of Advanced Research in Computer Science, 8 (5), May-June 2017, 765-768
8. Teresa Nicole Brooks ‘Exploring Geometric Property Thresholds For Filtering Non-
Text Regions In A Connected Component Based Text Detection Application’
Seidenberg School of CSIS Pace University, pp- 1709.03548, Sep 2017.

37 | P a g e

You might also like