You are on page 1of 28

Computer vision resource for OCR (Optical character

recognition) and perform OCR on the


images
A Dissertation submitted in partial fulfillment of the requirements for the
award of degree of

MASTER OF COMPUTER APPLICATIONS


By

ADARSH KUMAR SINGH


1NH21MC004

Under the Guidance of


DR. V. ASHA
Computer vision resource for OCR (Optical character
recognition) and perform OCR on the
Images

A Dissertation submitted in partial fulfillment of the requirements for the


award of degree of

MASTER OF COMPUTER APPLICATIONS

By
ADARSH KUMAR SINGH
1NH21MC004

Under the Guidance of

Internal Guide: External Guide:


DR. V. Asha Guide Name
Prof. Binju Saju Designation
Dept. of MCA, NHCE Company Name
2022-2023

DEPARTMENT OF MASTER OF COMPUTER APPLICATIONS

CERTIFICATE
This is to certify that ADARSH KUMAR SINGH, bearing USN
1NH21MC004 has successfully completed his final year IV
semester Industry Internship / Project work entitled Computer
vision resource for OCR (Optical character recognition)
and perform OCR on the images as a partial fulfillment of
the requirements for the award of MASTER OF COMPUTER
APPLICATIONS degree, during the Academic Year 2022-23 under
my supervision. This report has not been submitted to any other
Organization/University for any award of degree.

Signature of the Internal Guide Head of the Department Principal

External Viva

Internal Examiner External Examiner


Date:
(Kindly insert Original Company Project Completion
Certificate)
DECLARATION

I, ADARSH KUMAR SINGH, student of IV Semester MCA, bearing USN


1NH21MC004 hereby declare that the Industry Internship / Project work entitled
computer vision resource for OCR (Optical character recognition) and
perform OCR on the images has been carried out by me under the supervision of
Internal Guide Dr. V. Asha and External Guide <Name of the Guide>,
<Designation of Guide> and submitted in partial fulfillment of the requirements for
the award of the Degree of Master of Computer Applications by the Department of
Master of Computer Applications, New Horizon College of Engineering, an
Autonomous Institution, Affiliated to Visvesvaraya Technological University
during the academic year 2022-23. This report has not been submitted to any other
Organization/University for any award of degree.

Name :
Signature :
Date :
ACKNOWLEDGEMENT

I would like to thank Dr. Mohan Manghnani, Chairman of New Horizon College of
Engineering for providing good infrastructure and Hi-Tech lab facilities to develop
and improve student’s skills.

I sincerely express my gratitude to the college Principal Dr. Manjunatha for


supporting the students in all their technical activities and giving guidance to them.
I would like to thank Dr. V. Asha, Head, Department of MCA, New Horizon
College of Engineering for granting permission to undertake this project. I would
like to express my gratitude to the project guide Prof. Vishwanath C.R. for giving
all the instructions and guidelines at every stage of the Project work.

I thank all the staff members of the Department of Master of Computer


Applications, for extending their constant support to complete the project. I express
my heartfelt thanks to my parents and friends who were a constant source of
support and inspiration throughout the project.
COMPANY PROFILE
TABLE OF CONTENTS
Chapter
Title Page No
No.
ABSTRACT (i)
LIST OF TABLES (ii)
LIST OF FIGURES (iii)
1 INTRODUCTION 1
1.1 General Introduction
1.2 Problem Statement
1.3 Existing System
1.4 Objective of the Work
1.5 Proposed System with Methodology
1.6 Feasibility Study

2 REVIEW OF LITERATURE
2.1 Review Summary
3 SYSTEM CONFIGURATION
3.1 Hardware requirements
3.2 Software requirements
4 MODULE DESCRIPTION
4.1 Module 1
4.2 Module 2
5 SYSTEM DESIGN
5.1 DFD/ UML Diagrams
5.2 Data Base Design
6 SYSTEM IMPLEMENTATION
6.1 Implementation
6.1.1 Pre – Implementation Technique
6.1.2 Post – Implementation Technique
6.2 Screen Shots
7 SYSTEM TESTING
7.1 Test Cases
7.2 Maintenance
8 RESULTS AND DISCUSSIONS
8.1 Conclusion
8.2 Limitations
8.3 Future Enhancements
9 REFERENCES
9.1 Text References
9.2 Web References
ABSTRACT
The Optical Character Recognition (OCR) project aims to develop a robust and efficient
system for extracting text from images and scanned documents. OCR technology plays a
crucial role in various domains, including document digitization, data entry automation,
and text extraction from images.

This project leverages state-of-the-art OCR techniques and algorithms to achieve accurate
and reliable text recognition. The process begins with image acquisition, where images or
scanned documents containing text are obtained. Preprocessing techniques are then
applied to enhance the image quality, including noise reduction, normalization, and
deskewing.

Next, the OCR system employs text detection algorithms to locate and identify text
regions within the image. Character segmentation techniques are used to divide the text
regions into individual characters or groups of characters. Character recognition
algorithms, based on pattern recognition and machine learning, are then applied to
determine the most likely character for each segment. To improve the accuracy and
reliability of the extracted text, post-processing steps are implemented. These steps
include language modeling, spell- checking, and context analysis. The final output of the
OCR system is the extracted text, presented in a machine-readable format, such as plain
text, HTML, or PDF with selectable text.

Throughout the project, various resources are utilized, including popular OCR engines
such as Tesseract, OpenCV, PyTesseract, and cloud-based services like OCR.space or
Google Cloud Vision API. These resources provide powerful tools and APIs to integrate
OCR functionality into applications effectively.
LIST OF TABLES

Sl. Page
No. Table No. Title
No.
1

4
LIST OF FIGURES

Sl.
No. Figure No. Title Page No.

3
CHAPTER 1

INTRODUCTION

1.1 General Introduction

Optical Character Recognition (OCR) is a technology that enables the recognition and
extraction of text from images or scanned documents. OCR systems analyze the shapes
and patterns of characters in the image and convert them into machine-readable text. OCR
can be used in various applications such as document digitization, data entry automation,
text extraction from images, and more. Here's a general overview of the OCR process:
1. Image acquisition: The OCR process starts with obtaining an image or document
that contains the text you want to extract. This can be a scanned document, a photo,
or any other image containing text.

2. Preprocessing: The acquired image may require preprocessing to enhance its


quality and improve OCR accuracy. Preprocessing steps can include noise
reduction, image normalization, binarization (converting the image to black and
white), deskewing (correcting the image rotation), and other techniques to improve
text readability.

3. Text detection: In this step, the OCR system locates and identifies regions of text
within the image. This is done by analyzing the image for patterns, shapes, and
characteristics that resemble text.

4. Character segmentation: Once text regions are identified, the system divides them
into individual characters or groups of characters. This step is crucial for systems
that analyze characters individually.

5. Character recognition: Each segmented character is then analyzed and matched


against a database of known characters. OCR algorithms employ pattern
recognition and machine learning techniques to determine the most likely character
for each segment.
6. Post-processing: After character recognition, post-processing steps may be applied
to refine the results. These steps can include language modeling, spell-checking,
and context analysis to improve the accuracy and reliability of the extracted text.

7. Output: The final output of an OCR system is the extracted text, typically in a
machine-readable format such as plain text, HTML, or PDF with selectable text.

It's worth noting that the accuracy of OCR systems can vary depending on factors such as
image quality, font style, language, and layout complexity. Some OCR engines allow fine-
tuning and customization to improve performance in specific domains or languages.
1.2 Problem Statement

The problem statement for the OCR (Optical Character Recognition) project is to develop
an accurate and robust system that can effectively extract text from various types of
images and scanned documents. The goal is to overcome the challenges posed by image
quality, font styles, language variations, and layout complexities to achieve high-quality
text recognition.
Specifically, the project aims to address the following key challenges:
1. Image Quality: Images obtained from different sources may have variations in
resolution, lighting conditions, noise, and distortion. The OCR system should be
capable of handling these variations and employ preprocessing techniques to
enhance the image quality before performing text recognition.

2. Font Styles and Language Variations: Text can appear in various font styles, sizes,
and languages within the same document or image. The OCR system should be
able to accurately recognize and interpret different font styles, including
handwritten text, and support multiple languages.
3. Layout Complexities: Documents and images may contain complex layouts, such
as tables, columns, or overlapping text. The OCR system should have the
capability to identify and interpret the structure of the text, preserving the original
document's layout and formatting during the recognition process.

4. Accuracy and Reliability: The OCR system should strive for high accuracy in text
recognition to minimize errors and ensure reliable results. It should employ
advanced algorithms, including machine learning techniques, to improve character
recognition accuracy and handle ambiguous or distorted characters effectively.

5. Efficiency and Scalability: The OCR system should be efficient in terms of


processing time and resource utilization, allowing for real-time or near-real-time
text extraction. It should also be scalable to handle large volumes of images or
documents without compromising performance.
By addressing these challenges, the OCR project aims to deliver a comprehensive solution
that can handle a wide range of image types and document layouts, providing accurate and
reliable text extraction. The successful implementation of the OCR system will contribute
to the automation and digitization of documents, enabling faster and more efficient
information retrieval from image-based sources.

1.3 Existing System


Existing OCR systems have made significant advancements in recent years, utilizing a
combination of computer vision techniques and machine learning algorithms to extract
text from images and scanned documents. Here are some key components and features
commonly found in existing OCR systems:

1. Preprocessing: Existing OCR systems employ preprocessing techniques to enhance


image quality and improve OCR accuracy. This may include noise reduction, image
normalization, binarization (converting the image to black and white), deskewing
(correcting image rotation), and other image enhancement techniques.
2. Text Detection: OCR systems use text detection algorithms to locate and identify
text regions within the image. Various techniques, such as edge detection,
connected component analysis, or deep learning-based methods (e.g., object
detection models), are utilized to identify and localize text regions accurately.
3. Character Segmentation: Once text regions are identified, OCR systems segment
the text into individual characters or groups of characters. This step is crucial for
systems that analyze characters individually rather than recognizing entire words.
4. Character Recognition: Character recognition is a core component of OCR systems.
Various algorithms are used to analyze and recognize characters. These algorithms
can range from traditional template matching or feature-based methods to more
advanced techniques like neural networks, deep learning models (such as
convolutional neural networks or recurrent neural networks), or hybrid approaches.
5. Language Support: Many OCR systems support multiple languages, allowing for
text recognition in various scripts and character sets. Language models and
character classifiers are trained to handle different languages, enabling accurate
recognition across diverse linguistic contexts.

Overall, Existing OCR systems, such as Tesseract, OpenCV, commercial solutions like
ABBYY FineReader, Adobe Acrobat OCR, and cloud-based services like Google Cloud
Vision API or Microsoft Azure Cognitive Services, incorporate these components and
features to provide accurate, efficient, and scalable text recognition capabilities for a wide
range of applications.

1.4 Objective of the Work


The objective of the work on OCR (Optical Character Recognition) is to develop and
improve the accuracy, robustness, and efficiency of text extraction from images and
scanned documents. The primary goals can be summarized as follows: The objectives of
the work on " computer vision resource for OCR (Optical character recognition) and
perform OCR on the images " are as follows:
1. Accuracy: Enhance the accuracy of OCR systems by implementing advanced
algorithms, machine learning techniques, and deep neural networks. The objective
is to minimize recognition errors and improve the overall quality of extracted text.
2. Robustness: Address challenges related to various image qualities, font styles,
language variations, and layout complexities. The objective is to develop OCR
systems that can handle diverse sources of text-containing images while
maintaining high recognition accuracy.
3. Language Support: Extend language support to recognize and extract text from
multiple languages and scripts. The objective is to enable OCR systems to handle a
wide range of languages and character sets effectively.
4. Preprocessing Techniques: Develop and optimize preprocessing techniques to
enhance image quality, reduce noise, correct distortions, and improve text
readability. The objective is to ensure accurate text detection and segmentation.
5. Layout Analysis: Enhance the capability of OCR systems to handle complex
document layouts, such as tables, columns, and multi-column text. The objective is
to preserve the original document structure during text extraction, maintaining the
overall context and formatting.
6. Real-Time and Scalable Processing: Improve the efficiency and scalability of OCR
systems to enable real-time or near-real-time text extraction. The objective is to
optimize processing time, resource utilization, and handle large volumes of images
or documents efficiently.
7. Integration and Usability: Provide integration options, APIs, and user-friendly
interfaces for developers and end-users to easily incorporate OCR functionality into
applications. The objective is to facilitate seamless integration and usage of OCR
technology.
By achieving these objectives, the work on OCR aims to advance the state-of-the-art in
text extraction from images and scanned documents, contributing to improved document
digitization, data entry automation, and efficient information retrieval from image-based
sources.
1.5 Proposed System with Methodology
The proposed system aims to develop an OCR (Optical Character Recognition) system
using computer vision techniques to accurately extract text from images and scanned
documents. The methodology involves several key steps:
Methodology:
Create a Cognitive Services resource
You can use the Computer Vision service by creating either a Computer
Vision resource or a Cognitive Services resource.

If you haven’t already done so, create a Cognitive Services resource in your Azure
subscription.

1. In another browser tab, open the Azure portal at https://portal.azure.com, signing in


with your Microsoft account.
2. Click the +Create a resource button, search for Cognitive Services, and create
a Cognitive Services resource with the following settings:
o Subscription: Your Azure subscription.
o Resource group: Select or create a resource group with a unique name.
o Region: Choose any available region.
o Name: Enter a unique name.
o Pricing tier: Standard S0
o By checking this box I acknowledge that I have read and understood
all the terms below: Selected.
3. Review and create the resource and wait for deployment to complete. Then
go to the deployed resource.
4. View the Keys and Endpoint page for your Cognitive Services resource. You
will need the endpoint and keys to connect from client applications.

Run Cloud Shell

To test the capabilities of the Custom Vision service, we’ll use a simple command-line
application that runs in the Cloud Shell on Azure.

1. In the Azure portal, select the [>_] (Cloud Shell) button at the top of the page to the
right of the search box. This opens a Cloud Shell pane at the bottom of the portal.
2. The first time you open the Cloud Shell, you may be prompted to choose the type of
shell you want to use (Bash or PowerShell). Select PowerShell. If you do not see
this option, skip the step.
3. If you are prompted to create storage for your Cloud Shell, ensure your subscription
is specified and select Create storage. Then wait a minute or so for the storage to
be created.

4. Make sure the the type of shell indicated on the top left of the Cloud Shell pane is
switched to PowerShell. If it is Bash, switch to PowerShell by using the drop-down
menu.

5. Wait for PowerShell to start. You should see the following screen in the
Azure portal:
Configure and run a client application:-

Now that you have a custom model, you can run a simple client application that uses the
OCR service.
1. In the command shell, enter the following command to download the sample
application and save it to a folder called ai-900.

git clone https://github.com/MicrosoftLearning/AI-900-AIFundamentals ai-


900

2. The files are downloaded to a folder named ai-900. Now we want to see all of the
files in your Cloud Shell storage and work with them. Type the following
command into the shell:
code .
Notice how this opens up an editor like the one in the image below:

3. In the Files pane on the left, expand ai-900 and select ocr.ps1. This file contains
some code that uses the Computer Vision service to detect and analyze text in an
image, as shown here:
4. Don’t worry too much about the details of the code, the important thing is that it
needs the endpoint URL and either of the keys for your Cognitive Services
resource. Copy these from the Keys and Endpoints page for your resource from
the Azure portal and paste them into the code editor,
replacing the YOUR_KEY and YOUR_ENDPOINT placeholder values
respectively.

After pasting the key and endpoint values, the first two lines of code should look
similar to this:
$key="1a2b3c4d5e6f7g8h9i0j...."
$endpoint="https..."

5. At the top right of the editor pane, use the … button to open the menu and
select Save to save your changes. Then open the menu again and select Close
Editor. Now that you’ve set up the key and endpoint, you can use your Cognitive
Services resource to extract text from an image.

Let’s use the Read API. In this case, you have an advertising image for the
fictional Northwind Traders retail company that includes some text.
The sample client application will analyze the following image:

6. In the PowerShell pane, enter the following commands to run the code to read
the text:
cd ai-900
./ocr.ps1 advert.jpg

7. Review the details found in the image. The text found in the image is organized
into a hierarchical structure of regions, lines, and words, and the code reads these
to retrieve the results.

Note that the location of text is indicated by the top- left coordinates, and the width
and height of a bounding box, as shown here:
8. Now let’s try another image:
To analyze the second image, enter the following command:
./ocr.ps1 letter.jpg

9. Review the results of the analysis for the second image. It should also
return the text and bounding boxes of the text.

1.6 Feasibility Study

The feasibility study aims to assess the viability and practicality of implementing an
"Azure SQL Database Sandbox: Creating and Querying Samples" project. The study will
evaluate the technical, economic, and operational aspects to determine the feasibility of
developing and utilizing the proposed system.

1. Technical Feasibility:
 Technology Compatibility: Azure SQL Database is a widely used cloud-
based relational database service provided by Microsoft Azure. It offers
comprehensive features for creating and querying databases, making it
technically feasible to develop a sandbox environment for creating and
querying samples.
 Infrastructure Requirements: The project requires access to Azure cloud
services, Azure SQL Database, and an SQL client tool. These resources are
readily available and compatible with most modern computing
environments, ensuring technical feasibility.

2. Economic Feasibility:
 Cost Analysis: The cost of utilizing Azure SQL Database and other Azure
services must be considered. Azure offers flexible pricing options, allowing
users to choose the appropriate tier and configuration based on their needs.
The economic feasibility depends on the budget allocated for the project
and the potential cost savings in terms of infrastructure and maintenance
compared to on-premises solutions.
 Return on Investment (ROI): The potential benefits, such as increased
productivity, scalability, and reduced infrastructure costs, should be
evaluated against the investment required to implement and maintain the
Azure SQL Database Sandbox. If the benefits outweigh the costs, the
project is economically feasible.

3. Operational Feasibility:
 User Acceptance: The project's success relies on user acceptance and
adoption. The feasibility study should assess the willingness of users, such
as developers and database professionals, to embrace the Azure SQL
Database Sandbox and their readiness to learn and utilize the Azure SQL
platform.
 Training and Support: Adequate training resources and support should be
provided to users to ensure a smooth transition to the new system.
Documentation, tutorials, and hands-on learning materials should be
developed to facilitate the learning process and enhance the operational
feasibility of the project.

4. Time Feasibility:
 Ease of Provisioning: Provisioning an Azure SQL Database instance is a
quick and straightforward process, typically taking just a few minutes to
complete. This enables users to swiftly set up the necessary environment
and begin their project without significant delays.
 Efficient Query Execution: Azure SQL Database excels in executing queries
with high efficiency. Its advanced query optimization techniques ensure
fast and optimized retrieval and analysis of data. Users can run complex
queries with optimized performance, resulting in reduced query execution
time and improved productivity.
After a comprehensive assessment of technical, operational, economic, and time
feasibility, the " computer vision resource for OCR (Optical character recognition) and
perform OCR on the images" project has been determined to be feasible. The project's
practicality and viability are supported by the availability of Azure Cloud Shell as a
reliable cloud-based solution, accompanied by the necessary tools and resources.

With the established feasibility of the project, users can confidently embark on the
journey of creating OCR using Azure cloud shell. This endeavor will provide valuable
hands-on experience and knowledge, enabling users to gain proficiency in utilizing Azure
Cloud Shell effectively.
CHAPTER 2
REVIEW OF LITERATURE

2.1 Review Summary

The OCR (Optical Character Recognition) computer vision project focuses on developing
an efficient and accurate system for extracting text from images and scanned documents.
The project employs a series of steps and methodologies to achieve this objective.

The project's proposed system begins with image acquisition, where images or scanned
documents containing text are obtained from various sources. Preprocessing techniques
are applied to enhance image quality, including noise reduction, normalization,
binarization, and deskewing. This step optimizes the images for subsequent text
extraction.

Text detection algorithms are utilized to locate and identify text regions within the
preprocessed images. Character segmentation is then performed to separate individual
characters or groups of characters, enabling independent analysis during the recognition
process.

Character recognition is a core component of the system, utilizing machine learning


algorithms and neural networks to match characters with known patterns or features. Post-
processing techniques refine the recognized text, including language modeling, spell-
checking, contextual analysis, and error correction algorithms.

The output of the system is the extracted text presented in a machine-readable format.
Integration options, APIs, or user interfaces are provided to seamlessly incorporate the
OCR functionality into other applications or workflows.

Throughout the project, various computer vision libraries, frameworks, and algorithms
are utilized, such as Tesseract, OpenCV, and deep learning frameworks like TensorFlow or
PyTorch. The system undergoes rigorous evaluation to assess its performance in terms of
accuracy, speed, and robustness. User feedback and corrections are incorporated to
continuously refine and improve the OCR system.

Overall, the OCR computer vision project aims to develop a reliable OCR solution capable
of handling diverse image sources, recognizing multiple languages, and providing accurate
text extraction. By leveraging computer vision techniques and advanced machine learning
algorithms, the project strives to achieve high accuracy and usability in OCR applications.
CHAPTER 3
SYSTEM CONFIGURATION

3.1 Hardware
Requirements: Processer:
Intel i3, i5, i7 Ram: 8GB
Hard Disk: 250GB

3.2 Software Requirements:


Operating System: Windows, macOs, Linux
Technology Used: Azure SQL Database

You might also like