Professional Documents
Culture Documents
By
ADARSH KUMAR SINGH
1NH21MC004
CERTIFICATE
This is to certify that ADARSH KUMAR SINGH, bearing USN
1NH21MC004 has successfully completed his final year IV
semester Industry Internship / Project work entitled Computer
vision resource for OCR (Optical character recognition)
and perform OCR on the images as a partial fulfillment of
the requirements for the award of MASTER OF COMPUTER
APPLICATIONS degree, during the Academic Year 2022-23 under
my supervision. This report has not been submitted to any other
Organization/University for any award of degree.
External Viva
Name :
Signature :
Date :
ACKNOWLEDGEMENT
I would like to thank Dr. Mohan Manghnani, Chairman of New Horizon College of
Engineering for providing good infrastructure and Hi-Tech lab facilities to develop
and improve student’s skills.
2 REVIEW OF LITERATURE
2.1 Review Summary
3 SYSTEM CONFIGURATION
3.1 Hardware requirements
3.2 Software requirements
4 MODULE DESCRIPTION
4.1 Module 1
4.2 Module 2
5 SYSTEM DESIGN
5.1 DFD/ UML Diagrams
5.2 Data Base Design
6 SYSTEM IMPLEMENTATION
6.1 Implementation
6.1.1 Pre – Implementation Technique
6.1.2 Post – Implementation Technique
6.2 Screen Shots
7 SYSTEM TESTING
7.1 Test Cases
7.2 Maintenance
8 RESULTS AND DISCUSSIONS
8.1 Conclusion
8.2 Limitations
8.3 Future Enhancements
9 REFERENCES
9.1 Text References
9.2 Web References
ABSTRACT
The Optical Character Recognition (OCR) project aims to develop a robust and efficient
system for extracting text from images and scanned documents. OCR technology plays a
crucial role in various domains, including document digitization, data entry automation,
and text extraction from images.
This project leverages state-of-the-art OCR techniques and algorithms to achieve accurate
and reliable text recognition. The process begins with image acquisition, where images or
scanned documents containing text are obtained. Preprocessing techniques are then
applied to enhance the image quality, including noise reduction, normalization, and
deskewing.
Next, the OCR system employs text detection algorithms to locate and identify text
regions within the image. Character segmentation techniques are used to divide the text
regions into individual characters or groups of characters. Character recognition
algorithms, based on pattern recognition and machine learning, are then applied to
determine the most likely character for each segment. To improve the accuracy and
reliability of the extracted text, post-processing steps are implemented. These steps
include language modeling, spell- checking, and context analysis. The final output of the
OCR system is the extracted text, presented in a machine-readable format, such as plain
text, HTML, or PDF with selectable text.
Throughout the project, various resources are utilized, including popular OCR engines
such as Tesseract, OpenCV, PyTesseract, and cloud-based services like OCR.space or
Google Cloud Vision API. These resources provide powerful tools and APIs to integrate
OCR functionality into applications effectively.
LIST OF TABLES
Sl. Page
No. Table No. Title
No.
1
4
LIST OF FIGURES
Sl.
No. Figure No. Title Page No.
3
CHAPTER 1
INTRODUCTION
Optical Character Recognition (OCR) is a technology that enables the recognition and
extraction of text from images or scanned documents. OCR systems analyze the shapes
and patterns of characters in the image and convert them into machine-readable text. OCR
can be used in various applications such as document digitization, data entry automation,
text extraction from images, and more. Here's a general overview of the OCR process:
1. Image acquisition: The OCR process starts with obtaining an image or document
that contains the text you want to extract. This can be a scanned document, a photo,
or any other image containing text.
3. Text detection: In this step, the OCR system locates and identifies regions of text
within the image. This is done by analyzing the image for patterns, shapes, and
characteristics that resemble text.
4. Character segmentation: Once text regions are identified, the system divides them
into individual characters or groups of characters. This step is crucial for systems
that analyze characters individually.
7. Output: The final output of an OCR system is the extracted text, typically in a
machine-readable format such as plain text, HTML, or PDF with selectable text.
It's worth noting that the accuracy of OCR systems can vary depending on factors such as
image quality, font style, language, and layout complexity. Some OCR engines allow fine-
tuning and customization to improve performance in specific domains or languages.
1.2 Problem Statement
The problem statement for the OCR (Optical Character Recognition) project is to develop
an accurate and robust system that can effectively extract text from various types of
images and scanned documents. The goal is to overcome the challenges posed by image
quality, font styles, language variations, and layout complexities to achieve high-quality
text recognition.
Specifically, the project aims to address the following key challenges:
1. Image Quality: Images obtained from different sources may have variations in
resolution, lighting conditions, noise, and distortion. The OCR system should be
capable of handling these variations and employ preprocessing techniques to
enhance the image quality before performing text recognition.
2. Font Styles and Language Variations: Text can appear in various font styles, sizes,
and languages within the same document or image. The OCR system should be
able to accurately recognize and interpret different font styles, including
handwritten text, and support multiple languages.
3. Layout Complexities: Documents and images may contain complex layouts, such
as tables, columns, or overlapping text. The OCR system should have the
capability to identify and interpret the structure of the text, preserving the original
document's layout and formatting during the recognition process.
4. Accuracy and Reliability: The OCR system should strive for high accuracy in text
recognition to minimize errors and ensure reliable results. It should employ
advanced algorithms, including machine learning techniques, to improve character
recognition accuracy and handle ambiguous or distorted characters effectively.
Overall, Existing OCR systems, such as Tesseract, OpenCV, commercial solutions like
ABBYY FineReader, Adobe Acrobat OCR, and cloud-based services like Google Cloud
Vision API or Microsoft Azure Cognitive Services, incorporate these components and
features to provide accurate, efficient, and scalable text recognition capabilities for a wide
range of applications.
If you haven’t already done so, create a Cognitive Services resource in your Azure
subscription.
To test the capabilities of the Custom Vision service, we’ll use a simple command-line
application that runs in the Cloud Shell on Azure.
1. In the Azure portal, select the [>_] (Cloud Shell) button at the top of the page to the
right of the search box. This opens a Cloud Shell pane at the bottom of the portal.
2. The first time you open the Cloud Shell, you may be prompted to choose the type of
shell you want to use (Bash or PowerShell). Select PowerShell. If you do not see
this option, skip the step.
3. If you are prompted to create storage for your Cloud Shell, ensure your subscription
is specified and select Create storage. Then wait a minute or so for the storage to
be created.
4. Make sure the the type of shell indicated on the top left of the Cloud Shell pane is
switched to PowerShell. If it is Bash, switch to PowerShell by using the drop-down
menu.
5. Wait for PowerShell to start. You should see the following screen in the
Azure portal:
Configure and run a client application:-
Now that you have a custom model, you can run a simple client application that uses the
OCR service.
1. In the command shell, enter the following command to download the sample
application and save it to a folder called ai-900.
2. The files are downloaded to a folder named ai-900. Now we want to see all of the
files in your Cloud Shell storage and work with them. Type the following
command into the shell:
code .
Notice how this opens up an editor like the one in the image below:
3. In the Files pane on the left, expand ai-900 and select ocr.ps1. This file contains
some code that uses the Computer Vision service to detect and analyze text in an
image, as shown here:
4. Don’t worry too much about the details of the code, the important thing is that it
needs the endpoint URL and either of the keys for your Cognitive Services
resource. Copy these from the Keys and Endpoints page for your resource from
the Azure portal and paste them into the code editor,
replacing the YOUR_KEY and YOUR_ENDPOINT placeholder values
respectively.
After pasting the key and endpoint values, the first two lines of code should look
similar to this:
$key="1a2b3c4d5e6f7g8h9i0j...."
$endpoint="https..."
5. At the top right of the editor pane, use the … button to open the menu and
select Save to save your changes. Then open the menu again and select Close
Editor. Now that you’ve set up the key and endpoint, you can use your Cognitive
Services resource to extract text from an image.
Let’s use the Read API. In this case, you have an advertising image for the
fictional Northwind Traders retail company that includes some text.
The sample client application will analyze the following image:
6. In the PowerShell pane, enter the following commands to run the code to read
the text:
cd ai-900
./ocr.ps1 advert.jpg
7. Review the details found in the image. The text found in the image is organized
into a hierarchical structure of regions, lines, and words, and the code reads these
to retrieve the results.
Note that the location of text is indicated by the top- left coordinates, and the width
and height of a bounding box, as shown here:
8. Now let’s try another image:
To analyze the second image, enter the following command:
./ocr.ps1 letter.jpg
9. Review the results of the analysis for the second image. It should also
return the text and bounding boxes of the text.
The feasibility study aims to assess the viability and practicality of implementing an
"Azure SQL Database Sandbox: Creating and Querying Samples" project. The study will
evaluate the technical, economic, and operational aspects to determine the feasibility of
developing and utilizing the proposed system.
1. Technical Feasibility:
Technology Compatibility: Azure SQL Database is a widely used cloud-
based relational database service provided by Microsoft Azure. It offers
comprehensive features for creating and querying databases, making it
technically feasible to develop a sandbox environment for creating and
querying samples.
Infrastructure Requirements: The project requires access to Azure cloud
services, Azure SQL Database, and an SQL client tool. These resources are
readily available and compatible with most modern computing
environments, ensuring technical feasibility.
2. Economic Feasibility:
Cost Analysis: The cost of utilizing Azure SQL Database and other Azure
services must be considered. Azure offers flexible pricing options, allowing
users to choose the appropriate tier and configuration based on their needs.
The economic feasibility depends on the budget allocated for the project
and the potential cost savings in terms of infrastructure and maintenance
compared to on-premises solutions.
Return on Investment (ROI): The potential benefits, such as increased
productivity, scalability, and reduced infrastructure costs, should be
evaluated against the investment required to implement and maintain the
Azure SQL Database Sandbox. If the benefits outweigh the costs, the
project is economically feasible.
3. Operational Feasibility:
User Acceptance: The project's success relies on user acceptance and
adoption. The feasibility study should assess the willingness of users, such
as developers and database professionals, to embrace the Azure SQL
Database Sandbox and their readiness to learn and utilize the Azure SQL
platform.
Training and Support: Adequate training resources and support should be
provided to users to ensure a smooth transition to the new system.
Documentation, tutorials, and hands-on learning materials should be
developed to facilitate the learning process and enhance the operational
feasibility of the project.
4. Time Feasibility:
Ease of Provisioning: Provisioning an Azure SQL Database instance is a
quick and straightforward process, typically taking just a few minutes to
complete. This enables users to swiftly set up the necessary environment
and begin their project without significant delays.
Efficient Query Execution: Azure SQL Database excels in executing queries
with high efficiency. Its advanced query optimization techniques ensure
fast and optimized retrieval and analysis of data. Users can run complex
queries with optimized performance, resulting in reduced query execution
time and improved productivity.
After a comprehensive assessment of technical, operational, economic, and time
feasibility, the " computer vision resource for OCR (Optical character recognition) and
perform OCR on the images" project has been determined to be feasible. The project's
practicality and viability are supported by the availability of Azure Cloud Shell as a
reliable cloud-based solution, accompanied by the necessary tools and resources.
With the established feasibility of the project, users can confidently embark on the
journey of creating OCR using Azure cloud shell. This endeavor will provide valuable
hands-on experience and knowledge, enabling users to gain proficiency in utilizing Azure
Cloud Shell effectively.
CHAPTER 2
REVIEW OF LITERATURE
The OCR (Optical Character Recognition) computer vision project focuses on developing
an efficient and accurate system for extracting text from images and scanned documents.
The project employs a series of steps and methodologies to achieve this objective.
The project's proposed system begins with image acquisition, where images or scanned
documents containing text are obtained from various sources. Preprocessing techniques
are applied to enhance image quality, including noise reduction, normalization,
binarization, and deskewing. This step optimizes the images for subsequent text
extraction.
Text detection algorithms are utilized to locate and identify text regions within the
preprocessed images. Character segmentation is then performed to separate individual
characters or groups of characters, enabling independent analysis during the recognition
process.
The output of the system is the extracted text presented in a machine-readable format.
Integration options, APIs, or user interfaces are provided to seamlessly incorporate the
OCR functionality into other applications or workflows.
Throughout the project, various computer vision libraries, frameworks, and algorithms
are utilized, such as Tesseract, OpenCV, and deep learning frameworks like TensorFlow or
PyTorch. The system undergoes rigorous evaluation to assess its performance in terms of
accuracy, speed, and robustness. User feedback and corrections are incorporated to
continuously refine and improve the OCR system.
Overall, the OCR computer vision project aims to develop a reliable OCR solution capable
of handling diverse image sources, recognizing multiple languages, and providing accurate
text extraction. By leveraging computer vision techniques and advanced machine learning
algorithms, the project strives to achieve high accuracy and usability in OCR applications.
CHAPTER 3
SYSTEM CONFIGURATION
3.1 Hardware
Requirements: Processer:
Intel i3, i5, i7 Ram: 8GB
Hard Disk: 250GB