You are on page 1of 7

OCR (Optimal Character Recogintion):

OCR system transform a two-dimensional image of text, that contain machine printed or handwritten
text from its image representation into machine-readable text. This done by certain process like
preprocessing of Image, Text Localization, Charactere Segmentation, Character Recogination, Post
Processing. This process main aim is to identify and capture all the unique words using different
languages from written text characters. The technology still holds an immense potential due to the
various use cases of deep learning based OCR like digitizing invoices, digitizing ID cards. OCR has two
parts to it. The first part is text detection where the textual part within the image is determined. This
localization of text within the image is important for the second part of OCR text recognition where the
text is extracted from the image. Using these techniques together is how you can extract text from any
images.
For the recogination of characters from the images we use method called tesseract.Tesseract is an open
source text recognition using an API  to extract printed text from images. It can be used with the existing
layout analysis to recognize text within a large document, or it can be used in conjunction with an
external text detector to recognize text from an image of a single text line.

Working:
The API request the input image to pre processor and the text in the images is being recoginized
by the tesseract OCR engine and retreive the text post processing with traine data sets. This works on
the process by python based LSTM algorithm (Long short term memory) to inbuild in python to extract
the datas from the images.The image jpg is import for adative binarization where the image is
categorized based on the each pixel to get binary images and undergo component analysis phase to
separate each pixel variable to analyze and undergoes word detection from paragraph lines are organize
the word for two step recogination and words are seperated from the image to retreive the charcters
editable document.

The recogination is done by word finding which is organizing text lines into blobs, and the lines
and regions are analyzed for fixed pitch or proportional text.  Line findings where text lines are broken
into words differently according to the kind of character spacing. Character classification Each word that
is satisfactory is passed to an adaptive classifier as training data. The adaptive classifier then gets a
chance to more accurately recognize text lower down the page.
ABBYY OCR :

ABBYY is an optimal character recogination tool such has ABBY fine reader pdfs uses engine to
get the character recogination from the pdf and ABBYY Flexicapture where to scan the document by
automatic document classification. The neural-based automatic document classification technology
enables sorting of documents and subcategories by text content and image patterns automatically
processes all types of documents from files and scanners in a single flow, including  office documents and
image formats, email attachments and message bodies. It is used for recognition of printed and hand-
printed text (OCR and ICR), barcodes, and optical marks (OBR & OMR) accessible via Web API.

It extracts data from predefined fields or utilizes ready-to-use data capture algorithms to capture data
During the document analysis stage the document is split into individual pages the layout of each page
is checked to detect placement of text, images, barcodes and table elements. At the same time the
document as an entity and its logical structure is detected. This way the function of text elements is
understood – for example headers and footers will be identified. Information about text, pictures and
formatting elements will be saved - and used later during the final document reconstruction.   The result
will be an exactly reconstructed document. At the recognition stage the document images are
assembled into document sets. Their content and data are intelligently extracted and validated
automatically. This task can be done either by separators and page counters or with the help of ABBYY
neural-based classification algorithms that automatically identify. ABBYY engine runs consistency checks
to ensure all case-related documents are assembled correctly into a full document set. Key fields, seals,
photos or signatures of different documents by displaying their main fields on the same сase. ABBY
engine automatically extracts data from various papers of the documents. The extracted field is set to
verify for the optimal recogination and view in web based format. The AI based document classification
utilizes OCR, Natural Language Processing and a pre-trained Convolutional Neural Network. This allows
to quickly train the classification module on own documents and implement document classification.

Process:

Step1: Input: Documents images or screenshots can be uploaded. All types of PDFs can be processed -
their annotations, metadata, bookmarks and other data can be kept

Step2: Image preprocessing: Pre-processing tools optimize the images prior to the text recognition step.
To achieve high-quality recognition results, images are rotated, cropped, de-skewed and binarized.
Distortions are corrected and backgrounds are filtered out

Step3: Layout Analysis: Document analysis is performed to detect text areas and collect information
about the document, its structure and the layout of each page. Choose from several document analysis
modes or manually define text recognition blocks.
Step4: Recoginition: By nlp algorithm the hand-printed text in more than 120 languages, many different
fonts, writing styles and language combinations can be recognized. Barcodes and checkmark values can
be extracted.

Step5: Verification: Internal recognition results, such as character coordinates, fonts and formatting are
accessible and can be used to implement automated correction or manual verification.

Step6:Export : The recognition results are delivered according to requested settings. These can range
from individual field values in TXT or CSV format to completely reconstructed PDF or Word documents
with their internal links and original formatting.

GOOGLE LENS:

Google lens uses machine learning model in API allows the user to get the images recoginised
for vision technology. The Vision API can quickly classify images into thousands of categories and assign
them sensible labels. It can even detect individual objects, faces, and pieces of text within an image. It
needs to make sense of the shapes and letters. This is vital for text recognition tasks. So, Optical
character recognition utilizes a region proposal network  to detect character level bounding boxes that
can be merged into lines for text recognition. RPN is a fully convolutional network that simultaneously
predicts object bounds and objectness scores at each position. By applying Knowledge Graph in addition
to context of the text words we can recoginize characters. The text can be obscured, stylised or blurry
and it can cause the model to misunderstand words. To improve word accuracy, Lens in utilizes the
Knowledge Graph to provide contextual clues, such as whether a word is likely a proper noun and
should not be spell corrected and other such details. Google Lens uses Convolutional neural networks to
detect text blocks like columns, or text in a consistent style or color. And then within each block, it uses
signals like text alignment, language, and the geometric relationship of the paragraphs to determine
their final reading order.

Steps for recogination:


By the google cloud vision python to send requests to the API and perform label detection on a
large dataset of images. The image is uploaded in the machine to import the lens shall automatically
train the images in the google buckets and the batch processing is done to image shall extract the text
for recogination. The image is processed for the model to create and get accurate result of image. The
neural networking algorithm create a regional proposal for recognising the image. Thus the step
provides to use api and graphical UI to classify images using predefined model which being trained and
able to deploy machine learning model at high accurate model. The feature includde detect object by
pre recoginised data of information, enale cision product search to compare photos, detect handwritten
text using OCR and able to anlyze the detected face by image recogination
AWS texttract:
Amazon texttract is used to add documents to the field and do text detection and able to
provide an machine learnining model to create analysis to the application. It is able to detect text in
various documents and extract text forms and tables. It uses deep learning technology to detect the text
of the documents. It is simple to use APIs that can analyze image files and PDF files. . Amazon Textract
provides you with control over how text is grouped as an input for NLP. The datas is extracted and
normalized from different souces. Its provide scalable document analysis with integration of document
text detection with low cost.

It enables you to detect and analyze text in single or multipage input documents. It analyze the
text that find deeper relationships such as form data and tables. It detect the text synchronous and
asynchronous operations that return only the text detected in a document. For both sets of operations,
the following information is returned in multiple the section called Block objects. The lines and words of
detected text. The relationships between the lines and words of detected text. The page that the
detected text appears on. The location of the lines and words of text on the document are detected.
Amazon Textract analyzes documents and forms for relationships between detected text. Amazon
Textract analysis operations return 3 categories of text extraction text, forms, and tables. The raw text
extracted from a document. Form data is linked text items extracted from a document. It represents
form data as keyvalue pairs.It can extract tables, table cells, and the items within table cells.The textract
operations return the location and geometry of items found on a document page. The geometry of the
document is based on bounding box, points(polygonal reference).

The process is done by the methods calling Amazon Textract synchronous operations,detecting
document text with Amazon Textract , analyzing document text with Amazon Textract. Pass a document
image to an Amazon Textract operation by passing the image as a byte array. The file stored is being
shows the input json for amazon textract to process. To detect text in a document, you use the
DetectDocumentText operation and pass a document file as input. DetectDocumentText returns a json
structure that contains lines and words of detected text, the location of the text in the document, and
the relationships between detected text. For multipage operation we can detect text lines and words on
a multipage document. The asynchronous operations are StartDocumentTextDetection and
GetDocumentTextDetection and text analysis we can identify relationships between detected text on a
multipage documents by StartDocumentAnalysis and get documents.
AWS Comprehend:

Amazon Comprehend uses natural language processing to extract insights about the content of
documents. It recognises the entities, key phrases, language, sentiments, and other common elements
in a document. Customize Comprehend for your specific requirements without the skillset required to
build machine learning-based NLP solutions. Using automatic machine learning, or AutoML,
Comprehend Custom builds customized NLP models on using data you already have can detect the data.
Custom Classification is used to create custom document classifiers to organize your documents into
your own categories. For each classification label, provide a set of documents that best represent that
label and train your classifier on it. Once trained, a classifier can be used on any number of unlabeled
document sets. Custom Entities is used to create custom entity types that analyze text for your specific
terms and noun-based phrases. You can train custom entities to extract terms like policy numbers, or
phrases that imply a customer escalation. To train the model, you provide a list of the entities and a set
of documents that contain them. Once the model is trained, you can submit analysis jobs against it to
extract their custom entities. The are following process to extract the data and find the documents
about a particular subject using Amazon Comprehend topic modeling. Scan a set of documents to
determine the topics discussed and to find the documents associated with each topic. Then Amazon
Comprehend tell you what customers think of your products. Send each customer comment to the
DetectSentiment operation and it will tell you whether customers feel positive, negative, neutral, or
mixed about a product and discover what matters to the customer. Amazon Comprehend removes the
complexity of building text analysis capabilities into your applications by making powerful and accurate
natural language processing available with a simple API.Amazon Comprehend uses deep learning
technology to accurately analyze text and enables you to analyze millions of documents so that you can
discover the insights that they contain.

Amazon Comprehend uses a pre-trained model to examine and analyze a document or set of documents
to gather it. This model is continuously trained on a large body of text so that there is no need for you to
provide training data. It can examine and analyze different languages. With Amazon Comprehend we
can perform the following on your documents detect the Dominant Language to examine text to
determine the dominant language, detect Entities such as textual references to the names of people,
places, and items as well as references to dates and quantities. To detect Key Phrases to find key
phrases such as "good morning" in a document or set of documents, to detect Personally Identifiable
Information and to analyze documents to detect personal data that could be used to identify an
individual, such as an address, bank account number, or phone number. Determine Sentiment to
analyze documents and determine the dominant sentiment of the text. Analyze Syntax to parse the
words in your text and show the speech syntax for each word and enable you to understand the content
of the document.

There are lots of languages supported in AWS comprehend by suitable key word English en,
Dutch duc etc.
Amazon SageMaker

AWS Sage Maker is machine learning technique service models can quickly and easily build and
trained machine learning models. It also provides common machine learning algorithms that are
optimized to run efficiently against extremely large data. SageMaker offers flexible distributed training
options that adjust to your specific workflows.AWS sagemaker has several feaures like Studio integrated
machine learning environment where you can build, train, deploy, and analyze your models all in the
same application. Ground Truth training datasets by using workers along with machine learning to
create labeled datasets. Preprocessing to analyze and pre-process data, tackle feature engineering, and
evaluate models. SageMaker Experiments for management and tracking. You can use the tracked data
to reconstruct an experiment, incrementally build on experiments conducted by peers, and trace model
lineage for compliance and audit verifications. SageMaker Debugger to inspect training parameters and
data throughout the training process. Automatically detect and alert users to commonly occurring errors
such as parameter values getting too large or small. SageMaker Autopilot is used for the users without
machine learning knowledge can quickly build classification and regression models. Reinforcement
Learning to maximize the long-term reward that an agent receives as a result of its actions. Batch
Transform to preprocess datasets, run inference when you don't need a persistent endpoint, and
associate input records with inferences to assist the interpretation of results. SageMaker Model Monitor
to analyze models in production to detect data drift and deviations in model quality. SageMaker Neo to
train machine learning models then run anywhere in the cloud and at the edge. SageMaker Elastic
Inference is to speed up the throughput and decrease the latency of getting real-time inferences.

Steps in AWS Sagemaker:

Generate the sample data anad expore to train the model. Fetch the data and clean the process.
Prepare or transform the data to improve the performance. Train the model using the machine learning
algorithms. The algorithm you choose depends on a number of factors. After training the model you
evaluate it to determine whether the accuracy of the inferences is acceptable. Deploy the model to the
application.

Thr training algorithm performed is used to get the solution data for the appropriate model by
using the deep learning frameworks.
AWS Elastic search:

It is a open source search and data analytics engine for use cases such as log analytics, real-time
application monitoring, and clickstream analysis. It uses the method called clustering. Amazon Es create
a domain. An Amazon ES domain is synonymous with an Elasticsearch cluster. Domains are clusters with
the settings, instance types, instance counts, and storage resources that you specify. Each instance acts
as one Elasticsearch node. It perform function like indexing the data by naming restriction for indices
such as all letters must be lowercase, Index names cannot begin with _ or -, Index names can't contain
spaces, commas etc. URI searches is being done to find the necessary datas. This is acheives to KNN,
Cross cluster search and ranking the the datas. After processing the data reindexing the data by
managing the indices through index state management and using curator to rotate data. Thus amazon
Elasticsearch Service lets you to monitor your data proactively with the alerting and anomaly detection
features.

You might also like