You are on page 1of 13

Capabilities of Automatic Data Extraction

1 CONFIDENTIAL © 2013 Kofax, All rights reserved


1. Various Recognition Technologies (OCR/ICR/OMR/Barcode)
The Kofax Capture platform supports numerous ways which are configuration based to extract meta-
data from the documents.

i. OCR/ICR/OMR Extraction

Optical Character Recognition (OCR)


OCR is the most prevalent of all the technologies. It stands for “optical character recognition” and
is the reading of machine-printed text on a document and the conversion of the text into a digital
format.

Intelligent Character Recognition (ICR) Software


ICR Software provides the ability to recognize handwritten, or hand printed text. This process
can be extremely accurate when the printed text is bound by boxes, or combed form fields.

Optical Mark Recognition (OMR) Software


OMR Software, sometimes called "mark sense", provides the ability to read checked boxes on
forms or documents. The software senses the difference between an unmarked and marked box
using a baseline reading, and then allows the recognition to take place.

The following screenshot shows how a combination of the above techniques is used to identify a
document and extract meta-data from documents.

ICR

OMR
OCR

ii. Barcode Detection / Support


The Kofax Platform is able to read both 1D and 2D barcode standards that are commonly used
in the market.

Following are some samples of such standards:

2 CONFIDENTIAL © 2013 Kofax, All rights reserved


The barcode locator within Kofax Transformation Modules (KTM) is configurable to read various
types of 1D and 2D barcodes standards as illustrated below:

Using the barcode locator we are able to 1 or more barcodes that are located on the same page.

3 CONFIDENTIAL © 2013 Kofax, All rights reserved


4 CONFIDENTIAL © 2013 Kofax, All rights reserved
2. Cheque Recognition

For cheque recognition, the platform has integration with a specialized recognition engine to read
information accurately such as MICR, LAR and CAR.

Kofax platform is able to locate and reads the amount written in figures (CAR) and the amount
written in letters (LAR) and combines the 2 results to increase the overall amount recognition
performance.

Date, Payee Name and MICR recognition for Cheque is also possible with the Kofax Platform.

5 CONFIDENTIAL © 2013 Kofax, All rights reserved


3. Intelligent Capture

Various intelligent recognition methods can be selected from.

3.1 Format Locator


One of the mostly used intelligent capture method is the Format Locator, it extends the flexibility
to capture meta-data by using Regular Expressions.

6 CONFIDENTIAL © 2013 Kofax, All rights reserved


In most cases, when a format is used, several “alternative” results are found in the document. For
example, if a date on an invoice is searched, several alternatives could be found: order date, amount
due date, invoice date, delivery date etc. To further evaluate the results, keywords are used.
Keywords are words whose geometrical relationship with each alternative governs how the
alternatives are evaluated. After this evaluation, the alternative with the highest confidence is selected
as the result.

7 CONFIDENTIAL © 2013 Kofax, All rights reserved


3.2 Database Locator
Database locator searches the incoming document to see if any records in pre-configured database
appear on the document. Matching rules defining columns to match can be setup and multiple
matching rules are allowed.

Configuration of database to look up.

8 CONFIDENTIAL © 2013 Kofax, All rights reserved


Screen to define the matching rules

9 CONFIDENTIAL © 2013 Kofax, All rights reserved


4. Meta Data Capture at Import

Regardless of the channel of input, fields from channels can be automatically extract and
populated to fields of the imported documents.

This can be done by simple configuration of mapping fields between import channels to
document index fields.

The following information can be extracted from import channels

10 CONFIDENTIAL © 2013 Kofax, All rights reserved


11 CONFIDENTIAL © 2013 Kofax, All rights reserved
5. Extraction of Table in Different Column Formats

The platform can recognize tables in different column formats. The platform can learn
different header rows of the same table type.

Training in progress to allow the system to learn different header rows

12 CONFIDENTIAL © 2013 Kofax, All rights reserved


6. Online Learning

When pre-configured rules cannot extract the data fields or table, the system will learn the
newly specified data position after user manually validate the fields.

There will be a “+” sign at the document icon on the left panel. This indicates the system will
learn the data position of this document.

The learning will be saved into the internal knowledgebase after validation and will be
effective immediately for the next extraction.

13 CONFIDENTIAL © 2013 Kofax, All rights reserved

You might also like