You are on page 1of 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/269072749

Form type identification for banking applications and its implementation


issues

Conference Paper · January 2003


DOI: 10.1117/12.476328 · Source: www.imaging.org

CITATIONS READS

4 74

7 authors, including:

Hisao Ogata Hiromichi Fujisawa

4 PUBLICATIONS   10 CITATIONS   
Waseda University
82 PUBLICATIONS   2,535 CITATIONS   
SEE PROFILE
SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Knowledge-based document management View project

Document understanding View project

All content following this page was uploaded by Hiromichi Fujisawa on 20 March 2018.

The user has requested enhancement of the downloaded file.


SPIE Document Recognition and Retrieval X - San Jose, Jan. 2003

Form Type Identification for Banking Applications and

Its Implementation Issues

Hisao Ogata*1, Shigeru Watanabe*1, Atsuhiro Imaizumi*1, Tsukasa Yasue*1, Noahiro Furukawa*2, Hiroshi
Sako*2, and Hiromichi Fujisawa*2

*1
Mechatronics Systems Division, Hitachi, Ltd.
hisao-ogata@itg.hitachi.co.jp
*2
Central Research Laboratory, Hitachi, Ltd.
fujisawa@crl.hitachi.co.jp

Abstract
This paper presents a new type ATM called image-ATM and an image workflow system developed for
banking applications. The system including the image-ATM captures the paper forms brought by the
clients at the very front-end, identifies the type of forms, and recognizes the data on the form
automatically. The image-ATM can accept over 400 different kinds of forms. The system is presently in
operation at some of the Japanese major banks. They could reduce considerable human workforce for
their branch offices by introducing the image workflow system and by centralizing the back-office work
at a few operation centers. Technically, form recognition, especially form type identification, was one of
the keys for this success. This paper discusses a method for form type identification and its technical
issues.
Keywords: forms processing, image workflow, form identification, form recognition, banking application

1. Introduction
Because everything seems to go to electronic, it has been asked why we need to concern about paper documents and
their imaging. Forms processing is a good example of that kind. It has been discussed in contrast to the web
computing such as e-Banking. In reality, however, it is a good thing to consider multiple ways of doing things. By
doing so, we may face and respond to the digital divide issue, human factors, social factors, and the resistance to
changes. Good blending of new technologies and conventional ones is the key in this discussion. This paper presents
new forms processing which is applied in a new banking system in Japan.
The banks in Japan have been re-engineering the branch office workflow to improve their service quality, to
reduce the human workforce, and to initiate new services. One of the tasks they wanted to improve was paper form
handling and related work process. The paper forms brought to the bank counter by clients are those to make
payments of utilities such as electricity, gas and water, local taxes, health insurance, telephone charges, and so on. It
is said that there are thousands of such different forms. The problems here are that they are produced by many
different organizations, that the forms are in many formats, and that it is hard to ask them to standardize the forms or
make it all electronic. Therefore, there are many realistic reasons that they do not disappear, although they also
employ and encourage electronic payment schemes including e-Banking.

1
SPIE Document Recognition and Retrieval X - San Jose, Jan. 2003

The solutions we have provided are the image-ATM (Automatic Teller Machine) and the image workflow
system. The image-ATM is a new type machine by which the clients can withdraw and pay money (Fig. 1). To make
payments, the client brings a paper form of an electricity bill, local tax notice, and so on, and then, enters the piece of
paper into a small window of the machine to let it read. Through minimum interactions between the client and the
machine, he/she can pay money and get an official receipt for it. Presently, the image-ATM can accept about 400
kinds of forms. Another solution, the image workflow system, was developed which is a large-scale workflow system
that realizes the improved forms processing. The system links several hundreds of branch offices, several tens of
operation centers, and a few data centers. By introducing such a system, a bank institution could centralize the so-
called back-office work, which once existed in each branch office, at a few of operation centers. As a result they
could reduce the corresponding workforce. Even in considering the additional workforce for the operation centers and
data centers, it was reported that the reduction of the workforce was several hundreds of people.
The technical issues involved are versatile ranging from the system issues such as RAS (reliability, availability
and serviceability) to the system performance, data security and human factors. The technical issues to be discussed
in this paper however are those around forms recognition. The paper forms put into the image-ATM and those
presented to the tellers are scanned and converted into the digital images. The images are analyzed, the type of the
form is identified, and then, the content of the form is recognized. As mentioned above, the forms are just as they
exist, and therefore, there are many pattern recognition problems. This paper discusses the important problems
around the form type identification.

Fig. 1. Image-ATM Fig. 2. Blinkscan on a teller station

100 Mbps LAN

Mainframes
Hub servers
Banking Teller Batch OCRs
Stations
Applications
servers
64/128/512 kbps
Form sorting

Verification
staffs

Branch office Approval


Image ATMs
servers staffs

400 branch offices 2 data centers


9,000 terminals 30 operation centers

Retail Branch Offices Data/Operation Centers

Fig. 3. System architecture for image workflow

2
SPIE Document Recognition and Retrieval X - San Jose, Jan. 2003

2. System Architecture and Workflow


The new banking system to be described here, which is developed by Hitachi, is now in practical operation in some
of the Japanese major banks. The hardware part which is related to forms processing and image workflow is depicted
in Fig. 3. The system consists of subsystems of retail branch offices, operation centers and data centers as shown.
They are linked with each other through dedicated private lines. The front-end of this system consists of a set of
image-ATMs (Fig. 1) and a set of banking teller stations which install a front-end scanner (Fig. 2). The front-end
scanner (over-head type) shown in Fig. 3 is specially designed so that the teller can capture the form images very
quickly. The product is named “Blinkscan” after this. Its scanning resolution is 200/300/400dpi, and the scanning
speed for A4-size color document is about 3 seconds. Another feature of this scanner is that it does not occupy a large
space of the desk as seen in Fig. 3.
The design philosophy of this system is that the information on the paper forms is taken at the very front-end of
the system. The image-ATM and the front-end scanner are the devices for this design philosophy. Digitized images
are analyzed and the contents are recognized automatically. The result of recognition and the key entered data are sent
to one of the remote operation centers for verification and approval. The digital information put onto the data
workflow makes it possible.
The system works as follows (Fig. 4). The image of a form is captured by a front-end scanner on the desktop or
by an image-ATM. Examples of the forms to be processed are shown in Fig. 5, which are a standard money transfer
form and a local tax notice having three parts. The image captured by the scanner is sent to a form recognition
program on a branch office server, and then the form type is automatically identified. The identification is done
against the predefined form types and it makes it possible for the computer invoke an appropriate application
program and open the right application window. At the same time, the form recognition program can locate the
positions of the data fields and the types of data to read. While the human teller processes the form, the data fields
such as account number, amount of money, and due date are automatically recognized in the back ground. The
recognized data, key entered data and the corresponding image are sent to one of the remote operation centers for
verification and appropriate management operations. Finally, the central database is updated correspondingly, and the
form images are stored in the image archive server with their index information. Of course, the detail operational
steps are different depending of the type of forms.

(1) Image capture and recognition

(2) Key in amount of money Central


database
Mainframe computers

Tellers (3) Image and data

(11) Update central database


Retail Branch Offices

(4) Image and data

(5) Key in payee & payer names


Operators
(6) System verification
(7) Mismatched data
(8) Check & correction Transaction
records
(9) Big transactions
Managers
(10) Approval Application servers
Operation Centers Data Centers

Fig. 4. Form workflow for the money order process

3
SPIE Document Recognition and Retrieval X - San Jose, Jan. 2003

In the case of the image-ATM, the process is somewhat different because it is to automate the whole process
without human teller supports. For this, it has a form recognition engine, and it automatically identifies the form type
and recognizes the form content locally, after the client enters information such as the payer identification and the
amount of money through the keyboard. The data fields to be read by the recognition engine depends on the
definition of the form. When the information entered by the client and the information recognized by the recognition
engine coincides, the client is requested to put the money into the machine. When the form cannot be identified or
when the two sets of information do not coincide, the client is requested to proceed to a bank counter where the
human teller may help. It should happen less than 5% of the time.
When the money is properly accepted, the form is cut into two or three parts inside the machine, and the receipt
part with an official stamp is returned to the client. Of course, the record is kept and transferred to the central
computer. This terminates the payment transaction. The benefit of this new type ATM is that the client does not need
to tell the machine the kind of the form which is from out of about 400 kinds of them. In fact, it is not realistic to
require the client to enter full length of the character string of the form title printed on the form. Moreover, the form
title is not enough information that can specify the form type. Regarding recognition of the content, the way that
automatic recognition is used is unique. It is used to verify the key entered data. Therefore, the result of recognition is
not shown on the terminal screen of the ATM.

(a) Standard money transfer form (b) Tax notice form for a local government
• Left part: Sent to the payee • Left part: Local tax notice and receipt to be kept by the tax payer
• Right part: Receipt kept by the payer • Middle part: Notice to be kept by the bank
• Right part: Notice to be kept by the authority
Fig. 5. Example form: Local tax notice and receipt

Bank cheque
Bank money order
Money order
Postal services
E1 company
Electricity
E2 company
W1 city
Water
W2 city
Utility bill Gas
T1 company
Telephone
T2 company
Form for payments
TV broadcast Year 2002
C1 city
Year 2001
Resident’s tax Year 2002
C2 city
Local taxation Automobile Year 2001
C3 city
Fixed property tax
H1 company
Health insurance H2 company
Insurance fee Pension deposit H3 company

Nursing insurance
Others

Fig. 6. Taxonomy of the forms

4
SPIE Document Recognition and Retrieval X - San Jose, Jan. 2003

The taxonomy of the form types for the image-ATM is shown in Fig. 6. The thick boxes are generic types,
while the thin boxes are individual types which are shown only partly. The form types have been increased for a
better service from 100 types in the initial operation in the year 2000 to 408 types to date. Such image-ATM
operation is now limited to a Tokyo area. Still, it should cover 7 power companies, 10 telephone companies, 31 water
suppliers, 38 local governments for resident’s taxes, 30 local governments for health insurance, 32 private
organizations and companies, and so on. Because each of the entities has more than one kind of forms, the total
number of the different forms amounts to 408.

3. Technical Issues and General Approach


There are many kinds of forms that make researchers and engineers busy to solve the corresponding technical
problems. As shown in Fig. 5, some forms are rather simple while others are complex. The type of the forms shown
in Fig. 5 (a) is a standardized one for money transfer used at banks and post offices. Companies and entities who
want to receive payments print such forms and send them to the payers through mail. Then, the payers bring it to a
bank or a post office with money. As described earlier, the banks want to reduce the cost for this kind of transactions.
There are following favorable features however, because the form was designed to process by machine:
• The background of borders and boxes is printed in a drop-out color.
• An OCR font is used partially.
• Some of the forms have barcodes printed.
• OCR characters and barcodes are redundant information so that they can be read in addition to attain
higher recognition reliability.
Therefore, there are few technical issues, except that the background information printed in a drop-out color needs to
be scanned into the memory to extract the form structure information, otherwise it could be dropped out.

(a) Mixed numbers of lines (two dimensional layout)

(b) Different gray levels of backgrounds (c) Different fonts in a single line

(d) Different field design: border shape, thickness, digit separators

(e) Different syntax of dates, and printing displacement


Fig. 7. Difficulties in the non-standard forms
On the other hand, non-standard forms such as the one shown in Fig. 5 (b) have a number of technical problems
because of the following characteristics. Examples to be referred are shown in Fig. 7.
• A lot of information is packed in a limited space causing following problems.
• Character lines are laid out in a two dimensional way. In the case of Fig. 7 (a), each of the lines divides
into two lines partially.

5
SPIE Document Recognition and Retrieval X - San Jose, Jan. 2003

• Highlight is given to important fields making binarization difficult. In the case of Fig. 7 (b), the three
fields have three different gray levels.
• Font sizes, font spacing and line spacing are not uniform.
• A single line may have different fonts as in the first line of Fig. 7 (c); i.e. the month is the foreground
printed later in OCR-B font which is different from the background of the year number and year name.
• Field borders may have different shapes and different thickness as in Fig. 7 (d).
• The foreground printing does not align properly to the background printing sometimes as in Fig. 7 (e).
Because the position of the field in the background is used to register the form definition profile, the
displacement (positioning error) may cause a segmentation error in the recognition phase. In addition,
foreground characters may overlap the background characters and lines.
• Date information printed is important in form reading applications. However, the syntax of date
representation has many variations, some of such examples are shown in Fig. 7 (e). The right image of
the figure shows the syntax which is read as “Year-name YY year MM month DD day” in English.
• Some of the forms are very similar to each other, except a few characters printed on it. For example,
the difference may be only in the company names as in “East Japan” and “West Japan” (i.e. the
difference is just in one Kanji character).
• Some information is printed at two or three positions, however. This helps recognition produce more
reliable results.
• It is very difficult to gather samples to define new forms and to train the system, although the pattern
recognition solutions require many sample to set up the system. There are several reasons for the
difficulty. One reason is because there are so many organizations that produce such forms. Another is
because such organization is not the entity that can receive the merit of the system. Still another is
because the banks for example have not stocks of those forms. Furthermore, there are privacy and
security concerns.
• Another system-level technical issue is the methodology for defining new forms. It turned out to be a
very costly process to define new forms. When the number of the form types gets larger, a manual
process of the form definition became almost impossible. Automatic or semi-automatic form definition
became mandatory. It was partly because the pattern recognition accuracy for form type identification
was not reliable enough when it was started. In such a situation, the staffs of forms definition were
requested to repeatedly retry different ways of defining a form. With the improvements in recognition
engine, this problem became less significant, but it was still good to have an (semi-) automatic method
for this.

In summing up, major pattern recognition problems are technically in adaptive binarization, character line
segmentation, character extraction from the background, word recognition and word spotting, and learning of new
forms. The recognition technology that solves these pattern recognition problems is a basis of the systematic function
of the form type identification.

Preprocessing Form Type Identification Form Content Recognition

Binarization Structure extraction Field segmentation

Skew detection Keyword extraction Segmentation-


recognition
Line segment Keyword pattern
Self-verification
extraction matching

Knowledge Base

Form models Form features Character features Word lexicon

Fig. 8. Form recognition engine

6
SPIE Document Recognition and Retrieval X - San Jose, Jan. 2003

Before discussing how to solve these problems, the form recognition engine will be described by referring to
Fig. 8. The engine consists of four modules of preprocessing, form type identification, form data recognition, and
knowledge base. The preprocessing includes binarization, form edge detection, skew detection, and line-segment
extraction from horizontal and vertical rule-lines. Binarization is first applied with a standard threshold for normal
case handling. Skew is detected after edge detection, but the image is not normalized. Instead, the skew parameters
are used in the following processes to adjust positions.
Form type identification is done by template matching called “constellation matching” [1]. The name of the
method was given by the analogy between stars scattered in the sky and keywords printed in the whole space of a
forms. The conventional form recognition methods looked at the line structures which construct the tabular forms.
However, as discussed above, the non-OCR forms do not necessarily have much of such line structure, or such
structure does not necessarily discriminate different types of form. What is important to note here is that the
“researchers” of this technical field cannot control the definition of categories or group formation of such forms.
Rather, it is the “customers” or the system management people that determine the categories. Furthermore, the
categorization is the result of even human factor considerations. As a result, it may happen that very similar forms be
regarded as different types. They request for example to differentiate very similar telephone bills that are from
different companies. Therefore, the final resort to make form type identification is to recognize text on the forms.
Namely, the keywords and their positions are used as the features of a form. For this, a set of tuples of a keyword and
its position coordinates is defined for each form type. We will refer the set as template or form type profile. To make
reliable identification, each template needs to have multiple tuples. The profile actually has more information in
addition to such tuples for the sake of content recognition and business processes. Concretely, the information
includes that of data field positions and their attributes which defines character types and a verification method.
Template matching for form type identification is a standard, straightforward method, but the problem is its
computational cost to match the input to each of a few hundreds of templates. The approach taken here is so-called
hierarchical template matching to acceleration. Therefore, the template matching is carried out only for a few of
templates. Details will be described later.
When the form type is identified, its profile tells where and what to read. As shown in Fig. 8, form content
recognition starts with field segmentation for which the position data is given by the type profile. To cope with the
problems discussed above with respect to Fig. 7, field images are segregated from the original image with some
larger margins. This may solve the problem of the placement error of foreground printing, but at the same time, it
may cause another problem of “noise” inclusion. The segregated image may include patterns of adjacent fields.
Patterns intruded from the edges of the segregated image are analyzed and determined if they should be discarded. If
the field is a date field, syntax checking is conducted to see if the string is consistent with one of the date
representation formats. Finally, redundancy check is made if there exists redundant information on the input form.
The last issue of lack of sample forms is one of the toughest problems. It is very critical to have a good
relationship good collaboration with the banks which would install the system. Technically, it is very effective to
generate many samples artificially from a very small number of real samples. We may use perturbation to generate
similar, but somewhat noisy samples.

4. Form Type Identification and Content Recognition


4.1 Form Type Identification
Form type identification (Fig. 8) is done in four steps to speed up the process; i.e. it is based on hierarchical template
matching:
1) Size matching
2) Line-segment matching
3) Fix area word matching
4) Form definition template matching
The first three steps are a kind of screening. The first step raises about 30 candidates on the average, the second
20 candidates, and the third 3 candidates, respectively. Figures 9 (b) and (c) show the directional features used at the
second stage. These features are those used in handwritten and machine-printed Chinese character recognition
effectively [2, 3]. The mesh size of these feature patterns is 16 by 16, which is just sufficient for this purpose. At the

7
SPIE Document Recognition and Retrieval X - San Jose, Jan. 2003

fourth step, a left-upper rectangular sub-region of the input form which is a fixed region (120mm x 40 mm) for every
candidate is analyzed to recognize character strings. To cope with the difficulty in character segmentation, the
lexically directed approach is adopted as shown in Fig. 10 [4, 5]. The figure shows how the problem of non-uniform
character size and spacing is approached. Because the parentheses are thin and the last five character fonts are small,
there are multiple choices of segmentation kept in the character hypothesis network shown in Fig. 10 (b). Each edge
of the network is assigned a corresponding segmented image, which is then assigned a vector of the character
classification scores. Given a list of words as a lexicon, dynamic programming can be applied to search for the
optimum partial path for each word. The word(s) that gives the best score is (are) the recognition result. In the case of
Fig. 10 (c), two words are found. Then, the set of the recognized words is matched against the word lists of the
defined form types, producing three best form types as the candidates for the final step.
The final step of form type identification is to match the three candidate templates against the input. Namely,
for each form type template, the areas at the positions specified by the template are analyzed to see if they have a
specified character string inside. To do this, those areas of the input form image are segmented and recognized by the
method shown in Fig. 10. Because it is verification rather than recognition, one character string is hypothesized for
each image segment, and the recognition process gives a total score of matching from the plural areas. The best
match whose total score is greater than the pre-determined threshold is the output of the form type identification.

(a) Sample form image (b) Horizontal feature pattern (c) Vertical feature pattern
Fig. 9. Sample form and its directional features for hierarchical template matching

(a) Segmented character line

(b) Character hypothesis network

(c) Recognized string

Fig. 10. Character segmentation network for lexically directed string recognition

4.2 Form Content Recognition


Form content recognition is to recognize the data fields specified by the form type profile which is identified by the
previous step of form type identification. Most of the cases, fields to be recognized are amount of money, due date,
and verification code. The verification code is printed in an OCR font or in terms of barcode. Some forms do not
carry a verification code. Examples of problematic images of amount of money and due date are shown in Fig. 5.
Recognition of machine-printed character strings of digits is not new at all. The problems here are, as discussed
in the previous chapter, adaptive binarization, line segment removal, separation of character patterns from line
segments, and noise elimination. Again, there already exist algorithms that solve the corresponding problem. The
question is how to organize the whole. The approaches taken to these problems here are the multiple strategy
approach, multiple hypotheses approach and information integration approach [6].
The multiple strategy approach is to prepare multiple solutions against a single problem. It is effective for
example when the background characters are faint. One solution for this is to attempt to eliminate them by

8
SPIE Document Recognition and Retrieval X - San Jose, Jan. 2003

morphological filtering, and another is to try to recognize them by lowering the binarization threshold. The strategy is
to “do both” and to choose the better result. Actually, it is not so simple because the engine does not know when and
where it is faint (too light) and when it is strong (too dark). In practice, if the standard recognition operation ends up
with rejection, then an abnormal situation is hypothesized and such a strategy is applied. To make it work reliably, it
is important not to make false accept. Abnormal input should be rejected reliably. Contextual recognition such as
information integration described later is one of the ways for it.
The multiple hypotheses (and test) approach is effective when there are ambiguities in doing something. For
example, a segmented field image with wider margins may include noises (or character patterns) intruded from
outside of the true area. Such “noisy” patterns are checked if they are really from outside. In such a situation, it is
effective to set the gray zone for this determination, and the patterns that lie in this gray zone are regarded as noise
and regarded as real patterns as well. These multiple hypotheses are tested by recognition.
The information integration approach is effective to recognize the due date especially. The date has specific
representational syntax rules, and these constraints can be used to recognize it accurately just as in the case of address
phrase recognition. It is one way of doing contextual recognition. It is more reliable to recognize the string of “Year-
name YY year MM month DD day” as a whole than just to recognize “Y Y M M D D” separately without any
contextual information. However, a trickier situation is that the background characters of “Year-name year month
day” do not align nicely with “YY MM DD.” The worst situation is that some of characters from the two groups
touch each other because of printing placement error. If the recognition results from these two groups of characters
can be interpreted as date contextually, the result can be accepted.

5. Form Definition

The form definition is the process to prepare for a new type form given a few samples of it. It has been a costly task
and it has been hoped to automate the task. There are three subtasks involved:
1) Define geometric parameters of the form
2) Find features of the new form for automatic form type identification
2) Specify data fields to be read and define the field property
3) Specify machine operations and workflow after recognition

The step 2 is to be discussed in this paper. As described before, form type identification is based on four-step
hierarchical template matching. Therefore, the form definition task should create a definition template for each step.
Such a task for creating templates for the first three steps is rather straightforward, the method taken for the final step
will be describe below.
The template used at the final step of template matching consists of multiple tuples of keyword character strings
and their position coordinates, as described earlier. Because a complex form may have about a hundred words printed
on the form surface, it is important to choose keywords (i.e. character substrings) that meet the following conditions:
• discriminative enough among other similar forms
• accurately and reliably recognizable
• a minimum number of keywords
The method to create form definition templates to be presented here can find keywords that meet those
conditions, by analyzing a sample form image. Using the modules of form recognition engine, the form image is
analyzed to recognize character strings of the sample form. The idea behind this is that by using the same recognition
modules, it will find character strings that are accurately and reliably recognizable. To assure the accuracy and
reliability, it is possible to use plural samples of that kind and to apply a perturbation technique.
Recognition of character strings is actually word spotting. After character line segmentation, lexically directed
string recognition is conducted for each character line to recognize words (Fig. 10). A lexicon of about 5,000 words
has been created from a huge sample set and is used for this word spotting. Figure 11 shows a sample result of
character line segmentation, and Figure 12 shows character substrings (words) obtained by word spotting. The
spotted words are candidates for the keywords for type definition.
The keyword selection is done based on the keyword uniqueness and recognition stability (reliability). The
uniqueness is defined as a reciprocal of the number of the forms among the whole set of forms, that has the same

9
SPIE Document Recognition and Retrieval X - San Jose, Jan. 2003

keyword appearing at the same position. If the keyword concerned appears at a specific position in only one form in
the whole set, then the uniqueness of the keyword takes the value of 1. The stability of a keyword is defined by the
number of samples whose keyword concerned is properly recognized, divided by the total number of the samples.
Therefore, if the keyword concerned can be recognized in all samples, then the stability is 1. By evaluating every
keyword candidate, those whose uniqueness and stability are high enough are selected as the keywords for the type
defining template. The number of keywords to be selected can be determined by a probabilistic model [1].
When it is difficult to prepare many real life samples, a perturbation method can be used to artificially generate
physically different samples. Perturbation parameters can be the binarization threshold, artificial skew, displacement
(linear translation), etc. It has been proved that this is practically useful and effective [1].

Fig. 11. Character line segmentation

Fig. 12. Candidate keywords extracted by word spotting (thick boxes)

Fig. 13. Generated form template for form type identification

6. Conclusion
This paper presented a large-scale image workflow system and a new type ATM (Automatic Teller Machine) applied
to a banking system. The system including those ATMs called image-ATM has been already in operation at some of
the Japanese major banks. It has been successful in reducing the workforce of the branch offices and in providing
better services. The form images captured at the very front-end of the system are analyzed to identify the kind of the
forms and to read the contents. The form type identification is the first step of this pattern recognition task, by which
the following steps are determined. The image-ATM presented can process over 400 kinds of forms covering money
order, local government’s tax notices, utility bills, health insurance payment slips, and so on.
The method for form type identification is one of the key technologies of this kind of system. The method
presented utilizes the printed text information as well as ruled lines. The ruled line information is effective to reduce
the number of the candidate types, but it is not enough to identify the unknown form type from out of hundreds of

10
SPIE Document Recognition and Retrieval X - San Jose, Jan. 2003

types. Local tax forms are similar except the city names, for example. The presented approach is to word-spot the
keywords defined for each of the types in the input form image. A score is calculated for each candidate template that
defines a form type. The selection of the keywords is an important task when a new type is to be defined to guarantee
the high recognition rate.
As for the performance of form type identification, an experiment has shown that the correct identification rate
was 97.0%, the false alarm rate 0.0%, and the rejection rate 3.0%, respectively, for 1,505 test forms from 291 types of
forms. The rejection threshold was set such that the false alarm rate was virtually zero. The average processing time
was 3.6 sec on the average by using the 1.7GHz Pentium 4 processor with 1GB memory. In the practical operation,
the image-ATM and the image workflow system can accept over 95% of the forms without human intervention, but it
always needs better performance.
This type of form processing may find other applications than banking systems. For example, e-Government
systems are requested to support more human-friendly interfaces to bridge the digital divide. We expect that we may
apply this technology to local governments and other needs as well.

Acknowledgement
The authors would like to express their thankfulness to the research staffs at Central Research Laboratory and
engineers at Mechatronics Systems Division for their detailed improvements and form definition tasks by which it
became possible the developed method work under the practical conditions.

Reference
[1] Hiroshi Sako, Naohiro Furukawa, Masakazu Fujio, and Shigeru Watanabe, “Document-Form Identification
Using Constellation Matching of Keywords Abstracted by Character Recognition,” D. Lopresti, et al. (Eds.),
DAS 2002, LNCS 2423, Springer-Verlag, pp. 261-271, 2002.
[2] Michio Yasuda and Hiromichi Fujisawa, “An Improvement of Correlation Method for Character Recognition,”
Systems•Computers•Controls, Scripta Publishing Co., Vol. 10, No. 2, 1979, pp. 29-38.
[3] Hiromichi Fujisawa and Osamu Kunisaki, “On the Characteristics of Directional Pattern Matching,” National
Conv. Rec. Inst. Electron. Comm. Eng. Japan, No. 1372, Vol. 5, 1983, p. 243 (in Japanese).
[4] Masashi Koga, Ryuji Mine, Hiroshi Sako, and Hiromichi Fujisawa, "Lexical Search Approach for Character-
String Recognition," Proc. IAPR Workshop on Document Analysis Systems, DAS'98, Nagano, Nov. 1998, pp.
237-251.
[5] Cheng-Lin Liu, Masashi Koga, Hiromichi Fujisawa, “Lexicon-Driven Handwritten Character String
Recognition for Japanese Address Reading,” Proc. 6th IAPR International Conference on Document Analysis and
Recognition, ICDAR2001, Seattle, USA, Sep. 2001, pp. 877-881.
[6] Hiromichi Fujisawa, Masashi Koga, Hisao Ogata, Tatsuhiko Kagehiro, and Hiroshi Sako, “Recognition
Strategies for Japanese Mailpiece Recipient Addresses,” Proc. 2nd Int. Conf. Multimodal Interface, Kowloon
Tong, Hong Kong, Jan. 5-7, 1999, pp. III69-74.
[7] Hiromichi Fujisawa, Yasuaki Nakano, and Kiyomichi Kurino, “Segmentation Methods for Character
Recognition: From Segmentation to Document Structure Analysis,” Proceedings of the IEEE, Vol. 80, No. 7,
July 1992, pp. 1079-1092.
[8] Takashi Hirano, Yasuhiro Okada, and Fumio Yoda, “Field Extraction Method from Existing Forms Transmitted
by Facsimile,” Proc. 6th IAPR International Conference on Document Analysis and Recognition, ICDAR2001,
Seattle, USA, Sep. 2001, pp. 738-742.
[9] Hiroshi Shinjo, Kazuki Nakashima, Masashi Koga, Katsumi Marukawa, Yoshihiro Shima, and Eiichi Hadano,
“A Method for Connecting Disappeared Junction Patterns on Frame Lines in Form Documents,” Proc. 4th IAPR
International Conference on Document Analysis and Recognition, ICDAR1997, Ulm, Germany, Aug. 1997, pp.
667-670.
[10] Akira Amano and Naoki Asada, “Complex Table Form Analysis Using Graph Grammar,” D. Lopresti, J. Hu,
and R Kashi (Eds.), DAS 2002, LNCS 2423, Springer-Verlag, pp. 283-286, 2002.

11

View publication stats

You might also like