Forty Years of Research in Character and Document Recognition-An Industrial Perspective

Pattern Recognition 41 (2008) 2435 -- 2446
Contents lists available at ScienceDirect
Pattern Recognition
journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / p r
Forty years of research in character and document recognition---an industrial

perspective
Hiromichi Fujisawa ∗
Central Research Laboratory, Hitachi, Ltd., 1-280 Higashi-koigakubo, Kokubunji, Tokyo 185-8601, Japan
A R T I C L E I N F O A B S T R A C T
Article history: This paper presents an overview on the last 40-years of technical advances in the field of character and
Received 15 February 2008 document recognition. Representative developments in each decade are described. Then, key technical
Received in revised form 10 March 2008 developments in the specific area of Kanji recognition in Japan are highlighted. The main part of the paper
Accepted 11 March 2008
discusses robustness design principles, which have proven to be effective to solve complex problems
in postal address recognition. Included are the hypothesis-driven principle, deferred decision/multiple-
Keywords: hypotheses principle, information integration principle, alternative solution principle, and perturbation
OCR principle. Finally, future prospects, the 'long-tail' phenomena, and promising new applications are dis-
Character recognition cussed.
Handwriting recognition © 2008 Elsevier Ltd. All rights reserved.
Kanji recognition
Postal address recognition
Robustness design
Information integration
Hypothesis-driven approaches
Digital pen
1. Introduction syllabic characters, Kanji (Japanese version of Chinese) characters,

Chinese characters, and Hangul characters. Work is now being done
Presented is an industrial view on the character and document to make Indian and Arabic scripts readable. Many different kinds of
recognition technology, based on some material presented at ICDAR paper forms can be read by today's OCRs, including bank checks, post
[1]. Commercial optical character readers (OCRs) emerged in the cards, envelopes, book pages, and business cards. Typeface standards
1950s, and since then, the character and document recognition tech- such as OCR-A and OCR-B fonts have contributed to making OCRs
nology has advanced significantly providing products and systems reliable enough even in the early stages. In the same context, spe-
to meet industrial and commercial needs throughout the develop- cially designed OCR forms have simplified the segmentation problem
ment process. At the same time, the profits from businesses based and made handprinted character OCRs readable even by immature
on this technology have been invested in research and development recognition technology. Today's OCRs are successfully used to read
of more advanced technology. We can observe here a virtuous cy- any type of fonts and freely handwritten characters.
cle. New technologies have enabled new applications, and the new The field of character and document recognition has not always
applications have supported the development of better technology. been peaceful. It has twice been disturbed by waves of new digital
Character and document recognition has been a very successful area technologies that threatened to diminish the role of OCR technol-
of pattern recognition. ogy. The first such wave was that of office automation in the early
The main business and industrial applications of character and 1980s. Starting then, most of information seemed to be going to be
document recognition in the last forty years have been in form read- 'born digital', potentially diminishing demand for OCRs, and some
ing, bank check reading and postal address reading. By supporting researchers were pessimistic about the future. However, it turned
these applications, recognition capability has expanded in multiple out that the sales of OCRs in Japan, for example, peaked in the 1980s.
dimensions: mode of writing, scripts, types of documents, and so on. This was ironically due to the promoted introduction of office com-
The recognizable modes of writing are machine-printing, handprint- puters. It is well known that the use of paper has kept increasing.
ing, and script handwriting. Recognizable scripts started with Arabic We are now facing the second wave. IT and Web technologies
numerals and expanded to the Latin alphabets, Japanese Katakana might have a different impact. Many kinds of applications can now be
completed on the Web. Information can flow around the world in an
instant. However, it is still not known whether the demand for char-
∗ Tel.: +81 42 323 1111; fax: +81 42 327 7700. acter and document recognition will decrease or whether new appli-
E-mail address: hiromichi.fujisawa.sb@hitachi.com. cations requiring more advanced technology will be created. Search
0031-3203/$30.00 © 2008 Elsevier Ltd. All rights reserved.

doi:10.1016/j.patcog.2008.03.015
2436 H. Fujisawa / Pattern Recognition 41 (2008) 2435 -- 2446
engines have become ubiquitous and are expanding their reach into
the areas of image documents, photographs, and videos. People are
re-evaluating the importance of handwriting and trying to integrate
it into the digital world. It seems that paper is still not going to dis-
appear. Mobile devices with micro cameras now have CPUs capable
of real-time recognition. The future prospects of these developments
are discussed here.
2. Brief historical view
2.1. Overview
The first practical OCR appeared in the United States in the 1950s,
in the same decade as the first commercial computer UNIVAC. Since
then, each decade has seen advances in OCR technology. In the early
1960s, IBM produced their first models of optical readers, the IBM
1418 (1960) and IBM 1428 (1962), which were, respectively, capable
of reading printed numerals and handprinted numerals. One of the
models of those days could read 200 printed document fonts and
were used as input apparatus for IBM 1401 computers. Also in the Fig. 1. First desktop Hitachi OCR HT-560 (1982).
1960s, postal operations were automated using mechanical letter

sorters with OCRs, which for the first time automatically read postal
codes to determine destinations. The United States Postal Service first used to read names and addresses for data entry. More detailed tech-
introduced address-reading OCRs, which in 1965 began reading the nology reviews are available in the literature [9,10].
city/state/ZIP line of printed envelopes [2]. In Japan, Toshiba and NEC The office automation boom of the 1980s, which was influential
developed handprinted numeral OCRs for postal code recognition, in Japan, had two features. One was Japanese language processing by
and put them into use in 1968 [3]. In Germany, a postal code system computers and Japanese word processors. Emergence of Kanji OCRs
was introduced for the first time in the world in 1961 [4]. However, was a natural consequence of this development. The other feature
the first postal code reading letter sorter in Europe was introduced was optical disks used as computer storage systems, which were de-
in Italy in 1973, and the first letter sorter with an automatic address veloped and put into use in the early 1980s. A typical application was
reader was introduced in Germany in 1978 [5]. patent automation systems in the United States and Japan that stored
Japan started to introduce commercial OCRs in the late 1960s. images of patent specification documents. The Japanese patent of-
Hitachi produced their first OCR for printed alphanumerics in 1968 fice system then stored approximately 50 million documents or 200
and the first handprinted numeral OCR for business use in 1972. NEC million digitized pages on 12-in optical disks. Each disk could store
developed the first OCR that could read handprinted Katakana in ad- 7 GB of data, the equivalent of 200 000 digitized pages. The sys-
dition in 1976. The Japanese Ministry of International Trade and In- tem used 80 Hitachi optical disk units and 80 optical library units.
dustry (since renamed the Ministry of Economy, Trade and Industry) These systems can be considered one of the first digital libraries.
conducted a 10-year 20 billion-yen national project on pattern in- This kind of new computer applications directly and indirectly en-
formation processing starting in 1971. Among other research topics, couraged studies on document understanding and document lay-
Toshiba worked on printed Kanji recognition, and Fujitsu worked on out analysis in Japan. More importantly, it was in this decade that
handwritten character recognition. The ETL character databases in- documents became the focus of computer processing for the first
cluding Kanji characters were created as part of this project, which time.
contributed to research and development of Kanji OCRs [6]. As a by- The changes in the 1990s were due to the upgraded performances
product, the project attracted many students and researchers into of UNIX workstations and then personal computers. Though scanning
the pattern recognition area. In the United States, IBM introduced a and image preprocessing were still done by the hardware, a major
deposit processing system (IBM 3895) in 1977, which was able to rec- part of recognition was implemented by the software on general-
ognize unconstrained handwritten check amounts. The author had purpose computers. The implication of this was that programming
a chance to observe it in operation at Mellon Bank in Pittsburgh in languages like c and c++ could be used to code recognition algo-
1981, and it could reportedly read about 50% of handwritten checks rithms, allowing more engineers to develop more complicated algo-
with the remaining half being hand coded. The state of the art in rithms and expanding the research community to include academia.
character recognition in the 1960s and 1970s is well documented in During this decade, commercial software OCR packages running on
the literature [7,8]. PCs also appeared on the market. Techniques for recognizing freely
The 1980s witnessed significant technological advances in semi- handwritten characters were extensively studied, and successfully
conductor devices such as CCD image sensors, microprocessors, dy- applied to bank check readers and postal address readers. Advanced
namic random access memories (DRAMs), and custom-designed LSIs. layout analysis techniques enabled recognition of wider varieties of
For example, OCRs became smaller than ever fitting on desktops business forms. Research institutions specializing in this field such as
(Fig. 1). Then cheaper megabyte-size memories and CCD image sen- CENPARMI, led by Prof. Suen and CEDAR, led by Prof. Srihari and Prof.
sors enabled whole-page images to be scanned into memory for Govindaraju contributed to these advances. New high-tech vendors
further processing, in turn enabling more advanced recognition and appeared, including A2iA, which was started by the late Prof. Simon
wider applications. For example, handwritten numeral OCRs that in France [11], and Parascript, which was started in Russia to do
could recognize touching characters were introduced for the first business in the United States. In Japan, the Japanese Postal Ministry
time in 1983, making it possible to relax physical form constraints conducted the third generation postal automation project between
and writing constraints. In the late 1980s, Japanese vendors of OCRs 1994 and 1996, in which Toshiba, NEC, and Hitachi joined to develop
introduced into their product lines new OCRs that could recognize postal address recognition systems that could sort sequences. This
about 2400 printed and handprinted Kanji characters. These were project enabled significant advances in Japanese address reading.
H. Fujisawa / Pattern Recognition 41 (2008) 2435 -- 2446 2437
The International Association for Pattern Recognition began hold- the basic theory had been known, but computers of the 1970s did
ing conferences such as ICDAR, IWFHR, and DAS in the early 1990s. not have sufficient computational power to be applied to studies
Many intensively studied methods have been reported in these con- of such statistical approaches. Today, the four-directional feature
ferences. Examples are artificial neural networks, hidden Markov vector for Kanji patterns consists of 8 × 8 × 4 elements, and the
models (HMMs), polynomial function classifiers, modified quadratic subspace obtained by statistical covariant analysis is of from 100 to
discriminant function (MQDF) classifiers [12], support vector ma- 140 dimensions. However, the size of the 8 × 8 array is surprisingly
chines (SVMs), classifier combination [13--15], information integra- (counter-intuitively) small in light of many complex Kanji characters.
tion, and lexicon-directed character string recognition [16--19], some Recognition accuracy for individual freely handwritten Kanji is not
of which are based on original ideas from the 1960s [20,21]. Most yet high enough, however. Therefore, linguistic context such as name
of these play key roles in today's systems. In contrast with previ- and address is used to enhance total recognition accuracy. To reduce
ous decades, in which industry mostly used proprietary in-house computational cost, cluster-based two-stage classification is used
technology, the 1990s witnessed important interactions between to reduce the number of templates that must be matched. One of
academia and industry. Academics studied real technical problems the recent advances in Kanji (and Chinese character) recognition
and developed sophisticated theory-based methods, enabling indus- is the reduced size of recognition engines designed especially for
try to benefit from their research. Readers may find the state of the mobile phone applications. A compact recognition engine reported
art of character recognition systems, including image preprocessing, in Refs. [35,36] requires only 613 kB of memory to store parameters
feature extraction, pattern classification, and word recognition, well to recognize 4344 classes of printed Chinese characters.
described in the literature [22].
In the following subsections, major pre-1990s technical achieve-
ments in the area of Kanji character classifiers, character segmenta- 2.3. Character segmentation algorithms
tion algorithms, and linguistic processing are described.
In the 1960s and 1970s, a flying-spot scanner or a laser scanner
with a rotated mirror was used together with a photo-multiplier to
2.2. Kanji character classifiers convert optical signals into electrical signals. Character segmenta-
tion was usually carried out with the help of these kinds of scan-
In the 1970s, there were two competing approaches to character ning mechanisms. For example, forms for handprint reading used
recognition, structural analysis and template matching (or the statis- marks on an edge that signaled the presence of a character line to
tical approach). Contemporary commercial OCRs were using struc- be scanned. In addition, the locations of writing boxes on the forms
tural methods to read handprinted alphanumerics and Katakana, and were registered beforehand, and the colors of the boxes were trans-
template matching methods to read printed alphanumerics. Tem- parent to the scanner sensor. Therefore, OCRs could easily extract
plate matching methods had been experimentally proven to be appli- images that contained exactly one single handprinted character.
cable to printed Kanji recognition by the late 1970s [23--26], but their Then, in the 1980s, semiconductor sensors and memories ap-
applicability to handwritten (or handprinted) Kanji was in question. peared, enabling OCRs to scan and store images of whole pages. This
The problem of recognizing handwritten Kanji seemed like a steep, was an epoch making change that was significant to users because
unexplored mountain. It was clear that neither the structural nor it relaxed strict conditions on OCR form specifications, for exam-
the simple template matching approaches could conquer it alone. ple, by enabling them to use smaller non-separated writing boxes.
The former had difficulty with the huge number of topological vari- However, it required a solution of the problem of touching numer-
ations due to complex stroke structures, while the latter had diffi- als and change in how images are represented in memory [37]. Be-
culty with nonlinear shape variations. However, in light of previous fore this change, scanned images had been arrays of binary pixels,
work on handwritten numeral recognition using a template match- and segmentation was pixel-based, but from this time on, the bi-
ing approach, the latter approach seemed to have a greater chance nary image in the memory was represented by run-length codes. The
of success [27]. run-length representation was suited to conducting connected com-
The key was the concept of blurring as feature extraction, which ponent analysis and contour following. The connected components
was applied to directional features and found to be effective in rec- were processed as black objects rather than as pixels. In 1983, Hi-
ognizing handwritten Kanji [27,28]. The introduction of continuous tachi produced one of the first OCRs that could segment and recog-
spatial feature extraction made the optimum amount of blurring sur- nize touching handwritten numerals based on a multiple-hypothesis
prisingly large. The first Hitachi OCR for reading handprinted Kanji segmentation--recognition method (Fig. 3). Contour shape analysis
used simple template matching based on blurred directional fea- was able to identify candidates of touching points, and multiple pairs
tures where the feature templates were four sets of 16 × 16 arrays of of forcedly separated patterns were fed into the classifier. By con-
gray values. The directional feature, which was patented in Japan in sulting the confidence values from the classifier, the recognizer was
1979, was computed using a two-dimensional gradient to determine able to choose the right hypothesis. This direction of changes has
stroke direction (Fig. 2) and was even applicable to grayscale images led us to forms processing whose ultimate goal is to read unknown
[29]. Although it was only indirectly relevant, Hubel and Wiesel's forms, or at least those forms that are not specifically designed for
work encouraged our view that the directional feature was promis- OCRs. However, this means that users might become less careful in
ing [30]. Nonlinear shape normalization [31--33] and statistical clas- their writing, so OCRs have to be more accurate for freely handwrit-
sifier methods [12,34] boosted recognition accuracy. We learned that ten characters as well.
blurring should be considered as a means of obtaining latent dimen- The segmentation problem was far tougher in postal address
sions (subspace) rather than as a means of reducing computational recognition. Fig. 4 shows horizontally handwritten addresses. The
cost, though the effects might seem similar. For example, the mesh width of a character varies by as much as a factor of two, and some
size of 8 × 8 used in statistical approaches was determined by the of the radicals and components are also valid characters. As shown
optimum blurring parameter in light of the Shannon sampling the- in Fig. 4, it is difficult to group the right components to form the
orem, and bigger mesh sizes with the same blurring parameter did right character patterns, where some characters are quite wide and
not give better recognition performances. others narrow. To resolve the grouping problem, linguistic informa-
The thorough studies of the research group led by Prof. Kimura tion (or address knowledge) is required in addition to geometric and
contributed to advancing statistical quadratic classifiers [12], which similarity information. This issue will be discussed in more detail in
were successfully applied to handwritten Kanji recognition. Actually, Section 3.
1 0 -1
d1
1 0 -1
1 0 -1
1 1 1
d2
0 0 0
-1 -1 -1
Fig. 2. Directional features based on 2D gradient.
Fig. 3. Separation and recognition of touching numerals: (a) only geometric information used; (b) similarity information used in addition to geometric information; and (c)
address information used in addition to geometric and similarity information.
If a string consists of N Kanji characters and there are K candidates for

each, there are K N possible interpretations (or word recognition re-
sults). The linguistic processing consists of choosing one of the many
possible interpretations. To do this, we developed a method based
on a finite state automaton as a key technique [38]. The basic idea is
to throw L lexical terms at the automaton, and see which terms the
automaton accepts, where the model of the automaton is dynami-
cally generated from the lattice (Fig. 5). L is usually a number as big
as several tens of thousands, but only the terms whose first charac-
ter appears in the first column of the lattice are to be accepted. To
Fig. 4. Segmentation of Kanji characters: triangles show erroneous segmentation.
improve accuracy, we may consider the terms whose second char-
acter appears in the second column of the lattice as well. Such terms
are fed into the automaton one by one, and the state transitions de-
2.4. Integration of linguistic information termine a path (a series of edges). Then the corresponding penalties
are summed up and associated with the input term. Passing the first
Major business uses of handprinted Kanji OCRs have been the edge gives a penalty of zero, and passing the last gives a penalty of 15
reading of names and addresses in application forms. In such ap- when K = 16. In this way, a term with the smallest penalty is deter-
plications, to avoid the segmentation problem, forms have sepa- mined to be the recognized word. The number of candidates for each
rate preprinted fixed boxes, but how to achieve highly accurate character is adaptively controlled to be equal to or less than K, to ex-
word/phrase recognition is still a question. clude extremely unlikely word candidates. This algorithm has been
We can utilize a priori linguistic knowledge to choose the right used successfully for address phrases, provided that the characters
options from the candidate lattice to accurately recognize words and are reliably segmented. Marukawa et al.'s experiments showed that
phrases. Here, the lattice is a table in which each column carries can- character recognition accuracy was raised to 99.7% from 90.2% for a
didate classes, and each row corresponds to characters on the sheet. lexicon with 10 828 terms, resulting in address phrase recognition
3.1. Hypothesis-driven principle
Variability means that no one solution can fit all situations. There-
fore, problems must often be divided into a certain number of cases
with a different solution (problem-solver) to each case. However,
the case to which an input in question belongs is unknown. The
hypothesis-driven principle can be applied in such cases, and the
problem of Japanese address block identification is one such case.
There are six layout types basically, but in real life, there are actually
twelve types because envelopes are sometimes used upside-down.
The approach we take is to choose salient features to distinguish be-
tween such cases and to evaluate the likelihood of each case based
on the observed value of such salient features.
As a general framework of the hypothesis-driven approach, we
Fig. 5. Finite state automaton. call the case a hypothesis and the observed salient features evidence,
and a statistical hypothesis test method may be used to evaluate
likelihood. The a posteriori probability of the k-th hypothesis after
accuracy of 99.1%. Here, we can note that error occurrences are not observing evidence for this hypothesis can be computed as in Eq. (1),
statistically independent. Linguistic processing that solves difficult where Hk represents the k-th hypothesis, and ek the feature vector
segmentation problems (cf. Fig. 4) is discussed in Section 3. for the k-th hypothesis. In Eq. (1), L is a likelihood ratio of hypothesis
Hk to null hypothesis H̄k and is computed as in Eq. (2) assuming
the statistical independence of the features. Functions, P(eki |Hk ) and
3. Robustness design to deal with uncertainty and variability P(eki |H̄k ), can be learned from the training samples.
Postal address recognition was an ideal application for re- P(Hk )

searchers in the sense that it presented many technical challenges, L(ek |Hk )
P(H̄k )
but, at the same time, the innovation was an expected one for post P(Hk |ek ) = (1)
P(Hk )
office automation and the investments really paid off. In the 1990s, 1+ L(ek |Hk )
R&D projects were conducted in the United States, Europe, and Japan P(H̄k )
to develop address readers that could recognize freely handwritten
n

and printed full addresses. These were intended to automate carrier P(ek |Hk ) P(eki |Hk )
L(ek |Hk ) =
(2)
sequence sorting, a tedious task for postal workers. The recognition P(ek |H̄k ) P(eki |H̄k )
i=1
task was to identify an exact delivery point by recognizing the full
destination address including street and apartment numbers. The Therefore, observing evidence {ek |k = 1, . . . , K} for all hypotheses
problem in Japan is to identify one of 40 000 000 address points. makes it possible to compute L(ek |Hk ) and P(Hk |ek ) accordingly, to
In this section, the main issues of robustness design intended to find the most probable hypothesis [41].
deal with uncertainty and variability are discussed based on the In the hypothesis-driven approach, after identifying candidates
experience of the author's team [39,40]. of hypotheses, the corresponding problem-solvers applicable only to
Japanese address recognition is a difficult task as shown in Fig. 6. that kind of input are called to process the input.
The read rates for printed and handwritten mail are higher than
90% and 70%, respectively. Images of the rejected mail pieces are
sent to video-coding stations where human operators enter address 3.2. Deferred decision/multiple-hypotheses principle
information. The results of automatic recognition and human coding
are transformed to address codes, which are then sprayed on the In a complex pattern recognition system, many decisions must
corresponding mail pieces as they run through the sorting machine. be made to obtain the final result. As always, each decision is not
After the address codes are mapped to numbers that show a carrier 100% accurate, so the decision-making modules cannot be simply
sequence, the mail pieces can be sorted in sequence by using the cascaded. Each module should not make a decision but should defer
two-pass radix sort method. the decision and forward multiple hypotheses to the next module.
The recognition system consists of a high-speed scanner, image The idea itself is a simple one. In the case of postal address recogni-
preprocessing hardware, and the computer software that carries out tion, there can be as many functional modules as shown below:
layout analysis for address block location, character line segmenta-
tion, character string recognition (i.e., address phrase interpretation), • Line orientation detection
character classification and post processing (Fig. 7). As can be seen • Character size (large/small) determination
in the block diagram, there are many modules that make imperfect • Character line formation and extraction
decisions; i.e., uncertainty is always involved. Algorithms to solve • Address block identification
specific problems are susceptible to variations in the images, so the • Character type (machine-printed/handwritten) identification
most basic questions are how to deal with uncertainty and variabil- • Script (Kanji/Kana) identification
ity and how to implant robustness into the system. A more appro- • Character orientation identification
priate question may be how to compose such a recognition system • Character segmentation
from small pieces of recognition modules, or how to connect those • Character classification
modules. • Word recognition
In answering these questions, it should be recognized that there • Phrase interpretation
are design principles that can guide researchers and engineers. We • Address number recognition
may call them robustness design principles. Table 1 lists them and • Building/room number recognition
gives simple explanations. In the following subsections, five such • Recipient name recognition
principles are discussed. • Final decision making (accept/reject/retry)
Fig. 6. Japanese address recognition: rectangles show identified postal code and address blocks.
Mail Scanning/preprocess Address knowledge integration

piece
Layout analysis Char. string recog.
Preprocessing HW numeral string

recognition Post processing
Line orientation
detection MP numeral string
recognition Address point
Line extraction verification Recognition
HW Kanji string
recognition results
Postal code location Name recognition
Address block MP Kanji string
recognition Retry control
location
(Perturbation)
Character type MP Kana string
recognition recognition
Character classification engines
HW numerals HW Kanji MP numerals MP Kanji MP Kana
Fig. 7. Postal address recognition system.
Table 1
Design principles for robustness
Principles Expected effects
P1 Hypothesis-driven principle When type of a problem is uncertain, set up hypotheses and test them
P2 Deferred decision/multiple hypotheses principle Do not decide; leave decision to next experts carrying over multiple hypotheses
P3a Process integration Solve a problem by multiple different-field experts as a team
P3b Information integration principle Combination-based integration Decide as a team of multiple same-field experts
P3c Corroboration-based integration Utilize other input information; seek more evidence
P4 Alternative solutions principle Solve a problem by multiple alternative approaches
P5 Perturbation principle Modify problem slightly and try again
These functional modules generate multiple hypotheses each of known search methods, we basically use the Hill Climbing Search
which is then forwarded to the next module, which again gener- with backtracking, by which we can reach the optimum solution
ates multiple hypotheses. This process therefore creates the kind in the shortest time. When an optimum branch is rejected at a
of hierarchical tree of hypotheses shown in Fig. 8. The question later stage because it has a confidence value smaller than a preset
here is how to find which optimum branches to follow to reach the threshold, other branches are processed. The use of the Beam Search
best possible answer in the shortest possible time. Among the well- at the later stages effectively boosts the recognition accuracy, while
Fig. 8. Series of decisions required for recognizing a postal address.
Stored linguistic LTM

knowledge (Long-term memory)
Character
Optimum search for
classification Result
interpretation
engines
Character Observed world STM

Image
presegmentation (temporary data) (Short-term memory)
Fig. 9. Information integration: process integration.
tion, and (3) corroboration-based integration. The first approach,

process integration, integrates two or three processes to form a
single problem-solver. Examples are segmentation--recognition
methods and segmentation--recognition--interpretation methods.
This approach started in the area of speech understanding back in
the 1970s. The second combination-based integration approach is
the one taken in character classification and known as classifier
combination or classifier ensemble [13--15]. Different classifiers
such as statistical and structural classifiers and neural networks are
combined (integrated) to deduce a single result, in the expectation
that the classifiers will behave complementarily. Methods known
as majority voting and Dempster--Shafer approaches can be used
to implement the algorithm. Finally, corroboration-based integra-
tion is the approach of finding additional evidence that supports
Fig. 10. Contents of short-term memory: presegmentation network. the result or looking for multiple input information sources for the
same information. A good example is reading bank check amounts
by recognizing both the courtesy amount (numerals) and the legal
its use in earlier stages is too costly. Search control on the number amount (numbers in words). In postal address recognition, both the
of hypotheses to generate is important trade-off between time and postal code and the address phrase in words are read to obtain more
accuracy because computational time is limited to 3.7 s in our case. accurate results. Recipient name recognition is another example of
Of course, shorter is better because it requires less computational corroboration. This approach is taken when street numbers are not
power. recognized.
In postal address recognition, the most important consideration
3.3. Information integration principle is to integrate the three processes of character segmentation, char-
acter classification, and interpretation of the phrases (or linguistic
We recognize three kinds of information integration known in processing). As described in previous sections, address knowledge is
the character and document recognition field to attack the uncer- required to resolve the ambiguities in segmentation incorporation
tainty issue: (1) process integration, (2) combination-based integra- with geometrical information [42] and character similarity, so simple
Fig. 11. Contents of long-term memory: linguistic knowledge in terms of RTN (recursive transition network) generated from a context-free grammar.
application of the multiple-hypotheses principle was not sufficient. sults. Another example of the alternative solutions approach is used
An approach known as the lexicon-directed or lexicon-driven ap- to solve the window noise problem. When existence of window
proach has been developed and can be considered a hypothesis- noise is suspected, two problem-solvers are needed. One attempts to
driven approach, as explained below. The approach is illustrated in eliminate such noise by erosion (thinning) operation, assuming the
Fig. 9, where an input pattern is interpreted by searching for the path shadow is thin or faint. The other attempts to extract line segments
in the presegmentation network (Fig. 10) that best matches the path that form a frame, assuming the shadow is rather solid. These two
in the network that represents linguistic knowledge (Fig. 11). We can problem-solvers are used hoping one will succeed.
say that this is the equivalent of searching for a path in the linguis-
tic network that best matches a path in the presegmented network
[18,19]. This interpretation of the knowledge-directed recognition 3.5. Perturbation principle
process is in line with an explanation given by Simon [43]:
The principle of perturbation is to modify the problem slightly
When it is solving problems in semantically rich domains, a large when it is difficult to solve and to try again to solve it. If pattern
part of the problem-solving search takes place in long-term mem- recognition were such a continuous process, the perturbation prin-
ory and is guided by information discovered in that memory. ciple would not work. In reality, however, it is often a discontinuous
In our case, the long-term memory refers to the linguistic process. Very small modifications may change the final recognition
knowledge, and the short-term memory refers to the presegmented results. It is hoped that the change is from rejection to correct recog-
network. nition or from error to correct recognition. This approach was used
We have developed several versions of such algorithms, one of in the 1980s to recognize handwritten numerals using a structural
which (Fig. 12) was presented by Liu et al. [19]. The recognition rate approach. Because slight topological variations caused rejection, per-
of the lexicon-driven handwritten address recognition algorithm was turbation of parameters or of input images improved the recognition
83.7% with 1.1% error in an experiment, which was done using 3589 rate. In recent years more systematic studies have again shown the
actual mail pieces and a lexicon containing 111 349 address phrases. effectiveness of the approach. Input images are perturbed by various
The linguistic model was represented in the TRIE structure, and the transformations such as morphological (dilation/erosion) and geo-
search was controlled by the Beam Search method. Recognition time metrical transformations (rotation, slanting, perspective, shrinking,
was about 100 ms using a Pentium III/600 MHz machine. and expanding). In Ha and Bunke's work [44], handwritten numerals
were transformed in twelve ways and recognized using the frame-
work of classifier combination. Their approach recognized difficult,
3.4. Alternative solutions principle eccentric handwriting better than classical classifiers such as k-NN
and neural network. By the way, blurring is one of image transfor-
There are many image level problems including touching charac- mations but has not been applied in the context of perturbation.
ters, touching underlines, window shadow noise, cancellation stamps Blurring used in character feature extraction is not the kind of 'slight
covering/touching address characters, and so on. The alternative so- transformation'.
lutions approach is to provide more than one solution to a problem. It The perturbation approach has also been successfully applied
effectively provides solutions that are complementary to each other. to Japanese postal address recognition. Our test of the approach
For example, the problem of touching characters may be solved using achieved about 10--15 percentage point improvements in recogni-
a holistic approach or a forced separation (dichotomizing) approach. tion rates on the average. When we did not set limits on recog-
Especially when dealing with numerals, a pair of touching numer- nition time and repeated more perturbation operations including
als can be treated as one character out of 100 classes. Training such rotational transformation, re-binarization, and some other paramet-
holistic classifiers enables the results of the holistic and dichotomiz- ric modifications in sequence, we found that 53% of rejected images
ing classifiers to be merged producing more reliable recognition re- were correctly recognized with a 12% error rate. Although the result
Fig. 12. Lexicon-driven handwritten postal address recognition [19].
was attractive, reduction of additional errors is a necessary step to 1.0

using this approach. One possible way to pursue this is to apply the
0.9
combination scheme as Ha and Bunke did [44]. Instead of taking the
first recognition result after a series of rejections, multiple perturba-
0.8
tions may be simultaneously applied yielding one result by voting,
for example. In the light of ever increasing computing power, this 0.7
approach seems to be very promising. It should be noted here that
perturbation is not only effective to character classification but also 0.6
effective to layout analysis, line extraction, character segmentation,
and other intermediate decisions. 0.5
0.4
3.6. Robustness implementation 0.3
The design principles described in the previous subsections con- 0.2

Bulk
cern the structure and algorithms of a recognition system, but classi-
Machine-printed
fiers and various parameters have to be carefully and simultaneously 0.1
Handwritten
trained and adjusted [40]. The same is true even for specific problem-
0.0
solving modules. Though minor, many problems emerge during the
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45
development phases. Robustness implementation, therefore, is a dif-
Dataset number
ficult task for researchers and engineers. The following are important
keys to an efficient and effective development process.
Fig. 13. Profiles of recognition rates for bulk, printed, and handwritten mail [40].
• Live samples at users' sites

• Robustness measurement using many 'bags' of test samples
• Acceleration data sets Acceleration data sets are collections of samples that have been
• Sample-by-sample cause analysis rejected or erroneously recognized by a version of the recognizer
concerned. Every sample in the data sets may be given a unique iden-
If possible, it is highly desirable to gather samples from the users' tifier by which the samples can be subjected to sample-by-sample
sites. We call these real samples live samples. However, live samples cause analysis, and more importantly, by which the improvements
should not be mixed into a single sample set while samples are usu- can be traced throughout the development process. If names and
ally collected in multiple sessions. It is important to choose the right problem codes can be assigned to problematic situations, the non-
occasions to capture samples because sample characteristics vary straightforward progress resulting from the remedying processes can
depending on the operational modes and seasonal tendencies. With- be managed more appropriately.
out mixing the collections, we have kept samples in many different
'bags'. Recognition rates (or recognition accuracy) may be measured 4. Future prospects
for each of the bags (or data sets), as shown in Fig. 13. Here, a trick
in the graph is that the data set numbers are rearranged so that the A 40--50 year overview of OCR history and an overview of the
recognition rates are in decreasing order. Arranging the graph this current market may give rise to the view that the technology is al-
way enables observation of the profiles of recognition rates, where most matured. However, it is clear that the technology is still in the
a steeper slope means that the recognition system is less robust. In midst of development and is far inferior to human cognition. From
addition, if recognition performance for a data set is very low, then the viewpoint that the technology is mature, it seems that the cur-
we can re-examine that data set in detail, which is small in size, to rent state is the long tail part of the market (or applications). Accord-
identify the cause of the problem (i.e., low recognition rate). ing to this view, the ''head'' part of the market has a small number
Communication unit
Battery
Memory
Pressure sensor Processor

Ink
Camera
Camera view: 6 by 6 dot area

Dot grid pitch: 0.3 mm
Fig. 15. Digital pen using Anoto functionality.
scene can be recognized. The technical challenges to this technol-

ogy include color image processing, geometric perspective normal-
ization, text segmentation, adaptive thresholding, unknown script
recognition, language translation, and so on. Every mobile phone
in Japan is equipped with a digital camera, and their microproces-
Fig. 14. Mobile phone capturing text for dictionary lookup.
sors are becoming more powerful. Some of digital cameras now
have such intelligent functionality to locate faces in images to be
taken. The question is why is text recognition so difficult. Some mo-
of applications having huge amount of documents to read. They are bile phones in Japan can now recognize over 4000 Kanji characters
business form reading, bank check reading and postal address read- [36]. What seems interesting to challenge is a dynamic recognition
ing. They have been investment-effective due to sufficiently heavy capability, which ensures high recognition performance by repeat-
demands. Or return on investment has been almost always promised. edly recognizing multiple shots of camera images without users'
Of course, the technological advances have elongated the head part conscious operation. Users may try various angles and positions
towards the tail, but the remaining tail is very long. The three appli- aiming at a target of recognition. It can be considered interactive
cation areas considered parts of the head have also tail parts. There perturbation.
are a lot of business forms, checks, and mailpieces that are very diffi- Another attractive area is a digital pen and handwriting document
cult to read. More advanced recognition techniques are undoubtedly management. The act of handwriting is being reconsidered based on
needed. For example, small to medium-sized enterprises (SME) in its importance in education and knowledge work contexts. The act
Japan are still using paper forms to do bank transactions and paper of writing helps people read, write, and memorize, and we may in-
income forms to report to local government. The number of transac- tegrate these acts into information systems by using today's digital
tions carried out by each such company is not very large, and there pens, which can capture handwritten annotations and memos in a
is not much incentive for them to innovate. Banks that receive dif- very natural way. The Anoto䉸 functionality is one of such advanced
ferent forms from such companies, therefore, want to use more in- techniques and digitally captures handwriting stroke data and other
telligent, versatile OCRs. The long-tail phenomenon applies to postal related data (Fig. 15). There are research groups that are using such
address recognition as well. The questions are if the demand side digital pens to create more intelligent information management sys-
can foresee the return on the investment in proposed new products tems [46--49]. Their goal is to seamlessly manage documents with
and systems, and if the scientists and engineers can convince them digital inks. A group advocating 'Information Just-in-Time' (iJIT) is
of the return, while technical problems are piecewise and diverse. developing a pilot system for researchers that supports their note-
These are typical long-tail questions. taking and hybrid document management [49]. Their handwritten
In talking about the future from a different angle, there is the research notebooks can always be kept compatible with their digital
question of chicken and egg, or need and seed, which is difficult to counterparts in computers. By doing so, they can easily share infor-
answer in general. From the industry's viewpoint, it seems more im- mation in the group even when they are located remotely. Another
portant to think of needs, or at least latent needs, and the future feature of the system is that users can print any digital document in
needs seem to be subjective at least for now. The well recognized such a way that the printed document is sensitive to a digital pen
unfilled needs of today include: (1) office document archives for e- (Fig. 16). In other words, the content of a digital document is printed
Government, (2) handwriting for human interface of mobile devices, overlaid with Anoto dots. Therefore, the users can mark and write
(3) text in videos for video search, and (4) books and historical doc- annotations onto those printouts, and handwriting strokes are cap-
uments for global search. There are also two other applications: (5) tured and synchronized with the corresponding document already
text-in-the-scene for information capture, and (6) handwriting doc- existing in computer. The value of this kind of system is that a dig-
ument management for knowledge workers. ital document in computer comes to have the same annotations as
Unknown scripts and unknown languages are a big handicap for the physical counterpart, meaning that they can throw away paper
travelers in foreign countries making quick decisions on the road, in documents anytime without any loss of information. This concept
shops, at the airport, etc. A mobile device with a digital camera, i.e., enables users to work equally well in the digital world and in the
an Information Capturing Camera [45] may be an aid in such a situa- real world. This is an attempt to go beyond the myth of the paper-
tion (Fig. 14). With a higher performance microprocessor, text in the less office [50]. When such a use of digital pens becomes a common
Client PC Client PC References
HDMS / [1] H. Fujisawa, A view on the past and future of character and document
ODPS recognition, in: Proceedings of the Seventh ICDAR, Curitiba, Brazil, September
2007, pp. 3--7.
[2] The United States Postal Service: An American History 1775--2002, Government
Creation and reuse Relations, United States Postal Service, 2003.
Read and write [3] H. Genchi, K. Mori, S. Watanabe, S. Katsuragi, Recognition of handwritten
Original documents numeral characters for automatic letter sorting, Proc. IEEE 56 (1968)
Edited documents 1292--1301.
PDFs [4] W. Schaaf, G. Ohling, et al., Recognizing the Essentials, Siemens ElectroCom,
Printing conditions Konstanz, 1997.
[5] http://www.industry.siemens.com/postal-automation/usa/.
Annotations
[6] K. Yamamoto, H. Yamada, T. Saito, I. Sakaga, Recognition of handprinted
Metadata characters in the first level of JIS Chinese characters, in: Proceedings of the
Retrieval Annotations Eighth ICPR, 1986, pp. 570--572.
[7] J.R. Ullmann, Pattern Recognition Techniques, Butterworths, London, 1973.
[8] C.Y. Suen, M. Berthod, S. Mori, Automatic recognition of handprinted
characters---The state of art, Proc. IEEE 68 (4) (1980) 469--487.
[9] S. Mori, C.Y. Suen, K. Yamamoto, Historical review of OCR research and
development, Proc. IEEE 80 (7) (1992) 1029--1058.
[10] G. Nagy, At the frontiers of OCR, Proc. IEEE 80 (7) (1992) 1093--1100.
Who/when printed? [11] J.C. Simon, Off-line cursive word recognition, Proc. IEEE 80 (7) (1992)
1150--1161.
Who/when annotated?
[12] F. Kimura, K. Takashina, S. Tsuruoka, Y. Miyake, Modified quadratic discriminant
functions and the application to Chinese character recognition, IEEE Trans.
Fig. 16. iJIT system that manages hybrid documents containing handwritten anno- PAMI 9 (1) (1987) 149--153.
tations. [13] C.Y. Suen, C. Nadal, T.A. Mai, R. Legault, L. Lam, Recognition of totally
unconstrained handwritten numerals based on the concept of multiple experts,
in: Proceedings of the First IWFHR, Montreal, Canada, 1990, pp. 131--143.
[14] L. Xu, A. Krzyzak, C.Y. Suen, Methods of combining multiple classifiers and
their applications to handwriting recognition, IEEE Trans. SMC 22 (3) (1992)
practice, it will be a natural demand to ask for capabilities of hand-
418--435.
written character recognition, handwritten query processing, and [15] T.K. Ho, J.J. Hull, S.N. Srihari, Decision combination in multiple classifier systems,
more intelligent knowledge management. Effort to create informa- IEEE Trans. PAMI 16 (1) (1994) 66--75.
[16] F. Kimura, M. Sridhar, Z. Chen, Improvements of lexicon-directed algorithm
tion systems that would require recognition technology is a way
for recognition of unconstrained hand-written words, in: Proceedings of the
that we may pursue. We hope more advanced information systems Second ICDAR, Tsukuba, Japan, October 1993, pp. 18--22.
require more advanced recognition technology. [17] C.H. Chen, Lexicon-driven word recognition, in: Proceedings of the Third ICDAR,
Montreal, Canada, August 1995, pp. 919--922.
[18] M. Koga, R. Mine, H. Sako, H. Fujisawa, Lexical search approach for character-
5. Conclusion string recognition, in: Proceedings of the Third DAS, Nagano, Japan, November
1998, pp. 237--251.
[19] C.-L. Liu, M. Koga, H. Fujisawa, Lexicon-driven segmentation and recognition of
Vision and fundamental technologies are both key to the future handwritten character strings for Japanese address reading, IEEE Trans. PAMI
of our technical community. Vision takes the form of forecasted ap- 24 (11) (2002) 425--1437.
[20] M. Aizermann, E. Braverman, L. Rozonoer, Theoretical foundations of the
plications with new value propositions. For investment to be made potential function method in pattern recognition learning, Automat. Remote
in new technology, such new propositions need to be attractive Control 25 (1964) 821--837.
to many people or at least to some innovative people. This is a [21] U. Miletzki, Schürmann-polynomials---roots and offsprings, in: Proceedings of
the Eighth IWFHR, 2002, pp. 3--10.
top--down approach to innovation. Fundamental technologies may [22] M. Cheriet, N. Kharma, C.-L. Liu, C.Y. Suen, Character Recognition Systems---A
start innovation from the bottom as well. Here, the technologies we Guide for Students and Practitioners, John Wiley & Sons, Inc., Hoboken, NJ,
are discussing have two parts: one is the technology that supports 2007.
[23] R. Casey, G. Nagy, Recognition of printed Chinese characters, IEEE Trans.
our community from the bottom; the other is the technology of Electron. Comput. EC-15 (1) (1966) 91--101.
our own, i.e. character and document recognition. For the first part, [24] S. Yamamoto, A. Nakajima, K. Nakata, Chinese character recognition by
we have seen impacts of advanced semiconductor devices, high- hierarchical pattern matching, in: Proceedings of the First IJCPR, Washington,
DC, 1973, pp. 183--194.
performance computers, and more advanced software development [25] H. Fujisawa, Y. Nakano, Y. Kitazume, M. Yasuda, Development of a Kanji OCR:
tools, which have supported the advances in recognition technology. an optical Chinese character reader, in: Proceedings of the Fourth IJCPR, Kyoto,
They not only enabled more advanced OCR systems on the surface, November 1978, pp. 815--820.
[26] G. Nagy, Chinese character recognition: a twenty-five-year retrospective, in:
but also invited and promoted more academia into this community, Proceedings of the Ninth ICPR, 1988, pp. 163--167.
which have also contributed to the advances in recognition tech- [27] M. Yasuda, H. Fujisawa, An Improvement of Correlation Method for Character
nology. We would like to see this kind of virtuous cycles happen Recognition, vol. 10 (2), Systems, Computers, Controls, Scripta Publishing Co.,
1979, pp. 29--38.
forever. [28] H. Fujisawa, C.-L. Liu, Directional pattern matching for character recognition
revisited, in: Proceedings of the Seventh ICDAR, Edinburgh, August 2003, pp.
794--798.
Acknowledgments [29] H. Fujisawa, O. Kunisaki, Method of pattern recognition, Japanese Patent
1,520,768 granted in 1989, filed in 1979.
[30] D.H. Hubel, T.N. Wiesel, Functional architecture of macaque monkey visual
The author is grateful to the members of his research team in cortex, Proc. R. Soc. London Ser. B 198 (1977) 1--59.
Hitachi who worked on development of the postal address recogni- [31] J. Tsukumo, H. Tanaka, Classification of handprinted Chinese characters using
tion system: H. Sako, K. Marukawa, M. Koga, H. Ogata, H. Shinjo, K. non-linear normalization and correlation methods, in: Proceedings of the Ninth
ICPR, Rome, Italy, 1988, pp. 168--171.
Nakashima, H. Ikeda, T. Kagehiro, R. Mine, N. Furukawa, and T. Taka- [32] C.-L. Liu, Normalization-cooperated gradient feature extraction for handwritten
hashi. He is also grateful to Dr. C.-L. Liu at the Institute of Automa- character recognition, IEEE Trans. PAMI 29 (6) (2007) 1465--1469.
tion of the Chinese Academy of Sciences, Beijing, and Prof. Y. Shima [33] C.-L. Liu, Handwritten Chinese character recognition: effects of shape
normalization and feature extraction, in: Proceedings of the Summit on Arabic
at Meisei University, Tokyo, for the work they did at our laboratory. and Chinese Handwriting, College Park, September 2006, pp. 23--27.
The author also thanks Prof. G. Nagy of Rensselaer Polytechnic Insti- [34] K. Jain, R.P.W. Duin, J. Mao, Statistical pattern recognition: a review, IEEE Trans.
tute for his valuable discussions and comments on this manuscript. PAMI 22 (1) (2000) 4--37.
[35] C.-L. Liu, R. Mine, M. Koga, Building compact classifier for large character
Thanks also go to Dr. U. Miletzki of Siemens ElectroCom for provid- set recognition using discriminative feature extraction, in: Proceedings of the
ing information regarding their historical work. Eighth ICDAR, Seoul, Korea, 2005, pp. 846--850.
[36] M. Koga, R. Mine, T. Kameyama, T. Takahashi, M. Yamazaki, T. Yamaguchi, [44] T.M. Ha, H. Bunke, Off-line, handwritten numeral recognition by perturbation
Camera-based Kanji OCR for mobile-phones: practical issues, in: Proceedings method, IEEE Trans. PAMI 19 (5) (1997) 535--539.
of the Eighth ICDAR, Seoul, Korea, 2005, pp. 635--639. [45] H. Fujisawa, H. Sako, Y. Okada, S-W. Lee, Information capturing camera
[37] H. Fujisawa, Y. Nakano, K. Kurino, Segmentation methods for character and developmental issues, in: Proceedings of the Fifth ICDAR'99, Bangalore,
recognition: from segmentation to document structure analysis, Proc. IEEE 80 September 1999, pp. 205--208.
(7) (1992) 1079--1092. [46] F. Guimbretière, Paper augmented digital documents, in: Proceedings of the
[38] K. Marukawa, M. Koga, Y. Shima, H. Fujisawa, An error correction algorithm ACM Symposium on User Interface Software and Technology, UIST2003,
for handwritten Chinese character address recognition, in: Proceedings of the Vancouver, Canada, 2003, pp. 51--60.
First ICDAR, Saint-Malo, September 1991, pp. 916--924. [47] C. Liao, F. Guimbretière, PapierCraft: a command system for interactive paper,
[39] H. Fujisawa, How to deal with uncertainty and variability: experience and in: Proceedings of the ACM Symposium on User Interface Software and
solutions, in: Proceedings of the Summit on Arabic and Chinese Handwriting, Technology, UIST2005, Seattle, USA, 2005, pp. 241--244.
College Park, September 2006, pp. 29--39. [48] R. Yeh, C. Liao, S. Klemmer, F. Guimbretière, B. Lee, B. Kakaradov, J. Stamberger,
[40] H. Fujisawa, Robustness design of industrial strength recognition systems, in: A. Paepcke, ButterflyNet: a mobile capture and access system for field biology
B.B. Chaudhuri (Ed.), Digital Document Processing: Major Directions and Recent research, in: Proceedings of the International Conference on Computer--Human
Advances, Springer, London, 2007, pp. 185--212. Interaction, CHI2006, Montreal, Canada, 2006, pp. 571--580.
[41] T. Kagehiro, H. Fujisawa, Multiple hypotheses document analysis, in: S. Marinai, [49] H. Ikeda, K. Konishi, N. Furukawa, iJITinOffice: desktop environment enabling
H. Fujisawa (Eds.), Studies in Computational Intelligence, vol. 90, Springer, integration of paper and electronic documents, in: Proceedings of the ACM
Berlin, Heidelberg, 2008, pp. 277--303. Symposium on User Interface Software and Technology, UIST2006, Montreux,
[42] T. Kagehiro, M. Koga, H. Sako, H. Fujisawa, Segmentation of handwritten Switzerland, October 2006.
Kanji numerals integrating peripheral information by Bayesian rule, [50] A.J. Sellen, R.H. Harper, The Myth of the Paperless Office, The MIT Press,
in: Proceedings of the IAPR MVA'98, Chiba, Japan, November 1998, Cambridge, MA, 2001.
pp. 439--442.
[43] H.A. Simon, The Sciences of the Artificial, third ed., The MIT Press, Cambridge,
MA, 1998, pp. 87--88.
About the Author---HIROMICHI FUJISAWA is a Corporate Chief Scientist at the Research and Development Group of Hitachi, Ltd., and is an
advisor to the Global Standardization Office of Hitachi. He joined its Central Research Laboratory in 1974. Since then, he has engaged in
research and development for handwritten character recognition, document understanding, and document retrieval, as applied to business
OCR systems, mail sorting machines, e-Government systems, etc. He was a visiting scholar at Carnegie Mellon University's Computer Science
Department in 1980--1981 and at Stanford University's Computer Science Department in 2005--2006. He is a Fellow of the IEEE, Fellow of the
International Association for Pattern Recognition (IAPR), and Fellow of the Institute for Electronics, Information and Communication Engineers,
Japan (IEICE). He is also a recipient of the IAPR/ICDAR's Outstanding Achievements Award in 2007 for contributions to industrial document
analysis and to the ICDAR community. Fujisawa received the Dr. Eng. degree in Electrical Engineering from Waseda University in 1975.

Forty Years of Research in Character and Document Recognition-An Industrial Perspective

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Forty Years of Research in Character and Document Recognition-An Industrial Perspective

Uploaded by

Copyright:

Available Formats

Pattern Recognition 41 (2008) 2435 -- 2446

Contents lists available at ScienceDirect

Forty years of research in character and document recognition---an industrial

1. Introduction syllabic characters, Kanji (Japanese version of Chinese) characters,

0031-3203/$30.00 © 2008 Elsevier Ltd. All rights reserved.

2. Brief historical view

1960s, postal operations were automated using mechanical letter

Fig. 2. Directional features based on 2D gradient.

If a string consists of N Kanji characters and there are K candidates for

3.1. Hypothesis-driven principle

Postal address recognition was an ideal application for re- P(Hk )

Mail Scanning/preprocess Address knowledge integration

Layout analysis Char. string recog.

Preprocessing HW numeral string

Character classification engines

HW numerals HW Kanji MP numerals MP Kanji MP Kana

Fig. 7. Postal address recognition system.

Principles Expected effects

Fig. 8. Series of decisions required for recognizing a postal address.

Stored linguistic LTM

Character Observed world STM

Fig. 9. Information integration: process integration.

tion, and (3) corroboration-based integration. The first approach,

Fig. 12. Lexicon-driven handwritten postal address recognition [19].

was attractive, reduction of additional errors is a necessary step to 1.0

3.6. Robustness implementation 0.3

The design principles described in the previous subsections con- 0.2

• Live samples at users' sites

Pressure sensor Processor

Camera view: 6 by 6 dot area

Fig. 15. Digital pen using Anoto functionality.

scene can be recognized. The technical challenges to this technol-

Client PC Client PC References

You might also like