You are on page 1of 42

Natural Language Processing (COSC 6405)

Lecture 08: Related Fields

Department of Computer Science,


Addis Ababa University

Yaregal Assabie

2018/19—Sem I
Modes of Language Representation
Written and Spoken Languages
Speech Recognition
Conversion to Machine-Editable Text
Optical Character Recognition

Written and Spoken Languages

• There are two modes used to represent languages.


♦ Written Language

Machine-printed Handwritten Handwritten Machine-Editable

♦ Spoken Language

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 2/42
Modes of Language Representation
Written and Spoken Languages
Speech Recognition
Conversion to Machine-Editable Text
Optical Character Recognition

Conversion to Machine-Editable Text

• Most of the NLP applications and tasks discussed so far assume that the language is
represented as machine editable text.
• Optical Character Recognition (OCR) Systems convert non-editable texts into their
equivalent machine-editable text.
• Speech Recognition (SR) Systems convert spoken language into its equivalent machine
editable text.
• Both OCR and SR systems merge interdisciplinary technologies from Signal Processing,
Pattern Recognition, Natural Language, and Linguistics into a unified framework.

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 3/42
Modes of Language Representation Parameters of SR Systems
Speech Recognition General Architecture
Optical Character Recognition Components of SR Systems

Parameters of SR Systems

• The following parameters are considered in the development of Speech Recognition


systems.
♦ Speaking Mode
ƒ Isolated
ƒ Continuous
♦ Speaking Style
ƒ Read Speech
ƒ Spontaneous Speech
♦ Enrollment
ƒ Speaker-Dependent
ƒ Speaker-Independent
♦ Vocabulary
ƒ Small (<20)
ƒ Large (>20,000)

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 4/42
Modes of Language Representation Parameters of SR Systems
Speech Recognition General Architecture
Optical Character Recognition Components of SR Systems

General Architecture

Acoustic Signal

Feature Extraction

Acoustic Model +
Language Model Decoding
Lexical Model

Text

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 5/42
Modes of Language Representation Parameters of SR Systems
Speech Recognition General Architecture
Optical Character Recognition Components of SR Systems

Components of SR Systems: Acoustic Signal

• Acoustic signals represent the waveforms of a given speech.


♦ Can be captured using a sound recorder and stored as a wave file.

Aበበ በሶ በላ

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 6/42
Modes of Language Representation Parameters of SR Systems
Speech Recognition General Architecture
Optical Character Recognition Components of SR Systems

Components of SR Systems: Feature Extraction

• Feature Extraction is the process of transforming the input acoustic signal data into the
set of features.
• Mel-Frequency Cepstral Coefficients (MFCCs) are commonly used as features in speech
recognition systems.
♦ A total of 39 features are extracted.

Wavefile

Feature Extraction

Feature
Vectors

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 7/42
Modes of Language Representation Parameters of SR Systems
Speech Recognition General Architecture
Optical Character Recognition Components of SR Systems

Components of SR Systems: Language Modeling

• The notion of Language Modeling in Speech Recognition is similar to that of Statistical


Machine Translation [Lecture 08].
• N-gram Language Model is commonly used for Speech Recognition as well.
• We want to compute:
♦ the probability of a sequence
P(w1,w2,w3,w4,w5…wn) = P(W)

♦ the probability of a word given some previous words


P(w5|w1,w2,w3,w4)

• Further reading:
Language Modeling in Statistical Machine Translation [Lecture 08].

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 8/42
Modes of Language Representation Parameters of SR Systems
Speech Recognition General Architecture
Optical Character Recognition Components of SR Systems

Components of SR Systems: Acoustic Modeling

• The goal of the probabilistic noisy channel architecture for speech recognition can be
summarized as follows:
What is the most likely sentence out of all sentences in the language L given some acoustic input O?”
• We can treat the acoustic input O as a sequence of individual “symbols” or
“observations” (for example by slicing up the input every 10 milliseconds, and
representing each slice by floating-point values of the energy or frequencies of that
slice).
O = o1,o2,o3, . . . ,ot
• Similarly, we treat a sentence as if it were composed of a string of words:
W = w1,w2,w3, . . . ,wn

• The probabilistic implementation of our intuition above, then, can be expressed as


follows:
^
W=argmax P(W/O)
W

• The Noisy Channel Equation for Speech Recognition:


^
W=argmax P(O/W) * P(W)
W
Observation likelihood Language Model

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 9/42
Modes of Language Representation Parameters of SR Systems
Speech Recognition General Architecture
Optical Character Recognition Components of SR Systems

Components of SR Systems: Acoustic Modeling

• The acoustic model needs to have the following:


♦ Observation likelihoods
♦ Pronunciation lexicon (lexical model)
ƒ The HMM structure for each word, built by hand
♦ Transition probabilities

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 10/42
Modes of Language Representation Parameters of SR Systems
Speech Recognition General Architecture
Optical Character Recognition Components of SR Systems

Components of SR Systems: Acoustic Modeling

Selecting Appropriate Units


• What is the best base unit for a continuous speech recognizer?
♦ Possible units: Phrase, word, syllable, phoneme, allophone, subphone
• Requirements
♦ Accurate
ƒ Can be recognized with high accuracy.
♦ Trainable
ƒ Can be well trained with the given size of the training data.
♦ Generalizable
ƒ Words not in the training data should be modeled with high precision.

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 11/42
Modes of Language Representation Parameters of SR Systems
Speech Recognition General Architecture
Optical Character Recognition Components of SR Systems

Components of SR Systems: Acoustic Modeling

Comparison of Different Units


• Phrase
♦ Pros: Captures coarticulation for a whole phrase.
♦ Cons: Very large number; common phrases might be trainable.
• Word
♦ Pros: Intra-word, but not inter-word coarticulation is captured.
♦ Cons: Very large number; large vocabulary training unrealistic.
• Syllable
♦ Pros: Close tying with prosody (stress, rhythm).
♦ Cons: Coarticulation at endpoints not captured; Large number.
• Phone
♦ Pros: Low number (around 50).
♦ Cons: Very sensitive to coarticulation.
• Context-dependent phone (triphone, diphone, monophone)
♦ Pros: Captures coarticulation from adjacent phones.
♦ Cons: High number of triphones (125,000).

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 12/42
Modes of Language Representation Parameters of SR Systems
Speech Recognition General Architecture
Optical Character Recognition Components of SR Systems

Components of SR Systems: Decoding

• Given the language model and acoustic model (along with lexical model), a decoder
searches for the best sequence of words from speech.
• The Viterbi algorithm is widely used as a decoder in Speech Recognition systems
• Currently, the HMM Toolkit (HTK) is the most widely used open source toolkit to
implement HMM-based Speech Recognition systems.

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 13/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Types of OCR Systems

• Optical Character Recognition (OCR) is a process that involves reading text from paper
in the form of image and converting the image into a standard encoding scheme
representing the text, e.g. ASCII or Unicode.
• The idea of OCR came into existence when G. Tauscheck obtained a patent on ‘Reading
Machine’ in Germany in 1929.
♦ However, the modern history of OCR started with the advent of computers.
• In the early years of Latin OCR, some standards of fonts were developed to help easy
recognition.
♦ Among the standard OCR fonts are OCR-A and OCR-B, which are widely used
in passports, bank checks, serial tracking labels, credit card imprints, cash
registers, license plates and postal mails.

OCR-A

OCR-B

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 14/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Types of OCR Systems

• The input text can be machine-printed or hand-written.


• Historically, the term OCR has been used for all types of input texts.
♦ However, in recent times, the term intelligent character recognition (ICR) is
often used for handwriting recognition.
♦ If recognition is performed at word level, the handwriting recognition is called
intelligent word recognition (IWR).
• When the text to be recognized is already available in some media such as paper, it
means that the recognition is done after the writing process has been completed.
♦ The text is digitized using a scanner and stored in image format.
♦ In such cases, the recognition process is called offline recognition.
• The recent explosion in the use of handheld digital devices has brought the need to
automatically recognize characters whilst the user is writing.
♦ The text is captured and stored, e.g. in UNIPEN format.
♦ This method of recognition is called online recognition.

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 15/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Types of OCR Systems

Recognition
Type of input text
Method Technology Complexity
Machine printed Offline OCR Easy
Offline handwritten Offline ICR Difficult
Online handwritten Online ICR Easy

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 16/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

General Architecture

Text Image

Preprocessing

Segmentation

Feature Extraction
Optional Component
Language
Classification Model

Post-Processing

Editable Text

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 17/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Processes of OCR Systems: Preprocessing

• Preprocessing stage aims to produce data that are easy for recognition systems to
produce accurate results.
• It includes image enhancement, noise removal, skewness and slant correction, and size
normalization and thinning.
• Image Enhancement
♦ Used to improve the quality of degraded documents which is typically
observed in ancient documents.
♦ The enhancement can be done by filling some part of missing data or by
adjusting the intensity of images.
• Noise Removal
♦ Noise is commonly present in ancient documents, low quality papers, or poor
printing and writing conditions.
♦ Noisy documents are improved by using smoothing operations which replace
each pixel with some function of the pixel’s neighborhood.
♦ Morphological operations such as dilation and erosion can be used for noise
removal.

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 18/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Processes of OCR Systems: Preprocessing

♦ The use of Gaussian function is one of the most commonly used methods for
noise removal due to its isotropic smoothing.
♦ A 2-dimensional (2D) Gaussian function is defined as:

1 ⎛ x2 + y2 ⎞
g ( x, y ) = exp⎜⎜ − ⎟
2πσ 2
⎝ 2σ 2 ⎟⎠

0.004 0.015 0.026 0.015 0.004

0.015 0.059 0.095 0.059 0.015

0.026 0.095 0.150 0.095 0.026

0.015 0.059 0.095 0.059 0.015

0.004 0.015 0.026 0.015 0.004


Graphical Representation of Discrete Approximation to 2D
2D Gaussian Function Gaussian Function with σ =1.0

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 19/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Processes of OCR Systems: Preprocessing

• Skew and Slant Correction


♦ Skewness is a measure of how well documents are in their expected position
during the scanning process.
♦ It is a global feature of a document and can be detected by projection profiles,
correlation methods, etc.
♦ The skewness of documents is corrected by analyzing the direction and
alignment of the text in images.
♦ Slant refers to the local direction of texts and it is a characteristic feature of
handwriting.
♦ The purpose is to align the paper document with the coordinate system of the
scanner.
♦ In the case of slant correction, the characters in the text would be brought to
a normal position.
• Size Normalization and Thinning
♦ Used to reduce or standardize the feature space representing characters.

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 20/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Processes of OCR Systems: Segmentation

• Segmentation refers to all procedures in which observed patterns in the image are
segregated into units of sub-patterns such as graphical objects, tables, text lines,
words, and characters.
• Handwriting systems usually have difficulties to segment unconstrained text into
individual characters.
♦ With this regard, recognition systems are seen to follow either of the two
paradigms: segmentation-based and segmentation-free.
♦ Segmentation-based approaches assume that the would-be characters are
extracted for further processing such as feature description or recognition.
ƒ This assumption can be feasible for machine-printed documents but it
is not easy for handwritten texts.
♦ Thus, most handwriting recognition systems are designed based on a
segmentation-free paradigm.
ƒ Here, words are considered to be inputs for the system and for this
reason, the segmentation-free technique is also known as holistic
approach.

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 21/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Processes of OCR Systems: Segmentation

Handwritten Amharic Document Image

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 22/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Processes of OCR Systems: Segmentation

Character Segmentation in Handwritten Amharic Document Image [From EthioReader]

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 23/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Processes of OCR Systems: Segmentation

Text Line Detection in Skewed Handwritten Amharic Document Images [From EthioReader]

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 24/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Processes of OCR Systems: Feature Extraction

• Feature extraction involves the measurement or computation of the most relevant


information out of a given raw data.
• Features can be extracted in two ways:
♦ from the structures of patterns in the raw data; or
♦ by applying some transformations on the raw data and then extracting the
features from the patterns.
• A good feature design requires that features should be invariant to various distortions
of the patterns.
• The design of discriminating features is an important factor to any pattern recognition
algorithm being successful in classification.
• The choice of discriminating features mostly depends on the nature of character
structures and writing styles.
• Most commonly used structural features used for character recognition include strokes,
loops, corners, contours, curves, intersection points, end points of lines, etc.
• The aggregate shape of words can also be used as a feature in the case of holistic
approach.
• In addition, online handwriting recognition uses the directional features generated as a
result of pen-tip movements.

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 25/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Processes of OCR Systems: Feature Extraction

• On the other hand, features can be computed by using image transformations.


• The number of features, known as dimensionality, has its own implications on the
complexity of classification.
• As the dimensionality of features linearly increases, the required number of training
samples increases exponentially.
♦ This phenomenon is known as the curse of dimensionality.
♦ Thus, dimensionality reduction is an important component in feature
extraction.
• Extracted features are not equally useful for classification.
♦ A limited yet salient feature set both improves the recognition results and
reduces the complexity of classification.
♦ The process of choosing limited yet good features that lead to efficient
classification is called feature selection.

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 26/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Processes of OCR Systems: Feature Extraction

Low Level Feature Extraction in Offline Recognition

Image of the Ethiopic character “ ም” scanned from a noisy document

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 27/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Processes of OCR Systems: Feature Extraction

Low Level Feature Extraction in Offline Recognition: Gradient Fields

Direction field image of the Ethiopic character “ ም” scanned from a noisy document

• Gradient field is a low level feature describing the change in gray level with direction.
♦ Calculated by taking the difference in value of neighboring pixels, producing a
vector for each pixel.
ƒ Can be computed by convolving the image with a Gaussian and
derivatives of Gaussian operators.
♦ The gradient of pixels is expressed in the range of [0..360] degrees, where
pixels with directions of zero are represented by the red color.

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 28/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Processes of OCR Systems: Feature Extraction

Low Level Feature Extraction in Offline Recognition: Direction Fields

Direction field image of the Ethiopic character “ ም” scanned from a noisy document

• Direction field represents the ideal local direction of pixels characterized by the fact
that the gray value remains constant in one direction (along the direction of lines), and
only changes in the orthogonal direction.
♦ Can be computed by convolving the image with a Gaussian and derivatives of
Gaussian operators, and then by pixel-wise complex squaring.
♦ The direction of pixels is represented in double angle and expressed in the
range of [0..180] degrees, where pixels with directions of zero are represented
by the red color.

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 29/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Processes of OCR Systems: Feature Extraction

Low Level Feature Extraction in Online Recognition

Handwritten Ethiopic Character “ጬ” Captured Online

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 30/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Processes of OCR Systems: Feature Extraction

Low Level Feature Extraction in Online Recognition: Gradient Fields

Time Parameterized Gradient Field for Online Handwritten “ጬ”

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 31/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Processes of OCR Systems: Feature Extraction

Low Level Feature Extraction in Online Recognition: Direction Fields

Time Parameterized Direction Field (Double Angle Representation) for Online Handwritten “ጬ”

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 32/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Processes of OCR Systems: Feature Extraction

Low Level Feature Extraction in Online Recognition: Direction Fields

Time Parameterized Direction Field (Normal Angle Representation) for Online Handwritten “ጬ”

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 33/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Processes of OCR Systems: Feature Extraction

High Level Feature Extraction in Online Recognition: Structural Features

Structural Feature Extraction from Direction Field (Normal Angle Representation)

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 34/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Processes of OCR Systems: Feature Extraction

High Level Feature Extraction in Offline Recognition: Structural Features

Structural Feature Extraction from Direction Fields

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 35/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Processes of OCR Systems: Classification

• The primary goal of any recognition system is to classify unknown data into a set of
known categories.
♦ The basic idea is to take the extracted features and determine what label
(class) it should have with minimal error.
♦ The classes in text recognition systems can be characters in a script or words in
a lexicon.
• Classification is the final stage in recognition systems in which a decision is made on
the recognition of a given input.
• The result of decision made by the system can be:
♦ Correct Classification
ƒ A given input is recognized by the system as a correct class.
♦ Misclassification
ƒ A given input is recognized by the system as a wrong class.
♦ Rejection
ƒ Occurs when the system cannot match the input data with one of the
known classes.

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 36/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Approaches to Recognition

• The field of pattern recognition, of which character recognition is a sub-field, has seen
much progress since its beginnings.
• A large number of different approaches have been proposed to solve pattern
recognition problems.
• However, most of them are grouped into one of the following four important
recognition techniques:
♦ Template matching
♦ Structural and syntactic
♦ Statistical
♦ Artificial neural network
• Despite their strengths to solve a particular problem, not a single approach is found to
be optimal for all pattern recognition problems.
• Each of these recognition techniques have their own advantages and limitations, and
hybrid systems draw upon the synergy effect of two or more techniques.
• Hybrid methods aim at combining the advantages of different paradigms within a single
system.

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 37/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Approaches to Recognition: Template Matching

• Template matching is one of the simplest and earliest approaches of character


recognition techniques where the character to be recognized is matched against a
database of stored templates of characters.
♦ Template matching assumes very small intra-class variability.
♦ Templates of characters are usually represented by features such as image
pixels, samples, curves, or directional properties of pixels.
• Recognition is made by measuring the correlation of unknown input and stored
templates.
• Template matching is effective to recognize standardized machine printed characters.
♦ Its applicability for handwriting recognition or general purpose machine-printed
character recognition is limited since it needs a stored template of all variants.
• A few improvements have been made on the original rigid template matching technique.
♦ As it becomes difficult to include various types of sample templates, a
representation known as deformable templates has been used.
♦ Deformable templates provide a simple and compact representation of various
sample characters or words.
♦ Thus, a generic deformation model can be used to model large set of classes
using a few examples from each class.

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 38/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Approaches to Recognition: Structural and Syntactic Approach

• Syntactic and structural techniques utilize structural features and syntactic rules to
recognize patterns (characters).
♦ They are used for recognition of complex patterns which are represented in
terms of the interrelationships between simple sub-patterns called primitives.
♦ Large number of complex patterns can be described by a small number of
primitives and their spatial relationships.
♦ This provides a description of how a given character is constructed from the
given set of primitives.
• Recognition is made by parsing the sub-patterns according to a predefined rule and
grammar, and the recognition accuracy depends on the successful extraction of
primitives and their relationships.
• The choice of primitives is application dependent and relies on the general
understanding of the language, the script as well as the technical and mathematical
model building.
• The relationships of primitive structural features are represented by means of symbolic
data structures such as strings, trees, and graphs.

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 39/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Approaches to Recognition: Statistical Approach

• Statistical approach is based on statistical characterizations of patterns, assuming that


the patterns are generated by a probabilistic system.
♦ Each pattern is represented in terms of d features and is viewed as a point in
d-dimensional space.
♦ For effective representation of the patterns, the features of each pattern should
form disjoint regions in the d-dimensional feature space.
• Statistical approach is the most intensively studied technique which represents each
pattern in terms of features or measurements
♦ Many character and word recognition systems make use of this technique.
• In statistical approach, the recognition system involves two operations:
♦ Training (learning)
ƒ The appropriate features representing the input patterns are extracted
and the classifier is trained to partition the feature space.
♦ Classification (testing)
ƒ Features are extracted from the unknown input and the trained
classifier assigns the input to one of the pattern classes under
consideration based on the measured features.

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 40/42
Types of OCR Systems
Modes of Language Representation
General Architecture
Speech Recognition
Processes of OCR Systems
Optical Character Recognition
Approaches to Recognition

Approaches to Recognition: Neural Network Approach

• Artificial neural network (ANNs) are recently introduced pattern techniques inspired by
neuronal operations in biological systems.
♦ Although established in the 1940s, ANNs have been considerably applied in
the field of pattern recognition only since the 1980s.
• ANNs are a large number of highly interconnected processing elements called neurons,
which are organized into three layers:
♦ Input Layer
ƒ Takes data of the unknown pattern
♦ Hidden Layer
ƒ Contains many of the neurons in various interconnected structures
hidden from the outside view.
♦ Output Layer
ƒ Provides an interface for generating the recognition result.
• ANNs are known to be more effective on handwritten character recognition.
• Samples, pixels, or features can be used as inputs for the neural network system.
• Like statistical classification methods, neural networks require training of samples from
which they learn about how new samples are classified.

Department of Computer Science, Addis Ababa University Lecture 08: Related Fields 41/42
TOC: Course Syllabus

Previous: Applications of NLP

Current: Related Fields


Next:

You might also like