You are on page 1of 42

Chapter – 8

Applications of NLP – Part II

Department of Computer Science


School of Computing
Dire Dawa Institute of Technology
Dire Dawa University

Te ssf u Ge t e ye (Ph D)

2020/2021-Se me st er- II
Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

Automatic Speech Recognition

 Human-machine and human-human interactions are being significantly improved using


different Human Language Technologies (HLTs) include:

 Text-based HLTs: Text Summarization, Machine Translation, and others.

 Speech-based HLTs: Automatic Speech Recognition (ASR), Speech Synthesis,


Speech Translation and others.
 Speech-based HLTs are more convenient in communication efficiency, restriction, and
accuracy.

 ASR systems are used for translating speech sequences into the corresponding textual
representation.

Speech Text

Department of Computer Science, SC, DDIT, DDU Applications of NLP 2/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

Automatic Speech Recognition

 Building blocks of ASR systems:

Department of Computer Science, SC, DDIT, DDU Applications of NLP 3/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

Automatic Speech Recognition

 ASR is defined as:

 Acoustic and Lexical models, which compute the

 Language model, which computes the .

 - sequence of word

 - acoustic observation sequence

Department of Computer Science, SC, DDIT, DDU Applications of NLP 4/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

Automatic Speech Recognition


 The major building blocks of ASR system are:

 Acoustic Model

 Language Model

 Lexical Model

 Decoder
 Acoustic Model:

 It is used in ASR to represent the relationship between an audio signal and the
phonemes or other linguistic units that make up speech.

 The model is learned from a set of audio recordings and their corresponding
transcripts.

 It is typically deals with the raw audio waveforms of human speech, predicting
what phoneme each waveform corresponds to, typically at the character or
subword level. 

 It defines the probability that a basic sound unit, or phoneme has been uttered.

 It represents the relationship between the speech signal and the linguistic or
acoustic units in the language.

 It can be developed via different approaches.

Department of Computer Science, SC, DDIT, DDU Applications of NLP 5/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

Automatic Speech Recognition

 Language model:

 It defines the probability of the occurrence of a word or a word sequence.

 It provides context to distinguish between words and phrases that sound


phonetically similar.

 It can be developed via statistical approach (n-gram) or neural network


approach.

 Lexical Model:

 It is called Vocabulary or Lexicon Model.

 It contains information of how words are formed from phoneme sequences.

 It contains a list of words with their equivalent possible pronunciations in the


language.

 Example: አበበ = አ ብ ኧ ብ ኧ or አበበ = አ ብኧ ብኧ

Department of Computer Science, SC, DDIT, DDU Applications of NLP 6/59


Speech Recognition Speech Recognition Processes
v
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

Automatic Speech Recognition

 Decoder:

 It combines acoustic, language, and lexical models given the feature vector
sequence and the hypothesized word sequence, and outputs the word sequence
with the highest score as the recognition result.

Department of Computer Science, SC, DDIT, DDU Applications of NLP 7/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

Types of Automatic Speech Recognition

Department of Computer Science, SC, DDIT, DDU Applications of NLP 8/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

Types of ASR based on speech

 Isolated speech

 Isolated word recognition system which recognizes single utterances i.e. single
word.

 It is suitable for situations where the user is required to give only one-word
response or commands, but it is very unnatural for multiple word inputs.
 Connected words

 A connected words system is similar to isolated words, but it allows separate


utterances to be “run-together‟ with a minimal pause between them. Utterance
is the vocalization of a word or words that represent a single meaning to the
computer.
 Continuous speech

 Continuous speech recognition system allows users to speak almost naturally,


while the computer determines its content.

 Basically, it is computer dictation. In this closest words run together without


pause or any other division between words. Continuous speech recognition
system is difficult to develop.

Department of Computer Science, SC, DDIT, DDU Applications of NLP 9/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

Types of ASR based on speech

 Spontaneous speech

 Spontaneous speech recognition system recognizes the natural speech.

 Spontaneous speech is natural that comes suddenly through mouth.

 An ASR system with spontaneous speech is able to handle a variety of natural


speech features such as words being run together. Spontaneous speech may
include mispronunciation, false-starts and non-words.

 Highly Conversational Speech

Department of Computer Science, SC, DDIT, DDU Applications of NLP 10/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

Types of ASR based on Size of Vocabulary

 The size of vocabulary of ASR system can affect:

 The complexity, processing and the rate of recognition of ASR system.

 ASR systems are classified based on the vocabulary as:

 Small Vocabulary - 1 to 1000 words

 Medium Vocabulary - 1001 to 10000 words

 Large Vocabulary- 10001 to 100,000 words

 Very-large vocabulary - More than 100,001 words

 Unlimited vocabulary : Contain all potential words of the Language.

Department of Computer Science, SC, DDIT, DDU Applications of NLP 11/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

Types of ASR based on Speaker model

 Speaker Dependent Models

 Speaker dependent systems are developed for a particular type of speaker.

 They are generally more accurate for the particular speaker, but could be less
accurate for other type of speakers.

 These systems are usually cheaper, easier to develop and more accurate.

 But these systems are not flexible as speaker independent systems.

  Speaker Independent Models

 Speaker Independent system can recognize a variety of speakers without any


prior training.

 A speaker independent system is developed to operate for any particular type


of speaker.

 Its drawback is that it limits the number of words in a vocabulary.

 Implementation of Speaker Independent system is the most difficult.

 It is expensive and its accuracy is lower than speaker dependent systems.

Department of Computer Science, SC, DDIT, DDU Applications of NLP 12/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

Difficulties with ASR

 Spoken language is not equal to written Language


 Noise
 Body Language
 Channel Variability
 Speaker Variability

 Speaking Style

 Speaker Sex

 Dialects

Department of Computer Science, SC, DDIT, DDU Applications of NLP 13/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

ASR Approaches

Department of Computer Science, SC, DDIT, DDU Applications of NLP 14/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

ASR Approaches: Acoustic Phonetic Approach

 It is also called rule-based approach.

Department of Computer Science, SC, DDIT, DDU Applications of NLP 15/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

ASR Approaches: Acoustic Phonetic Approach

 Use knowledge of phonetics and linguistics to guide search process


 Usually some rules are defined expressing everything (anything) that might help to
decode:
 Phonetics, phonology, phonotactics

 Syntax

 Pragmatics
 Typical approach is based on “blackboard” architecture:
 At each decision point, lay out the possibilities

 Apply rules to determine which sequences are permitted.

 Poor performance due to:


 Difficulty to express rules

 Difficulty to make rules interact

 Difficulty to know how to improve the system

Department of Computer Science, SC, DDIT, DDU Applications of NLP 16/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

ASR Approaches: Pattern Recognition Approach

Department of Computer Science, SC, DDIT, DDU Applications of NLP 17/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

ASR Approaches: Pattern Recognition Approach


 Feature measurement: Filter Bank, LPC, DFT, ...

 Pattern training: Creation of a reference pattern derived from an averaging technique.

 Pattern classification: Compare speech patterns with a local distance measure and a
global time alignment procedure (DTW).

 Decision logic: similarity scores are used to decide which is the best reference pattern.
 The pattern recognition approach has two steps-namely, training of speech
patterns, and recognition of patterns by way of pattern Classifier.
 The pattern recognition approach can be: Template based or Stochastic or Statistical
based approach.
 This approach contains many techniques such as :
 Dynamic Time Warping (DTW)

 Vector Quantization (VQ)

 Support Vector Machine (SVM)

 Polynomial Classifier

 Hidden Markov Model (HMM)

Department of Computer Science, SC, DDIT, DDU Applications of NLP 18/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

ASR Approaches: Pattern Recognition Approach: Template Matching

 An anthology of prototypical speech patterns is amassed as reference patterns


characterizing the dictionary of candidate words.
 After that, recognition is performed by harmonizing an unidentified spoken word with
each of the reference templates and chooses the type of the best identical pattern.
 The templates for all the words are configuring.

 Test pattern, T, and reference patterns, {R1, …, Rv}, are represented by sequences of
feature measurements.
 Pattern similarity is determined by aligning test pattern, T, with reference pattern, Rv,
with distortion D(T, Rv)
 Decision rule chooses reference pattern, R*, with smallest alignment distortion D(T,
R*).

 Dynamic time warping (DTW) is used to compute the best possible alignment warp,
between T and Rv, and associated distortion D(T, Rv).

Department of Computer Science, SC, DDIT, DDU Applications of NLP 19/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

ASR Approaches: Pattern Recognition Approach: Statistics-based Approach

 Can be seen as extension of template-based approach, using more powerful


mathematical and statistical tools.
 Sometimes seen as “anti-linguistic” approach.
 Fred Jelinek (IBM, 1988): “Every time I fire a linguist my system improves”
 Collect a large corpus of transcribed speech recordings.
 Train the computer to learn the correspondences (“machine learning”)
 At run time, apply statistical processes to search through the space of all possible
solutions, and pick the statistically most likely one.
 This approach includes:
 HMM

 SVM

 DTW and Bayesian Classification approaches


 The most popular stochastic approach now a day is hidden Markov modelling (HMM).
 An HMM is characterized by a finite state Markov model and a set of output
distributions.
 The transition parameters in the Markov models are temporal variability’s, while the
parameters in the output distribution model are spectral variability’s.

Department of Computer Science, SC, DDIT, DDU Applications of NLP 20/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

ASR Approaches: Pattern Recognition Approach: Statistics-based Approach

HMM based Speech Recognition Architecture

Department of Computer Science, SC, DDIT, DDU Applications of NLP 21/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

ASR Approaches: Pattern Recognition Approach: Statistics-based Approach

 ASR system developed using the statistical approach as a hybrid way of GMM-HMM.
 This approach is also called conventional statistical approach.
 It is the widely used approach for ASR for more than four decades.
 In GMM-HMM approach:
 GMM – is used for modeling the spectral features of the speech signal.
- is used for estimating the emission probabilities or observation
likelihoods of the HMM states via the expectation-maximization
algorithm.

 HMM – is used for modeling the temporal features of the speech signal with
respect to the linguistic units in the development of the ASR system for
a particular language.
- is used for computing the probabilities of observation sequences using a
forward algorithm, for finding out the optimal sequences of HMM states

using Viterbi algorithm.

Department of Computer Science, SC, DDIT, DDU Applications of NLP 22/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

ASR Approaches: Pattern Recognition Approach: Statistics-based Approach

 Limitation of this approach:


 GMM is unable to model the temporal characteristics of speech,
 GMM does not model the high-dimensional speech features
 GMM is statistically inefficient for modeling data that lie on or near a nonlinear
manifold in the data space.

 HMM has relatively poor discrimination power.


 HMM Ignore any long term dependencies. These make HMM inaccurate but
simple to implement.

Department of Computer Science, SC, DDIT, DDU Applications of NLP 23/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

ASR Approaches: Pattern Recognition Approach: Neural Network Approach

 The ANN approach attempts to mechanize the recognition procedure according to the
way a person applies its intelligence in visualizing, analyzing and finally making a
decision on the measured acoustic features.

 The ANN approach is a hybrid of the acoustic phonetic approach, pattern


recognition approach and a feature extractor approach.

 The various methods or techniques in Artificial Neural Network are:


 Time Delay Neural Network (TDNNs)
 Multi-layer Perceptron (MLP)
 Radial basis Functions (RBF)
 Recurrent Neural Network (RNN)
 Self-Organizing Map (SOM)

Department of Computer Science, SC, DDIT, DDU Applications of NLP 24/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

ASR Approaches: Artificial Intelligence: Deep Learning Approach

 The deep neural networks based acoustic modeling technique reduce the limitations of
GMM-HMM ASR system.
 Deep neural network such as feed forward and recurrent neural network are applied :
 For acoustic modeling in ASR as a feature extractor for GMM-HMM system
 For replacing GMM to develop hybrid neural networks-HMM systems.
 For developing ASR in End-to-End approach

 Feed Forward DL Networks:

 In these networks, the information always travels in one direction (from the
input layer to the output layer via the hidden layers)and never goes backward.
 Those networks include:
 DNN

 CNN

 TDNN networks.

Department of Computer Science, SC, DDIT, DDU Applications of NLP 25/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

ASR Approaches: Artificial Intelligence: Deep Learning Approach

 Recurrent DL Networks

 RNNs are called cyclic networks with self-connections from the previous time
steps used as inputs to the current time steps.
 These networks capture a dynamic history of information about the input
feature sequences and are less influenced by temporal distortion.
 Unlike the feed-forward DL networks, RNNs can take a long sequence of input
features and generated a long sequence of output values .

 Consequently, these networks are better to model long-term dependencies


among frames of input features .

 The common RNNs are:


 conventional RNN
 LSTM
 GRU

Department of Computer Science, SC, DDIT, DDU Applications of NLP 26/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

ASR Approaches: Artificial Intelligence: Deep Learning Approach

 Data Sharing DL Networks

 Data sharing DL networks are vital for minimizing the overfitting problems of
unilingual feed-forward and RNNs in low-resource ASRs.
 These networks include multitask, multilingual, and weight-transfer learning
techniques.
 Multitask learning is used to improve the overall performance of a learning task
by jointly learning multiple associated tasks.
 This helps to transfer knowledge between or among tasks if the tasks
are associated with each other and share an internal representation
by joint learning.
 Example: Train ASR system for Amharic and Chaha languages.

Department of Computer Science, SC, DDIT, DDU Applications of NLP 27/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

ASR Approaches: Artificial Intelligence: Deep Learning Approach

 Multilingual learning is a special type of MTL in which the training of multiple


languages jointly without specifying the primary and ancillary languages.
 All languages have the same impact on the training of multilingual DL
models.
 It is not mandatory that the languages are related to each other , and
thus, this technique allows for training the DL models using several
training corpora from multiple languages.
 It is trained in two ways, shared phone sets and shared hidden layers
among languages.
 Example: Train ASR of multiple languages jointly

Department of Computer Science, SC, DDIT, DDU Applications of NLP 28/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

ASR Approaches: Artificial Intelligence: Deep Learning Approach

 Weight-transfer technique: considers two major language classes: source and


target.

 Source languages are widely high-resource, which have sufficient training corpus
for training the DL models, while target language is usually a low-resource, with
limited training corpus that is insufficient to train DL models.
 This technique allows transferring the weights from DL models trained via the
source languages to train the target language.
 The hidden layers of the source DL models are trained using either unilingual or
multilingual training corpora, and then the output layers are discarded, and
replaced with a new target language output layer. Then, the weights of nodes in
the added output layer and biases are randomly initialized.
 Finally, either all the hidden layers are made fixed and we train only the added
output layer or we retrain all the hidden layers and the added output layer using
a small training dataset of the target language.
 This technique is important for developing ASR systems for languages that have
very limited training datasets, have no known phone sets , and have no well-
defined orthographic systems .

Department of Computer Science, SC, DDIT, DDU Applications of NLP 29/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

ASR Performance Evaluation

 The performance of speech recognition is specified in terms of accuracy and speed.


 Accuracy is measured in terms of performance accuracy which is known as Word Error
Rate (WER).
 Speed is measured with the Real Time Factor (RTF).
 Word Error Rate (WER)

 It is a common metric of the speech recognition performance. As recognized


word sequence have a different length from the reference word sequence, there
is difficulty in measuring performance.

 Where S - is number of substitutions

D - is number of deletions
I - is number of insertions
N - is number of words in the reference.

Department of Computer Science, SC, DDIT, DDU Applications of NLP 30/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

ASR Performance Evaluation

 Sometimes word recognition rate (WRR) is used instead of WER while describing
performance of speech recognition.

 Speed

 It is measured by real time factor.

 If it takes time T to process an input of duration D, then real time factor is


defined by:

 RTF ≤1 implies real time processing

Department of Computer Science, SC, DDIT, DDU Applications of NLP 31/59


Speech Recognition Speech Recognition Processes
Types of Automatic Speech Recognition
Optical Character Recognition
Difficulties with ASR
Speech Recognition Approaches
Speech Recognition Performance Evaluation
NLP in Speech Recognition

NLP in Speech Recognition

 NLP concepts which are very fundamental for ASR include:

 Word pronunciation – considering homonymy (Homophones) - [Acoustic and


lexical models]

 Language syntax – [language model]

Department of Computer Science, SC, DDIT, DDU Applications of NLP 32/59


Speech Recognition OCR Definition
Optical Character Recognition OCR Phases
OCR approaches
OCR Performance Evaluation
NLP in OCR

Optical Character Recognition

 Handwriting recognition is classified into two types:

 Off-line Handwritten Recognition

 On-line Handwritten recognition

 Off-line handwriting recognition:

 It involves automatic conversion of text in an image into letter codes which are
usable within computer and text-processing applications.

 It is more difficult, as different people have different handwriting styles.

 On-line character recognition:

 It deals with a data stream which comes from a transducer while the user is
writing.

 The typical hardware to collect data is a digitizing tablet which is electromagnetic


or pressure sensitive.

 When the user writes on the tablet, the successive movements of the pen are
transformed to a series of electronic signal which is memorized and analyzed by
the computer.

Department of Computer Science, SC, DDIT, DDU Applications of NLP 33/59


Speech Recognition OCR Definition
Optical Character Recognition OCR Phases
OCR approaches
OCR Performance Evaluation
NLP in OCR

Optical Character Recognition

 Optical Character Recognition (OCR) is a field of research in pattern recognition,


artificial intelligence and machine vision, signal processing.

 OCR is usually referred to as an off-line character recognition process to mean that the
system scans and recognizes static images of the characters.

 It refers to the mechanical or electronic translation of images of handwritten character


or printed text into machine code without any variation.
 It is used to convert handwritten, typed, scanned text, or text inside images to
machine-readable text.

Department of Computer Science, SC, DDIT, DDU Applications of NLP 34/59


Speech Recognition OCR Definition
Optical Character Recognition OCR Phases
OCR approaches
OCR Performance Evaluation
NLP in OCR

OCR Phases

 OCR has the following major phases:

 Digitization

 Preprocessing

 Segmentation

 Feature extraction

 Classification and Recognition

 Post processing

Department of Computer Science, SC, DDIT, DDU Applications of NLP 35/59


Speech Recognition OCR Definition
Optical Character Recognition OCR Phases
OCR approaches
OCR Performance Evaluation
NLP in OCR

OCR Phases

 General Architecture of OCR

Department of Computer Science, SC, DDIT, DDU Applications of NLP 36/59


Speech Recognition OCR Definition
Optical Character Recognition OCR Phases
OCR approaches
OCR Performance Evaluation
NLP in OCR

OCR Phases

 Digitization

 It is the process of converting a paper-based handwritten, typed, scanned text,


or text inside images documents into electronic format using scanner or camera
to produce an image files.

 Preprocessing

 The input image can be preprocess before segmentation.

 The preprocessing tasks are binarization and size normalization.

 Binarization is converting the gray-scale or color images into binary mages for
reducing the storage space and for increasing processing speed.

 Size normalization is making the characterize normalized for reducing the size
varieties.

Department of Computer Science, SC, DDIT, DDU Applications of NLP 37/59


Speech Recognition OCR Definition
Optical Character Recognition OCR Phases
OCR approaches
OCR Performance Evaluation
NLP in OCR

OCR Phases

 Segmentation:

 The position of the character in the image is found out and the size of the image
is normalized to that of the template size.

 Segmentation can be external and internal.

 External segmentation is the isolation of various writing units, such as


paragraphs, sentences or words.

 Internal segmentation an image of sequence of characters is decomposed into


sub images of individual character.
 Feature extraction:

 Features of individual character are extracted.

 The performance of each character recognition system that depends on the


features that are extracted. The extracted features from input character should
allow classification of a character in a unique way.

 For example: diagonal features, intersection and open end points features,
transition features, zoning features, directional features, parabola curve fitting–
based features, and power curve fitting–based features in order to find the
feature set for a given character.

Department of Computer Science, SC, DDIT, DDU Applications of NLP 38/59


Speech Recognition OCR Definition
Optical Character Recognition OCR Phases
OCR approaches
OCR Performance Evaluation
NLP in OCR

OCR Phases

 Recognition and Classification:

 Comparing the unknown images of symbols (feature extraction) with predefined


stored samples in order to identify their type, its determines the region of
feature space in which an unknown pattern falls.

 Post-processing:

 It is the final stage in OCR system, and the most important stage.

 It is working for checking the result text from previous stage, and correct it to
make sure it is free from errors.

Department of Computer Science, SC, DDIT, DDU Applications of NLP 39/59


Speech Recognition OCR Definition
Optical Character Recognition OCR Phases
OCR approaches
OCR Performance Evaluation
NLP in OCR

OCR - Recognition and Classification Techniques


Techniques
 The commonly used OCR approaches are:
 Optimum statistical classifiers: Includes

 Support Vector Machines (SVM)

 Principal Component Analysis (PCA),

 Kernel Principal Component Analysis (KPCA) and others

 SVM are a group of supervised learning methods that can be applied to


classification. In a task of Classification usually data is divided into training and
testing sets. The aim of SVM is to produce a model, which predicts the target
values of the test data. Different types of kernel functions of SVM are: Linear
kernel, Polynomial kernel, Gaussian Radial Basis Function (RBF) and Sigmoid.

 Neural Networks/Deep learning

 OCR, which makes recognition process more application-aware using into a


neural network.

 Common deep learning approach which is effective for OCR is CNN models.

Department of Computer Science, SC, DDIT, DDU Applications of NLP 40/59


Speech Recognition OCR Definition
Optical Character Recognition OCR Phases
OCR approaches
OCR Performance Evaluation
NLP in OCR

OCR Performance Evaluation

 Recognition rate

 The proportion of correctly classified characters.

 Rejection rate

 The proportion of characters which the system was unable to recognize.

 Rejected characters can be flagged by the OCR system, and are therefore easily
retraceable for manual correction.

 Error rate

 The proportion of characters erroneously classified.

 Misclassified characters go by undetected by the system, and manual inspection


of the recognized text is necessary to detect and correct these errors.

Department of Computer Science, SC, DDIT, DDU Applications of NLP 41/59


TOC: Course Syllabus

Previous: Approaches to NLP

Current: Applications of NLP-Part-II


Next:
End of NLP Course

You might also like