Professional Documents
Culture Documents
Dr.Mallikarjun Hangarge
– Script is a set of symbols and rules used to express or
convey the information in a graphic form.
4/8/2018 2
– Script is independent
of language
– Different languages
may use the same
script
– For example, Sanskrit,
Marathi, and Hindi
use the Devnagari
script
4/8/2018 3
Scripts Languages Regions
Hindi, Sanskrit,
Devanagari North India
Marathi, Nepali
Gujarati Gujarati North India
Gurumukhi Punjabi North India
Bengali,
Bangla North India
Assamese
Oriya Oriya North India
Telugu Telugu South India
Kannada Kannada South India
Tamil Tamil South India
Malayalam Malayalam South India
Urdu Urdu North India
Roman English India
4/8/2018 4
4/8/2018 5
Script Identification
Devanagari
Roman
Input Image
4/8/2018 6
Indian Script Character Shape
Properties
Malayalam
Kannada Tamil
• Primary aim of the proposed system is to identify the
script of a word
INPUT DOCUMENT
Binarization OUTPUT
Skew Detection
Segmentation
4/8/2018 8
Input Image Binarization : Otsu’s Method
4/8/2018 10
D1= 0.2690 0.6865 -0.1663
STD
0.9511 -0.1382 0.4253 0.8042 0.5238 0.4183
-0.8784 -0.2854 0
-0.5878 0 0
Mean
0.5821
4/8/2018 11
1DCT
4/8/2018 ICECIT_2012@SRIT,ANATAPUR,A.P 12
Classification
• LDA:- It preserves class discriminating information to
the higher extent by reducing dimensionality of
feature space. It also maximizes separability
between the classes by maximizing the ratio of
between-class variance to the within class variance.
• KNN:- To comprehend the performance of LDA,
another traditional classifier i.e., K-NN is used.
Basically K-NN stores the training data X. Then finds
the minimum D distance between training sample X
and testing pattern Y using
4/8/2018 13
Experiments
• There is no publicly available dataset of Indic script at
present. Therefore, a dataset of 9000 handwritten
text words of six scripts, namely Roman (R),
Devanagari (D), Kannada (K ), Telugu (TE ), Tamil (TA )
Each script is written by a different set of 20 writers.
Each writer has written 75 words.
• The writers are asked to write the text provided for
them on a A4 size paper. These papers are digitized
by a scanner with a resolution of 300 dpi.
4/8/2018 14
Evaluation Protocol
• To evaluate the performance of the method, K-fold
cross validation (CV) has been implemented unlike
traditional dichotomous classification. In K-fold CV,
the original sample for every dataset is randomly
partitioned into K sub-samples. Of the K sub-
samples, a single sub-sample is used for validation,
and the remaining K − 1 sub-samples are used for
training.
• This process is then repeated for K-folds, with each
of the K sub-samples used exactly once. Eventually, a
single value results from averaging all. In our tests,
we use K = 10.
4/8/2018 15
BI-SCRIPT IDENTIFICATION RESULTS IN % WITH LDA (LOWER
TRIANGLE RESULTS ARE FROM DDI AND UPPER TRIANGLE ARE
FROM D-DCT).
4/8/2018 16
BI-SCRIPT IDENTIFICATION RESULTS IN % WITH KNN (LOWER
TRIANGLE RESULTS ARE FROM DDI AND UPPER TRIANGLE ARE
FROM D-DCT).
4/8/2018 17
TRI-SCRIPT IDENTIFICATION (IN %) WITH LDA.
4/8/2018 18
MULTI-SCRIPT IDENTIFICATION (IN %) WITH LDA .
4/8/2018 19
Horizontal features of Indian Roman
and Kannada script
4/8/2018 20
Horizontal features of IAM Roman and Kannada
Script
4/8/2018 21
C-DCT Versus D-DCT
4/8/2018 22
Observations
• The native writer of a specific script mimics his style
of writing while writing non native scripts. It is
experimentally validated with Indian scripts and
Roman script.
4/8/2018 23