Professional Documents
Culture Documents
net/publication/221389517
CITATIONS READS
22 2,653
3 authors:
Bhabatosh Chanda
Indian Statistical Institute
213 PUBLICATIONS 2,761 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Pulak Purkait on 31 March 2015.
1
(a) (b) (c)
2
The features are extracted from each cell of a word to
capture the arrangement of strokes in a word. In this
investigation we have tried different sizes of cell; how-
ever a cell of size 18 × 18 gives reasonably good result.
So for the entire investigation we have taken cells of the
same dimension. Figure 3. Four structuring element (a)-(d)
with center as the first element used to
3.1.1 Directional Opening find the images containing the portion of
skeleton in the respective direction.
The pre-processed image is opened in four direction
using a ‘line’ structuring element of a size of twice
the average width of the strokes and the normalized
3.1.4 k-curvature features
mass of the opened image in each cell is taken as a
feature. Let I be the pre-processed word image and The slope of chain code at any point is a multiple of
[S 1 , S 2 , S 3 , S 4 ] be the four ‘line’ structuring element 45◦ , and the slope of a crack code is a multiple of 90◦ .
in four direction (horizontal, vertical and two diagonal In order to arrest finer variation in slope, we use some
directions). Let N be the number of cells in which a type of smoothing over, say, k such chain code to define
particular word has been divided and mj i be the mass k-curvature.
of a word in j th cell after opening I with the structuring Here left and right k-slopes at a point P on a curve are
element S i . The normalized mass Mj i of that partic- defined as the slopes of the line joining P to the points
ular word in j th cell after opening with S i is given as k steps away along the curve on each side of P, and the
mij
Mji = maxj=1,···,N (mij )
. Thus for a word with number k-curvature of P as the difference between its left and
of cells N, we have a directional opening feature vector right k-slopes. In this definition it is assumed that the
of length 4N. curve from P to the fixed k-steps away can be approx-
imated by a straight line segment. This assumption is
better satisfied if k is small. On the other hand, in the
3.1.2 Directional Closing case of small k, the slope may be influenced by small
perturbation due to noise. So value of k is selected com-
Instead of opening mentioned above, the pre-processed promising these two conflicting characteristics of slope
word image is closed with the same ‘line’ structuring measure.
elements and normalized mass of the closed image in We calculate k-slopes at each point of the morpho-
each of the cell is taken as a feature. Normalization is logical skeleton of the object as the angle with respect
also done in the same way. Thus for a word with number to horizontal axis, and hence the k-curvature as acute
of cells N, a directional closing feature vector is also of angle between left and right segment at P.
length 4N. We take k = 4 and observe that the value of k-
curvature in most of the pixels lies between 90◦ and
3.1.3 Directional Erosion 180◦ , with a bias towards 180◦. This suggests to finer
bias in the interval after 90◦ at the time of quantization.
Another morphological feature is obtained from the pre- So we quantize the interval [0◦ , 180◦ ] into five unequal
processed skeleton image X. Skeleton image is eroded bins as follows : 0◦ ≤ θ < 90◦ , 90◦ ≤ θ < 120◦,
by the diagonal structuring elements shown in figure 120◦ ≤ θ < 140◦ , 140◦ ≤ θ < 160◦ and 160◦ ≤ θ ≤
3(a)-3(d), to get the number of co-occurrences of skele- 180◦ . We make histogram on the basis of this distri-
tal pixels in four different direction. Here structuring el- bution. Now there are four possibility as shown in the
ements are S 1 = {(0, 0), (1, 1)}, S 2 = {(1, 0), (0, 1)}, figure 4(a)-(d) which are different in terms of charac-
S 3 = {(0, 0), (1, 0)}, and S 4 = {(0, 0), (0, 1)}. Each ter shape but their histograms of k-curvature are nearly
image X̂i = XΘS i , i = 1, ..., 4 contains the pixels at identical.
which a particular directional co-occurrences on skele- So we need to distinguish among the histogram of
ton occur, where Θ is the morphological erosion op- curve segments shown in Figure 4. The first two i.e. (a)
eration. Then we divide each image X̂i into N cells and (b) and the rest two i.e. (c) and (d) can be sepa-
and make a normalized histogram of directional co- rated by the difference of the row number and column
occurrences (sji , j = 1, ..., N ) of portion of skeleton number of the end points of the curve segments. Then
in each cell. In this case also we get a feature vector of they may be further distinguished by checking the loca-
length 4N. tion of mid-point with respect to the straight line joining
3
Table 3. Inter writer variation of characters
’pra’ by different writers
Figure 4. Four possibilities (a)-(d) where (a) (b) (c) (d) (e) (f)
histogram of k-curvature are identical.
Writer Identification is a one to many comparison From Table 1, One can see that the directional open-
between a query document and the database documents ing feature set outperforms other feature sets. A max-
associated with different writers. Features from a docu- imum accuracy of 82.70% is obtained for word with
ment is compared to features from the document of all word index 6. For that particular word (word index 6)
the writers in the database and the inference is made every feature set gives the best performance in their re-
against the comparisons. For writer identification we spective class.
have followed ‘Leave-one out’ strategy. Out of 5 doc- From the table, it is also evident that peculiarities of a
uments from each writer, one document is kept out for word is more important than the length of the word. The
testing and remaining 4 documents are used as training longest words with word index 5 and 2 could not per-
data. All the combinations of 1 document out was tried form well in comparison to the words with word index
and an average accuracy is reported in results and dis- 6 and 1 having more peculiar characters (first character
cussions section. i.e. ‘pra’ of these words)with lots of variations within
For Writer identification nearest neighbor looks an writers. Table 3 shows inter writer variation of charac-
obvious choice since the number of writers (class) may ter ’pra’ by six different writers. Variations shown in
vary. We use Euclidean distance as similarity measure Table 3 support the higher accuracy of word with word
between different words. The writer of the query or index 6 and 1.
test document is identified if the distance to the near- Table 2 depicts the performance of nearest neighbor
est neighbor is less than a predefined threshold. when the different number of words is combined to take
decision on a document according to Section 3.4. Each
of the four feature sets could link more than 98% doc-
3.3 Combining word-level performance for a uments with their respective writers when ten words
document is taken into account. Directional opening feature is
the highest performer over all the four feature sets that
Forensic document examiners, while examining the could link all the documents with their respective writ-
documents for writer identification look for individu- ers. Merely 5 words in combination gives an accuracy
ality of different words and make a decision based on of more than 90% with all the four feature set.
the combined performance (individuality) of different
words in a document. Following a similar strategy to 5 Conclusion and Future works
link a writer from a document we also combine the per-
formances of different words in a document. In the present study we have proposed four sets of
Let pi j be the probability of a word wj of a particular feature namely directional opening, directional clos-
document A to be of written by the writer i. The proba- ing, directional erosion and k-curvature features for the
bility of a document A having words wj , j = 1, 2, · · · , k writer recognition for handwritten Telugu documents.
to be of writer i can be given as After noise removal, 10 words are segmented from each
4
Table 1. Identification results on individual words
word index word Directional opening Directional closing Directional erosion k-curvature
1 75.00 67.47 69.09 61.82
2 68.18 65.45 46.36 63.64
3 78.18 76.36 53.63 62.72
4 67.27 69.09 60.00 57.27
5 60.90 60.90 52.72 72.73
6 82.72 82.72 72.78 72.73
7 69.09 64.55 60.90 68.18
8 72.73 68.18 67.27 65.45
9 72.73 70.00 44.54 56.36
10 70.00 60.90 53.63 57.27
document. Segmented word of similar content is size [2] B. Chanda and D. D. Majumdar. Digital Image Process-
normalized and divided into a number of cells of fixed ing and Analysis. Prentiece Hall of India, New Delhi,
size. All the feature sets are extracted from 10 different 2005.
words of each of the document in the data set. The pro- [3] R. A. Huber and A. M. Headrick. Handwriting Identifi-
posed features give an encouraging results on both word cation: facts and fundamentals. CRC Press, 1999.
[4] J. S. Kelly and B. S. Lindblom. Scientific Examina-
and document level but the directional opening outper- tion of Questioned Documents. Taylor and Francis, New
forms other feature set. York, 2006.
We will continue our investigation as future work to [5] A. S. Osborne. Questioned Documents. Boyd Printing
see the usefulness of the proposed features on a larger Co., New York, 1929.
database. [6] N. Otsu. A threshold selection method from gray-level
histogram. IEEE Trans. System, Man and Cybernatics,
Acknowledgements 6:62–66, 1979.
[7] R. Plamondon and G.Lorrete. Automatic signature veri-
fication and writer identification -the state of art. Pattern
We would like to acknowledge Venkatappaiah Ku- Recognition, 22(2):107–131, 1989.
rapati for collection and preparation of digital Telugu [8] A. Rosenfeld and A. C. Kak. Digital Picture Process-
handwritten documents. ing, Volume 2. Academic Press, London, 1982.
[9] R. E. Saferstien. Criminalistics: An Introduction To
References Forensic Science. Prentice Hall, 2006.
[10] P. Soille. Morphological Image Analysis. Springer Ver-
lag, Berlin, 1999.
[1] M. Bulacu and L. Schomaker. Text-independent writer
[11] S. N. Srihari, S. H. Cha, H. Arora, and S. Lee. Indi-
identification and verification using textural and allo-
viduality of handwriting. Journal of Forensic Sciences,
graphic features. IEEE Trans. on Pattern Analysis and
47(4):1–17, 2002.
Machine Intelligence, 29(4):701–717, 2007.
5
[12] C. I. Tomai, B. Zhang, and S.N.Srihari. Discrimina- dividuality using word features. In Seventh Interna-
tory power of handwritten words for writer recogni- tional Conference on Document Analysis and Recogni-
tion. In Seventeenth International Conference on Pat- tion, Edinburgh, pages 1142–1146, 2003.
tern Recognition, Cambridge, pages 638–641, 2004.
[13] B. Zhang and S.N.Srihari. Analysis of handwriting in-