You are on page 1of 15

2538 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO.

10, OCTOBER 2018

Online Writer Identification With Sparse


Coding-Based Descriptors
Vivek Venugopal and Suresh Sundaram

Abstract— This paper proposes a system to identify the author- global description from the handwritten image data are uti-
ship of online handwritten documents. We represent the trace of lized for writer identification such as computation of ink
handwriting of a writer with descriptors that are derived from a width [2] and attributes like Local Binary Pattern (LBP) [3],
set of dictionary atoms obtained in a sparse coding framework.
The descriptors for each dictionary atom encode the error while Co-occurrence features [4] and Edge structure code [5] to
using it alone for reconstruction. The use of sparse representation name a few. Conversely, the allograph based techniques con-
offers flexibility in describing each of the segmented handwritten sider the feature description computed from the letter parts
sub-strokes of a writer as a combination of more than one (allographs). While establishing the authorship of a test docu-
atom or prototype. The descriptor is constructed by considering ment, these entail the use of a dictionary / vocabulary that are
the attributes obtained from sets of histograms extracted at a
sub-stroke level. In addition, an entropy-based analysis for the pre-learnt during the time of training [6]–[8]. In the context of
bin size to be used for obtaining the feature sets is proposed. The the present paper on online writer identification, the allographs
writer descriptor is evaluated on the paragraph and individual relate to the sub-strokes that are segmented from the trace of
text lines of two publicly available English databases (the IAM the handwritten input.1
and IBM-UB1) and a Chinese database—CASIA. We empirically The research ideology of this paper falls in line with the
show that the results obtained are promising when compared with
previous works. allograph-based method. Without loss of generality, works
from such approaches in literature suggest the generation of
Index Terms— Online writer identification, substrokes, his- the vocabulary via a grouping of the sub-strokes by a hard
togram based features, sparse coefficients, writer descriptor,
IAM online handwriting database, IBM-UB 1 database, CASIA clustering algorithm like k-means. Such a scheme allows each
database. feature vector of a segmented sub-stroke (from the input data
of a writer) to be associated or represented by exactly one
code-vector in the vocabulary. However, many a times, it may
I. I NTRODUCTION
not be possible to represent sub-strokes with only one code-

R ESEARCH works on writer identification focus on the


task of assigning authorship of a piece of handwritten
document by contrasting it against templates of handwritten
vector. To resolve this issue, we adopt a sparse framework
for constructing the dictionary. This provides flexibility to
describe each of the segmented sub-strokes of a writer as
samples saved in a database [1]. In literature, these come under a linear combination of more than one atom / prototype.
the purview of behavioral biometrics. Past explorations in this In recent times, sparse representation has been applied very
area have led to the development of two types of systems, successfully in various applications of pattern analysis such as
namely online and offline. face recognition [9], [10], signature verification [11], to name
In an online writer identification system, the handwritten a few.
input is captured by an electronic stylus that makes contact
with the screen of a tablet. The tip of the stylus records the
A. Focus of the Present Work
dynamic information of the trace such as (x, y) coordinates
and time stamp. In general, the input to such a system is a This paper focuses on proposing a methodology that
document comprising a set of strokes, that in turn contains a encodes the sub-strokes of the online handwritten trace
sequence of points. Offline writer identification, on the other with descriptors derived from the set of dictionary atoms.
hand, regard the handwritten data as an image and consider We attempt to capture the similarity of the attributes2 of
techniques from image processing for subsequent analysis. each individual sub-stroke to the corresponding value in the
In such systems, there are two categorizations, namely textural subset of atoms that contribute with a non-zero sparse coding
based and allograph based methods. In the former group, coefficient. Keeping this in perspective, we derive descriptors
from each of the atoms that incorporate the similarity scores of
Manuscript received August 21, 2017; revised December 29, 2017, the attributes in a feature vector. The scores, as we shall see
February 25, 2018, and March 12, 2018; accepted March 16, 2018. Date
of publication April 4, 2018; date of current version May 9, 2018. The
in Section VI, are in a way indicative of the reconstruction
associate editor coordinating the review of this manuscript and approving it for
1 The textural based methods rely on the bitmap image and hence are not
publication was Dr. Julian Fierrez. (Corresponding author: Suresh Sundaram.)
The authors are with the Department of Electronics and Elec- applicable for the categorization of online writer identification systems. The
trical Engineering, IIT Guwahati, Guwahati 781039, India (e-mail: previous works from literature relate primarily to the use of allograph based
v.venugopal@iitg.ernet.in; sureshsundaram@iitg.ernet.in). methods and these are discussed in Section II.
Color versions of one or more of the figures in this paper are available 2 We refer to the feature values in a feature vector as ‘attributes’. The
online at http://ieeexplore.ieee.org. attributes computed for each sub-stroke of the online trace are stacked to
Digital Object Identifier 10.1109/TIFS.2018.2823276 form a feature vector.
1556-6013 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
VENUGOPAL AND SUNDARAM: ONLINE WRITER IDENTIFICATION WITH SPARSE CODING-BASED DESCRIPTORS 2539

Fig. 1. Block diagram of the proposed text independent online writer identification methodology. The test data refer to paragraphs or text lines written by
the enrolled writers.

error obtained while employing a particular atom alone for 1) We provide a more thorough analysis of the histogram
reconstruction. The idea of scoring individual feature attributes based features introduced in [12]. A strategy based on
separately with regards to a dictionary atom is achieved by entropy is proposed in subsection IV-A for determining
utilizing the corresponding non-zero sparse coefficient. the appropriate bin size to be selected for the histograms.
Prior attempts on sparse coding primarily rely on the 2) We present a proposal for describing the handwritten
accumulation of the coefficients over the set of dictionary content by a set of descriptors derived from the set of
atoms for generating the descriptor [6], [12]. Unlike these dictionary atoms in subsection VI.
works, the present proposal takes a step forward in exploring 3) Lastly, extensive evaluation of the proposal is presented
additional information from the sparse coefficients for writer on two English databases (IAM and IBM-UB1) and a
identification. Their utilization in separately scoring each of Chinese database (CASIA) with promising results over
the individual attributes in a feature vector helps in capturing several state-of-art methods.
the description of the writer at a finer level, compared to the To the best of our knowledge, the entropy based analysis for
works [6], [12]. This in turn leads to improvements in perfor- bin size selection of the histogram based features as well as
mance of the writer identification system. the derivation of the descriptor from the dictionary atoms
In addition to the above contribution, we consider in has not been attempted in the literature on online writer
Section IV, several sets of histogram based attributes / features identification.
at the sub-stroke level for constructing the dictionary atoms
and subsequently the descriptors.3 We provide a more thor- B. A Block Schematic of Our Proposal
ough and comprehensive analysis of the same by addressing
Figure 1 presents the block diagram encapsulating the
the issue of the selection of the bin size to be used and
details of our proposal. The spatio-temporal sequence con-
their relation to the writer identification rate. More specifically,
sisting of the spatial coordinates and the time duration from a
we propose in Section IV-A, an entropy based strategy for
document is initially passed through the pre-processing block.
determining the appropriate bin size to be selected for the
The output of this module is then used to segment the strokes
generation of these histograms. Considering that the descrip-
into smaller fragments called sub-strokes, following which
tors of the sparse coding framework are constructed from the
histogram based feature sets parameterized by a bin size are
resulting histogram feature sets, the selection of the bin size
extracted for each of them. The atoms of a pre-learnt dictionary
using our method ensures a good discrimination between the
are then used in conjunction with the attributes of sub-stroke
writers. This also gets reflected in the experimental results,
feature vectors to generate a descriptor, that encapsulates
where we show a dependence between the bin size chosen
the writer specific characteristics of the document. Finally,
by the entropy based analysis and the corresponding writer
a Support Vector Machine (SVM) classifier is utilized to
identification performance.
establish the authorship of the handwritten content.
The efficacy of our proposal is demonstrated with extensive
The organization of this paper is as follows: We begin by
experimentations on three publicly available online handwrit-
reviewing the allograph based methods from the literature
ten databases namely IAM, IBM-UB1 and CASIA. Apart from
of online writer identification in Section II. Following this,
the one-versus-all SVM, we also consider the Exemplar-SVM
we outline in the subsequent four sections the steps of our
framework discussed in [8] for writer identification.
proposed writer identification. The pre-processing methods
We summarize the discussion of this sub-section by reiter-
together with the sub-stroke generation process are presented
ating the contributions of the paper.
in Section III. The histogram based feature sets along with an
3 These features were introduced for the first time by us in the paper [12] entropy based analysis (for the appropriate bin size selection)
published in ICFHR-2016. is described in Section IV. An overview of the sparse coding
2540 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 10, OCTOBER 2018

framework for online writer identification with appropriately the size, shape and style of a specified set of shape primitives4
defined notations is then provided in Section V. In Section VI, to deduce the identity of the writer. The work in [25] charac-
the sub-stroke feature vectors from a document together with terises each stroke in a document by utilizing the histogram of
their corresponding sparse-coded coefficients are utilized in dynamic information such as pressure, azimuth, altitude and
deriving the writer descriptor. velocity. The idea of shape based and temporal sequence codes
With regards to the experiments, the set of databases are has also been attempted in the literature [26]. At this juncture,
discussed in Section VII with their respective training and we also make a mention of the hierarchical approach to writer
testing protocols. The efficacy of our proposal is demon- identification adopted in [27]. In the first stage, the statistics
strated in Section VIII, followed by a discussion to prior of the orientation and the pressure information in a shape
works in Section IX. Lastly, in Section X, we conclude the primitive (captured as a probability distribution) are employed
paper. to reduce the list of probable writers. In the second stage,
the curvature of the shape primitive of the selected writers is
II. OVERVIEW OF A LLOGRAPH BASED A PPROACHES IN quantized to 18 bins. Thereafter, the mean and variance of four
O NLINE W RITER I DENTIFICATION dynamic features (pressure, azimuth, altitude and velocity) in
each bin is employed for identification.
Over the last decade, there has been considerable research In a recent study, Chaabouni et al. [28] attempted to
in the area of online writer identification under the category model the sub-strokes using multi-fractals for Arabic writer
of allograph-based approaches. One of the prominent works identification. The sub-strokes were clustered into five groups
is that of the Gaussian Mixture Model-Universal Background based on the frequently occurring patterns in Arabic. In the
Model (GMM-UBM) system proposed in [13] and [14]. The work [29], an approach based on the combination of Dynamic
idea is to first construct the UBM from the training data Time Warping (DTW) and Support Vector Machine (SVM)
corresponding to a set of enrolled writers with the Expectation has been proposed for ascertaining the authorship of texts in
Maximization (EM) algorithm. Thereafter, the adaptation step Arabic. An exploration towards the efficacy of Beta-Elliptical
is considered wherein individual GMMs are obtained for models in representing the spatial and velocity profile of a sub-
each writer by using only his / her training data. During the stroke was suggested in [30]–[32] for writer identification.
testing phase, the document is assigned to the writer, whose We conclude with a mention of deep learning based
corresponding GMM provides the highest log-likelihood score. approaches, that have recently captured the interest of the
The development of online writer identification systems online writer identification research community. A Convolu-
have also been motivated from the domain of information tional Neural Network based approach was presented in [33]
retrieval. The works in [15]–[17] consider scoring the dis- where the authors employ a drop segment method (influenced
tribution of the allographs in the handwritten data with a from the drop-out training strategy of [34]) to improve the
measure motivated from the term frequency-inverse document generalization capability. Lastly, a Recurrent Neural Network
frequency (tf-idf) framework. In addition, the notion of Latent model with bi-directional Long Short Term Memory based
Dirichlet Allocation from the area of topic models has been approach has also been proposed in the work [35].
explored in [18]–[20]. The underlying assumption made in
these works is that a document can be modelled as a distrib- III. P REPROCESSING AND S UB -S TROKE G ENERATION
ution of finite writing styles shared among writers, which in The following two preprocessing steps are employed in this
turn can be viewed as a random mixture of text independent study.
features. • Isolated point removal: In order to eliminate such points,
Apart from the preceding ideas, a study towards the utility we consider a neighbourhood Np around a sample point
of subtractive clustering to generate the prototype patterns for (say p) in the same stroke. Thereafter, we compute the
online writer identification was conducted in [21]. In partic- Euclidean distances between p and all the other points
ular, two different strategies have been put forth for deciding in Np . If all the distances are greater than a thresh-
the identity of the writer. The first one makes use of a scoring old t, then it is regarded as an isolated point and sub-
inspired from the tf-idf approach while the second utilizes sequently eliminated from the online handwritten trace.
a voting scheme based on the nearest prototype. Following The threshold is computed empirically for each document
the success of sparse representation approach in offline writer by considering the mean μ and standard deviation σ of
identification [6], a study was conducted to demonstrate its the Euclidean distances calculated between consecutive
effectiveness for online handwritten documents in [12]. In this sample points of a stroke. For our work, the value of t is
work, the identification was achieved by employing the sparse set to (μ + 3σ ).
coefficients in a tf-idf framework. An investigation on the • Stroke merge: Due to issues in the capture of the online
viability of deriving writer description from a pre-learnt code- data, at times, we encountered that a handwritten trace,
book generated with k-means algorithm was presented in [22]. (which can be represented as a single stroke) is broken
In particular, a comparative study was conducted with the down to more than one stroke.5 In such cases, we consider
Vector of Locally Aggregated Descriptors [23] for identifying 4 In this paper, the terms shape primitives, sub-strokes and allographs have
the authorship of online handwritten documents. been used interchangeably.
Methods that make use of shape primitives have also been 5 This issue was observed with regards to the IAM Online handwriting
attempted in the literature. Namboodiri and Gupta [24] encode database, that will be used for the experimentation in this work.
VENUGOPAL AND SUNDARAM: ONLINE WRITER IDENTIFICATION WITH SPARSE CODING-BASED DESCRIPTORS 2541

Fig. 2. Illustration of the segmentation approach - (a) Input online handwritten trace. (b) Substrokes obtained by locating the local maxima in a stroke.
(c) Substrokes generated with considering fixed length sample points. Note that we use different colours to distinguish between subsequent sub-strokes.

the time difference between the last point of a stroke and Once the (θpi , dpi ) pairs are evaluated for each point of the
the first point of the succeeding stroke. If the obtained sub-stroke, we generate a histogram comprising B bins. The
value lies within a threshold, the corresponding strokes value of B decides the extent to which the angular values
are merged into a single stroke. are quantized. To start with, the votes corresponding to the
Once the preceding pre-processing steps are performed, each bins are initialised to zero. For each pi , we increment the
stroke is divided into fragments called sub-strokes, that are vote of the bin in which θpi falls into with a magnitude dpi .
in a way indicative of the basic structural units present in In this manner, the accumulation of the votes across the bins
the document. The segmentation procedure utilized in this is done for all the N S points of S under consideration. The
work considers the generation of sub-strokes obtained via two resulting operation gives us a B dimensional vector which is
strategies as outlined below: In the first method, we rely on then normalized by its Euclidean norm to give hp .
locating the local maxima in a stroke. The strokes are then split Apart from the above, we generate two additional sets of
at these points and individual sub-strokes obtained thereof. histograms that take into the account the information of the
In the second strategy, we consider sub-strokes of a fixed vicinity around the sample points of a sub-stroke. Accordingly,
length by splitting the stroke into smaller segments containing for each point pi in S, we employ a set of 2r + 1 points
a specific number of points. The successive segments are {pi−r , ..., pi , ..., pi+r } in its neighbourhood Npi and compute
allowed to have an overlap that is governed by a stride the following:
factor. For our work, we have considered a sub-stroke to 
r
1
contain 30 points with a stride of five points. The generation of i = j pi+ j ∀i ∈ [r + 1, N S − r ]
sub-strokes using the segmentation strategies may be regarded 2r + 1
j =−r
as a data augmentation process, that facilitates the learning 1  r
of the sparse representation framework on their corresponding ai = j i+j ∀i ∈ [2r + 1, N S − 2r ] (2)
2r + 1
feature vectors discussed in the next section. j =−r
As an illustration, we depict in Figure 2 (a) an online The value of r determines the number of sample points in
trace of a word and the sub-strokes obtained using the two Npi used for obtaining the above values at pi . In our work,
strategies in sub-figures (b) and (c) respectively. For better the value of r was set to two. This choice is reasonable,
clarity, different colours are employed to distinguish between considering that we do not have to lose more than four points
subsequent sub-strokes. at the start and end of each stroke when compared to a higher
value of r .
IV. F EATURE E XTRACTION Both i and ai , corresponding to point pi , encode the
change in the spatial coordinates and the rate of change
Subsequent to generating the sub-strokes, we extract fea- respectively. It may be mentioned that the form of Equation 2
tures that encode their shape information. Such a represen- bears semblance to the ‘delta’ and ‘acceleration’ coefficients
tation provides a fixed feature vector dimension irrespective from the area of speech processing [37]. With regards to our
of the length of the sub-stroke. Inspired by the success of work, these consist of two components represented by i =
Histogram of Oriented Gradient (HOG) [36] in the area of (i x , iy ) and ai = (ai x , aiy ) corresponding to the spatial x
image processing, we propose sets of histogram based features and y coordinates respectively. Accordingly, we refer to them
/ attributes for each sub-stroke as follows: as delta and acceleration vectors respectively. By noting that
NS
Consider a sub-stroke S of N S points denoted by {pi }i=1 the vectors encode the relative spatial information, we utilize
where pi = (x i , yi ) consists of the spatial x and y coordinates them to compute their orientation and magnitude as follows.
of the i t h point. Let pc = (x c , yc ) represent the coordinates  
−1 iy
of its centroid. We calculate θpi , the angle of the line segment θi = tan
i x
pi − pc with the horizontal and the Euclidean distance dpi 
between pi and pc as follows: di = (i x )2 + (iy )2 (3)
   
aiy
−1 (yi − yc ) θai = tan−1
θpi = tan ai x
(x i − x c ) 

dpi = (x i − x c )2 + (yi − yc )2 (1) dai = (ai x )2 + (aiy )2 (4)
2542 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 10, OCTOBER 2018

Fig. 3. A toy-example for the computation of H B - (a) Illustration of the first level of k1 = 20 clusters obtained from the feature vectors of the three
writers with their prototypes and Voronoi cells highlighted. (b) Depiction of the three sub-clusters for a particular Voronoi cell shaded in yellow color in
j
sub-figure (a). In addition, we also present the computation of probability values { pik }3j =1 from the sub-cluster marked with an arrow.

Analogous to the generation of hp , a voting procedure for The fraction of feature vectors corresponding to each of the
the B bins of the histogram is applied on the (θi , di ) and W writers in a given sub-cluster can now be used to compute
(θai , dai ) pairs to obtain h and ha respectively. Thereafter, the entropy measure as:
the vectors hp , h and ha are concatenated to provide the

W
feature vector description fS for S. Hik =
j j
− pik log2 ( pik ) 1 ≤ i ≤ k1 , 1 ≤ k ≤ W (7)
j =1
f S = [hp , h , ha ] (5)
From the preceding definition, the entropy values correspond-
It can be noticed that the dimension D of fS is 3 × B. For ing to the W sub-clusters generated from the i t h cluster can
a document consisting of n T sub-strokes, we denote the set be written as {Hik }k=1
W
. Clearly, the calculation of a specific
of feature vectors as F, defined as: Hik in Equation 7 involves obtaining the probability values
j
for the W writers, namely { pik }W j =1 .
F = {f1 , f2 , · · · , fn T } = {f j }nj =1
T
(6) To motivate the use of entropy measure for the selection of
the bin size B, consider the scenario wherein the feature vec-
where f j is the 3 × B dimensional feature vector for the tors assigned to the k t h sub-cluster encapsulate the complete
j t h sub-stroke in the document. information from only the n t h writer, 1 ≤ n ≤ W . Assuming
that the sub-clusters have been generated from the i t h cluster,
j
we can infer that the value of pik will be one for j = n and
A. Entropy Based Selection of Bin Size B zero otherwise. Accordingly from Equation 7, we see that the
From the preceding discussion, it is evident that the dimen- entropy value Hik will achieve a value of zero.
sion of the feature representation derived from the segmented As an extension of the above, when each of the k1 × W sub-
sub-strokes relies on the number of bins B used for generating clusters generated in the second level comprise feature vectors
the set of histograms. In addition, from the work-flow of corresponding to only one writer, the average entropy measure
Figure 1, it can be seen that these are utilized subsequently defined by
to derive the writer descriptors. Hence, it becomes imperative
1 
k1 
W
to select an appropriate value of B, so that increased discrim- H= Hik (8)
ination of descriptors between writers can be achieved. As a k1 × W
i=1 k=1
step towards making this choice, we propose a scheme based
will again be zero. We therefore can infer that lower entropy is
on analyzing the values of entropy obtained subsequent to a
a measure to represent better separability of the feature vectors
two level clustering scheme as detailed below:
across the writers. Lastly, by recalling that the feature set of
In the first level, feature vectors corresponding to the
the segmented sub-strokes themselves are parameterized with
sub-strokes are pooled among the set of paragraphs of the
a bin size B, we explicitly refer to the average entropy value
W writers and partitioned into k1 clusters. The resulting
obtained in Equation 8 by H B . The choice of B (denoted
prototypes relate on an average to the shape characteristics
as B * ) that can be considered is as follows:
of the frequently occurring sub-strokes. In the second level,
we partition the feature vectors assigned to each of the k1 B * = arg min H B (9)
clusters separately into W clusters - corresponding to the B
number of writers enrolled for training. For the sake of clarity, For a better understanding of the preceding discussion,
we refer to each of the W clusters at the second level as a we provide a toy example illustration in Figure 3 with three
‘sub-cluster’. Thus, in total, we obtain k1 × W sub-clusters at writers enrolled to the identification system. The feature vec-
the second level. tors of the writers are depicted with different symbols. In the
VENUGOPAL AND SUNDARAM: ONLINE WRITER IDENTIFICATION WITH SPARSE CODING-BASED DESCRIPTORS 2543

sub-figure (a), we depict the k1 clusters obtained at the first have n i ≤ n T . We define two scores S + −
pi (d) and S pi (d) for the
level with their corresponding Voronoi cells. In this example, t h t h
d attribute of the p feature vector f pi having a non-zero
the value of k1 used is 20. Thereafter, the feature vectors in a sparse coding coefficient α pi for the i t h dictionary atom φ i .
given Voronoi cell are used to generate W = 3 sub-clusters at ⎧
⎨ 1
the second level. The value of W corresponds to the number + f pi (d) ≥ α pi ×φi (d)
of writers enrolled in the system. The sub-clusters of one such S pi (d)= 1 + | f pi (d) − α pi × φi (d)|

Voronoi cell, highlighted with a yellow color in Figure 3(a) 0 other wi se

is shown in sub-figure (b). The entropy corresponding to the ⎨ 1
f pi (d) < α pi ×φi (d)
distribution of the writers in each of the three sub-clusters S− (d)= 1 + | f pi (d) − α pi × φi (d)|
pi ⎩
are computed separately and then accumulated. This process 0 other wi se
of accumulation is carried out across all the sub-clusters in 1 ≤ i ≤ K , 1 ≤ p ≤ n i (12)
the feature space, thus leading to the value of the average
entropy H B (refer Equation 8). For sake of completion, Here φi (d) is the value of the d t h attribute (1 ≤ d ≤ D) in
in the sub-figure 3(b), we also present the probability values the atom φ i . The Equation 12, in a sense, attempts to capture
corresponding to the writers of a sub-cluster (indicated with the error obtained along the d t h attribute while considering
an arrow). a single dictionary atom for reconstruction. If we define this
error as e pi (d) = f pi (d) − α pi × φi (d), then with regards to
V. S PARSE C ODING : A N OVERVIEW the attribute f pi (d) for which e pi (d) ≥ 0, we assign it with
The sparse framework aims at efficiently determining an the score S + pi (d). Likewise, when e pi (d) < 0, this attribute
over-complete set of basis vectors, in a way so that each feature is given a score of S − pi (d). In particular, we observe that the
vector can be approximated closely as a linear combination of more proximal f pi (d) is to α pi ×φi (d), the higher is the value
them. The term ‘sparse’ comes from the fact that most of the of the score S − +
pi (d) or S pi (d) (as applicable).
coefficients obtained while representing a feature vector are The obtained scores across the d t h feature attribute of
close to zero. In our work, we consider that a document with {f pi }np=1
i
are subsequently accumulated and normalized with
n T feature vectors {f j }nj =1 T
and an over-complete dictionary D respect to the values obtained from the entire document to
of K atoms, [φ φ 1 , φ 2 , ..., φ K ] are available to us. Any feature obtain S̃i+ (d) and S̃i− (d).
vector f j of a segmented sub-stroke can then be closely
ni
represented / approximated as a linear combination of the S+
pi (d)
atoms φ i , 1 ≤ i ≤ K with a set of coefficients α j = S̃i+ (d) =
p=1

[α j 1 α j 2 · · · α j K ]T as follows:
K
nj
S+
p j (d)

K j =1 p=1
fj ≈ α j iφ i (10) ni
S−
pi
i=1
p=1
Since K is higher than D (the dimension of f j ), there are S̃i− (d) = 1 ≤ i ≤ K, 1 ≤ d ≤ D (13)

K
nj
infinitely large solutions for Equation 10. In order to obtain S−
p j (d)
a unique set of coefficients, we impose a sparsity constraint j =1 p=1
that ensures that only a few of the K coefficients in α j are The scores for each of the D feature attributes are stacked to
non-zero. For a pre-learnt dictionary D of atoms, the values form the descriptors S+ −
i and Si as shown:
of these coefficients can be obtained by solving the following
+ T
optimization formulation: S+ +
i = S̃i (1) · · · S̃i (d) · · · S̃i+ (D)

− T

K S− −
i = S̃i (1) · · · S̃i (d) · · · S̃i− (D) (14)
min  f j − α j i φ i 2 +λ  α j 1 (11)
αj To ensure compactness of the descriptor for the i t h dictionary
i=1
atom, we consider the L 2 norm representation of the S+ i and
The factor λ is a Lagrange multiplier which controls the
S−
i vectors.
influence that the two terms have on each other. With regards
to the estimation / learning of the dictionary D in the training Si = [ S+ −
i 2 ,  Si 2 ]
T
(15)
phase, we use the algorithm of [38], implemented in the
SPArse Modeling Software (SPAMS) optimization toolbox. The final descriptor is obtained by concatenating the descrip-
tors across all the K dictionary atoms.6

T
VI. S PARSE C ODING BASED W RITER D ESCRIPTION S = S1 S2 · · · S K (16)
Prior to deriving the descriptor, the feature vectors {f j }nj =1
T

extracted from the n T sub-strokes, together with their respec-


tive sparse coefficient vectors {α α j }nj =1
T
are assumed to be
known. Let the subset of feature vectors in {f j }nj =1T
with a 6 We also explored on deriving descriptors by scoring the attributes without
consideration to the sign of the reconstruction error. However, the present
non-zero sparse coefficient α j i for the i dictionary atom
t h
strategy of using separate scores Si+ and Si− resulted in improved writer
ni
be represented as {f pi } p=1 . Without loss of generality, we identification rates. The details are presented in sub-section VIII-F.
2544 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 10, OCTOBER 2018

Algorithm 1 Proposed Writer Descriptor Generation


• Input:
– A set of histogram based feature vectors {f j }nj =1
T
extracted from a document with n T sub-strokes.
– Dictionary with K atoms {φ φ i }i=1 .
K

• Output:
– Writer descriptor S.
• Algorithm:
Compute:
◦ α j - the sparse coding coefficient vector for each feature vector f j .
◦ n i - the number of feature vectors having non-zero sparse coding coefficient for the i t h atom φ i .

% Descriptor Computation:
for every dictionary atom in {φ φ i }i=1
K do

for every feature vector having non-zero sparse-coding coefficient for φ i do


Compute S + −
pi (d) and S pi (d) scores across the D attributes
end for
end for

% Normalization Step:
for every dictionary atom in {φ φ i }i=1
K do

Calculate S̃i+ (d) and S̃i− (d) scores across the D attributes.
end for

Stack the S̃i+ (d) and S̃i− (d) scores across the D attributes to obtain S+ −
i and Si vectors respectively.
+ −
Evaluate L 2 norm of the Si and Si vectors to get Si .
Concatenate {Si }i=1
K to represent the writer descriptor S.

We see that the dimension of S is 2 × K . In the event obtained between the descriptors of two different writers
of a dictionary atom φ i not having any of the points in w1 , w2 as follows.
the document with a non-zero sparse coding coefficient, its M M
w1 w2
descriptor Si is set to the zero vector. To summarize our 1  
d2 (w1 , w2 ) =  Sw1 i − Sw2 j 1 (18)
strategy, the pseudo-code of our proposed descriptor is shown Mw1 × Mw2
i=1 j =1
in Algorithm 1.
Subsequent to the above calculations between each pair
A. Discussion
of writers, the average inter-writer City-block distance for
From Equation 15, it can be observed that we considered writer w1 can be written as:
stacking the L 2 norm of the S+ −
i and Si vectors for constructing
Si rather than the entire vectors themselves. The rationale 1 
W

behind this step is to construct a compact descriptor for a d2 (w1 ) = d2 (w1 , m) (19)
W −1
m=1
document in terms of feature dimension. If we consider the m =w1
entire S+ −
i and Si vectors for generating Si , our final descriptor
would have a dimension that is increased by a factor of D. The ratio R of the average inter-writer to the intra-writer
To ensure that the L 2 norm operation does maintain the distance is then defined by:
discrimination between the writers, we conduct an experiment
W
that determines the ratio of the average inter writer to intra d2 (i )
i=1
writer descriptor distance, as follows: R= (20)
Mw W
Let {Sw1 i }i=11 correspond to the final descriptors for the d1 (i )
Mw1 documents written by writer w1 . The average pair-wise i=1
intra-writer City Block distance between these descriptors is The value of R is related to the degree of separation of the
denoted as d1 (w1 ). writer descriptors in the feature space. In Table I, we present
Mw1 Mw1
1  the results on the data of the writers from the databases
d1 (w1 ) = M  Sw1 i − Sw1 j 1 (17) used in this experimentation, namely the IAM, IBM-UB1 and
w 1
2 i=1 j =1 CASIA.7 For this set-up, the value of bins for the histogram
j =i
based features and the dictionary size was set to 10 and 400,
As a first step to computing the inter writer descriptor
distance, we consider the average of the City-block distance 7 A detailed description of the databases are provided in Section VII-A.
VENUGOPAL AND SUNDARAM: ONLINE WRITER IDENTIFICATION WITH SPARSE CODING-BASED DESCRIPTORS 2545

TABLE I • For the experiments presented in subsections VIII-A to


I LLUSTRATION OF THE R R ATIO AT PARAGRAPH AND T EXT L INE L EVEL VIII-F of this paper, the following protocol is suggested
ON THE IAM, IBM-UB 1 AND CASIA D ATABASES . T HE F ULL
D ESCRIPTOR C ORRESPONDS TO U SING THE E NTIRE V ECTORS OF
for the IAM database. Four random paragraphs written by
S+ −
i AND Si A CROSS A LL THE D ICTIONARY ATOMS FOR
each writer are selected for training while the remaining
R EPRESENTATION . T HE C OMPACT D ESCRIPTOR , ON are reserved for the evaluation of our proposal. The
THE O THER H AND , C ONSIDERS THE L 2 N ORM
protocol advocated in [19] is used for the IBM-UB 1
handwriting data-set with 80% of the randomly selected
summary document paragraphs from each writer used for
enrolment and the remaining for testing. For the CASIA
Chinese database, two random pages of text per user are
selected for enrolment. The performance of our algorithm
is evaluated on the remaining one page [33], [35].
respectively. The choice behind these values is based on the For each database, the atoms of the dictionary are learnt
discussion in subsection VIII-B. from the enrolled data corresponding to 25% of the
We considered two possible representation of the descriptors number of users, that are randomly chosen. The Support
as outlined below: Vector Machine (SVM) is used to establish the identity of
+ − the writer for the test document. In particular, a one versus
• We stack the entire vector of Si and Si obtained across
all SVM with the Radial Basis Function (RBF) kernel is
all the K dictionary atoms, resulting in a 2 × D × K
considered. The optimal values of the parameters C and
dimensional descriptor. For sake of brevity, we refer to
γ are obtained via the grid search technique.
this as the ‘full descriptor’.
+ − • In the second protocol, for each of the databases we
• We stack the L 2 norm of the Si and Si vectors obtained
consider dividing the handwritten samples into a disjoint
across all the K dictionary atoms. We refer to this as the
training and test set, so that there is no overlap of
‘compact descriptor’.
writers between them. The training set corresponds to
With the above two representations, we obtain a comparable the data contributed from 25% of the number of users
value of the ratio R across all the databases at paragraph that are randomly chosen. The test set comprises samples
and text-line levels respectively. This in turn suggests that of the remaining 75 % of writers. Moreover, in place
either of the descriptors may be used for writer identification. of the traditional one versus all SVM, the Exemplar-
Nonetheless, employing L 2 norm leads to a reduction in the SVM is used for writer identification [8]. The results
dimension by a factor D over the full descriptor. of our methodology using this set-up is presented in
VII. E XPERIMENTAL S ET-U P Sub-section VIII-G.
We consider two types of training strategies for our experi-
In this section, we present an outline of the three databases
mentation namely, paragraph and text-line level. In the for-
used together with their enrollment and evaluation protocols.
mer, the proposed descriptor is derived separately for each
paragraph while for the latter, we divide each paragraph
A. Database Description
into individual text lines and construct the descriptor. While
The IAM Online Handwriting English database [39]: evaluating the performance of our strategy, we mention the
comprises online handwriting data obtained on a white-board average writer identification rate obtained from ten trials.
from 217 writers. Each writer contributed to as much as eight A trial comprises the following steps:- selection of documents
paragraphs of 50 words with the recordings of (x, y) spatial for enrolment (using one of the above protocols), dictionary
coordinates and time stamps of the pen trace being noted. learning, construction of descriptor and classification.
The IBM-UB1 English database [40]: comprises samples
of 43 writers acquired from the IBM Cross-pad. For each topic
chosen by a specific writer, two documents are generated - VIII. R ESULTS AND D ISCUSSION
summary and ‘query’ text. The summary text contains a one In this Section, we demonstrate several experiments that
to two page essay on a particular topic while the query data reflect the various aspects of the proposal being explored for
captures the essence of the summary text with approximately establishing the authorship of online handwritten documents.
25 words. The spatial x and y-coordinates are recorded for
each handwritten document in this database.
The CASIA Chinese database [41]: consists of texts A. Analysis of Average Entropy Values H B With Bin Size B
provided by the Institute of Automation, Chinese Academy of We begin by investigating on the size of the bin B that can
Sciences. The online data being employed are contributed by be utilized for the histogram based feature sets while ensuring
187 writers, with each writer providing three Chinese pages. discrimination between the writers. As part of the experimental
set up, the number of clusters k1 used at the first level is varied
B. Training and Testing Protocol from 25 to 100 in steps of 25. As discussed in Section IV-A,
While evaluating the performance of the proposal, two each of these clusters are further split into W sub-clusters at
protocols are considered with regards to the training and the second level for the computation of the average entropy
testing of the handwritten documents. These are as follows. value H B . In our case, the values of W used are 217, 43 and
2546 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 10, OCTOBER 2018

TABLE II
VARIATION OF THE AVERAGE E NTROPY VALUES H B FOR VARYING N UMBER OF B INS B ON THE IAM, IBM-UB 1 AND CASIA D ATABASES . T HE VALUE
OF k1 D ENOTE THE N UMBER OF C LUSTERS AT THE F IRST L EVEL . F OR M ORE D ETAILS , P LEASE R EFER THE T EXT

TABLE III
C OMPARISON OF AVERAGE W RITER I DENTIFICATION R ATES ( IN %) AT THE PARAGRAPH AND T EXT L INE L EVEL FOR THE P ROPOSED D ESCRIPTOR
W ITH VARYING N UMBER OF B INS B AND D ICTIONARY S IZE K ON THE IAM D ATABASE . T HE B EST AVERAGE I DENTIFICATION R ATES
I S M ARKED IN B OLD . (a) PARAGRAPH L EVEL . (b) T EXT L INE L EVEL

187 - corresponding to the number of writers in the IAM, number of bins B for the histogram based features and the
IBM-UB1 and CASIA database respectively. number of the atoms K in the dictionary. We vary the value
The average entropy H B is recorded for bin sizes B, ranging of B from two to 16 in steps of two and number of atoms
from 2 to 16 in steps of 2 in Table II. We observe that its value from 50 to 500 in steps of 50. Table III (a) and III (b) report
initially decreases with increasing values of B. Subsequent to the results of our strategy for the IAM database. We observe
reaching a minimum, it begin to increase again for higher that the proposed writer descriptor provides the best average
number of bins. The lowest value of H B is obtained with the identification rate of 99.45 % and 90.28% at paragraph and text
bin size of 10 across all the databases. line level respectively for a dictionary size K = 400 and bin
As the number of bins increase, it is likely that for large size B = 10. With regards to the IBM-UB 1 database (shown
values, a perturbation to one or more of the orientations in Table IV (a) and IV(b)), we achieve a best average writer
θpi , θi , θai can lead to a change in the bin to which the votes identification rate of 97.21% and 83.49% at paragraph and text
are assigned. This results in relatively higher variation between line level respectively for the same value of (B, K ). Lastly,
the attributes of the histogram feature vectors corresponding for the CASIA database presented in Table V (a) and V (b),
to the sub-strokes of the same writer - thereby leading to the the best average paragraph and text-line writer identification
increase in the average entropy value. On the other hand, rate is reported to be 95.58% and 80.85% respectively.
the increased H B values at small number of bins can be It is interesting to note that the value of the bin size B
explained due to the generated feature vectors not being able leading to the best average writer identification rate coincides
to adequately capture the nuances that discriminate the feature to that providing the lowest average entropy measure H B
vectors belonging to different writers. in Table II on all the three databases. This validates the need to
select an appropriate choice of bin size for the extraction of the
histogram feature sets, thereby leading to better discrimination
B. Performance With Varying Bins B and Dictionary Size K of the sparse descriptors between the writers.
In this subsection, we evaluate the performance of the With regards to the trend in identification rates across all the
proposed writer identification system with the variation of the databases, for a fixed dictionary size as we vary the number of
VENUGOPAL AND SUNDARAM: ONLINE WRITER IDENTIFICATION WITH SPARSE CODING-BASED DESCRIPTORS 2547

TABLE IV
C OMPARISON OF AVERAGE W RITER I DENTIFICATION R ATES ( IN %) AT THE PARAGRAPH AND T EXT L INE L EVEL FOR THE P ROPOSED
D ESCRIPTOR W ITH VARYING N UMBER OF B INS B AND D ICTIONARY S IZE K ON THE IBM-UB1 D ATABASE . T HE B EST AVERAGE
I DENTIFICATION R ATE I S M ARKED IN B OLD . (a) PARAGRAPH L EVEL . (b) T EXT L INE L EVEL

TABLE V
C OMPARISON OF AVERAGE W RITER I DENTIFICATION R ATES ( IN %) AT THE PARAGRAPH AND T EXT L INE L EVEL FOR THE P ROPOSED
D ESCRIPTOR W ITH VARYING N UMBER OF B INS B AND D ICTIONARY S IZE K ON THE CASIA D ATABASE . T HE B EST AVERAGE
I DENTIFICATION R ATE I S M ARKED IN B OLD . (a) PARAGRAPH L EVEL . (b) T EXT L INE L EVEL

bins, the average identification rate increases. After reaching C. Influence of the Segmentation Strategy
a maximum value, it begins to decrease for larger values As mentioned in Section III, there were two strategies
of B. Moreover, it may be observed that the performance described for the generation of sub-strokes. In this experiment,
on the IAM, IBM-UB1 and CASIA databases is inversely we consider the performance of our proposal on each of them
related to values of H B reported in Table II. In other words, separately. For sake of simplicity, we denote the method of
lower identification rates for a dictionary size K correspond splitting the online trace at the local maxima in a stroke
to higher values in H B and vice-versa, thereby suggesting a as ‘SEG-1’. The second strategy ‘SEG-2’ corresponds to
strong dependence. generating sub-strokes with a fixed number of points (chosen
Furthermore, it can also be noticed that for a particular value as 30 in this work).
of bin size B, the average writer identification rate increases Table VI presents the best average writer identification
with dictionary size and then becomes comparable for values results across all the databases at paragraph and text-line
of K ≥ 400. A small dictionary size leads to a low average levels, with the bin and dictionary size being specified. From
identification rate due to the limited prototypes available for the entries of the third row, it may be inferred that the
differentiating the sparse coding descriptors of the different substrokes from both the strategies SEG-1 and SEG-2 leads
writers. to improvement of the writer identification system.
Considering that the best performance on all the data-
sets were obtained with writer descriptors constructed
D. Influence of Histogram Feature Sets on Writer Description
from histogram features parameterized with B = 10,
we use this bin size for the experiments described in In the preceding experiments, we evaluated the efficacy of
subsections VIII-C, VIII-E, VIII-F and VIII-G. the writer descriptor constructed from the histogram based
2548 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 10, OCTOBER 2018

TABLE VI
P ERFORMANCE C OMPARISON OF O UR P ROPOSAL FOR THE D IFFERENT S EGMENTATION S TRATEGIES .
T HE N UMBERS B EING M ENTIONED A RE THE I DENTIFICATION R ATES ( IN %)

TABLE VII
P ERFORMANCE C OMPARISON OF P ROPOSED W RITER I DENTIFICATION S YSTEM ( IN %) W ITH D IFFERENT C OMBINATIONS OF H ISTOGRAM F EATURE S ETS

feature sets (hp , h and ha ) extracted from the sub-strokes TABLE VIII
(discussed in Section IV). We now validate the influence C OMPARISON OF AVERAGE W RITER I DENTIFICATION R ATES ( IN %) AT
THE PARAGRAPH L EVEL FOR THE D ESCRIPTOR O BTAINED VIA
of the different choice of feature sets for writer description k-M EANS AND T HROUGH S PARSE C ODING W ITH VARYING
in Table VII. The best writer identification accuracies are D ICTIONARY S IZES K . T HE B EST AVERAGE
reported with the number of dictionary atoms K and bin I DENTIFICATION R ATES A RE M ARKED
IN B OLD FOR E ACH D ATABASE
size B being specified. The bin size for each combination
of histogram sets corresponds to the one minimizing the
average entropy value H B (defined in sub-section IV-A). From
the entries of the last row, we see that the augmentation of
feature sets h and ha (that capture the vicinity information)
improve the performance of the identification system beyond
that provided by hp .

E. Impact of the Sparse Framework for Writer Description


As mentioned in Section I, sparse coding framework offers
flexibility in the representation of the segmented sub-strokes
over hard clustering algorithms such as k-means. To demon-
strate this, we investigate its influence on writer description
over the k-means algorithm for varying dictionary sizes/
number of prototypes. The average writer identification rates
obtained at the paragraph level for the IAM, IBM-UB1 and
CASIA databases are shown in Table VIII. The sparse-based
descriptors are derived from the histogram feature set com-
bination (hp , h and ha ) extracted from the segmented sub- the sparse coding descriptors achieve an average identification
strokes of the handwritten data. It may be noted that across rate of 99.45% for K = 400. This, as we see, corre-
the different dictionary sizes considered, they provide a better sponds to a performance improvement of 2.73%. For the
performance to those derived via k-means for all the three IBM-UB1 database, a best average writer identification rate
databases. of 97.21 % is achieved with the descriptors derived via
With regards to the IAM database, the descriptors from sparse coding for a dictionary size K = 400, which is an
the codebook generated with the k-means attain a best improvement of 3.59% over those obtained from the k-means.
performance of 96.81% for K = 450. Contrast to this, For the CASIA database the best result of 95.58% is obtained
VENUGOPAL AND SUNDARAM: ONLINE WRITER IDENTIFICATION WITH SPARSE CODING-BASED DESCRIPTORS 2549

Fig. 4. Variation of average writer identification rates (in %) at the text-line level for the proposed descriptor obtained via k-means (shown in blue) and
through sparse coding (in red) on the IAM (sub-figure (a)), IBM-UB1 (sub-figure (b)) and CASIA database (sub-figure (c)).

using the sparse framework, again with K = 400. This value size K is mentioned in the Table. From the entries, we can
outperforms that of 88.43% for the k-means. infer that across the data-sets, the ‘2 × K -dim’ descriptor
Moving further, an experimental comparison was also per- provides improved results at paragraph and text-line levels,
formed at the text line level across all the databases. The when compared to its variant - ‘K -dim’. This, we believe,
average writer identification rates for varying dictionary sizes may be due to the increased discrimination achieved with the
are presented in Figure 4. Note, that for sake of simplicity, scoring of the attributes of the segmented sub-stroke feature
we abbreviate it as IR. From the best average writer identifica- vectors based on both the value and sign of the reconstruction
tion results presented in the sub-figures, the descriptors from a error. Needless to say, the higher performance is achieved with
sparse coding framework provide an improvement of 19.02%, the trade-off of increasing the dimension by a factor of nearly
6.98% and 23.21% for the IAM, IBM-UB 1 and CASIA two (450 versus 800).
databases respectively.
G. Performance Evaluation Using Exemplar-SVM on a
F. Performance Variation With a Variant of Writer Descriptor Disjoint Train and Test Dataset
Recall that in the derivation of the writer descriptor in In this subsection, we evaluate the efficacy of our proposal
Section VI, we separately score each attribute of the seg- using a different training and test protocol. We divide the
mented sub-stroke by taking into consideration whether the handwritten samples of a database into a disjoint training
reconstruction error is positive or negative. For the sake of and test set, so that there is no overlap of writers between
completion, we also explored the possibility of a variant them. The training set corresponds to the data contributed
of the descriptor, wherein for a dictionary atom, attributes from 25% of the number of users that are randomly chosen
are scored irrespective of the sign of the reconstruction in each trial. The test set comprises samples of the remaining
error. Mathematically speaking, the score of the d t h attribute 75 % of writers. Moreover, for establishing the authorship of
(1 ≤ d ≤ D) of the pt h feature vector f pi having a non-zero a document, we explore the utility of the Exemplar-SVM
sparse coding coefficient α pi for the i t h dictionary atom φ i is framework8 recently suggested by Christlein et al. [8].
given as: The best average writer identification results are presented
in the Table X for the different sets of histogram features. The
1 corresponding values of bin size and dictionary size (B, K ) are
S pi (d) =
1 + | f pi (d) − α pi × φi (d)| also mentioned for each combination. Apart from this, we also
1 ≤ i ≤ K , 1 ≤ p ≤ n i (21) evaluated our proposal with the different segmentation strate-
gies using the Exemplar-SVM. The results for this experiment
Subsequent to obtaining the scores, the descriptor is computed
are given in Table XI.
by following steps similar to those described from Equation 13
to Equation 15. However, instead of obtaining two separate
IX. D ISCUSSION TO P RIOR W ORKS
vectors S+ −
i and Si for the i
t h dictionary atom φ , we get
i
a single vector of dimension D in this variant. The final In this Section, we see how the performance of the proposed
descriptor then comprises the L 2 norm of these vectors, writer identification system fares to the previous explorations
computed across all the K dictionary atoms - thus leading that use the IAM, IBM-UB 1 and CASIA databases (refer
to a writer descriptor of dimension K . The histogram feature Table XII). Since our method relies on hand-crafted features,
set combination (hp ,h and ha ) extracted from the segmented we begin the discussion by enumerating all prior systems from
substrokes are used for this experiment. literature in this paradigm. Unless specifically mentioned as a
The evaluation of the above descriptor at the paragraph table-note, the entries of the Table are the writer identification
and text line level is presented in Table IX. For sake of rates (in %) as reported from the respective references. It may
brevity, we refer it as the ‘K -dim’, while the one discussed 8 In the traditional SVM formulation, given a new handwriting sample from
in Section VI is abbreviated to as ‘2 × K -dim’. The best an unseen writer, all models need to be retrained. This issue is however
average writer identification rate along with the dictionary alleviated by the use of the Exemplar-SVM.
2550 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 10, OCTOBER 2018

TABLE IX
P ERFORMANCE C OMPARISON W ITH A VARIANT OF THE W RITER D ESCRIPTOR ON THE T HREE D ATABASES . T HE N UMBERS B EING M ENTIONED A RE
THE I DENTIFICATION R ATES ( IN %)

TABLE X
P ERFORMANCE C OMPARISON OF P ROPOSED W RITER I DENTIFICATION S YSTEM ( IN %) W ITH D IFFERENT C OMBINATIONS OF H ISTOGRAM F EATURE
S ETS . T HE T RAINING AND T ESTING P ROTOCOL OF S UBSECTION VIII-G I S U SED

TABLE XI
P ERFORMANCE C OMPARISON OF O UR P ROPOSAL FOR THE D IFFERENT S EGMENTATION S TRATEGIES U SING THE T RAINING AND T ESTING P ROTOCOL OF
S UBSECTION VIII-G. T HE N UMBERS B EING M ENTIONED A RE THE I DENTIFICATION R ATES ( IN %)

also be noted that the systems being outlined can differ with our implementation with the online handwritten data9 (using
regards to the features as well as the classifier and enrolment the feature vectors of the segmented sub-strokes described
strategies. in Section IV). On the whole, the proposed methodology
To the best of our knowledge, the writer identification attains higher average writer identification rates of 99.45% and
systems proposed in [12], [13], [19], [21], and [22] utilize 90.28% at paragraph and text-line level using the traditional
the IAM database for experimentation. Schlapbach et al. [13] SVM. The Exemplar-SVM also presents competing results
use a GMM based framework and present an identification of 99.40% and 89.88%.
rate of 98.56% and 88.96% at paragraph and text line level. To the best of our knowledge, only two studies have
The concept of Latent Dirichlet Allocation utilized in [19] been made to date on the IBM-UB1 database [19], [22].
and the subtractive clustering based approach of [21] lead to a These works provide an average writer identification accu-
paragraph-level writer identification rate of 93.39% and 96.3% racy of 89.47% and 94.37% at the paragraph level. Con-
respectively. The incorporation of the tf-idf framework on the trast to these and our implementation of the scheme [6],
sparse representation framework in [12] achieved a paragraph the proposed methodology of sparse coding based descrip-
level identification rate of 98.94% and a text line level accu- tors does remarkably better with both traditional and
racy of 83.30%. An attempt on writer description from a pre- Exemplar-SVM.
generated code-book using the k-means algorithm resulted in With respect to the CASIA Chinese database, we see that
a performance of 97.81% and 80.61% at paragraph and text- Li et al. [27] mention Top 1 writer identification rate of 80%
line level [22]. In addition, we wish to state that the results of
98.54% and 82.18% reported for the work [6] correspond to 9 Kumar et al. [6] evaluated their proposal on off-line handwritten data.
VENUGOPAL AND SUNDARAM: ONLINE WRITER IDENTIFICATION WITH SPARSE CODING-BASED DESCRIPTORS 2551

TABLE XII
S URVEY OF O NLINE W RITER I DENTIFICATION S YSTEMS ON THE IAM, IBM-UB 1 AND CASIA D ATABASES . U NLESS S PECIFICALLY
M ENTIONED AS A TABLE -N OTE , THE N UMBERS A RE THE W RITER I DENTIFICATION R ATES ( IN %) AS R EPORTED F ROM THE
R ESPECTIVE R EFERENCES . M OREOVER , E NTRIES IN THE TABLE M ARKED W ITH (-) S YMBOL I NDICATE T HAT THE
R ESPECTIVE AUTHORS H AVE N OT R EPORTED THE R ESULTS FOR T HEM

at paragraph-level for the GMM-UBM framework [13]. X. C ONCLUSION


Moreover, the hierarchical shape primitive approach by the
same authors report an accuracy of 91.20%. Our proposal out- In this work, we presented an online writer identification
performs not only these methods but also the implementation system by employing a sparse framework. Our contributions
of the sparse representation framework [6] with our histogram- are enumerated as follows: (i) Proposal of descriptors derived
based features. from a set of dictionary atoms. The descriptors for each
It may thus be inferred that amongst the several state-of- dictionary atom encode the error while using it alone for
art methods presented in Table XII with hand-crafted features, reconstruction. (ii) Utilization of histogram based feature
our algorithm achieves a better performance across all the vectors extracted at a sub-stroke level, together with their cor-
databases. responding sparse coefficients, for deriving the sparse coded
Lastly, there have been recent developments in online writer descriptors. (iii) Proposal of an entropy based analysis for the
identification with deep learning techniques. These methods appropriate bin size to be selected for obtaining the features
learn the feature representation from the handwritten data, so as to ensure discrimination between the writer descriptors.
thus alleviating the need of hand-crafted features. In particular, (iv) Evaluation of the strategy on a number of databases with
a Convolutional Neural Network based architecture on the results demonstrated to be on par or even better to several
CASIA Chinese database provides a writer identification rate prior works.
of 95.72 % [33]. This is almost at par to the best result We conclude this paper with a terse discussion on how our
of 95.58 % obtained with our proposal at paragraph-level. proposal can be extended for the offline writer identification
The most recent work on the same database (published in setting. The road-map of steps would be to first segment
April 2017), however, uses Recurrent Neural Networks to out the allographs or letter parts and then describe each
achieve a performance of up-to 99.5% [35]. patch thereof with features such as SIFT or RootSIFT [8].
Based on the preceding discussion, we see that our method The extracted features can be utilized to construct the over-
is quite promising for identifying the authorship of online complete dictionary and subsequently the writer descriptor,
handwritten text documents. The obtained results are better as outlined in Section VI. At this juncture, it may be worth
than several state-of-art works. mentioning that, in place of hand crafted SIFT based features,
2552 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 10, OCTOBER 2018

one can also attempt to describe the allographs with con- [23] H. Jégou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez, and
volutional neural network (CNN) activation features as done C. Schmid, “Aggregating local image descriptors into compact codes,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 9, pp. 1704–1716,
in [42] and [43]. Sep. 2012.
[24] A. Namboodiri and S. Gupta, “Text independent writer identification
R EFERENCES from online handwriting,” in Proc. 10th Int. Workshop Frontiers Hand-
writing Recognit., Oct. 2006, pp. 23–26.
[1] A. K. Jain, A. Ross, and S. Prabhakar, “An introduction to biometric [25] Z. Sun, B. Li, and T. Tan, “Online text-independent writer identification
recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 1, based on stroke’s probability distribution function,” in Proc. Int. Conf.
pp. 4–20, Jan. 2004. Biometrics, Aug. 2007, pp. 201–210.
[2] A. A. Brink, J. Smit, M. L. Bulacu, and L. R. B. Schomaker, “Writer [26] B. Li and T. Tan, “Online text-independent writer identification based
identification using directional ink-trace width measurements,” Pattern on temporal sequence and shape codes,” in Proc. IEEE 10th Int. Conf.
Recognit., vol. 45, no. 1, pp. 162–171, 2012. Document Anal. Recognit. (ICDAR), Jul. 2009, pp. 931–935.
[3] D. Bertolini, L. S. Oliveira, E. Justino, and R. Sabourin, “Texture-based [27] B. Li, Z. Sun, and T. Tan, “Hierarchical shape primitive fea-
descriptors for writer identification and verification,” Expert Syst. Appl., tures for online text-independent writer identification,” in Proc. IEEE
vol. 40, no. 6, pp. 2069–2080, 2013. 10th Int. Conf. Document Anal. Recognit., (ICDAR). Jul. 2009,
[4] S. He and L. Schomaker, “Co-occurrence features for writer identifica- pp. 986–990.
tion,” in Proc. 15th Int. Conf. Frontiers Handwriting Recognit. (ICFHR), [28] A. Chaabouni, H. Boubaker, M. Kherallah, A. M. Alimi, and H. El Abed,
Shenzhen, China, Oct. 2016, pp. 78–83. “Multi-fractal modeling for on-line text-independent writer identifi-
[5] J. Wen, B. Fang, J. Chen, Y. Tang, and H. Chen, “Fragmented edge cation,” in Proc. Int. Conf. Document Anal. Recognit., Sep. 2011,
structure coding for Chinese writer identification,” Neurocomputing, pp. 623–627.
vol. 86, pp. 45–51, Jun. 2012. [29] M. Gargouri, S. Kanoun, and J.-M. Ogier, “Text-independent writer
[6] R. Kumar, B. Chanda, and J. Sharma, “A novel sparse model identification on online Arabic handwriting,” in Proc. IEEE 10th Int.
based forensic writer identification,” Pattern Recognit. Lett., vol. 35, Conf. Document Anal. Recognit., (ICDAR), Aug. 2013, pp. 428–432.
pp. 105–112, Jan. 2014. [30] T. Dhieb, W. Ouarda, H. Boubaker, M. B. Halima, and A. M. Alimi,
[7] X. Wu, Y. Tang, and W. Bu, “Offline text-independent writer identi- “Online Arabic writer identification based on beta-elliptic model,”
fication based on scale invariant feature transform,” IEEE Trans. Inf. in Proc. Int. Conf. Intell. Syst. Design Appl. (ISDA), Dec. 2015,
Forensics Security, vol. 9, no. 3, pp. 526–536, Mar. 2014. pp. 74–79.
[8] V. Christlein, D. Bernecker, F. Hönig, A. Maier, and E. Angelopoulou, [31] T. Dhieb, W. Ouarda, H. Boubaker, and A. M. Alimi, “Beta-elliptic
“Writer identification using GMM supervectors and exemplar-SVMs,” model for online writer identification from online Arabic handwriting,”
Pattern Recognit., vol. 63, pp. 258–267, Mar. 2016. J. Inf. Assurance Security, vol. 11, no. 5, pp. 263–272, 2016.
[9] J. Wang, C. Lu, M. Wang, P. Li, S. Yan, and X. Hu, “Robust face [32] W. Ouarda, T. Dhieb, H. Boubaker, and A. M. Alimi, “Deep neural
recognition via adaptive sparse representation,” IEEE Trans. Cybern., network for online writer identification using beta-elliptic model,” in
vol. 44, no. 12, pp. 2368–2378, Dec. 2014. Proc. Int. Joint Conf. Neural Netw., Jul. 2016, pp. 1863–1870.
[10] Y. Gao, J. Ma, and A. L. Yuille, “Semi-supervised sparse representa- [33] W. Yang, L. Jin, and M. Liu, “DeepwriterID: An end-to-end online text-
tion based classification for face recognition with insufficient labeled independent writer identification system,” IEEE Intell. Syst., vol. 31,
samples,” IEEE Trans. Image Process., vol. 26, no. 5, pp. 2545–2560, no. 2, pp. 45–53, Mar. 2016.
May 2017. [34] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and
[11] Y. Liu, Z. Yang, and L. Yang, “Online signature verification based on R. Salakhutdinov, “Dropout: A simple way to prevent neural networks
DCT and sparse representation,” IEEE Trans. Cybern., vol. 45, no. 11, from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958,
pp. 2498–2511, Dec. 2014. 2014.
[12] I. Dwivedi, S. Gupta, V. Venugopal, and S. Sundaram, “Online writer [35] X.-Y. Zhang, G.-S. Xie, C.-L. Liu, and Y. Bengio, “End-to-end
identification using sparse coding and histogram based descriptors,” online writer identification with recurrent neural network,” IEEE Trans.
in Proc. 15th Int. Conf. Frontiers Handwriting Recognit., Oct. 2016, Human-Mach. Syst., vol. 47, no. 2, pp. 285–292, Apr. 2016.
pp. 572–577. [36] N. Dalal and B. Triggs, “Histograms of oriented gradients for human
[13] A. Schlapbach, M. Liwicki, and H. Bunke, “A writer identification detection,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
system for on-line whiteboard data,” Pattern Recognit., vol. 41, no. 7, Recognit., vol. 1. Jun. 2005, pp. 886–893.
pp. 2381–2397, 2008. [37] S. Furui, “Speaker-independent isolated word recognition based on
[14] M. Liwicki, A. Schlapbach, and H. Bunke, “Fusing asynchronous emphasized spectral dynamics,” in Proc. IEEE Int. Conf. Acoust.,
feature streams for on-line writer identification,” in Proc. 9th Int. Conf. Speech, Signal Process. (ICASSP), vol. 11. Apr. 1986, pp. 1991–1994.
Document Anal. Recognit. (ICDAR), Sep. 2007, pp. 103–107. [38] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online learning for
[15] G. X. Tan, C. Viard-Gaudin, and A. C. Kot, “A stochastic nearest matrix factorization and sparse coding,” J. Mach. Learn. Res., vol. 11,
neighbor character prototype approach for online writer identification,” pp. 19–60, Mar. 2010.
in Proc. 19th Int. Conf. Pattern Recognit., Dec. 2008, pp. 1–4. [39] M. Liwicki and H. Bunke, “IAM-OnDB—An on-line English sen-
[16] G. Tan, C. Viard-Gaudin, and A. Kot, “Online writer identification using tence database acquired from handwritten text on a whiteboard,”
fuzzy c-means clustering of character prototypes,” in Proc. ICFHR, in Proc. 8th Int. Conf. Document Anal. Recognit., Sep. 2005,
Aug. 2008, pp. 475–480. pp. 956–961.
[17] C. Viard-Gaudin, G. X. Tan, and A. C. Kot, “Individuality of alphabet [40] A. Shivram, C. Ramaiah, S. Setlur, and V. Govindaraju, “IBM_UB_1:
knowledge in online writer identification,” Int. J. Document Anal. A dual mode unconstrained English handwriting dataset,” in Proc. 10th
Recognit., vol. 13, no. 2, pp. 147–157, 2010. Int. Conf. Document Anal. Recognit. (ICDAR), Aug. 2013, pp. 13–17.
[18] A. Shivram, C. Ramaiah, U. Porwal, and V. Govindaraju, “Modeling [41] The Institute of Automation, Chinese Academy of Sciences (CASIA).
writing styles for online writer identification: A hierarchical Bayesian (2007). (CASIA) Handwriting Database—Biometrics Ideal Test.
approach,” in Proc. Int. Conf. Frontiers Handwriting Recognit. (ICFHR), [Online]. Available: http://biometrics.idealtest.org/
Sep. 2012, pp. 387–392. [42] V. Christlein, D. Bernecker, A. K. Maier, and E. Angelopoulou,
[19] A. Shivram, C. Ramaiah, and V. Govindaraju, “A hierarchical Bayesian “Offline writer identification using convolutional neural network acti-
approach to online writer identification,” IET Biometrics, vol. 2, no. 4, vation features,” in Proc. German Conf. Pattern Recognit., Nov. 2015,
pp. 191–198, 2013. pp. 540–552.
[20] C. Ramaiah, A. Shivram, and V. Govindaraju, “Data sufficiency for [43] Y. Tang and X. Wu, “Text-independent writer identification via CNN
online writer identification: A comparative study of writer-style space features and joint Bayesian,” in Proc. 15th Int. Conf. Frontiers Hand-
vs feature space models,” in Proc. Int. Conf. Pattern Recognit. (ICPR), writing Recognit., (ICFHR), Shenzhen, China, Oct. 2016, pp. 566–571.
Aug. 2014, pp. 3121–3125.
[21] G. Singh and S. Sundaram, “A subtractive clustering scheme for text-
independent online writer identification,” in Proc. 13th Int. Conf. Doc- Vivek Venugopal, photograph and biography not available at the time of
ument Anal. Recognit. (ICDAR), Aug. 2015, pp. 311–315. publication.
[22] V. Venugopal and S. Sundaram, “An online writer identification system
using regression-based feature normalization and codebook descriptors,” Suresh Sundaram, photograph and biography not available at the time of
Expert Syst. Appl., vol. 72, pp. 196–206, Apr. 2017. publication.

You might also like