A Supervised Algorithm With A New Differentiated-Weighting Scheme For Identifying The Author of A Handwritten Text

(This is a sample cover image for this issue. The actual cover is not yet available at this time.
This article appeared in a journal published by Elsevier. The attached

copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/copyright
Author's personal copy
Pattern Recognition Letters 32 (2011) 1139–1144
Contents lists available at ScienceDirect
Pattern Recognition Letters

journal homepage: www.elsevier.com/locate/patrec
A supervised algorithm with a new differentiated-weighting scheme for

identifying the author of a handwritten text
Edith C. Herrera-Luna, Edgardo M. Felipe-Riveron ⇑, Salvador Godoy-Calderon
Artificial Intelligence Laboratory, Center for Computing Research, National Polytechnic Institute, Juan de Dios Batiz and Miguel Othon de Mendizabal, P.O. 07738,
Gustavo A Madero, Mexico
a r t i c l e i n f o a b s t r a c t
Article history: In this paper a new approach is presented for tackling the problem of identifying the author of a hand-
Received 15 December 2009 written text. This problem is solved with a simple, yet powerful, modification of the so called ALVOT fam-
ily of supervised classification algorithms with a novel differentiated-weighting scheme. Compared to
Communicated by L. Heutte
other previously published approaches, the proposed method significantly reduces the number and com-
plexity of the text-features to be extracted from the text. Also, the specific combination of line-level and
Keywords: word-level features used introduces an eclectic paradigm between texture-related and structure-related
Author identification
approaches.
Handwritten text
Differentiated-weighting
Ó 2011 Elsevier B.V. All rights reserved.
Supervised pattern recognition
ALVOT algorithm
1. Introduction In general, two types of characteristics (referred to as features)

can be identified in handwriting (Srihari et al., 2002): the so-called
Within the scope of computer aided analysis of handwritten conventional features, extracted by a graphology/graphoscopy ex-
text, two main areas are generally identified: Optical Character Rec- pert, and the computer-related features, automatically obtained
ognition (OCR) and Writing Identification and Verification (WIV). OCR from digital images of the text in order to eliminate subjectivity
has been widely studied and consists of recognizing characters at the time of their extraction. When a specific device is used for
within a digital image in such a way that the writing may be inter- this purpose, the text features are extracted at the very moment
preted word by word. The WIV area recognizes the author of a its writing is carried out and they are called dynamic in-line fea-
handwritten text, verifies and relates the authors to different doc- tures. On the other hand, if these features are extracted from a pre-
uments, finds possible changes, etc. viously scanned digital image, they are called static off-line features.
This paper focuses on the identification of authors of handwrit- In some WIV approaches it is necessary to know the semantic con-
ten texts, required in forensic control situations as well as by crim- tents of the words that make-up the text (text-dependent methods),
inalistic and criminology security institutions. and in others, the method does not depend on semantic contents
Studies regarding handwriting identification and verification (text-independent methods) (Pecharromán-Balbás, 2007).
through computer methods date back to the 1950’s and continue Computer-related features extracted from the text may be tex-
until the present time, recently having the most reliable results ture-related or structure-related (Bensefia et al., 2005). When
in (Bensefia et al., 2005; Pecharromán-Balbás, 2007; Srihari, extracting texture features, the document is seen as an image
1993, 2001; Pervouchine and Leedham, 2007; Said et al., 2000). and not as text, but when structural characteristics are extracted,
Nonetheless, the wide range of writing implements and sup- a description similar to that given by a graphology expert is sought
ports used in those studies have originated several procedures to because the distinctive characteristic properties of the author’s
characterize handwriting, differentiated among themselves by writing are desired.
their way of capturing data, their level of dependence on the This paper introduces a new supervised algorithm that is inde-
semantics of the written text and the methodology used to obtain pendent of the handwritten text to be identified. The method is
the main features that correspond to each text. based on off-line static features. The working methodology fol-
lowed in this research, poses and exemplifies the possibility of
identifying handwritten texts by using a new intermediate ap-
⇑ Corresponding author. Fax: +52 55 57296000/56607. proach between texture and structural features. The exact method-
E-mail addresses: edith.hluna@gmail.com (E.C. Herrera-Luna), edgardo@ ology followed is shown in Fig. 1. The first three stages correspond
cic.ipn.mx (E.M. Felipe-Riveron), sgodoyc@cic.ipn.mx (S. Godoy-Calderon). to activities specifically designed to put together a supervision
0167-8655/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.patrec.2011.03.002
1140 E.C. Herrera-Luna et al. / Pattern Recognition Letters 32 (2011) 1139–1144
Fig. 1. General methodology to identify the author of a handwritten text.
sample, that is, to undertake the traditional pre-processing feature 31 features from each selected word and using a neural network
extraction and classification tasks of every pattern recognition sys- classifier. In the doctoral thesis of Pecharromán-Balbás (2007) tex-
tem. On the other hand, the last stage presents a non-classic treat- tural characteristics are extracted to be used within probability
ment of the classification task. functions and use the K-NN classifier with Euclidean and Hamming
In the following sub-sections some specific data regarding each distances to identify and verify the author. Hertel and Bunke
of the stages of this methodology are briefly described. (2003) extract a set of structural characteristics from text lines,
as well as a set of fractal-based characteristics, resulting in a 90%
plus efficiency with a K-NN classifier. In (Plamondon and Srihari,
2. Background
2000) a study can be found about handwritten text recognition,
taking into account methodologies for working in-line as well as
The analysis of handwritten text for the purpose of verification
off-line. Zimmermann and Bunke (2002) describe the IAM database
and author identification is a tool that allows researchers to study
in a general way and also analyze the way in which images in-
text from many different points of view according to the number of
cluded were segmented. Both papers include references that detail
authors, the type and quantity of characteristics extracted from the
recent research work on handwritten texts analysis.
texts, the best classification algorithms, and so on. The analysis of
handwritten text is at the core of graphoscopic analysis techniques.
The criteria set for feature extraction and the specific processing of 3. Image pre-processing
resulting data patterns, proposed in this paper, provide a general
framework that allows researches to study handwritten text from For the purpose of this research an ad hoc sample database was
many different perspectives. created composed, amongst other things, by the texts written by
Graphoscopic analysis plays an important role in a number of 50 adult fellows and friends. The texts were written in English
practical problems, included, but not limited to, forensic and crim- and Spanish and with contents selected arbitrarily from books with
inalistic processing of evidence, validation of legal documents, his- the only condition that they have an interesting sense. Images of
toriography, psychological profiling, etc. handwritten text were scanned in color at 300 dpi with a conven-
Recently, amongst already published papers are Srihari (1993) tional scanner. In the way used to collect the sample, the writer
and Srihari et al. (2002), where the identification efficiency is con- copies two paragraphs, one text in English and the other one in
siderably reduced due to the high number of authors in the super- Spanish, always using print-type letters, without overwriting let-
vision sample. The algorithms proposed in (Said et al., 2000) ters, in different lines, with the same black ink pen and on a white
analyze the text image texture with Gabor’s filters and gray-scale paper also of the same type.
co-occurrence matrices (GSCM) using the weighted Euclidean dis- Image processing includes page alignment; the binary image is
tance and the K-NN classifier (Cover and Hart, 1967) to identify 40 obtained by thresholding the more detailed and less noisy green
different authors. Zois and Anastassopoulos (2000) propose the use plane of the text color image by the Otsu method (1979) and also
of horizontal projections and morphological operators together the Khashman and Sekeroglu (2007) method. Once images have
with the texture features of English and Greek words, using a mul- been thresholded, a geodesic reconstruction is obtained by erosion
tilayer perceptron and a Bayesian classifier. Bensefia et al. (2005) using the Otsu-processed image as a marker (Fig. 2).
split words into graphemes, and at the same time combine the lo-
cal features extracted from regions of text. Finally they employ a
commonly used model for information retrieval, known as vector 4. Feature extraction
space model, in the task of identifying the author of the text. Per-
vouchine and Leedham (2007) present an algorithm that extracts Both line-level and word-level features are extracted from text
structural features from characters and a grapheme, by means of images (see Table 1). At the line-level the space percentage used by
a genetic algorithm to look for optimal features after extracting the left margin, the right margin, the separation between
Fig. 2. (a) Original spanish-text image in color, (b) Otsu-thresholding over the green plane, (c) Khashman–Sekeroglu thresholding and (d) final noise-free binary image.
E.C. Herrera-Luna et al. / Pattern Recognition Letters 32 (2011) 1139–1144 1141
Table 1
Features representing a line of text.
Level Description
Line-level Space percentage occupied by left and right margins
Features Current line/(previous line, next line) ratios
Average inter-word space
General direction of writing Fig. 4. Features extracted on the word-level.
Word- Average (upper-zone/middle-zone) and (lower-zone/middle-
level zone) ratios The comparison criterion for values within the same feature
Features Average inclination of words may be the same for all the features or we can use different expres-
Number of words in the line
sions for comparing values from each feature. In the results re-
ported below, experiments made with both modalities are
included. As an example of this, consider that in the case in which
subsequent lines, the general direction of the writing and the inter- all features are compared with the same criterion, the following
word space are considered (see Fig. 3). expression is used:
Features extracted at the word level include the proportions of
the middle zone of the writing compared to that of the upper and jxs ðoi Þ xs ðoj Þj
Ccs ðoi ; oj Þ xs ðoj Þ ¼ 1 ð2Þ
lower zones, word inclination as well as the presence of crest and/ j maxfxs g minfxs gj
or axis in all words (see Fig. 4).
where jxs(oi) xs(oj)j is the absolute value of the difference between
Each text line is represented by a 22-tuple (pattern) composed
the values corresponding of the xs feature, adopted by the objects oi
by word-level and line-level features. Word-level features are ex-
and oj; max{xs}, min{xs} are the maximum and minimum values,
tracted for words with and without upper-zone, as well as for
respectively, observed in feature xs.
words with and without lower-zone. All features result in posi-
tive-real or integer numbers, except for the average inclination of
5.2. Calculating the weighting factors
words, which takes values in the interval [1, 1] of real numbers.
One of the most relevant elements for the recognition process is

5. Recognition process the measurement of the relative importance of each feature to be
considered. In order to measure this importance there is a range
Once the text features have been extracted, and an adequate of algorithms, the most notable one being the Principal Component
supervision sample has been put together, it only rests to solve a Analysis (PCA) as well as several criteria within Testor Theory
supervised pattern classification problem. The text whose author (Ruiz-Shulcloper and Abidi, 2002; Lazo-Cortés et al., 2001; Ruiz-
is to be recognized is pre-processed by the exact procedures de- Shulcloper et al., 1999; Santiesteban-Alganza and Pons-Porrata,
scribed above and several line and paragraph patterns are formed. 2003). Although the methodology shown in this paper is indepen-
Using slight conceptual and operative modifications of the classic dent of the algorithm used for calculating feature-relevance, the
ALVOT classification-algorithm, each of those patterns is indepen- specific way of doing so, during the experiments reported herein
dently classified and a final decision is issued about the authorship is described below:
of the text. Next sections describe the classification process in
detail. 1. Select a set of m representative patterns from each class. These
ideal objects from each class, called holotypes (defined as the
5.1. Comparison between patterns patterns whose average similarity to all other patterns in the
same class is optimal), are selected to make-up a reduced
For direct comparison between patterns a function (f) that com- supervision sample to guide the classification process.
pares by difference, with a weighted syntactic-distance of the two 2. For each holotype, calculate its average feature-by-feature sim-
patterns is used, that is, the arithmetic average of the pair-wise dif- ilarity with respect to the rest of the patterns in the same class.
ference between the corresponding values that make up two data 3. For each class and for each holotype, calculate its average fea-
tuples. Let {x1, x2, . . ., xr} be a set of r descriptive features or charac- ture-by-feature similarity with respect to all other non-holo-
teristics that form the patterns (in this case r = 22 ), so that: type patterns in the other classes.
4. With the result of step 3, calculate a normalized weight for each
X
r
feature and specific to each class.
f ðoi ; oj Þ ¼ Pr ½Ccs ðoi ; oj Þ ð1Þ
s¼1
Finding and using the specific weighting for each feature consti-
where oi, oj are the objects (lines of text) whose patterns will be tutes the most original element in the proposed algorithm. When
compared; Ccs(oi, oj) is a comparison criterion for the values of experimenting with large text samples, extracting the same
feature s; and Pr is a real value in the [0, 1] interval, assigned as features mentioned in Table 1, and comparing patterns using sim-
weight for feature s. ilarity function f shown in (1), the great similarity between pat-
Fig. 3. Features extracted on the line-level.

terns representing the handwritings of different authors becomes 2. Select the m most representative patterns (holotypes) in each
evident. This similarity complicates the recognition process con- class and make up a reduced supervision sample with them.
siderably. In order to be able to solve the problem, a differenti- 3. By using Eq. (1) compare each pattern to be classified with all
ated-weighting scheme was used. This scheme allows the the patterns of the reduced supervision sample formed in step
association of a different weight to each feature, depending on 2.
the class to which a new pattern is to be compared. The differenti- 4. Calculate the average similarities of each pattern to be classified
ated-weighted scheme allows the comparison of writing patterns with all the representative patterns of each class, as a vote func-
focusing on the distinctive aspects of each author’s handwriting. tion, in the reduced supervision sample.
Given the fact that each author is modeled as a class when all 5. (Solution rule) Allocate the pattern to be classified to the class in
the text lines are written by the same author, differentiated- which the vote function has the highest value (this means in the
weighting results in a more efficient classification of new patterns class in which the representative objects have the highest aver-
and therefore, also in the correct identification of the author of the age similarity with the pattern to be classified).
text. 6. If it is deemed to be convenient, add the recently identified pat-
A traditional non-differentiated weighting scheme assigns a terns to the supervision sample and recalculate the objects that
weight value to each pattern’s feature (each characteristic), so that are representative of each class.
this weighting can be considered while calculating similarity or
difference between patterns with Eq. (1). On the contrary, a differ- The classification of text is a collective decision based on the
entiated-weighting scheme assigns a different feature weight-va- lines contained in a given text. Its classification is as follows:
lue depending on the class being tested. This mechanism
provides enough flexibility to accurately discriminate patterns 1. Create a pattern for each line in the text with a minimum length
belonging to classes where the same sub-set of features is relevant of n words and classify it using the above procedure.
but with a different proportion in each case. When referring to the 2. (Solution rule) The whole text is labeled as the class which con-
differentiated-weighting scheme one should fill a k r matrix (k tains the majority of its line-patterns, allowing a maximum of k
classes by r pattern features), which stores the specific feature classes. If a text is classified in more than k classes the label
weight-value to be considered when testing membership to each unknown author is assigned.
class in the supervision sample.
After reviewing the above process it seems worthwhile to re-
view the specific ways in which the proposed algorithm consti-
5.3. Classification algorithm tutes a novel modification to the classic ALVOT family of
supervised classification algorithms. As all algorithms in this fam-
ALVOT is a well known family of supervised classification algo- ily, our proposal combines the information of several partial com-
rithms (Ruiz-Shulcloper et al., 1986; Zhuravliov Yu and Nikiforov, parisons to determine the global similarity between two patterns,
1971; Zhuravliov Yu, 1978). All algorithms within this family share but the specific way in which each text object to be recognized is
two main properties: they calculate the similarity of two patterns represented by several text-line patterns enforces a drastically dif-
by means of several partial comparisons and they synthesize all the ferent way of calculating the global similarity. This multiple-pat-
learned information about the pattern to be classified in one single tern representation of each text, along with the collective
quantity for each class (a vote). After doing so, each instance of this decision criteria used by the solution rule, establishes a new and
algorithm family applies a different solution rule to issue the final not previously explored facet of this type of classification algo-
decision about in which class assign the pattern, the membership rithms. The resulting impact over the precision and general effi-
degree of the pattern to that class and, of course, how to handle ciency of the identification process can be seen on the next section.
voting ties.
Using a simple, yet powerful modification to the ALVOT general
6. Experimental results
classification technique, the task of identifying the author of a
handwritten text was satisfactorily resolved. First the general iden-
In order to undertake the experiments reported herein, a data-
tification process will be explained and later a brief reflection
base with three equal handwritten texts and 50 different signa-
about the modifications required will be carried out.
tures is used. Each text contains from 5 to 9 lines, giving closely
The author identification process is based on the classification
a total of 600 lines. Three supervision and control samples were
of the patterns representing each one of the lines of text under
built with an 80/20 ratio. In the first group the supervision sample
study. An ALVOT algorithm based on holotypes is used to classify
contains the most representative patterns in each class. In the sec-
each pattern. Each pattern is independently classified and then a
ond group the total texts are randomly divided between the super-
general solution rule is applied, which, based on the classification
vision sample and the control sample. The supervision sample
of all the lines that make up the text, issues a final decision regard-
from the third group contains the least representative patterns in
ing the authorship of the text.
each class. In each one of these experiments, the objects from the
This final decision may be one of the authors included in the
control sample are classified taking the supervision sample objects
supervision sample or an unknown author not included in the sam-
as a reference in each one of these experiments. Three class-repre-
ple, if certain similarity thresholds previously established are not
sentative patterns were selected from each class within the super-
met. The specific procedure is as follows:
vision sample, according to the previously described procedure.
First, in order to demonstrate the relevance of the weighting
1. For each and every pattern in the supervision sample, deter-
scheme, percentages of efficiency per group are shown when clas-
mine its representativeness in the class to which it belongs.
sifying the texts without the use of any weighting for the descrip-
The representativeness T of each pattern is calculated as
tive features (see Table 2). Three rules are proposed to evaluate the
follows:
efficiency of the text classification.
P
os 2C j f ðoi ; os Þ
os –oi 1. (Rule 1) The author of a text is correctly identified (1 point), if
T C j ¼ ðoj Þ ¼ ð3Þ
jC j j 1 and only if it is classified into the corresponding correct class.
E.C. Herrera-Luna et al. / Pattern Recognition Letters 32 (2011) 1139–1144 1143
Table 2
Results of applying the weighting-less similarity functions.
Experiment Group % of Lines classification % of Text classification % of Text classification % of Text classification
(Rule 1) (Rule 2) (Rule 3)
1 1 42.28 50.00 61.11 55.56
2 2 52.76 66.67 83.33 75.00
3 3 46.46 55.56 66.67 61.11
Table 3
Results when the classification use different weightings and similiarity criteria.
4 1 60.16 61.11 72.22 65.62
5 1 61.79 66.67 72.22 69.44
6 2 67.72 88.89 88.89 88.89
7 2 64.57 83.33 94.44 88.89
8 3 65.35 88.89 88.89 88.89
9 3 62.99 83.33 83.33 83.33
Table 4
Results of classification by using the centroids of the classes.
10 1 73.17 83.33 94.44 88.89
11 2 70.08 88.89 88.89 88.89
12 3 72.44 88.89 94.44 91.67
Table 5
Comparison of the efficiency and testing conditions for identifying the author of a hadwritten text.
Publication Number of writers Sample size Lexicon dependency Performances (%)

Said et al. (2000) 40 Few lines of handwritten text No 95.0
Zois and Anastassopoulos (2000) 50 Forty-five examples of the same word Yes 92.48
Marti et al. (2001) 20 Five examples of the same text Yes 90.00
Bensefia et al. (2005) 88 Paragraphs/3–4 words No 93.0/90.0
Our methodology 30 Three examples of the same text No Rule 1 88.89
Rule 2 94.44
Rule 3 91.67
2. (Rule 2) Identification is correct if and only if the text is classi- feature. Surprisingly enough, this raised the effectiveness of the
fied in less than q classes and one of them is the correct one. algorithm even more in the percentage of the efficiency of the algo-
3. (Rule 3) Same as Rule 2 but assigning just a proportional hit rithm for the lines and to a lesser degree in the classification by
instead of a complete hit. text, with the most stringent rule (Rule 1) (see Table 4).
Lastly, our proposed methodology and algorithms ought to be
Results shown by this type of experiment are notoriously unsat- compared with those previously obtained by other authors. Unfor-
isfactory low because they correctly identify the author of the text tunately this comparison cannot be a precise, objective one, be-
only in 50% of the cases. In order to correct this situation the differ- cause the drastically different type of samples database, the
entiated-weighting scheme for each class, for each descriptive fea- distinct pre-processing techniques and, the diverse properties of
ture is added. In the second series of experiments three ideal the classification algorithms used by other authors make it impos-
objects from each class are selected to conform a reduced supervi- sible to compare their results on a fair base. Nevertheless, we think
sion sample containing only the selected holotypes of each class, to that the reader can benefit from a not-strict comparison that shows
make the classification with respect to it. Results obtained in these all the context details.
experiments are shown in Table 3. Table 5 shows results obtained from this research compared
Two aspects are worthwhile to consider: first, the correct clas- with those previously obtained by other authors, taken from
sification percentage is increased at the text level, reaching levels Bensefia et al. (2005). The comparison is summarized and some de-
higher than 80% in several cases. Second, although the classifica- tails are added on the type of methodology applied.
tion at a line level was substantially modified, the impact of the
differentiated-weighting becomes evident in the results at a text
level. A third set of experiments was carried out, taking advantage 7. Conclusions
of the best weighting scheme resulting from these experiments,
considering the centroid of each class as the holotype pattern. This A supervised identification algorithm based on the selection of
pattern is formed by the arithmetic mean of all the values in each holotypes and with a differentiated-weighting scheme is used to
identify the author of a handwritten text by means of the individ- Cover, T.M., Hart, P.E., 1967. Nearest neighbor pattern classification. IEEE Trans.
Inform. Theory IT-13 (1), 21–27.
ual classification of the lines that are part of the text. The descrip-
Hertel, C., Bunke, H., 2003. A set of novel features for writer identification. In: Proc.
tive features or characteristics used to make up the patterns 4th Internat. Conf. on Audio and Video-based Biometric Person Authentication,
representing such lines include features extracted at the word-le- pp. 679–687.
vel as well as at the line-level. Khashman, A., Sekeroglu, B., 2007. A novel thresholding method for text separation
and document enhancement. In: 11th Panhelenic Conference on Informatics
When all the lines that form the text to be identified have been 2007, vol B, Greece, pp. 324–330.
classified, a rule for the final collective decision regarding the Lazo-Cortés, M., Ruiz-Shulcloper, J., Alba-Cabrera, E., 2001. An overview of the
author of such text is applied. evolution of the concept of testor. Pattern Recognition 34, 753–762.
Marti, U.V., Messerli, R., Bunke, H., 2001. Writer identification using text line based
The largest efficiency percentages regarding identification are features. In: Proc. ICDAR’01, pp. 101–105.
achieved in the experiments in which centroids are used as holo- Otsu, N., 1979. A threshold selection method from gray-level histograms. IEEE
types for each class. Trans. Systems Man Cybernet. 9, 62–66.
Pecharromán-Balbás, S., 2007. Reconocimiento de escritor independiente de texto
The class differentiated-weighting scheme shows an important basado en características de textura. Thesis Doctoral, Escuela Politécnica
difference in the final results. This is due to the great similarity that Superior, Universidad Autónoma de Madrid.
the handwritings of several authors sometimes exhibit. Thanks to Pervouchine, V., Leedham, G., 2007. Extraction and analysis of forensic document
examiner features used for writer identification. Pattern Recognition 40 (3),
the weighting scheme we were able to achieve a more precise 1004–1013.
characterization of the relevance of each of the descriptive features Plamondon, R., Srihari, S.N., 2000. On-line and off-line handwriting recognition: A
considered in each one of the authors. A relevant aspect of the comprehensive survey. IEEE Trans. Pattern Anal. Machine Intell. 22, 63–84.
Ruiz-Shulcloper, J., Abidi, M., 2002. Logical combinatorial pattern recognition: A
weighting process which was observed during experimentation is
review. In: Pandalai, S.G. (Ed.), Recent Research Developments in Pattern
that the weighting of the features should never decrease to zero. Recognition Transworld Research Netwrok, Trivandrum, Kerala, India, pp. 133–
In these cases the deterioration of the results is evident. 176.
Comparing with previously published works, the best result of Ruiz-Shulcloper, J., Ponce de León-Sentí, E., López-Reyes, N., García-Mesa, N.,
Barros-Barreras, M.A., Cobos-Monzón, E., 1986. ALVOT, Sistema de programas
the reported set of experiments exceed the results previously re- de algoritmos de votación para la clasificación. Rev. Cien. Mat. VII (1), 41–60.
ported, with the added advantage that the algorithm proposed Ruiz-Shulcloper, J., Guzmán-Arenas, A., Martínez-Trinidad, J.F., 1999. Enfoque
herein is not dependent on the semantic content of the text. Lógico Combinatorio al Reconocimiento de Patrones. I. Selección de Variables
y Clasificación Supervisada. Instituto Politécnico Nacional, Colección de Ciencias
Recent research studies suggest the simplification of the auto- de la Computación, México, D.F.
matic calculation of the weightings and the determination of the Said, H., Tan, T., Baker, K., 2000. Personal identification based on handwriting.
similarity thresholds that may be more accurate to identify an Pattern Recognition 33 (1), 149–160.
Santiesteban-Alganza, Y., Pons-Porrata, A., 2003. LEX: Un nuevo algoritmo para el
author not considered in the supervision sample. Recently, still cálculo de los testores típicos. Revista Ciencias Matemáticas 21 (1), 85–95. Dpto.
undergoing research studies suggest a simplified, more precise de Computación, Universidad de Oriente, Cuba.
technique for automatic calculation of weightings could be possi- Srihari, S.N., 1993. Recognition of handwritten and machine-printed text for postal
address interpretation. Pattern Recognition Lett. 14 (4), 291–302.
ble. The implementation of these improvements may be extremely Srihari, S.N., 2001. Handwriting identification: Research to study validity of
useful for the identification of authors of handwritten texts, mainly individuality of handwriting and develop computer-assisted procedures for
in forensic control situations as well as in security institutions. comparing handwriting. Tech. Rep. CEDAR-TR-01-1, Center of Excellence for
Document Analysis and Recognition, University of Buffalo, USA.
Srihari, S.N., Cha, Sung-Hyuk, Arora, Hina, Lee, Sangjik, 2002. Individuality of
Acknowledgements handwriting. J. Forensic Sci. 47 (4). Paper ID JFS2001227-474.
Zhuravliov Yu, I., 1978. Acerca del enfoque algebraico para la solución de problemas
The authors thank the Academic Secretary, COFAA, Postgradu- de reconocimiento o clasificación. Rev. Probl. Kibernet. 33, 5–68.
Zhuravliov Yu, I., Nikiforov, V.V., 1971. Algoritmos de reconocimiento basados en el
ate and Research Secretary, and Center for Computing Research cálculo de las evaluaciones. Kivernetika 3, 1–11. en ruso.
of the National Polytechnic Institute, CONACyT and SNI, for their Zimmermann, M., Bunke, H., 2002. Automatic segmentation of the IAM off-line
economic support to carry out this work. handwritten {English} text database. In: 16th Internat. Conf. on Pattern
Recognition, Canada, vol. 4, pp. 35–39.
Zois, E., Anastassopoulos, V., 2000. Morphological waveform coding for writer
References identification. Pattern Recognition 33 (3), 385–398.
Bensefia, A., Paquet, T., Heutte, L., 2005. A writer identification and verification
system. Pattern Recognition Lett. 26, 2080–2092.

A Supervised Algorithm With A New Differentiated-Weighting Scheme For Identifying The Author of A Handwritten Text

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Supervised Algorithm With A New Differentiated-Weighting Scheme For Identifying The Author of A Handwritten Text

Uploaded by

Copyright:

Available Formats

(This is a sample cover image for this issue. The actual cover is not yet available at this time.

This article appeared in a journal published by Elsevier. The attached

Pattern Recognition Letters 32 (2011) 1139–1144

Contents lists available at ScienceDirect

Pattern Recognition Letters

A supervised algorithm with a new differentiated-weighting scheme for

1. Introduction In general, two types of characteristics (referred to as features)

1140 E.C. Herrera-Luna et al. / Pattern Recognition Letters 32 (2011) 1139–1144

Fig. 1. General methodology to identify the author of a handwritten text.

E.C. Herrera-Luna et al. / Pattern Recognition Letters 32 (2011) 1139–1144 1141

One of the most relevant elements for the recognition process is

Fig. 3. Features extracted on the line-level.

1142 E.C. Herrera-Luna et al. / Pattern Recognition Letters 32 (2011) 1139–1144

E.C. Herrera-Luna et al. / Pattern Recognition Letters 32 (2011) 1139–1144 1143

Publication Number of writers Sample size Lexicon dependency Performances (%)

1144 E.C. Herrera-Luna et al. / Pattern Recognition Letters 32 (2011) 1139–1144

You might also like