xy p
i
p
i
+1
p
i
+2
p
i
+3
p
i
+4
l
i
l
i
+1
l
i
+2
l
i
+3
φ
i
θ
i
+1
hw
Figure 1.
Features extracted from the on-line handwriting.
presented and discussed in Section 4, while Section 5 draws some conclusions and gives an outlook forfuture work.
2. Features
To acquire the handwritten data, the eBeam
1
interface is used. It outputs a sequence of (
x,y
)-coordinatesrepresenting the location of the tip of the pen together with a time stamp for each location. A series of normalization operations are applied before feature extraction. The operations intend to improve thequality of the features without removing writer specific, i.e. class specific information. The recorded on-line data contain noisy points and gaps within strokes, which are caused by loss of sampling data duringacquisition. To recover from artifacts of this kind, two preprocessing steps are applied to the data (Liwickiand Bunke, 2005
a
). The cleaned text is then automatically divided into lines using a simple heuristic rule.The next step is to divide each text line into sub-parts which can then be normalized independentlyof each other. For each sub-part the skew angle is corrected to horizontally align the text. Next each sub-part is divided into three regions: the upper area, which contains the ascenders of the letters; the medianarea with the corpus of the letters; and the lower area containing the descenders of the letters. These threeareas are normalized to predefined heights. Finally, the width of each sub-part is normalized. The numberof characters is estimated as a fraction of the number of strokes crossing the horizontal line between thebase line and the corpus line. The text is then horizontally scaled according to this value.The feature set used in this experiment contains on-line features as well as features extracted froman off-line representation of the on-line data. (In the remainder of this section, the number in roundbrackets behind the name of a feature indicates the number of individual features.) For a given stroke
s
consisting of points
p
1
to
p
n
, the following 18 on-line features for each consecutive pair of points (
p
i
,
p
i
+1
)are computed (see Fig. 1 for an illustration):
speed (1)
;
writing direction (2)
;
curvature (2)
;
normalized
x
- and
y
-coordinate (2)
;
speed in
x
- and
y
-direction (2)
;
overall acceleration (1)
;
acceleration in
x
- and
y
-direction (2)
;
log curvature radius (1)
, which is the length of the circle which best approximates thecurvature at the point
p
i
(Richiardi, Ketabdar and Drygajlo, 2005);
vicinity aspect (1)
, i.e., the aspectof the trajectory in the vicinity of the point
p
i
;
vicinity curliness (1)
, i.e., the deviation from a straightline in the vicinity of the point
p
i
;
vicinity linearity (1)
, i.e., the average square distance between everypoint in the vicinity and the straight line linking the first and the last point in the vicinity; and
vicinity slope (2)
, i.e., the cosine and the sine of the angle of the straight line from the first to the last vicinitypoint (Jaeger et al., 2001).The off-line features are computed using a two-dimensional matrix representing an off-line versionof the data. The matrix is obtained by projecting the on-line strokes on the two-dimensional plane.The following features are used:
ascenders/descenders (2)
, i.e., the number of points above/below thecorpus line whose
x
-coordinates are in the vicinity of the point and which have a minimal distance tothe corpus/base line and
context map (9)
, i.e., the two-dimensional vicinity of the point is divided intothree regions for each dimension. The number of black points in each region is taken as a feature value.Overall the feature set consists of 29 features (Liwicki et al., 2006).
3. Classifiers
Two approaches to address the two-class classification problem are taken. The first is a
discrimina-tive approach
where a Support Vector Machine (SVM) directly attempts to maximize the discriminability
1
eBeam System by Luidia, Inc. - www.e-Beam.com
Leave a Comment
Dr. Fred Bell from his book "Death of Ignorance" New Age Science. Nov 7 1980