You are on page 1of 72

Extending Graphs with Additional Edges for Keyword Spotting

Masterarbeit der Philosophisch-naturwissenschaftlichen Fakultät


der Universität Bern

vorgelegt von

Konstantin Niedermann

2019

Leiter der Arbeit

Prof. Dr. Kaspar Riesen


Abstract

The present thesis proposes extensions to graphs for a Keyword Spot-


ting (KWS) task in handwritten historical documents and explores their
impacts on the Mean Average Precision (MAP). In particular, parts of a re-
cently published graph-based KWS framework, that studies the performance
of Graph Edit Distance (GED) heuristics, are adopted and extended by
adding further preprocessing steps after the keypoint graph extraction from
single word images from historical documents. Since the heuristics used
in the framework focus on the near environment of nodes for node-to-node
matchings, they leave out a lot of information about the general structure
of the word graph. With the extensions proposed in this thesis, we aim to
add edges to each node that imply knowledge about the graph as a whole.
Researching the inclusion of different versions of additional edges to the
extracted graphs and extending the cost function of the GED heuristic called
Hausdorff Edit Distance (HED), improvements of up to 1.6% (absolute) of
the MAP can be achieved on the benchmark databases.
With these results, we can show that some extensions that are tailored for
specific KWS tasks improve the performance of GED heuristics without
increasing runtime complexity although some additional comparisons are
made. This is especially interesting, because in the last decades many
libraries have started to preserve their historical treasures by digital means.
As a result, large amounts of handwritten historical manuscripts have been
made digitally available for a broader audience. However, the accessibility
of such documents is limited. This is due to the fact that automatic full
transcriptions are generally not feasible in case of ancient manuscripts due
to large variations in the handwriting and noise in the documents. By
exploiting and enhancing the possibilities of GED heuristics, we facilitate the
application of such processes on digital historical manuscripts which helps
to close the gap between availability and accessibility of these documents.

I
Contents

1 Introduction 1
1.1 Broader Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Keyword Spotting (KWS) . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Graph Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5.1 Alvermann Konzilsprotokolle (AK) . . . . . . . . . . . . . . . 6
1.5.2 Botany (BOT) . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Graphs and Graph Matching 9


2.1 Introduction to Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1 Formal introduction of Graphs . . . . . . . . . . . . . . . . . . 11
2.2 Document and Image Preprocessing . . . . . . . . . . . . . . . . . . . 11
2.2.1 Document Preprocessing . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 Graph Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Graph Matchings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.1 Exact and Inexact Graph Matching . . . . . . . . . . . . . . . . 15
2.3.2 Graph Edit Distance (GED) . . . . . . . . . . . . . . . . . . . 17
2.3.3 Bipartite Graph Edit Distance (BP) . . . . . . . . . . . . . . . 18
2.3.4 Hausdorff Edit Distance (HED) . . . . . . . . . . . . . . . . . 20
2.4 Distance Evaluation and Evaluation Metric . . . . . . . . . . . . . . . 21
2.4.1 Cost Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.2 Evaluation Metric . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.5 Reference Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Extended Graphs and Adapted Graph Matching 27


3.1 Graph Extensions and Costfunctions . . . . . . . . . . . . . . . . . . . 29
3.1.1 Extension 1 - One Edge to the Furthest node . . . . . . . . . . 30
3.1.2 Extension 2 - One Edge to the Furthest Node - New Cost Function 32

II
III

3.1.3 Extension 3 - Three edges to the furthest nodes . . . . . . . . . 34


3.1.4 Extension 4 - One edge to the furthest node in each quadrant . . 36
3.1.5 Further Graph Extensions . . . . . . . . . . . . . . . . . . . . 39

4 Experimental Evaluation 41
4.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 Validation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.1 Extension 1 - One edge to the furthest node . . . . . . . . . . . 44
4.2.2 Extension 2 - One edge to the furthest node - new cost function 46
4.2.3 Extension 3 - Three edges to the furthest nodes . . . . . . . . . 49
4.2.4 Extension 4 - One edge to the furthest node in each quadrant . . 52
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5 Conclusion and Future Work 57


1
Introduction

1.1 Broader Perspective

Pattern Recognition (PR) is a scientific discipline in computer science that deals with the
automated recognition of patterns and regularities in data and the conclusion of actions
[1]. PR has been crucial for the survival of the human race. Over the past millions of
years, we have developed highly sophisticated neural and cognitive systems for such
tasks. In computer science, PR mimics this task of recognition and decision making.
Undeniably, humans are true experts at PR in daily life as we have been trained and
optimized since the early ages of the human race. However, we have reached a point

1
2 CHAPTER 1. INTRODUCTION

where pattern recognition algorithms achieve better results than human experts thanks to
the exponential rise of Machine Learning (ML) and Artificial Intelligence (AI) techniques
[2, 3].

Handwriting Recognition (HWR) is one application in PR and is understood as the task


of transforming a language represented in its spatial form of graphical marks into its
symbolic representation. Handwriting data is converted to digital form either by scanning
the writing on paper or by writing with a special pen on an electronic surface. These two
approaches are distinguished as offline HWR where the image of the word is converted
into gray level pixels using a scanner, and online HWR, where the x- and y-coordinates
of the pen tip are recorded as a function of time. Offline HWR is generally regarded as
the more difficult task of the two [4].

In the last decades, automatic HWR has been applied in many different domains such as
the recognition of address interpretation [5], bank checks [6], handwritten mathematical
expressions [7] or music notes [8], to name just a few. HWR has also found widespread
application in handwritten historical documents in historical manuscripts [9, 10]. Hand-
written historical documents are an eminent witness of invaluable importance that cover
knowledge and events of history. However, these important sources of knowledge are
endangered by increasing effects of degradation caused by external conditions such as
the exposure to light and humidity as well as the natural dissolution of paper. There-
fore, many ancients manuscripts have been digitally preserved in the last years. Due
to digitalization, ancient historical documents could not only be preserved from further
degradation but also made available to a broader audience. However, the accessibility of
these document is limited, since automatic full transcriptions are often not feasible or
imperfect in the case of ancient manuscripts due to a large variation in handwriting and
noisy, degraded documents [11].
1.2. KWS 3

1.2 Keyword Spotting (KWS)

Keyword Spotting (KWS) has been proposed as a flexible and more error-tolerant
alternative to full transcriptions of handwritten historical documents [12, 13]. Originally
proposed for speech recognition [14] and nowadays used by personal digital assistants
such as Alexa or Siri for so-called hot word detection, KWS has been adapted to printed
[12] and eventually for handwritten documents [13]. KWS is a problem that deals with
retrieving all instances of a given query word in a certain speech- or written document,
making these documents accessible for searching and browsing. It can be separated into
two categories, viz. template-based and learning-based approaches. Template-based
KWS uses template images of query words for finding new instances for the given
input. This procedure has the advantage that no knowledge of the underlying language
is needed and that template images are easy to obtain. However, at least one template
image is needed per query word. Since relying on few template images, these systems
usually do not deliver good results when generalizing to different writing styles. As a
template-based approach, Dynamic Time Warping (DTW) has been extensively studied
to match template images with segmented word images based on a sliding window [15].

Learning-based KWS on the other hand use supervised learning. Therefore these systems
require a lot of labeled training data and are more expensive than the former approach.
Learning-based approaches have to be trained a priori for the actual classification task,
making these applications more time-consuming and costly but also more powerful.
When applied to data under difficult conditions (i.e. with a lot of variances in terms
of style or degradation) learning-based approaches usually deliver better results than
template-based approaches. Hidden Markov Models (HMMs) [16] and Recurrent Neural
Networks (RNNs) [17] have been proposed for modeling handwritten documents as
a learning-based approach. However, in this thesis, we focus on the more flexible
template-based approach.

In the field of KWS, we further distinguish two tasks, viz. offline and online KWS. In
offline KWS, only spacial information about the scanned handwriting document is used.
In online KWS, additionally, temporal information about the writing process is captured
4 CHAPTER 1. INTRODUCTION

which may help to solve the task of KWS. The focus of this thesis lies on spotting
keywords in historical documents which belongs to offline KWS.

A further distinction in the field of KWS is Query-by-Example/Query-by-String (QbE/QbS);


i.e., whether the query is by giving a word image example, or by a character string. In
this thesis, we focus on QbE, given scans of historical documents.

Finally, the formalism used to represent the underlying image data is used as a fourth dis-
tinctive feature in KWS. We distinguish between statistical and structural representation
formalisms for modeling patterns such as handwriting. Statistical pattern recognition
extracts characteristics for each element in a feature vector with predefined length. Then,
a wide variety of comparing methods based on linear algebra can be used on these vectors
to distinguish and classify entities. This can be done very efficiently since the representa-
tions of features are simple vectors. However, the fixed structure and size of the feature
vector make statistical pattern recognition not the best solution when facing entities that
differ in size and complexity. In these cases, structural representations are better suited
for the formal representation. Graphs, for example, can model binary relations between
objects and may adapt their structure and size to the complexity of the uderlying data.
Thus, structural approaches result in a more powerful and flexible representation when
compared to those used in statistical approaches. However, lacking in useful mathemati-
cal structure, and facing complexity problems challenges real-world applications of the
statistical approach. In this thesis, we will focus on template-based algorithms that use
structural graph representations for query-by-example keyword spotting in handwritten
historical documents.

1.3 Graph Matching

Since KWS refers to the task of retrieving all instances of a given query, we need some
distance or similarity measure for graphs. Graph Edit Distance (GED) is considered one
of the most flexible graph matching models available [18]. It measures the minimum
cost needed to transform one graph into another by a set of different edit operations
1.4. MOTIVATION 5

and corresponding edit costs. Originally, edit distance has been proposed for strings
known as Levenshtein distance [19], but nowadays, has found widespread applications to
compute similarities of graphs [20, 21]. Although GED allows a certain error-tolerance
and is therefore well suited to handle small variation in the handwriting, its applicability
is limited to graphs of a rather small size due to its computational complexity [22]. In
recent years, fast suboptimal graph matching algorithms have been proposed to overcome
this restricted applicability. Bipartite Graph Edit Distance (BP) [23] and later Hausdorff
Edit Distance (HED) [24] are proposed GED heuristics that reduce the complexity to
cubic respectively quadratic time. In this thesis, we will focus on the faster HED as
distance measure. In chapter 2, we will summarize the concepts of graphs and graph
matchings in more detail.

1.4 Motivation

In this thesis, we pick up a framework from a recent PhD thesis that extended and
compared approximate graph distance computation to the problem of matching hand-
writing word graphs [25]. However, we restrict ourselves to the fast GED approximation
HED, applied on keypoint graphs from words of two well known historical documents
(Alvermann Konzilsprotokolle (AK) [26] and Botany (BOT) [15] are introduced in the
next section 1.5). With the objective to outperform the results achieved in [25], we
extend the graphs and edit distance cost function. While the underlying graphs are based
on extractions from word images where keypoints of a word image are used for nodes
and pen strokes between keypoints are used for edges, the extensions proposed in this
thesis only focus on the edges. These edges are added in a further preprocessing step,
after the keypoint graph extraction from the word images. Since the heuristics used in
the framework only take the near environment of nodes into account for node-to-node
matchings, they leave out a lot of information about the general structure of the word
graph. With the extensions proposed in this thesis, we aim to add edges to each node
that imply knowledge about the graph as a whole. From extending the underlying graphs
with additional edges that carry metainformation about some relation of two nodes and
6 CHAPTER 1. INTRODUCTION

involving them in the edit distance computation of HED, we expect improvement for the
KWS classification task.

1.5 Datasets

In the last decades, a wide set of handwritten historical manuscripts have been digitalized
and made publicly available. These documents cover a wide range of different languages,
alphabets, and time ranges, and thus, build an important pillar of our cultural heritage. To
make these documents available for searching and browsing, different Keyword Keyword
Spotting (KWS) approaches have been proposed [12, 13].

In this thesis, we focus on two different handwritten ancient manuscripts, Alvermann


Konzilsprotokolle (AK) and Botany (BOT) which we will introduce in the current section.
Example pages of both datasets can be seen in Figure 1.1. Both datasets, AK and BOT
are based on a recent KWS benchmark competition [26] and thus have been used in
recent research [25]. The variations with respect to style and size are rather low in both
cases. However, from manual inspecting, the documents small style abbreviations are
easily noticeable. That is because the manuscripts in each set have been created by
different, though few writers. Note that the exact number of writers for both sets is not
publicly available.

1.5.1 Alvermann Konzilsprotokolle (AK)

Documents from AK date from the 18th century, are written in German and refer to
handwritten minutes of formal meetings of the University of Greifswald. The whole
dataset consists of about 18 000 pages stemming from handwritten minutes created in
the period of 1794 to 1797. The notes are written in German, contain only minor signs
of degradation and rather low variations in the writing styles. The subset of this large
1.5. DATASETS 7

(a) Example page from AK (b) Example page from BOT

Figure 1.1: Example pages from the databases

dataset proposed by the benchmark competition [26] and also used in this thesis consists
of 30 pages with about 5 500 words.

1.5.2 Botany (BOT)

The BOT dataset contains different botanical records made by the government in British
India in the period of 1800 to 1850. The records are written in English and contain
certain signs of degradation like fading, smearing, and holes. Furthermore, some of the
pages contain vertical and horizontal ruling lines that can be regarded as an additional
challenge for Keyword Spotting (KWS) (see Figure 1.1b). The variations in the writing
style, with respect to scaling and intraword variations, is noticeably larger than in the
previously introduced AK dataset. In this thesis, we work with a subset of BOT of 30
8 CHAPTER 1. INTRODUCTION

pages with about 5000 words.

1.6 Outline

The thesis is organized in three chapters. In chapter 2, we review the concept of graphs
and graph matching. After a formal introduction to graphs and a review of the doc-
ument preprocessing steps, Bipartite Graph Edit Distance (BP) and Hausdorff Edit
Distance (HED) are introduced as inexact graph matching heuristics for Graph Edit
Distance (GED). Furthermore, the distance evaluation and metrics are explained. Then,
in chapter 3, we introduce the extensions to graphs and cost functions that allow im-
provements in accuracy. Finally, in chapter 4, the experimental setup is explained and
validation results are presented.
2
Graphs and Graph Matching

In chapter 1, we have introduced Keyword Spotting (KWS) as an application of pattern


recognition. Now we want to elaborate the method of how to apply offline structural
KWS on handwritten documents of the datasets introduced in section 1.5. To understand
this process, a brief introduction of the concept of graphs (section 2.1) is needed.

Since the datasets consist of ancient historical documents, we additionally have to pay
attention to age-related distortions in the documents. These problems have to be handled
within a preprocessing step and are discussed in the second section 2.2 of this chapter.
Furthermore, we explain which steps are needed to get from images of documents to
keypoint graphs.

9
10 CHAPTER 2. GRAPHS AND GRAPH MATCHING

In section 2.3, we introduce graph matching in general and state of the art graph matching
methods for keypoint graphs and discuss the runtime of these algorithms. Measuring
the similarity (dissimilarity) of two given graphs and computing a similarity score
(dissimilarity score), these methods compare two given graphs by mapping substructures
of one graph on substructures of the other graph.

In the fourth section 2.4 of this chapter, we explain our evaluation metric. This contains
the calculation of retrieval Indices from the results of the graph matching as well as the
generation of an ordered list of proposed similar words based on the retrieval results for
some keyword.

In the last section 2.5 of this chapter, we present a recently published work that uses the
introduced procedure and whose results we aim to enhance in this thesis.

2.1 Introduction to Graphs

Pattern recognition applications are either based on statistical (i.e. vectorial) or structural
data structures (i.e. strings, trees, or graphs). Hence, also KWS approaches make either
use of a statistical or structural representation of handwriting as reviewed in section 1.2.
Graphs, in contrast to feature vectors, are able to represent both entities and binary
relationships that might exist between these entities. Moreover, graphs can adapt their
size and complexity to the size and complexity of the actual pattern that is to be modelled
[22]. Due to their representational power and flexibility, graphs have found widespread
application in pattern recognition. The present section formally introduces the concept
of graphs.
2.2. DOCUMENT AND IMAGE PREPROCESSING 11

2.1.1 Formal introduction of Graphs

Definition. (Graph) A graph g is defined as a four-tuple g = (V, E, µ, ν) where

• V is a finite set of nodes,

• E is a finite set of edges,

• µ : V 7→ LV is a node labelling function, and

• ν : E 7→ LE is an edge labelling function.

Graphs can be distinguished, inter alia, into labelled and unlabelled graphs. In the former
case, both nodes and edges can be labelled with labels. These include numerical (i.e.
L = 1, 2, 3, ...), vectorial (i.e. L = Rn ), or symbolic labels (i.e. L = α, β, γ, ...). In the
latter case we assume empty label alphabets, i.e. LV = LE = ∅.

In the present thesis we mainly will encounter labelled graphs where µ(v) ∈ R2 represent
the location of a node v ∈ V in a 2D-plane and LE = {α, β} where,

α, if e origins from the graph extraction of a word
ν(e) =
β, if e origins from a graph extension (see section 3.1)

labels the an edge e ∈ E according to its origin.

2.2 Document and Image Preprocessing

We have seen in section 1.2, that graphs can be used for the problem of structural KWS.
The abilities to adapt their structure and size as well as modeling binary relations between
objects make graphs a powerful tool for the structural KWS task. In this section we
explicate how to process ancient handwritten documents, to extract word-based graphs.
12 CHAPTER 2. GRAPHS AND GRAPH MATCHING

In a first step, we describe the applied preprocessing steps that are needed to reduce
variation in the handwriting and age-related noise of scanned documents. Then we
elaborate on how the preprocessed scans are used to extract so-called keypoint-based
graphs that have been introduced for handwriting recognition in [27].

2.2.1 Document Preprocessing

The data used in this thesis (see section 1.5) is preprocessed according to [25]. In this
chapter, we want to touch upon the individual parts of this process. However, we will
leave out the discussion about parameter settings for the preprocessing steps, since no
changes compared to the baseline were made. For further information, we refer to [25].

Figure 2.1: Preprocessing steps


2.2. DOCUMENT AND IMAGE PREPROCESSING 13

The ultimate goal of the preprocessing step is to make document images feasible for
keypoint-based graph extraction. Since we start with scans of whole historical documents
and require single binarised, skeletonized word images for the graph extraction, this
process requires a number of separate steps as shown in Figure 2.1.

Historical handwritten documents usually show signs of degradation as well as fading,


smearing, and holes (Figure 2.2). Preprocessing steps also aim to reduce undesirable
variations in the handwriting due to different writers. In our particular case tough,
variations in writing style are minimized by graph normalizations in a subsequent step.

(a) Fading example from BOT (b) Ink smearing example from AK

Figure 2.2: Examples of distortions

Note that the raw data provided by the ICFHR2016 benchmark database [26] come seg-
mented into single word images, therefore a segmentation step is omitted in the following
overview. Besides that, the process follows well-known techniques in document analysis
as proposed in [28] and [29] and can be divided into four steps:
14 CHAPTER 2. GRAPHS AND GRAPH MATCHING

1. In a first step, edges of words are locally enhanced by means of Difference of


Gaussians (DoG) in order to address the issue of noisy background [28].

2. Next, filtered document images are binarized by a global threshold to clearly


distinguish between foreground (ink) and background (paper) of the handwriting
document.

3. In certain cases, the forced alignment segmentation provided by the ICFHR2016


is imperfect [26]. Therefore, additional filtering is necessary to remove undesired
word parts. In particular, small connected components are detected and removed
by means of different morphological operators based on the method proposed in
[30]

4. Finally, binarized and normalized word images are skeletonized by a 3x3 thinning
operator [31].

2.2.2 Graph Extraction

After the document preprocessing, we end up with skeletonised word images. In this
chapter we explain the process of keypoint graph extraction we used, proposed in [25].
These keypoint graphs are extracted based on the detection of characteristic points,
so-called keypoints, including endpoints, intersections, and corner points of circular
structures. Keypoints can be seen as the minimal amount of points needed to keep the
inherent topological characteristic of a word image. The process of deriving a keypoint
graph is visualized in Figure 2.3 based on an example for a skeletonized image of the
letter ’e’.

In a first step seen in Figure 2.3a each endpoint and each junction point of the skeleton
image is detected. These nodes are labeled with their coordinates (x, y) ∈ R2 and added
to the word graph. For further keypoint extraction, the junction points are removed
from the skeletonized image resulting in connected subcomponents (Figure 2.3b). Next,
intermediate points are detected in the subcomponents and again labeled with their
coordinates and stored in the word graph (Figure 2.3c). Finally, edges are added between
2.3. GRAPH MATCHINGS 15

(a) (b) (c) (d)

Figure 2.3: Graph representation Keypoint for letter ’e’. Subfigures (a) to (d) show the
steps from a skeletonised word image (a) to the resulting keypoint graph (d).1

the detected keypoints (and labeled with α, defining their graph extraction origin). In
Figure 2.3d the assembled graph is shown with added edges.

2.3 Graph Matchings

2.3.1 Exact and Inexact Graph Matching

To measure the similarity (dissimilarity) between two graphs, conventionally a specific


graph matching algorithm is used. In general, graph matching aims at mapping substruc-
tures of a first graph g1 to substructures of a second graph g2 . From the resulting mapping,
a similarity score (dissimilarity score) can be derived, that describes the similitude of the
two graphs. Basically, graphs can either be matched by means of exact or inexact graph
matching approaches. Exact graph matching aims to find correspondences between parts
or subparts of graphs which are identical in terms of their labels and their structure. In
contrast to that, inexact graph matching allows matching pairs of graphs that have no
common parts by endowing a certain error tolerance with respect to both structure and
labeling.

In this thesis, we are working with keypoint graphs derived from historical handwriting
documents. As shown in Figure 2.4, these graphs show many intraword variations
1
Source of Figure: [25]
16 CHAPTER 2. GRAPHS AND GRAPH MATCHING

coming either from writing style or from document degradation. Therefore it is virtually
impossible that identical substructures in terms of labels (coordinates) can be derived
from graphs of two handwritten instances of the same word. This basically makes an
exact graph matching approach unusable. As a consequence, the present thesis focusses
on inexact graph matching.

(a) Segmented word images

(b) Preprocessed word images

(c) Keypoint graph representations

Figure 2.4: Intraword variations

As previously noted, inexact graph matching relaxes these constraints and thus makes it
possible to also match two nodes which differ in structure (edges) or label (coordinates).
These matchings of nodes or edges are divided into three possible matching operations,
viz. substitution, deletion, and insertions. Further, such a graph matching approach needs
to define a cost for each of the three proposed matching operations. Then, the ultimate
goal of the inexact graph matching approach is to find the minimal sum of costs for all
node and edge operations. However, the minimization of the overall cost of a graph
matching is known to be NP-complete. Therefore, several fast approximations have been
proposed in the last decades [23, 24, 32, 33].

In the following sections Graph Edit Distance (GED) as well as some of the most popular
heuristics, Bipartite Graph Edit Distance (BP) and Hausdorff Edit Distance (HED) are
presented.
2.3. GRAPH MATCHINGS 17

2.3.2 Graph Edit Distance (GED)

One of the most flexible methods for measuring the dissimilarity of graphs is the edit
distance [20]. The key idea of Graph Edit Distance (GED) is to define the dissimilarity,
or distance, of two given graphs by the minimum amount of distortion that is needed to
transform one graph into another. GED is known to be very flexible since it can handle
arbitrary graphs and any type of node and edge labels. Furthermore, by defining costs
for edit operations, the concept of edit distance can be tailored to specific applications.
However, the major drawback of GED is its computational complexity that restricts its
applicability to graphs of rather small size [22]. In fact, GED belongs to the class of
NP-complete problems [34]. To make use of the powerful idea of GED, we focus on
suboptimal approaches for GED, viz. Bipartite Graph Edit Distance (BP) and Hausdorff
Edit Distance (HED), with cubic and quadratic complexity. These approaches are
presented in the next sections.

Originally, the concept of edit distance has been proposed for strings [19] and later has
been adapted for trees [35] and eventually graphs [36]. Similar to string edit distance,
the basic idea of GED is to transform a source graph g1 = (V1 , E1 , µ1 , ν1 ) into a target
graph g2 = (V2 , E2 , µ2 , ν2 ) using a set of edit operations applicable for both nodes and
edges. The six typical edit operations are substitutions, deletions, and insertions of both
nodes and edges.

An edit operation is formally defined as (xi → yj ). We speak of a node edit operation if


xi ∈ (V1 ∪ {}) and yj ∈ (V2 ∪ {}), and of a edge edit operation if xi ∈ (E1 ∪ {}) and
yj ∈ (E2 ∪ {}), where  denotes the empty node respectively empty edge. Furthermore
an edit operation is called

• insertion if xi =  ,

• deletion if yj =  , or

• substitution if xi , yj 6= .
18 CHAPTER 2. GRAPHS AND GRAPH MATCHING

An edit path λ is defined as a set of edit operations {e1 , ..., ek }. It is called an complete
edit path and denoted λ(g1 , g2 ), if it transforms g1 into g2 [22]. Every edit operation e is
assigned a cost c(e). This cost function has to be defined individually depending on the
application. In general, edit costs c(e) should reflect the degree of a deformation e. In
other words, strong modifications of the graph of certain edit operation e should lead to
high edit costs c(e) and vice versa smaller modifications to smaller costs [22].

The Graph Edit Distance (GED) dλmin (g1 , g2 ) is defined by


X
dλmin (g1 , g2 ) = min c(ei ), (2.1)
λ∈Υ(g1 ,g2)
ei ∈λ

where Υ(g1 , g2) denotes the set of all possible complete edit paths transforming g1 to g2
and λmin refers to the edit path with minimal cost found in Υ(g1 , g2).

Since the exact computation (finding all λ ∈ Υ and λmin among all edit paths λi ∈ Υ)
is not applicable in case of large graphs, several fast, suboptimal algorithms for GED
have been proposed [23, 24] in the last years. In the following sections two renowned
heuristics are summarized, viz. Bipartite Graph Edit Distance (BP) [23] and Hausdorff
Edit Distance (HED) [24].

2.3.3 Bipartite Graph Edit Distance (BP)

In this subsection, we review a suboptimal algorithm with cubic time complexity, that is
able to return an upper-bound for Graph Edit Distance (GED), termed Bipartite Graph
Edit Distance (BP) [23]. Basically, BP solves a Linear Sum Assignment Problem (LSAP)
for the assignment of local structures (nodes and adjacent edges) of the graphs that are
compared. These assignments can be used to deduce a set of node and edge edit
operations that form a valid edit path ψ ∈ Υ(g1 , g2 ). The sum of costs of this not
necessarily optimal edit path gives us an upper bound on the exact distance [37]

dψBP (g1 , g2 ) ≥ dλmin (g1 , g2 ) . (2.2)


2.3. GRAPH MATCHINGS 19

BP proposes the use of a cost matrix C to formulate the graph edit distance problem as
an instance of an LSAP. Based on the node sets V1 = {u1 , ..., un } and V2 = {v1 , ..., vm }
of the two compared graphs g1 and g2 , a cost matrix C is defined by

 
c11 c12 · · · c1m c1 ∞ ··· ∞
 
 .. .. 
 c21 c22 · · · c2m ∞ c2 . . 
 .. .. .. .. . . 
 .. .. 
 . . . . . . . ∞ 
 
 cn1 cn2 · · · cnm ∞ · · · ∞ cn 
 
C= ,
 c1 ∞ ··· ∞ 0 0 ··· 0 
 
 ... .. ... .. 
 ∞ c2 . 0 0 . 
 
 .. .. .. .. .. .. 
 . . . ∞ . . . 0 
 
∞ ··· ∞ cm 0 ··· 0 0

where cij denotes the cost of a node substitution (ui → vj ), ci denotes the cost of
a node deletion (ui → ), and cj denotes the cost of a node insertion ( → vj ). Finally,
to each entry cij ∈ C, which yet only take node operations into account, the implied
minimum sum of edge edit operation costs is added. Formally, for every entry cij in C
we solve an LSAP on the in- and outgoing edges of nodes ui and vj and add the resulting
cost to cij , resulting in substitution costs c∗ij that considers the node substitution cost
as well as the inferred minimum edge operation costs. Furthermore, to each entry ci ,
which denotes the cost of a node deletion, the cost of the deletion of all adjacent edges of
uj can be added, and to each entry cj , which denotes the cost of a node insertion, the
cost of all insertions of the adjacent edges of vj are added. This forms a new cost matrix
C∗ = (c∗ij ) [22].

From C∗ , a minimum cost permutation (ϕ∗1 , ..., ϕ∗n+m ) can be derived, using an LSAP
solving algorithm (for instance Munkres algorithm [38] in [39]). This permutation
corresponds to a bijective node assignment, and thus, to a valid and complete (yet not
20 CHAPTER 2. GRAPHS AND GRAPH MATCHING

necessarily optimal) edit path

ψ = ((u1 → vϕ1 ), (u2 → vϕ2 ), ..., (um+n → vϕm+n )) ∈ Υ(g1 , g2 ) .

The sum of costs of the node edit operations plus the sum of costs of the implied edge
edit operations leads to an upper-bound GED. Formally,

n+m
X n+m
X n+m
X
BP (g1 , g2 ) = dψBP (g1 , g2 ) = ciϕi + c(aij → bϕi ϕj ) .
|i=1{z } i=1 j=1
| {z }
node edit operation costs edge edit operation costs
from C captured in ψ. implied by ψ.

See [22] for proofs on upper-bound property and more detail on BP algorithm.

Although BP uses a good graph matching approach that reduces the exponential time
complexity of GED to cubic time complexity, it is not practical for many applications.
Therefore we summarize an even faster approach in the next subsection 2.3.4 that reduces
complexity to quadratic in terms of the number of nodes.

2.3.4 Hausdorff Edit Distance (HED)

In this subsection, we summarize the concept of a lower-bound approximation for Graph


Edit Distance (GED), Hausdorff Edit Distance (HED) with quadratic time complexity
[24].

The fact that HED matches nodes and their local structure from a local point of view
(independently of other node assignments), makes this approximation even faster than
the previously introduced Bipartite Graph Edit Distance (BP).

HED is based on the Hausdorff distance proposed in [40], but the distance measure used
in HED is slightly modified, due to its sensitivity to outliers [24]. This is especially
important in the case of Keyword Spotting (KWS), where outliers appear relatively often
due to the variability of handwriting and certain preprocessing steps.
2.4. DISTANCE EVALUATION AND EVALUATION METRIC 21

Formally, for two graphs g1 = (V1 , E1 , µ1 , ν1 ) and g2 = (V2 , E2 , µ2 , ν2 ), HED is defined


by [24] as
X X
HED(g1 , g2 ) = min c∗ (u, v) + min c∗ (u, v) . (2.3)
v∈V2 ∪ u∈V1 ∪
u∈V1 v∈V2

The modified cost function c∗ includes costs of adjacent edges and is defined as

 P c(p → )


 c(u → ) + p∈P for node deletions

 2
 P c( → q)
c∗ (u, v) = c( → v) + q∈Q for node insertions (2.4)
 2

 HED(P, Q)

 c(u → v) +

 2 for node substitutions,
2

where P = {p1 , ..., pn } is the set of edges adjacent to u, and Q = {q1 , ..., qm } is the set
of edges adjacent to v. Since HED does not enforce bidirectional substitutions, costs for
assignments V1 → V2 ∪ , as well as V2 → V1 ∪  have to be considered in Equation 2.3.
This leads to the fact that each node (of both graphs) is potentially subsituted twice and
thus the substitution cost has to be divided by 2 in Equation 2.4. Node deletions and
insertions, on the other hand, only appear in one of the summation terms, and thus, their
full cost is taken into account in Equation 2.4.

2.4 Distance Evaluation and Evaluation Metric

In this section, we propose an adaptation of Graph Edit Distance (GED) with respect to
the present Keyword Spotting (KWS) scenario. First, we review a flexible cost model
that allows handling variations in different handwriting graphs. Then we show how to
derive a local retrieval index for the evaluation of GEDs
22 CHAPTER 2. GRAPHS AND GRAPH MATCHING

2.4.1 Cost Model

The cost model we used is based on the model in [25], where constant costs are introduced
for node- and edge deletions and insertions.

c(u → ) = c( → v)= τv ∈ R+ , u ∈ V1 , v ∈ V2


c(p → ) = c( → q)= τe ∈ R+ , p ∈ E1 , q ∈ E2

The node substitution cost however, is supposed to reflect the dissimilarity of the associ-
ated label attributes, which in our application are (x, y)-coordinates. We use a weighted
Euclidean distance on these positional attributes to compute the subsitution cost. For-
mally, the node substitution cost c(u → v) with µ1 (u) = (xi , yi ) and µ2 (v) = (xj , yj ) is
defined by
q
c(u → v) = β(σx (xi − xj ))2 + (1 − β)(σy (yi − yj ))2 ,

where β ∈ [0, 1] denotes a parameter to weight the importance of the x- and y-coordinate
of a node, while σx and σy denote the standard deviation of all node coordinates in
the respective direction in the current query graph. The idea of β is that in datasets or
handwritings with larger deviation in x− or y− direction, the more important might be
the particular direction. Furthermore, [25] introduces a weighting parameter α ∈ [0, 1]
that controls the importance of node - or edge operations. Therefore the cost of any node
operation is multiplied by α, while every edge operation cost is multiplied by (1 − α).

In [25], the parameters (τv , τe , α, β) are optimised based on a set of five constants for node
and edge deletion/insertion costs (τv , τe ∈ {1, 4, 8, 16, 32}) and a set of five constants
for the weighting parameters α, β ∈ {0.1, 0.3, 0.5, 0.7, 0.9}. Thus the evaluation took a
total of 5 × 5 × 5 × 5 = 625 parameter settings into account.

In this thesis we will focus on Hausdorff Edit Distance (HED) using keypoint graphs of
instances from two datasets Alvermann Konzilsprotokolle (AK) and Botany (BOT). The
parameter settings for this setup, evaluated in [25], is shown in Table 2.1.
2.4. DISTANCE EVALUATION AND EVALUATION METRIC 23

Databases Parameters
τv τe α β
AK 16 1 0.3 0.1
BOT 8 4 0.3 0.1

Table 2.1: Optimised parameter settings for


keypoint graphs and HED [25].

2.4.2 Evaluation Metric

For evaluating graph distances between query- and target graphs and spotting keywords
we build retrieval indices that are optimized for a local threshold scenario. Local
thresholds are used to optimize retrieval indices for every query word independently.

In this thesis we will focus on Keyword Spotting (KWS) based on Query-by-Example


(QbE) (see section 1.2) using sets of labeled preprocessed instances of two datasets.
For spotting keywords, a given set of query word graphs (Q = {q1 , ..., q|Q| }) that
are labeled instances of the target word, and a set of document word graphs (G =
{g1 , ..., g|G| }) are used. For measuring the Graph Edit Distance (GED), we compute the
HED (subsection 2.3.4) for all instances of the query set qi ∈ Q with all instances of the
target set gj ∈ G and normalise the distance by the sum of the maximum cost edit path
between qi and gj , i.e. the sum of the edit path that results from deleting all nodes and
edges of qi and inserting all nodes and edges in gj . The smallest normalised distance of
any query graphs qi to target document graph gj is used to derive the retrieval index for
the measuring the similarity of Q to gj .

Formally, the maximum cost between qi and gj

dmax (qi , gj ) = (|Vqi | + |Vgj |)τv + (|Eqi | + |Egj |)τe ,

is used to compute the normalized distance


24 CHAPTER 2. GRAPHS AND GRAPH MATCHING

dHED (qi , gj )
dˆHED (qi , gj ) = ,
dmax (qi , gj )

and the corresponding local retrieval index

r(Q, gj ) = − min dˆHED (qi , gj ) .


qi ∈Q

From recall and precision, we compute the Average Precision (AP), which is the area
under the Recall-Precision curve for all keywords given a single threshold. Then, the
Mean Average Precision (MAP) is computed, that is the mean over the AP of each
individual keyword query. For deriving the AP and the MAP, from the calculated
retrieval index r we used the trec_eval software2 .

2.5 Reference Results

A recently published thesis [25] has introduced a novel graph-based KWS framework
for historical handwritten manuscripts, on which the framework presented in this chapter
fundamentally bases. Inter alia, the proposed method was applied to databases Alvermann
Konzilsprotokolle (AK) and Botany (BOT) (see section 1.5) using the GED heuristic
HED (see subsection 2.3.4). The results presented in Table 2.2 date from the referenced
work and are used as reference values of the present thesis that we aim to surpass.

2
http : //trec.nist.gov/trec_eval
2.6. SUMMARY 25

AK BOT
MAP 79.72 51.69

Table 2.2: MAP results for KWS framework


presented in [25] for AK and BOT

2.6 Summary

In this chapter, we introduced a graph-based Keyword Spotting (KWS) framework. The


framework can be roughly divided into four process steps, viz. image preprocessing and
graph extraction, graph matching, and distance evaluation.

Before discussing the image preprocessing steps, this chapter gives a brief formal intro-
duction to graphs. Then the substeps of the image preprocessing are discussed to get
from images of handwritten historical manuscripts to keypoint graphs. These include
Difference of Gaussians (DoG), binarisation, morphological filtering, and skeletonization
of the provided images. Then an algorithm for keypoint graph extraction is presented.
After giving a brief summary about exact- and inexact graph matching, the virtues of the
latter when dealing with graphs with small differences in structure or label are proposed.
Furthermore, the concept of Graph Edit Distance (GED) is presented and two suboptimal
GED approaches introduced, viz. Bipartite Graph Edit Distance (BP) and Hausdorff
Edit Distance (HED). Next, we show how the resulting GEDs can be transformed into
a retrieval index using local thresholds. From this retrieval index, the Mean Average
Precision (MAP) can be calculated and compared with the reference values of a recently
published work [25].
26 CHAPTER 2. GRAPHS AND GRAPH MATCHING
3
Extended Graphs and Adapted Graph
Matching

The Keyword Spotting (KWS) framework proposed in [25] has shown the beneficial
use of heuristics for Graph Edit Distance (GED). Using Hausdorff Edit Distance (HED)
as a distance measure for spotting keywords in historical documents, has brought the
problem from exponential time complexity to just quadratic time complexity [24]. By
making small additions to the graphs in terms of additionally edges and extending the
cost function we aim to further improve the accuracy of KWS achieved in [25]. In this
chapter, we present the proposed graph extensions that are applied to the keypoint graphs
extracted from word images. Furthermore, we propose cost functions for the extensions.

HED (introduced in subsection 2.3.4 and proposed in [24]) is a fast approximation for

27
28 CHAPTER 3. EXTENDED GRAPHS AND ADAPTED GRAPH MATCHING

GED. The fact that it matches nodes of graphs and their local structure from a local
point of view and independently of other node assignments results in quadratic time
complexity. However, only comparing nodes locally, leaves out a lot of information
about the word graph in general.

The proposed extensions in this chapter aim to narrow this gap and add some information
to each node about a further (global) view. The motivation behind this adaptation is to
make a matching of (locally) similar nodes more expensive if their view to far away nodes
in the matched graphs are significantly different. Therefore, some preprocessing steps are
built in each scenario, adding additionally edges, and the edit distance is slightly adjusted
without losing the benefit of the fast graph edit distance approximation of HED. Formally
we add a new cost for measuring the dissimilarity of the additional edges cAE (u, v) to the
substitution cost of the function c∗ introduced in equation 2.4. Note that cAE (u, v) only
influences node substitution costs to find better suited node to node matchings while it
has no impact on fixed costs for node deletions and insertions proposed by the reference
framework.



 P c(p → )

 c(u → ) + p∈P for node deletions

 2

 P c( → q)
c( → v) + q∈Q for node insertions
c∗ (u, v) =  2 

 HED(P, Q)

 γ c(u → v) + + (1 − γ)cAE (u, v)

 2

 for node subst.
2
(3.1)

The parameter γ ∈ [0, 1] determines the weighting between the former HED node
distance and the newly added distance measure that is defined individually for each
version of the proposed graph extensions in the following sections. γ is optimised
for each database individually, on the same training set as the parameters (τv , τe , α, β)
introduced in subsection 2.4.1.
3.1. GRAPH EXTENSIONS AND COSTFUNCTIONS 29

3.1 Graph Extensions and Costfunctions

As a goal for the graph extension, we want to incorporate some meta information about
the graph as a whole into every single node. The experiments base on the assumption
that if two graphs are similar, the metainformation proposed in the extensions are also
similar.

In all proposed extensions we added edges to the graph between two nodes that are far
apart. The distance between two nodes is calculated using the Euclidean distance metric
with equal weighting for both coordinate directions. Note that the newly added edges are
labeled differentially and will not be taken into account for the former HED computation
for the edit distance of two nodes, which only takes edges into account that are labeled
with α (see subsection 2.2.2).

The additionally added edges are exclusively used to determine the distance cAE (u, v)
as a part of the new cost function for node comparison in Equation 3.1. Furthermore,
these new edges are directed edges, meaning they have a defined starting and end node,
where the starting node is the node from where the furthest node is looked for, and the
end node is the node that is found. This ordering info information for directed edges is
crucial for the new cost cAE (u, v) for node substitutions. With this setup, the comparison
of directed edges of two nodes is run equally for each pair of nodes.

In the following sections, the approaches are introduced that have been evaluated in the
present thesis.
30 CHAPTER 3. EXTENDED GRAPHS AND ADAPTED GRAPH MATCHING

3.1.1 Extension 1 - One Edge to the Furthest node

Extension Preprocessing

After the preprocessing steps that have been applied to the word images described
in section 2.2, we have a graph for each image. For each of these graphs, a further
preprocessing step is carried out as described in algorithm 1.

Algorithm 1: Preprocessing Ext 1


1 foreach graph g(V, E, µ, ν) do
2 E+ ← ∅
3 foreach node u ∈ V do
4 find v ∈ V where d(u, v) is maximal
5 create new directed edge e = (u, v)
6 E+ ← E+ ∪ e
7 end
8 E ← E ∪ E+
9 end

In each graph g(V, E, µ, ν), we add a set of edges E + to E with |E + | = |V |. For each
node v ∈ V of g, we look for the node u ∈ V where the distance measure d(u, v) is
maximal and add the directed edge from e = (u, v) (directed from u to v) to E + . For the
distance d(u, v), we use an unweighted Euclidean distance measure.

In Figure 3.1 a graph representation of an instance of the word Extract of the BOT dataset
is shown. The nodes and edges extracted from the word image by the preprocessing
steps ( see section 2.2) are shown as red dots and black lines between dots. The newly
added directed edges e ∈ E + are shown as blue lines. While Figure 3.1a demonstrates
one new edge between two nodes in the origin graph as an example (from node 98 to
node 5), Figure 3.1b shows a mesh of all the newly added edges. It can be seen that these
newly added edges (compared to the former edges) no longer carry information about the
3.1. GRAPH EXTENSIONS AND COSTFUNCTIONS 31

(a) (b)

Figure 3.1: Graph representation of an instance of the word Extract of the BOT dataset
with nodes (red dots) and word edges with label α (black lines). In (a) a new edge from
node 98 to node 5 (blue line) is indicated, while in (b) all new edges of the graph (blue
lines) are plotted.

strokes of the word drawings, but meta information about the word in general. Therefore
they have to be handled strictly separated for the graph distance measure as shown in
equation 2.4.

Extension Cost Function

For each node u ∈ V there exists exactly one outgoing directed edge e ∈ E where its
destination node is any node x ∈ V . For measuring the dissimilarity of the additional
edges of two nodes u, v ∈ V , we compare their each unique directed edges eu , ev ∈ E
with each other. x is the destination node of eu and y is the destination node of ev . These
edges eu , ev can be seen as vectors iu , iv with origins u, v and destinations x, y in polar
vector notation iu = hru , Θu i and iv = hrv , Θv i.

The cost for measuring the dissimilarity of the additional edges cAE (u, v) introduced in
equation 2.4 is then calculated as follows:
32 CHAPTER 3. EXTENDED GRAPHS AND ADAPTED GRAPH MATCHING

  
cos(Θu − Θv )
cAE (u, v) = 1 − + 0.5 (ru + rv )
2

Consider following toy graph as an example for the cAE computation.

• A set of nodes V = {v1 , v2 , v3 }


with coordinates µ(v1 ) = (0, 0), µ(v2 ) = (1, 0) and µ(v3 ) = (2, 2)

• A set of additional directed edges E + = {e1 , e2 } where e1 is an edge from v1 to


v3 and e2 is an edge from v2 to v3 .

v3

e1
e2

v1 v2

Figure 3.2: Example graph setup

Then we observe a cost cAE (v1 , v2 ) of the additional edges for a node substitution
(v1 → v2 ) of
   
cos(45◦ − 60◦ ) √ √ 
cAE (v1 , v2 ) = 1 − + 0.5 8 + 3 = 0.078.
2

3.1.2 Extension 2 - One Edge to the Furthest Node - New Cost Func-
tion

Extension Preprocessing

In this extension, the same preprocessing steps are applied as in the previous subsec-
tion 3.1.1.
3.1. GRAPH EXTENSIONS AND COSTFUNCTIONS 33

Extension Cost Function

The cost function differs slightly from the previous one. While also making use of the
polar vectors at v and u (iu = hru , Θu i and iv = hrv , Θv i.), The distance cAE (u, v) is
defined by the dot product of iu and iv .

  
cos(Θu − Θv )
cAE (u, v) = 1 − + 0.5 (ru ∗ rv )
2

Consider the same toy graph as before as an example for the cAE computation.

• A set of nodes V = {v1 , v2 , v3 }


with coordinates µ(v1 ) = (0, 0), µ(v2 ) = (1, 0) and µ(v3 ) = (2, 2)

• A set of additional directed edges E + = {e1 , e2 } where e1 is an edge from v1 to


v3 and e2 is an edge from v2 to v3 .

Then we observe a cost cAE (v1 , v2 ) of the additional edges for a node substitution
(v1 → v2 ) of
   
cos(45◦ − 60◦ ) √ √ 
cAE (v1 , v2 ) = 1 − + 0.5 8 ∗ 3 = 0.083.
2

In general this new cAE computation results in a larger variance in distances. Therefore
the cAE of similar nodes in terms of the new edges differs more from the cAE of nodes
with unlike new edges. From now on, we therefore use this new calculation of cAE for
all remaining extensions.
34 CHAPTER 3. EXTENDED GRAPHS AND ADAPTED GRAPH MATCHING

3.1.3 Extension 3 - Three edges to the furthest nodes

Extension Preprocessing

In this approach, the number of added edges is increased. Instead of adding one edge to
the furthest node, we add three edges to the three furthest nodes for each node in a graph.
Like in the prior approaches, this is done after the preprocessing steps that have been
applied to the word images described in section 2.2. In algorithm 2, the process of this
extension is presented formally.

Algorithm 2: Preprocessing Ext 3


1 foreach graph g(V, E, µ, ν) do
2 E+ ← ∅
3 foreach node u ∈ V do
4 find S ⊂ V, |S| = 3, where d(u, v) > d(u, k) for any v ∈ S and k ∈ V \S
5 create new directed edge e = (u, v)
6 E+ ← E+ ∪ e
7 end
8 E ← E ∪ E+
9 end

In each graph g(V, E, µ, ν), we add a set of edges E + to E with |E + | = 3|V |. For each
node v ∈ V of g, we look for a subset of nodes S ⊂ V, |S| = 3 where the distance
measure d(u, v) for any v ∈ S is larger than any distance d(u, k) for any k ∈ V \S. Then
we add a directed edge e = (u, v) to E + for each v ∈ S. For the distance d(u, v), we use
an unweighted Euclidean distance measure.

In Figure 3.3 a graph representation of an instance of the word Extract of the BOT dataset
is shown. The nodes and edges extracted from the word image by the preprocessing steps
(see section 2.2) are shown as red dots and black lines between dots. The newly added
directed edges e ∈ E + are shown as blue lines. While Figure 3.3a demonstrates the new
3.1. GRAPH EXTENSIONS AND COSTFUNCTIONS 35

(a) (b)

Figure 3.3: Graph representation of an instance of the word Extract of the BOT dataset
with nodes (red dots) and word edges with label α (black lines). In (a) the new edges
from node 265 to nodes 307, 308, and 309 (blue lines) are indicated, while in (b) all new
edges of the graph (blue lines) are plotted.

directed edges generated for node number 265 to its three furthest nodes (307, 308 and
309) as an example, Figure 3.3b shows a mesh of all the newly added edges. It can be
seen that the mesh is more dense, including three times as many edges as Figure 3.1b.
However, the shape of the mesh did not change significantly.

These newly added edges (compared to the former edges) no longer carry information
about the strokes of the word drawings, but meta information about the word in general.
Therefore they have to be handled strictly separated for the graph distance measure as
shown in equation 2.4.

Extension Cost Function

For each node u ∈ V there exist three edges e1 , e2 , e3 ∈ E, where their destination nodes
are any node x1 , x2 , x3 ∈ V . We make use of the idea of viewing the new generated
directed edges as vectors, where u is seen as the origin and x is seen as the destination of
the vector . The set Iu of some node u ∈ V consists of the three vectors i1u , i2u , i3u that
have been generated by this extension pre processing step. Comparing a set of vectors
to any other set of vectors can be computationally complex. Therefore, for measuring
36 CHAPTER 3. EXTENDED GRAPHS AND ADAPTED GRAPH MATCHING

the dissimilarity of the additional edges of two nodes u, v ∈ V , we make use of a basic
idea of HED to reduce computational complexity and in return dispense with a valid edit
path. We allow multiple matchings of a source vector to a target vector (like HED allows
for nodes). Due to the fact that we now have to map a set of vectors on another set, we
calculate the arithmetic mean of the optimal mappings in both directions.

Formally, cAE (u, v) is defined as

1X 1X
cAE (u, v) = min c∗AE (i → j) + min c∗ (j → i) ,
2 i∈I j∈Iv 2 j∈I i∈Iu AE
u v

where

  
cos(Θu − Θv )
c∗AE (i → j) = 1 − + 0.5 (ru ∗ rv )
2

defines the vector substitution cost. Note that unlike in the HED computation introduced
in subsection 2.3.4, we do not allow vector deletions/insertions.

3.1.4 Extension 4 - One edge to the furthest node in each quadrant

Although adding more edges to the graph gives the new distance more weight, there is
often not much difference between the vector to the furthest node and the vector to the
third furthest node (see example in Figure 3.3a). In this approach the variance in the
generated edges in increased, while the general idea remains the same.
3.1. GRAPH EXTENSIONS AND COSTFUNCTIONS 37

Extension Preprocessing

In this approach, we add four directed edges to each node in a graph. Like in the prior
approaches, this is done after the preprocessing steps that have been applied to the word
images described in section 2.2. In algorithm 3, the process of this extension is presented
formally.

Algorithm 3: Preprocessing Ext 4


1 foreach graph g(V, E, µ, ν) do
2 E+ ← ∅
3 foreach node u ∈ V do
4 foreach Quadrant Q do
5 find v ∈ V in Q, where d(u, v) is maximal
6 create new directed edge e = (u, v)
7 E+ ← E+ ∪ e
8 end
9 end
10 E ← E ∪ E+
11 end

The graphs provided by the word image preprocessing are normalized. This means that
the nodes of a graph are normally distributed over the four quadrants of the word graph.
In this extension, we make use of this property and handle each quadrant separately. Let
Q1 ∪ Q2 ∪ Q3 ∪ Q4 = V be the distinct assignment of each node of V to its associated
quadrant Qi . Note that Qi ∩ Qj = ∅ for any i, j.

In each graph g(V, E, µ, ν), we add a set of edges E + to E with |E + | = 4|V |. For each
node v ∈ V of g and each quadrant Qi , we look for the node u ∈ Qi where the distance
measure d(u, v) is maximal and add a directed edge e = (u, v) to E + . For the distance
d(u, v), we use an unweighted Euclidean distance measure.

In Figure 3.4 a graph representation of an instance of the word Extract of the BOT dataset
38 CHAPTER 3. EXTENDED GRAPHS AND ADAPTED GRAPH MATCHING

is shown. The nodes and edges extracted from the word image by the preprocessing steps
(see section 2.2) are shown as red dots and black lines between dots. While the green
axes only serve as visual help to distinguish the four search spaces, the newly added
directed edges e ∈ E + are shown as blue lines. Figure 3.3a demonstrates the new edges
generated for one node number 298 to the four furthest nodes in each quadrant (3, 31, 89
and 192) as an example. Figure 3.3b shows a mesh of all the newly added edges.

(a) (b)

Figure 3.4: Graph representation of an instance of the word Extract of the BOT dataset
with nodes (red dots) and word edges with label α (black lines). In (a) the new edges
from node 298 to nodes 3, 31, 89 and 192 (blue lines) are indicated. Note that the axes
(green lines) serve as visual help to distinguish the different search spaces (quadrants).
In (b) all new edges of the graph (blue lines) are plotted.

Compared to prior extensions, it can be seen that the mesh is even more dense, including
more edges as all previously introduced approaches. Furthermore, it is recognizable from
the eye, that the variance in directions of the new edges is increased. Again, these newly
added edges no longer carry information about the strokes of the word drawings, but
meta information about the word in general. Therefore they have to be handled strictly
separated for the graph distance measure as shown in Equation 2.4.

Extension Cost Function

The cost function makes use of the polar vectors notation of the edges at each node. For
each node v ∈ V , iv,Qj = hrv,Qj , Θv,Qj i is the polar vectors notation of the directed edge
3.1. GRAPH EXTENSIONS AND COSTFUNCTIONS 39

e ∈ E where v is the origin and x ∈ Qj is the destination of the vector. The cost for
measuring the dissimilarity of the additional edges cAE (u, v) introduced in Equation 2.4
is then calculated as the sum of the dissimilarity of the vectors for each quadrant Qj .
Formally,

X 
cos(Θu,Qj − Θv,Qj )


cAE (u, v) = 1− + 0.5 ru,Qj ∗ rv,Qj .
Q
2

3.1.5 Further Graph Extensions

Within the framework of this thesis, further extensions have been evaluated, some of
which are shortly introduced in this section. All of these further graph extensions did
not reveal presentation-quality results. Therefore their results are omitted in the next
chapter 4. The reason for their unusability has different roots. Some extensions revealed
computational complexity issues, which couldn’t be solved in the scope of this masters
thesis but may be considered for future research. Two approaches turned out to have no
beneficial impact on the KWS task.

Extension - One edge to the furthest node. Simpler Costfunction

One of the first attempts of measuring the dissimilarity of graphs with the extension
proposed in subsection 3.1.1 made exclusive use of the direction in which the edge points.
It turned out that measuring only the cosine of the angle between two directions was not
beneficial to the KWS task (or the benefit was too little to be spot).
40 CHAPTER 3. EXTENDED GRAPHS AND ADAPTED GRAPH MATCHING

Extension - Edges to Neighbours

Some elaborated extensions created new edges between nodes that are connected with
few hops. However, all attempts to create edges that carried meta information about
the near environment of target node did not turn out successfully. On the one hand,
the information carried by the near environment is already captured within the baseline
HED computation where the number of neighbors is compared for every node matching.
Therefore the horizon for near environment meta information should be expanded. On
the other hand, we face computational issues when expanding the search area too much.
By expanding the search space by considering meta information of nodes within an
increased hop count, the number of added edges grows unregulated. Furthermore, the
cost function for edges that differ in hop-counts becomes more difficult to find.
4
Experimental Evaluation

For the experimental evaluation we consider two manuscripts of a recent Keyword


Spotting (KWS) benchmark competition [26], viz. Alvermann Konzilsprotokolle (AK)
and Botany (BOT) (see section 1.5 for further details on these documents).

In the first section of this chapter, section 4.1, the experimental setup is explained. Then,
in section 4.2, the results of the experiments with the proposed extensions (section 3.1)
on the datasets are presented. In the last section 4.3 of this chapter, the presented results
are summarised.

41
42 CHAPTER 4. EXPERIMENTAL EVALUATION

4.1 Experiment Setup

In this experimental evaluation, we compare different graph extensions and extended


costfunctions for keypoint graphs with each other by means of the baseline KWS system
using Hausdorff Edit Distance (HED) on two manuscripts, viz. AK and BOT.

(a) AK

(b) BOT

Figure 4.1: Instances of selected keywords that are used for optimisation.

The experimental evaluation is carried out in two steps. First, the weighting parameter γ
is optimized on ten manually selected keywords with different word lengths. Remember
that γ ∈ [0, 1] denotes how much influence the baseline has on the total edit distance
(i.e. γ = 1 derives in the baseline result). We define a validation set that consists of all
or at least 10 random instances per selected keyword and a maximum of 900 additional
random words (in total 1,000 words). Instances of the selected keywords are shown
exemplarily in Figure 4.1. This procedure is identic to the baseline framework proposed
in [25] with an equal selection of keywords and their instances. Second, the optimized
systems are evaluated on the same training and test sets as used in [26] and [25].

In Table 4.1 the number of keywords, as well as the sizes of the training and test sets
for all datasets, can be found. Note that the chosen keywords are based on standard
4.1. EXPERIMENT SETUP 43

benchmark datasets, and thus, they do not necessarily represent their potential users
(e.g. theologian, botanist, etc.) very well. However, for reason of comparability to other
systems, we follow this practice in our evaluation.

Databases Keywords Train Test


Alvermann Konzilsprotokolle (AK) 200 1849 3734
Botany (BOT) 150 1684 3380

Table 4.1: Number of keywords and number of word images in


the training and test sets of AK and BOT [25].

Note that for the best results with the adapted cost functions proposed in section 3.1, not
only an optimization of γ is needed, but also a recalculation of the optimization of the
parameters τv , τe , α and β is required. However, an evaluation of all combinations of
parameter settings would go beyond the scope of this thesis. Therefore, we leave the
settings of the first four parameters fixed to the settings evaluated in [25] (see Table 2.1)
and optimize only the last parameter γ. The idea behind this decision is that the evaluation
of γ is sufficient to prove a beneficial impact of the extensions and their new distance.
In other words, if we find some setting for γ that improves the results achieved by the
baseline (Table 2.2), we show that the extension is beneficial without showing how
much benefit the task of Keyword Spotting (KWS) may get from these extensions.
Still, to reduce the risk of missing local extrema for the optimum γ setting, the set of
constants that are evaluated for the parameter setting is largely extended (compared to
the evaluated settings of τv , τe , α and β). We optimise the new parameter γ ∈ [0, 1] in
steps of 0.05 producing 20 optimisation results. After manually inspecting the outcome,
some parameter refinement for γ is done for each extension and dataset individually, that
are explained in more detail in the subsections of the next section 4.2.
44 CHAPTER 4. EXPERIMENTAL EVALUATION

4.2 Validation Results

4.2.1 Extension 1 - One edge to the furthest node

Results on AK

Inspecting the results for the 20 proposed values for γ (γ ∈ [0, 1] in steps of 0.05) on
the AK dataset, the best results for the Mean Average Precision (MAP) for extension 1
on the train set surpass the baseline by only a small margin. From γ = 0 to γ = 1, the
values for MAP kept improving reaching the maximum at γ = 0.95 before dropping to
the reference value of the baseline at γ = 1. Expecting a further improvement of the
calculated MAP, the values tested in the promising area for γ ∈ [0.9, 1] were refined
(γ ∈ [0.9, 1] in steps of 0.01).

The results of all values for γ can be seen in Figure 4.2a, while Figure 4.2b shows the
results in the most promising area only.
0.75 0.71

0.7
Accuracy

Accuracy

0.6 0.7

0.5

0.45 0.69
0.2 0.4 0.6 0.8 1 0.85 0.9 0.95 1
γ γ

(a) Validation results (b) Validation results


for γ ∈ [0, 1] for γ ∈ [0.8, 1]

Figure 4.2: Impact of γ on validation results of Extension 1 on AK

In Table 4.2, the best performing value for γ is shown, as well as its results on the
validation set and test set. Besides that, the results of the baseline of [25] is shown with
4.2. VALIDATION RESULTS 45

its results on both validation and test set. The value for γ = 1 in the baseline row is
implied by the non-use of the new extension. We can inspect an increase of MAP on
the validation set by an absolute of 1.3% of accuracy for the best γ setting. However,
the benefit of using extension 1 on AK can not be confirmed by the test results. From
the absolute improvement of only 0.1% a positive impact of the extension on the AK
manuscripts cannot be affirmed.

MAP for γ MAP for γ


γ
on val. set on test set
Extension 1 0.95 0.7048 0.7984
Reference system 1 0.6916 0.7975

Table 4.2: Validation and test results for extension 1 on AK


compared to the baseline

Results on BOT

When inspecting the results for γ (γ ∈ [0, 1] in steps of 0.05) on the BOT manuscripts, a
large range for settings for γ can be determined, that have a beneficial impact the MAP on
the validation set. From γ = 0 to γ = 1, the values for MAP keep improving, surpassing
the baseline at γ = 0.35 and reaching the maximum at γ = 0.75. However, the maximum
value for MAP at γ = 0.75 is not far beyond the reference value at γ = 1. Expecting a
further improvement of the calculated MAP, the values tested in the promising area for
γ ∈ [0.6, 0.9] were refined in steps of 0.02. After all, the refinement did not reveal any
further enhancement. The results of all values for γ can be seen in Figure 4.3a, while
Figure 4.3b shows the results in the most promising area only.

In Table 4.3, the best performing value for γ on BOT is shown, as well as its results
on the validation set and test set. Furthermore, Table 4.3 also presents the results on
both validation and test set of the baseline of [25]. The value for γ = 1 in the baseline
row is implied by the non-use of the new extension. We can inspect an increase of
MAP on the validation set by an absolute of 1.3% of accuracy for the best γ setting.
46 CHAPTER 4. EXPERIMENTAL EVALUATION

0.5 0.46

0.4
Accuracy

Accuracy
0.45

0.3

0.2 0.44
0.2 0.4 0.6 0.8 1 0.4 0.5 0.6 0.7 0.8 0.9 1
γ γ

(a) Validation results (b) Validation results


for γ ∈ [0, 1] for γ ∈ [0.35, 1]

Figure 4.3: Impact of γ on validation results of Extension 1 on BOT

Furthermore, the benefit of using extension 1 on BOT can be confirmed by the test results.
An improvement of 1.1% (absolute) shows a positive impact of the extension on the BOT
manuscripts.

MAP for γ MAP for γ


γ
on val. set on test set
Extension 1 0.75 0.4577 0.5262
Reference system 1 0.4446 0.5149

Table 4.3: Validation and test results for extension 1 on BOT


compared to the baseline

4.2.2 Extension 2 - One edge to the furthest node - new cost function

Results on AK

The results for evaluation of the 20 proposed values for γ (γ ∈ [0, 1] in steps of 0.05) of
extension 2 on the AK dataset surpass the baseline by only a small margin. From γ = 0
4.2. VALIDATION RESULTS 47

to γ = 1, the values for MAP keep improving, reaching the maximum at γ = 0.97 before
dropping to the reference value of the baseline at γ = 1. Like in the prior extension 1,
we refined the parameters for γ in the promising area [0.9, 1] in steps of 0.01, to be sure
to get closer to the best setting for γ, given the other fixed parameters.

The results of all values for γ can be seen in Figure 4.4a, while Figure 4.4b shows the
results in the most promising area only.

0.71
0.7
Accuracy

0.6 Accuracy 0.7

0.5

0.4 0.69
0.2 0.4 0.6 0.8 1 0.92 0.94 0.96 0.98 1
γ γ

(a) Validation results (b) Validation results


for γ ∈ [0, 1] for γ ∈ [0.9, 1]

Figure 4.4: Impact of γ on validation results of Extension 2 on AK

In Table 4.4, the best performing value for γ with its results on the validation and test
set is compared to the results of the baseline of [25] on both sets. The value for γ = 1
in the baseline row is implied by the non-use of the new extension. We can inspect an
increase of MAP on the validation set by an absolute of 1.3% of accuracy for the best γ
setting. However, the benefit of using extension 2 on AK can’t be confirmed by the test
results. From the absolute improvement of only 0.2% a positive impact of the extension
on the AK manuscripts cannot be affirmed. These results and analyses cover with the
conclusion of extension 1. Although we used different cost functions for the new edge
distances, we have got very similar results with a slight shift towards γ = 1 for the
optimum value in extension 2.
48 CHAPTER 4. EXPERIMENTAL EVALUATION

MAP for γ MAP for γ


γ
on val. set on test set
Extension 2 0.97 0.7044 0.799
Reference system 1 0.6916 0.7975

Table 4.4: Validation and test results for extension 2 on AK


compared to the baseline

Results on BOT

When inspecting the results for γ (γ ∈ [0, 1] in steps of 0.05) on the BOT manuscripts,
we also observe a similar behavior of the distribution of results, when comparing with
results of the first extension. A large range for settings for γ can be determined, that have
a beneficial impact the MAP on the validation set. From γ = 0 to γ = 1, the values for
MAP keep improving, surpassing the baseline at γ = 0.55 and reaching the maximum
at γ = 0.76. Once again, the evaluated values for γ are refined in steps of 0.02 in the
promising area γ ∈ [0.6, 0.9]. Thanks to the refinement, the new maximum at γ = 0.76
was discovered. However, the improvement of the MAP of the new value is marginal.

0.5

0.46
0.4
Accuracy

Accuracy

0.45
0.3

0.2 0.44
0.2 0.4 0.6 0.8 1 0.6 0.7 0.8 0.9 1
γ γ

(a) Validation results (b) Validation results


for γ ∈ [0, 1] for γ ∈ [0.5, 1]

Figure 4.5: Impact of γ on validation results of Extension 2 on BOT


4.2. VALIDATION RESULTS 49

The results of all values for γ can be seen in Figure 4.5a, while Figure 4.5b shows the
results in the most promising area only.

In Table 4.5, the best performing value for γ on BOT is shown, as well as its results
on the validation set and test set. Furthermore, Table 4.5 also presents the results on
both validation and test set of the baseline of [25]. The value for γ = 1 in the baseline
row is implied by the non-use of the new extension. We can inspect an increase of
MAP on the validation set by an absolute of 1.9% of accuracy for the best γ setting.
Furthermore, the benefit of using extension 2 on BOT can be confirmed by the test results.
An improvement of 1.1% (absolute) shows a positive impact of the extension on the BOT
manuscripts.

Thus, also for BOT, the results and analyses largely cover with the conclusion of exten-
sion 1. We got very similar results with a slight shift towards γ = 1 for the optimum
value in extension 2, meaning a small reduction of the influence of the additional distance
of the extension over the overall computed distance.

MAP for γ MAP for γ


γ
on val. set on test set
Extension 2 0.76 0.4638 0.5263
Reference system 1 0.4446 0.5149

Table 4.5: Validation and test results for extension 2 on BOT


compared to the baseline

4.2.3 Extension 3 - Three edges to the furthest nodes

Results on AK

The results for evaluation of the 20 proposed values for γ (γ ∈ [0, 1] in steps of 0.05) of
extension 3 on the AK dataset can be inspected in Figure 4.6a. Again, the best results for
50 CHAPTER 4. EXPERIMENTAL EVALUATION

the MAP for extension 3 on the train set surpasses the baseline by only a small margin.
From γ = 0 to γ = 1, the values for MAP kept improving reaching the maximum at
γ = 0.97 before dropping to the reference value of the baseline at γ = 1. Note the
further refinement of the parameters for γ in the promising area [0.9, 1] in steps of 0.01.
The results of the refinement values can be seen in Figure 4.6b more precisely.

0.75 0.71

0.7

0.7
Accuracy

Accuracy
0.6

0.69

0.5

0.45
0.2 0.4 0.6 0.8 1 0.9 0.95 1
γ γ

(a) Validation results (b) Validation results


for γ ∈ [0, 1] for γ ∈ [0.9, 1]

Figure 4.6: Impact of γ on validation results of Extension 3 on AK

In Table 4.6, the best performing value for γ and its results on the validation and test set
is compared to the results of the baseline of [25] on both sets.

MAP for γ MAP for γ


γ
on val. set on test set
Extension 3 0.99 0.7009 0.7977
Reference system 1 0.6916 0.7975

Table 4.6: Validation and test results for extension 3 on AK


compared to the baseline

The value for γ = 1 in the baseline row is implied by the non-use of the new extension.
We can inspect an increase of MAP on the validation set by an absolute of 0.9% of
accuracy for the best γ setting. However, once more, we do not see the same benefit from
the extension when inspecting the test results on AK. There, the absolute improvement
4.2. VALIDATION RESULTS 51

aims towards 0. Therefore we have to reject extension 3 for improvement of the MAP on
AK manuscripts.

Results on BOT

When inspecting the results for γ (γ ∈ [0, 1] in steps of 0.05) on the BOT manuscripts,
we observe a similar behavior of the distribution of results, when comparing with results
of the first extensions. From γ = 0 to γ = 1, the values for MAP keep improving,
surpassing the baseline at γ = 0.74 and reaching the maximum at γ = 0.84. Although
being smaller than in previous extensions, the range for settings for γ that have a
beneficial impact the MAP on the validation remains large (γ ∈ [0.74, 1)). The evaluated
values for γ are refined in steps of 0.02 in the promising area that starts just before the
surpass of the baseline at γ ∈ [0.7, 0.9]. Thanks to the refinement, a new maximum at
γ = 0.84 was discovered. However, the improvement of the MAP of the new value at
γ = 0.84 compared to the maximum before the refinement at γ = 0.75 is marginal.

The results of all values for γ can be seen in Figure 4.7a, while Figure 4.7b shows the
results in the most promising area only.

0.5 0.46

0.4
Accuracy

Accuracy

0.45

0.3

0.2 0.44
0.2 0.4 0.6 0.8 1 0.8 0.9 1
γ γ

(a) Validation results (b) Validation results


for γ ∈ [0, 1] for γ ∈ [0.7, 1]

Figure 4.7: Impact of γ on validation results of Extension 3 on BOT


52 CHAPTER 4. EXPERIMENTAL EVALUATION

In the summary of the results of extension 3 on BOT manuscripts in Table 4.7, we can
inspect an increase of MAP on the validation set by an absolute of 1.3% of accuracy for
the best γ setting. Furthermore, the benefit of using extension 3 on BOT can be confirmed
by the test results. An improvement of 1.6% (absolute) shows a positive impact of the
extension on the BOT manuscripts.

MAP for γ MAP for γ


γ
on val. set on test set
Extension 3 0.84 0.458 0.5309
Reference system 1 0.4446 0.5149

Table 4.7: Validation and test results for extension 3 on BOT


compared to the baseline

4.2.4 Extension 4 - One edge to the furthest node in each quadrant

Results on AK

0.75 0.71

0.7
Accuracy

Accuracy

0.6 0.7

0.5

0.45 0.69
0.2 0.4 0.6 0.8 1 0.6 0.7 0.8 0.9 1
γ γ

(a) Validation results (b) Validation results


for γ ∈ [0, 1] for γ ∈ [0.5, 1]

Figure 4.8: Impact of γ on validation results of Extension 4 on AK


4.2. VALIDATION RESULTS 53

The results for evaluation of the 20 proposed values for γ (γ ∈ [0, 1] in steps of 0.05) of
extension 4 on the AK dataset can be inspected in Figure 4.8a. From γ = 0 to γ = 1,
the values for MAP reach the maximum at γ = 0.72. Thus, they peak for a large range
of values at a high value of about 0.705 before dropping to the reference value of the
baseline at γ = 1. Even the parameter refinement for γ in the promising area [0.6, 1] in
steps of 0.02 did not reveal any significant maxima. The results of the refinement values
can be seen in Figure 4.8b more precisely.

In Table 4.8, the best performing value for γ and its results on the validation and test
set is compared to the results of the baseline of [25] on both sets. The value for γ = 1
in the baseline row is implied by the non-use of the new extension. We can inspect an
increase of MAP on the validation set by an absolute of 1.4% in accuracy for the best
γ setting. However, when inspecting the test results on AK, we do not see the same
beneficial impact of γ and the new distance. Thus, the absolute improvement in MAP
remains 0.6%.

MAP for γ MAP for γ


γ
on val. set on test set
Extension 4 0.72 0.7056 0.803
Reference system 1 0.6916 0.7975

Table 4.8: Validation and test results for extension 4 on AK


compared to the baseline

Results on BOT

The results for γ (γ ∈ [0, 1] in steps of 0.05) on the BOT manuscripts show a similar
distribution as in the extensions before. From γ = 0 to γ = 1, the values for MAP keep
improving, surpassing the baseline at γ = 0.5 and reaching the maximum at γ = 0.7.
After reaching the maximum, the results for MAP decrease with growing γ. The range
for settings for γ that have a beneficial impact the MAP on the validation remains is the
largest among all evaluated extensions (γ ∈ [0.5, 1)). The evaluated values for γ are
54 CHAPTER 4. EXPERIMENTAL EVALUATION

refined in steps of 0.02 for γ ∈ [0.6, 0.9]. However, no further improvement of MAP
could be found within the new evaluated settings for γ.

The results of all values for γ can be seen in Figure 4.9a, while Figure 4.9b shows the
results in the most promising area only.

0.5 0.46

0.4
Accuracy

Accuracy 0.45

0.3

0.2 0.44
0.2 0.4 0.6 0.8 1 0.5 0.6 0.7 0.8 0.9
γ γ

(a) Validation results (b) Validation results


for γ ∈ [0, 1] for γ ∈ [0.45, 1]

Figure 4.9: Impact of γ on validation results of Extension 4 on BOT

In the summary of the results of extension 4 on BOT manuscripts in Table 4.9, we can
inspect an increase of MAP on the validation set by an absolute of 1.2% of accuracy for
the best γ setting. Although the test results do not show improvement just as high as in
the train set, the benefit of using extension 4 on BOT can be confirmed by the test results
0.7% (absolute). Thus a positive impact on the BOT manuscripts can be attested to the
extension 4.

MAP for γ MAP for γ


γ
on val. set on test set
Extension 4 0.7 0.4564 0.5221
Reference system 1 0.4446 0.5149

Table 4.9: Validation and test results for extension 4 on BOT


compared to the baseline
4.3. SUMMARY 55

4.3 Summary

In this section, we want to bring together the validation results of all extensions and
compare their beneficial impact on the two datasets AK and BOT.

In Table 4.10, the results of the baseline as well as the results of the four proposed
extensions on the AK dataset are summarised. Furthermore, the absolute and relative
gain in MAP when compared to the baseline are listed. We can see a relatively small
absolute gain for all extensions. However, the by far best result for AK provides extension
4 with 0.55% absolute gain of MAP. Comparing the relative gain, extension 4 reaches
0.68%. The separation in quadrants and comparing the furthest nodes for each quadrant
individually seems to have a positive impact on the KWS task. However, just measuring
the substitution distance of an edge to the furthest node(s) ignoring the individual
quadrants (Ext1, extension 2, extension 3) cannot be confirmed as a beneficial addition
to the distance measure of the baseline.

γ MAP for γ absolute ± relative ±


Reference system 1 0.7975 0 0
Extension 1 0.95 0.7984 0.09 % 0.11 %
Extension 2 0.97 0.7990 0.15 % 0.19 %
Extension 3 0.99 0.7977 0.02 % 0.03 %
Extension 4 0.72 0.8030 0.55 % 0.68 %

Table 4.10: Test results of all extensions on AK compared to the baseline.


With ± we indicate the percental gain or loss in the MAP when compared to
the baseline. The best result per column is shown in bold face.

The gain in MAP is very small and most likely noise. This assessment is supported by
the γ values that originate from the parameter validation. The γ values in extension 1,
extension 2, and extension 3 all are close to 1, meaning the new distance almost has no
influence on the computed Graph Edit Distance (GED) for the matching of two word
graphs. In extension 4 however, we observe a significantly smaller value for γ = 0.72
56 CHAPTER 4. EXPERIMENTAL EVALUATION

supporting the statement of the positive impact of the Extension on the MAP.

In Table 4.11, a summary of the results of the four extensions on the BOT dataset is given.
We can inspect greater improvements than on AK for all four extensions. The by far best
result for BOT provides extension 3 with 1.6% absolute gain (or 3.01% relative gain) of
MAP. However, the prior best performing extension 4 shows the smallest absolute and
relative improvement on this record. This allows us to conclude that on the one hand,
the measuring of the substitution distance of an edge to the furthest node(s) ignoring
the individual quadrants is better suited for the BOT dataset and on the other hand, the
distinction between edges to furthest point for each quadrant does not achieve the same
improvements in MAP.

γ MAP for γ absolute ± relative ±


Reference system 1 0.5149 0 0
Extension 1 0.75 0.5262 1.13 % 2.15 %
Extension 2 0.76 0.5263 1.14 % 2.17 %
Extension 3 0.84 0.5309 1.60 % 3.01 %
Extension 4 0.7 0.5221 0.72 % 1.38 %

Table 4.11: Test results of all extensions on BOT compared to the baseline.
With ± we indicate the percental gain or loss in the MAP when compared to
the baseline. The best result per column is shown in bold face.

When comparing the results on the two datasets, we have to consider the large discrepancy
between the different MAP results of the baseline. The relatively small MAP values
of the baseline on the BOT dataset generally lets room for larger improvements of
extensions, while the better performance of the baseline on AK limits the room for
improvement. In fact, when comparing the relative improvements in terms of error
correction, the performance of extension 4 on AK (2.79%) gets close to the performance
of extension 3 on BOT (3.41%).
5
Conclusion and Future Work

In the last decades, many handwritten historical documents have been digitized and made
publicly available. However, the accessibility to these documents is often limited, and
thus, a certain gap between availability and accessibility can be observed. Reason for that
might be the rather large variations (in style, size, and signs of degradation) in case of
handwritten ancient documents, making an automatic full transcription often not feasible.

To bridge this gap, Keyword Spotting (KWS) has been proposed as a flexible and more
error-tolerant alternative to full transcriptions. KWS allows to retrieve arbitrary keywords
in a given document and therefore increases the accessibility of these documents. While
many KWS approaches are based on a statistical representation of certain characteristics
of the handwriting, we observe only few and rather limited approaches that use a structural
representation of the documents content by means of strings, trees, or graphs. This is

57
58 CHAPTER 5. CONCLUSION AND FUTURE WORK

rather surprising as graphs offer some inherent advantages when compared with feature
vectors. In particular, graphs are able to adapt their size and structure to the complexity
of the underlying pattern. Moreover, graphs are able to represent binary relationships
among the subparts of the pattern. Both of these characteristics are highly beneficial for
representing handwriting. As a result, we see a large potential for graph-based KWS.
However, currently graph-based KWS is rarely used in real world applications. This
is due to the fact of the large computational power that is needed that comes with the
complexity of the task of comparing graphs with each other.

For comparing graphs, Graph Edit Distance (GED) has been proposed. GED is a flexible
and powerful paradigm for inexact graph matching, that basically measures the minimum
amount of distortion needed to transform one graph into another, given a set of edit
operations. However, GED entails the issue of time complexity, since its runtime is
exponential with respect to the number of nodes compared. To overcome this limitation,
a number of fast suboptimal algorithms for GED with cubic (Bipartite Graph Edit
Distance (BP)) or quadratic time complexity (Hausdorff Edit Distance (HED)) have been
proposed in the last years. Many variations of these two heuristics have been compared
in terms of performance and runtime on different datasets in a recently published thesis
[25].

From the mentioned thesis, we pick up the experimental setup for keypoint graphs
and HED, one of the most promising approaches for KWS in historical manuscripts.
Furthermore, we take over two datasets on which the performance of the approaches are
tested and which were part of a recent KWS benchmark competition [26], viz. Alvermann
Konzilsprotokolle (AK) and Botany (BOT). We adopt the graph preprocessing steps and
extend the resulting keypoint graphs with new edges. In addition to the adapted graphs,
the cost function of the edit operations for the GED are adjusted to be able to cope with
the new edges. After matching a query graph with a number of document graphs by
computing the HED for each pair of graphs, the resulting set of graph dissimilarities is
used as a retrieval index. In the best possible case, this retrieval index consists of all
instances of a keyword as its top results

As the baseline, we adopt the Mean Average Precision (MAP) results of the KWS
59

framework based on HED on AK and BOT of [25]. We compare the results of the
baseline to four approaches that all extend the former keypoint graph representations
with additional edges to existing nodes in the graph and expand the corresponding cost
function. The experimental evaluation of the present thesis investigates the benefits of
these extensions by evaluating a weighting parameter for the baseline and the newly
added distance. Note that the former parameters in the cost functions were optimized
without respect to the altered cost functions and therefore do not necessarily provide
optimum results. However, the core aspect of this thesis is to show that improvement of
the KWS task of the baseline in terms of MAP is possible by adding edges to the initial
keypoint graphs and not to find a global optimum parameter setting.

In the experiments of the present thesis, we present MAP results for four graph extensions.
Although showing a rather small improvement compared to the baseline, the experiments
allow the conclusion that supporting the baseline framework with further preprocessed
graphs is beneficial for the KWS task.

To support the indication of a beneficial impact of the proposed extensions, further


experiments are needed. On the one hand, global optimization of the weighting parameter
γ is likely to reveal new maxima in terms of the MAP result. This optimization requires
large resources in terms of implementation, computation, and evaluation that go beyond
the scope of this thesis, but that could lead to better results and a stronger indication of
the beneficial impact of the extensions. On the other hand, considering the deletion and
insertion for the newly added edges with penalty costs could reveal further improvements.
Allowing these operations also for the new edges makes the extensions more flexible and
brings the experiments closer to the core idea of error-tolerance of KWS and GED, what
makes them so powerful and adaptive for handwriting recognition in the first place.

In this thesis we have shown that information about the structure of a graph integrated
in its components (nodes) can be beneficial for the KWS task. Although the evaluated
extensions in this thesis make only a small contribution to the improvement of GED
heuristics, they seem to be going in the right direction.
Bibliography

[1] C. M. Bishop, Pattern recognition and machine learning. springer, 2006.

[2] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing
human-level performance on imagenet classification,” in Proceedings of the IEEE
international conference on computer vision, 2015, pp. 1026–1034.

[3] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, and S. Thrun,


“Dermatologist-level classification of skin cancer with deep neural networks,” Na-
ture, vol. 542, no. 7639, p. 115, 2017.

[4] J. Edwards, Y. W. Teh, R. Bock, M. Maire, G. Vesom, and D. A. Forsyth, “Making


latin manuscripts searchable using gHMM’s,” in Advances in Neural Information
Processing Systems, 2005, pp. 385–392.

[5] S. N. Srihari, “Recognition of handwritten and machine-printed text for postal


address interpretation,” Pattern recognition letters, vol. 14, no. 4, pp. 291–302,
1993.

[6] B. Horst and I. Sebastiano, Automatic bankcheck processing. World Scientific,


1997, vol. 28.

[7] A.-M. Awal, H. Mouchere, and C. Viard-Gaudin, “Towards handwritten mathemati-


cal expression recognition,” in 2009 10th International Conference on Document
Analysis and Recognition. IEEE, 2009, pp. 1046–1050.
BIBLIOGRAPHY

[8] A. Baro, P. Riba, and A. Fornés, “Towards the recognition of compound music notes
in handwritten music scores,” in 2016 15th International Conference on Frontiers
in Handwriting Recognition (ICFHR). IEEE, 2016, pp. 465–470.

[9] A. Fischer, “Handwriting recognition in historical documents,” Ph.D. dissertation,


2012.

[10] A. Antonacopoulos and A. C. Downton, “Special issue on the analysis of historical


documents,” International Journal on Document Analysis and Recognition, vol. 9,
no. 2, pp. 75–77, 2007.

[11] A. Fischer, A. Keller, V. Frinken, and H. Bunke, “HMM-based word spotting


in handwritten documents using subword models,” in 2010 20th International
Conference on Pattern Recognition. IEEE, 2010, pp. 3416–3419.

[12] S.-S. Kuo and O. E. Agazzi, “Keyword spotting in poorly printed documents using
pseudo 2-d hidden markov models,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 16, no. 8, pp. 842–848, 1994.

[13] R. Manmatha, C. Han, and E. M. Riseman, “Word spotting: A new approach to


indexing handwriting,” in Proceedings CVPR IEEE Computer Society Conference
on Computer Vision and Pattern Recognition. IEEE, 1996, pp. 631–637.

[14] R. C. Rose and D. B. Paul, “A hidden markov model based keyword recognition
system,” in International Conference on Acoustics, Speech, and Signal Processing.
IEEE, 1990, pp. 129–132.

[15] T. M. Rath and R. Manmatha, “Word image matching using dynamic time warping,”
in 2003 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, 2003. Proceedings., vol. 2, June 2003, pp. II–II.

[16] J. A. Rodriguez-Serrano and F. Perronnin, “Handwritten word-spotting using


hidden markov models and universal vocabularies,” Pattern Recognition,
vol. 42, no. 9, pp. 2106 – 2116, 2009. [Online]. Available: http:
//www.sciencedirect.com/science/article/pii/S0031320309000673
BIBLIOGRAPHY

[17] V. Frinken, A. Fischer, R. Manmatha, and H. Bunke, “A novel word spotting method
based on recurrent neural networks,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 34, no. 2, pp. 211–224, Feb 2012.

[18] D. Conte, P. Foggia, C. Sansone, and M. Vento, “Thirty years of graph matching
in pattern recognition,” International journal of pattern recognition and artificial
intelligence, vol. 18, no. 03, pp. 265–298, 2004.

[19] V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and


reversals,” in Soviet physics doklady, vol. 10, no. 8, 1966, pp. 707–710.

[20] H. Bunke and G. Allermann, “Inexact graph matching for structural pattern recog-
nition,” Pattern Recognition Letters, vol. 1, no. 4, pp. 245–253, 1983.

[21] M. Eshera and K.-S. Fu, “A graph distance measure for image analysis,” IEEE
transactions on systems, man, and cybernetics, no. 3, pp. 398–408, 1984.

[22] K. Riesen, “Structural pattern recognition with graph edit distance,” Advances in
computer vision and pattern recognition, Cham, 2015.

[23] K. Riesen, M. Neuhaus, and H. Bunke, “Bipartite graph matching for computing
the edit distance of graphs,” in International Workshop on Graph-Based Represen-
tations in Pattern Recognition. Springer, 2007, pp. 1–12.

[24] A. Fischer, C. Y. Suen, V. Frinken, K. Riesen, and H. Bunke, “Approximation of


graph edit distance based on hausdorff matching,” Pattern Recognition, vol. 48,
no. 2, pp. 331–343, 2015.

[25] M. Stauffer, “Graph-based keyword spotting in handwritten historical documents,”


In print.

[26] I. Pratikakis, K. Zagoris, B. Gatos, J. Puigcerver, A. H. Toselli, and E. Vidal,


“ICFHR2016 Handwritten Keyword Spotting Competition (H-KWS 2016),” in 2016
15th International Conference on Frontiers in Handwriting Recognition (ICFHR),
Oct 2016, pp. 613–618.
BIBLIOGRAPHY

[27] A. Fischer, K. Riesen, and H. Bunke, “Graph similarity features for HMM-based
handwriting recognition in historical documents,” in 2010 12th International Con-
ference on Frontiers in Handwriting Recognition, Nov 2010, pp. 253–258.

[28] A. Fischer, E. Indermühle, H. Bunke, G. Viehhauser, and M. Stolz, “Ground truth


creation for handwriting recognition in historical documents,” in Proceedings
of the 9th IAPR International Workshop on Document Analysis Systems, ser.
DAS ’10. New York, NY, USA: ACM, 2010, pp. 3–10. [Online]. Available:
http://doi.acm.org/10.1145/1815330.1815331

[29] U. V. Marti and H. Bunke, “Using a statistical language model to


improve the performance of an HMM-based cursive handwriting recognition
system,” in Hidden Markov Models, pp. 65–90. [Online]. Available: https:
//www.worldscientific.com/doi/abs/10.1142/9789812797605_0004

[30] R. van den Boomgaard and R. van Balen, “Methods for fast morphological
image transforms using bitmapped binary images,” CVGIP: Graphical Models
and Image Processing, vol. 54, no. 3, pp. 252 – 258, 1992. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/1049965292900553

[31] Z. Guo and R. W. Hall, “Parallel thinning with two-subiteration algorithms,”


Commun. ACM, vol. 32, no. 3, pp. 359–373, Mar. 1989. [Online]. Available:
http://doi.acm.org/10.1145/62065.62074

[32] M. C. Boeres, C. C. Ribeiro, and I. Bloch, “A randomized heuristic for scene


recognition by graph matching,” in International Workshop on Experimental and
Efficient Algorithms. Springer, 2004, pp. 100–113.

[33] M. Neuhaus, K. Riesen, and H. Bunke, “Fast suboptimal algorithms for the compu-
tation of graph edit distance,” in Joint IAPR International Workshops on Statistical
Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern
Recognition (SSPR). Springer, 2006, pp. 163–172.

[34] K. Zhang and T. Jiang, “Some max snp-hard results concerning unordered labeled
trees,” Information Processing Letters, vol. 49, no. 5, pp. 249–254, 1994.
BIBLIOGRAPHY

[35] S. M. Selkow, “The tree-to-tree editing problem,” Information processing letters,


vol. 6, no. 6, pp. 184–186, 1977.

[36] W.-H. Tsai and K.-S. Fu, “Error-correcting isomorphisms of attributed relational
graphs for pattern analysis,” IEEE Transactions on systems, man, and cybernetics,
vol. 9, no. 12, pp. 757–768, 1979.

[37] K. Riesen, A. Fischer, and H. Bunke, “Estimating graph edit distance using lower
and upper bounds of bipartite approximations,” International Journal of Pattern
Recognition and Artificial Intelligence, vol. 29, no. 02, p. 1550011, 2015.

[38] J. Munkres, “Algorithms for the assignment and transportation problems,” Journal
of the society for industrial and applied mathematics, vol. 5, no. 1, pp. 32–38, 1957.

[39] K. Riesen and H. Bunke, “Approximate graph edit distance computation by means
of bipartite graph matching,” Image and Vision computing, vol. 27, no. 7, pp.
950–959, 2009.

[40] D. P. Huttenlocher, W. J. Rucklidge, and G. A. Klanderman, “Comparing images


using the hausdorff distance under translation,” in Proceedings 1992 IEEE Com-
puter Society Conference on Computer Vision and Pattern Recognition. IEEE,
1992, pp. 654–656.

List of Acronyms

PR Pattern Recognition

ML Machine Learning

AI Artificial Intelligence

HWR Handwriting Recognition

DTW Dynamic Time Warping


BIBLIOGRAPHY

HMM Hidden Markov Model

RNN Recurrent Neural Network

KWS Keyword Spotting

AK Alvermann Konzilsprotokolle

BOT Botany

DoG Difference of Gaussians

GED Graph Edit Distance

BP Bipartite Graph Edit Distance

HED Hausdorff Edit Distance

LSAP Linear Sum Assignment Problem

AP Average Precision

MAP Mean Average Precision

You might also like