Ontology Learning For The Semantic Web 2002

THE KLUWER INTERNATIONAL SERIES
IN ENGINEERING AND COMPUTER SCIENCE

ONTOLOGY
LEARNING FOR THE
SEMANTIC WEB
ONTOLOGY
LEARNING FOR THE
SEMANTIC WEB
by
Alexander Maedche
University of Karlsruhe, Germany
SPRINGER SCIENCE+BUSINESS MEDIA, LLC

Library of Congress Cataloging-in-Publication Data
Maedche, Alexander D.
Ontology learning for the semantic Web / by Alexander D. Maedche.
p. cm.
Includes bibliographical references and index.
ISBN 978-1-4613-5307-2 ISBN 978-1-4615-0925-7 (eBook)
DOI 10.1007/978-1-4615-0925-7
1. Web site development. 2. Metadata. 3. Ontology. 4. Artificial intelligence. 1. Title.
TK5105.888 .M33 2002

005.2 , 76--<1c2 I
2001058188
Copyright © 2002 by Springer Science+Business Media New York

Originally published by Kluwer Academic Publishers. in 2002
Softcover reprint of the hardcover 1st edition 2002
AII rights reserved. No part of this publication may be reproduced, stored in a retrieval
system or transmitted in any form or by any means, mechanical, photo-copying, record ing, or
otherwise, without the prior written permission of the publisher.
Printed an acid-free paper.

Contents
List of Figures ix
List of Tables xiii
Preface xv
Acknowledgements xviii
Foreword by R. Studer XIX
Part I Fundamentals
1. INTRODUCTION 3
1 Motivation & Problem Description 3
2 Research Questions 4
3 Reader's Guide 6
2. ONTOLOGY - DEFINITION & OVERVIEW 11

1 Ontologies for Communication - A Layered Approach 15
2 Development & Application of Ontologies 21
3 Conclusion 25
3. LAYERED ONTOLOGY ENGINEERING 29
1 Ontology Engineering Framework 30
2 Layered Representation 34
3 Conclusion 49
3.1 Further Topics in Ontology Engineering 50
3.2 Ontology Learning for Ontology Engineering 51
vi ONTOLOGY LEARNING FOR THE SEMANTIC WEB
Part II Ontology Learning for the Semantic Web

4. ONTOLOGY LEARNING FRAMEWORK 59
1 A Taxonomy of Relevant Data for Ontology Learning 60
2 An Architecture for Ontology Learning 66
2.1 Overview of the Architecture Components 66
2.2 Ontology Engineering Workbench ONTO EDIT 68
2.3 Data Import & Processing Component 70
2.4 Algorithm Library 71
2.5 Graphical User Interface & Management Component 72
3 Phases of Ontology Learning 73
3.1 Import & Reuse 74
3.2 Extract 75
3.3 Prune 76
3.4 Refine 77
4 Conclusion 78
5. DATA IMPORT & PROCESSING 81
1 Importing & Processing Existing Ontologies 83
1.1 Ontology Wrapper & Import 84
1.2 FCA-MERGE - Bottom-Up Ontology Merging 85
2 Collecting, Importing & Processing Documents 95
2.1 Ontology-focused Document Crawling 95
2.2 Shallow Text Processing using SMES 97
2.3 Semi-Structured Document Wrapper 105
2.4 Transforming Data into Relational Structures 107
3 Conclusion 112
3.1 Language Processing for Ontology Learning 112
3.2 Ontology Learning from Web Documents 113
3.3 (Multi-)Relational Data 114
6. ONTOLOGY LEARNING ALGORITHMS 117
1 Algorithms for Ontology Extraction 118
1.1 Lexical Entry Extraction 118
1.2 Taxonomy Extraction 122
1.3 Non-Taxonomic Relation Extraction 130
2 Algorithms for Ontology Maintenance 140
2.1 Ontology Pruning 140
2.2 Ontology Refinement 142
Contents vii
3 Conclusion 144
3.1 Multi-Strategy Learning 145
3.2 Taxonomic vs. Non-Taxonomic Relations 145
3.3 A Note on Learning Axioms - AO 146
Part III Implementation & Evaluation

7. THE TEXT-TO-ONTO ENVIRONMENT 151
1 Component-based Architecture 153
2 Ontology Engineering Environment ONTO EDIT 154
3 Components for Ontology Learning 163
4 Conclusion 168
8. EVALUATION 171
1 The Evaluation Approach 172
2 Ontology Comparison Measures 173
2.1 Precision and Recall 174
2.2 Lexical Comparison Level Measures 175
2.3 Conceptual Comparison Level Measures 177
3 Human Performance Evaluation 183
3.1 Ontology Engineering Evaluation Study 184
3.2 Human Evaluation - Precision and Recall 185
3.3 Human Evaluation - Lexical Comparison Level 187
3.4 Human Evaluation - Conceptual Comparison Level 188
4 Ontology Learning Performance Evaluation 190
4.1 The Evaluation Setting 191
4.2 Evaluation of Lexical Entry Extraction 191
4.3 Evaluation of Concept Hierarchy Extraction 193
4.4 Evaluation of Non-Taxonomic Relation Extraction 194
5 Conclusion 196
5.1 Application-oriented Evaluation 197
5.2 Standard Datasets for Evaluation 198
viii ONTOLOGY LEARNING FOR THE SEMANTIC WEB
Part IV Related Work & Outlook

9. RELATED WORK 203
1 Related Work on Ontology Engineering 204
2 Related Work on Frameworks of KA & ML 209
3 Related Work on Data Import & Processing 212
4 Related Work on Algorithms 214
5 Related Work on Evaluation 219
10. CONCLUSION & OUTLOOK 223
1 Contributions 223
2 Insights into Ontology Learning 224
3 Unanswered Questions 225
4 Future Research 226
References 228
Index 242
List of Figures
1.1 Reading this Book 8

2.1 The Meaning Triangle 14
2.2 Ontologies for Communication 15
2.3 Example: Instantiated Ontology Structure 19
2.4 Different Kinds of Ontologies 22
2.5 Relational Metadata on the Semantic Web 24
3.1 Layered Ontology Engineering 31
3.2 Representation Layers 35
3.3 An RDF Example 36
3.4 An RDF-Schema Example 38
3.5 An Example for the RDF-Schema Serialization Syntax 40
3.6 XML Serialization of RDF Instances 40
3.7 OntoEdit Representation Vocabulary 41
3.8 A Concrete Representation of the Lexicon 42
3.9 OIL Extensions of RDF(S) 42
3.10 Extending RDF(S) using Semantic Patterns 46
4.1 Taxonomy of Relevant Data for Ontology Learning 60
4.2 Architecture for Ontology Learning 67
4.3 OntoEdit Screenshot 69
4.4 Ontology Learning Cycle 73
5.1 Import and Processing Modules 82
5.2 WordNet and GermaNet Example 86
5.3 Ontology Merging Method 88
5.4 Two Example Contexts Kl and K2 90
5.5 The Pruned Concept Lattice 91
x ONTOLOGY LEARNING FOR THE SEMANTIC WEB
5.6 Natural Language Processing System Architecture 98

5.7 Example SMES Output - Morphological Component 101
5.8 Dependency Grammer Description 102
5.9 Example SMES Output - Dnderspecificed Dependency
Structure (abbreviated) 103
5.10 Example for a Heuristic Concept Relation 105
5.11 Example Normalized Dictionary Entry in RDF 106
5.12 Concept Matrix Generation View 111
6.1 Hierarchy Clustering with Labeling 127
6.2 Example Pattern for Dictionary Descriptions 129
6.3 Dictionary-based Extracted Concept Hierarchy 130
6.4 An Example Concept Taxonomy as Background Know l-
edge for Non-Taxonomic Relation Extraction 136
6.5 Hierarchical Order on Extracted Non-Taxonomic Relations 138
7.1 TEXT-To-ONTO Components 153
7.2 TEXT- TO-ONTO Ontology Learning Environment 154
7.3 OntoEdit's View for Lexical Layer Definition 155
7.4 View for Modeling Concepts and Taxonomic Relations 156
7.5 Views for Modeling Non-Taxonomic Relations 157
7.6 View for Modeling Inverse Relations 158
7.7 View for Modeling Disjoint Concepts 159
7.8 ONTO EDIT'S Knowledge Base View 160
7.9 F-Logic Axiom Engineering 161
7.10 View for Querying the 8ILRI F-Logic Inference Engine 161
7.11 View for Data Selection and Processing 163
7.12 Graphical Interface for Pattern Engineering 164
7.13 Non-Taxonomic Relation Extraction Algorithm View 165
7.14 Result Presentation View 167
7.15 Graph-based Visualization 167
8.1 Levels for Evaluating Ontology Learning 173
8.2 Introduction of Precision and Recall 174
8.3 Example for Computing SC 178
8.4 Two Example Ontologies 0 1 , 02 179
8.5 Example for Computing DC and CM 181
8.6 Two Example Ontologies 0 1 , 02 183
8.7 Measuring Human Modeling Performance 183
8.8 Precision and Recall for Lexical Entry Modeling 185
Contents xi
8.9 Precision and Recall for Concept Hierarchy Modeling 186

8.10 Precision and Recall for Non-Taxonomic Relation Modeling 187
8.11 Measuring Ontology Learning Performance 190
8.12 Precision and Recall for Lexical Entry Extraction 192
8.13 Precision and Recall for Taxonomic Relations Discovery 193
8.14 TO of Discovered Taxonomic Relations 194
8.15 Precision and Recall of Non-Taxonomic Relation Discovery 196
9.1 Taxonomy of Related Work 205
List of Tables
3.1 Mapping of 0 and KB to F-Logic 48

5.1 Building an Ontology Wrapper for GermaNet 85
5.2 Example Lexical Entry-Lexical Entry Relation 108
5.3 Example ConceptJLexical Entry-Concept Relation 110
5.4 Example Document-Concept Relation 111
5.5 Example Concept-Transaction Relation 112
5.6 Document Structure Profile 114
6.1 Example Matrix r ee 126
6.2 Examples for Linguistically Related Pairs of Concepts 136
6.3 Examples of Discovered Non-Taxonomic Relations 137
6.4 Example matrix r de 143
8.1 Basic Statistics - Phase I / Phase II / Phase III 185
8.2 Precision and Recall for Non-Taxonomic Relation Modeling 186
8.3 SM(CC i, CC j), SM(C1\ , C R j) for Phase I-Ontologies. 187
8.4 Typical String Matches 188
8.5 TO( Oi, OJ), RO( Oi, OJ) for Phase I-Ontologies. 189
8.6 TO(Oi, OJ), RO(Oi, OJ) for Phase II-Ontologies. 189
8.7 RO( Oi, OJ) for Phase III -Ontologies. 189
8.8 Number of Proposed Lexical Entries 192
8.9 Evaluation Results for Non-Taxonomic Relation Extraction 195
9.1 Example Categorization 216
Preface
The web in its' current form is an impressive success with a growing number
of users and information sources. However, the growing complexity of the web
is not reflected in the current state of Web technology. The heavy burden of
accessing, extracting, interpretating and maintaining is left to the human user.
Tim Bemers-Lee, the inventor of the WWW, coined the vision of a Semantic
Web in which background knowledge on the meaning Web resources is stored
through the use of machine-processable (meta-)data. The Semantic Web should
bring structure to the content of Web pages, being an extension of the current
Web, in which information is given a well-defined meaning. Thus, the Semantic
Web will be able to support automated services based on these descriptions of
semantics. These descriptions are seen as a key factor to finding a way out of
the growing problems of traversing the expanding web space, where most web
resources can currently only be found through syntactic matches (e.g., keyword
search).
Ontologies have shown to be the right answer to these structuring and mod-
eling problems by providing a formal conceptualization of a particular domain
that is shared by a group of people. Thus, in the context of the Semantic Web,
ontologies describe domain theories for the explicit representation of the seman-
tics of the data. The Semantic Web relies heavily on these formal ontologies that
structure underlying data enabling comprehensive and transportable machine
understanding. Though ontology engineering tools have matured over the last
decade, the manual building of ontologies still remains a tedious, cumbersome
task which can easily result in a knowledge acquisition bottleneck. The suc-
cess of the Semantic Web strongly depends on the proliferation of ontologies,
which requires that the engineering of ontologies be completed quickly and
easily. When using ontologies as a basis for Semantic Web applications, one
has to face exactly this issue and in particular questions about development
time, difficulty, confidence and the maintenance of ontologies. Thus, what one
ends up with is similar to what knowledge engineers have dealt with over the
xvi ONTOLOGY LEARNING FOR THE SEMANTIC WEB
last two decades when elaborating methodologies for knowledge acquisition

or workbenches for defining knowledge bases. A method which has proven to
be extremely beneficial for the knowledge acquisition task is the integration of
knowledge acquisition with machine learning techniques.
This book is based on the idea of applying knowledge discovery to multiple
data sources to support the task of developing and maintainit:lg ontologies.
The notion of Ontology Learning aims at the integration of a multitude of
disciplines in order to facilitate the construction of ontologies, in particular
machine learning. Ontology Learning greatly facilitates the construction of
ontologies by the ontology engineer. The vision of Ontology Learning that
is proposed here includes a number of complementary disciplines that feed
on different types of unstructured and semi-structured data in order to support
a semi-automatic ontology engineering process. Because the fully automatic
acquisition of knowledge by machines remains in the distant future, the overall
process is considered to be semi-automatic with human intervention. It relies
on the "balanced cooperative modeling" paradigm, describing a coordinated
interaction between human modeler and learning algorithm for the construction
of ontologies for the Semantic Web. This objective in mind, an approach that
combines ontology engineering with machine learning is described, feeding on
the resources that we nowadays find on the Web.
This book is split into four parts: In the first part the basics on the history of
ontologies, as well as their engineering and embedding into applications for the
Semantic Web are systematically introduced. This portion of the book includes
a formal definition of what an ontology is and a collection of ontology-based
application examples in the Semantic Web. Subsequently, a layered ontology
engineering framework is introduced. The framework uses a layered repre-
sentation of ontologies based on W3C standards such as RDF(S) and its' cur-
rent extensions being created by the knowledge engineering and representation
community. The second part establishes a generic framework for Ontology
Learning for the Semantic Web. It discusses a wide range of different types of
existing data on the current Web relevant to Ontology Learning. The Ontology
Learning framework proceeds through ontology import, extraction, pruning and
refinement and gives the ontology engineer a wealth of coordinated tools for on-
tology engineering. Besides the general framework and architecture, a number
of techniques for importing, processing and learning from existing data are in-
troduced, such as HTML documents and dictionaries. The third part of the book
describes the implementation and evaluation of the proposed ontology learning
framework. First, it describes the developed ontology engineering workbench,
ONTOEDIT, supporting manual engineering and the maintenance of ontolo-
gies based on the fundamentals introduced in the first part of the book. Second,
the ontology learning environment TEXT- TO-ONTO implements the ontology
learning framework as shown in the second chapter of the book. An important
PREFACE xvii
aspect of applying ontology learning techniques deals with the question of how
to measure the quality of the application of these techniques. Therefore, the
third part of this book introduces a new approach and measures for evaluating
ontology learning based on the well-known idea of having gold standards as
evaluation references. The fourth part of this book provides a detailed overview
of existing work that emphasizes topics of interest with similarities to the task
of ontology learning. It analyzes a multitude of disciplines (ranging from in-
formation retrieval, information extraction and machine learning to databases).
The book concludes with a summary of contributions and insights gained. Fi-
nally, a vision of the future and a discussion of future challenges in regards to
the Semantic Web is delineated.
ALEXANDER MAEDCHE
Acknowledgements
Writing a book is a complex project in that many people are involved. I thank
all people supporting me in my research and especially in writing this book. I
appreciate very much the important roles that my colleagues Michael Erdmann,
Siegfried Handschuh, Andreas Hotho, Gerd Stumme, Nenad Stojanovic, Ljil-
jana Stojanovic, York Sure, and Raphael Volz played. I thank all my students
that supported me in my work by doing implementation and evaluation work.
Very special thanks to Raphael Volz, now one of my colleagues, who did heavy
implementation work in his master thesis. Stefan Decker, the Semantic Web
initiator at our research group in Karlsruhe, always and at any time was open
for useful comments. Special thanks to Steffen Staab for giving me the first
ideas on Ontology Learning for the Semantic Web. He always was open for
crazy discussions producing new ideas. I thank Rudi Studer, my advisor and
leader of the research group. He supported me in making great experiences
during my time at Karlsruhe. His way of leading me and the overall research
group created a prolific research environment. Thanks to Jdrg-Uwe Kietz that
provided useful input and comments to my work on ontology learning. Without
all of them them, this work would not have been possible.
I thank may parents that financed and supported my long stay at the university.
Mostly, however, I must thank my friend and wife, Ellen, who always accepted
when I was saying that there will come better times with less work. Thank to
all of you for being there.
Alexander Maedche
Karlsruhe, Germany
Foreword
The success of the Web today can be explained to a large extent by its sim-
plicity, i.e. the low level technical know-how that is needed to put information
into the Web and to access Web information by browsing and keyword-based
search. However, the volume of information that is nowadays available on the
Web makes the limits of the current Web drastically obvious for its users: find-
ing relevant information among millions of Web pages becomes more and more
a heavy burden, and more than once it becomes impossible.
The development of the Semantic Web is a promising path towards trans-
forming the Web into a semantically grounded information space that makes
information accessible in a semantic way. It is a common understanding that
machine-processable metadata that come with a semantic foundation as pro-
vided by ontologies, establish the technological basis for such a semantic pro-
cessing of Web information.
All experience in practical settings shows that the engineering of ontologies
is a crucial bottleneck when setting up Semantic Web applications. Further-
more, in fast changing market environments outdated ontologies mean outdated
applications. As a consequence, the systematic management of the evolution
of ontologies is a bottleneck as well.
Rather recently, these challenges gave rise to a new research area: "Ontol-
ogy Learning". Ontology Learning aims at developing methods and tools that
reduce the manual effort for engineering and managing ontologies. Ontology
Learning is an inherently interdisciplinary area bringing together methods from
ontology engineering, knowledge representation, machine learning, computa-
tionallinguistics and information extraction. Nowadays, there is no chance to
fully automate these learning processes. Therefore all approaches assume some
cooperation between humans and machines, i.e. they provide semi-automatic
means for ontology engineering and evolution.
This book describes a comprehensive framework for Ontology Learning.
This framework addresses for the first time the specific aspects of Ontology
xxii ONTOLOGY LEARNING FOR THE SEMANTIC WEB
Learning that arise in the context of the Semantic Web, e.g. the heterogeneity
of the Web sources and the layered representation of Web-based ontologies.
Ontology Learning relies on a tight integration of shallow linguistic process-
ing with ontology representation. Therefore, the Ontology Learning framework
defines a new notion of ontology that establishes precisely defined links between
a linguistic layer, an ontology, and an associated knowledge base that populates
the ontology. This integration paves the way for transforming lexical entries
and linguistic asssociations into conceptual entries of the ontology and related
conceptual relations.
The framework exploits a process-oriented view for Ontology Learning that
distinguishes between the phases Import, Extract, Prune, and Refine. Thus,
Ontology Learning is decomposed into subtasks that address specific aspects
and can therefore solved with methods that are tailored to these subtask-specific
challenges. Given the heterogeneity of the sources that are available in the Web
context as well as the diversity of the different ontology learning tasks it is obvi-
ous that no single learning approach can meet all these different requirements.
Therefore, the framework defines a system architecture that supports multi-
strategy learning, i.e. the results of different learning methods are combined
in order to achieve sufficiently good learning results. Thus, the framework is
open for adding new learning algorithms that may improve the learning results.
The description of the framework elaborates different learning subtasks, espe-
cially the import of ontologies (including ontology integration), the extraction
of ontologies from semi-structured sources, the learning of non-taxonomic rela-
tions, and the pruning of ontologies. As such, a broad collection of techniques
is integrated into the Ontology Learning framework. A considerable part of
the framework have been implemented in the ontology engineering framework
OntoEdit and the learning environment Text-To-Onto.
When learning ontologies an immediate question arises: what is the qual-
ity of the learning results. This is a rather tough problem since there do not
exist obvious quality standards. The ontology learning framework addresses
this problem by introducing a collection of measures for comparing ontologies
to each other. First evaluations indicate that the manual engineering and the
learning of ontologies supplement each other in a nice way and thus open the
way for further elaborating of how to arrange the cooperation between human
and machine for ontology learning.
The ontology learning framework as described in this book is a promis-
ing step in further developing the field of ontology learning. By identifying
clearly defined subtasks, further learning methods may be developed that en-
hance the learning results for respective subtasks. The framework is part of
the development and implementation of the Karlsruhe Ontology and Semantic
Web infrastructure that provides an overall architecture for managing and ap-
plying ontologies in the context of the Semantic Web. Thus ontology learning
xxiii
is tightly integrated with other aspects of the Semantic Web, like e.g semi-
automatic generation of metadata, the alignment of ontologies or inferring new
facts from given metadata and ontologies.
Ontology learning is a rather young, yet very promising research field. The
transfer of its research results into scalable products will be an important step
towards making the Semantic Web happen.
R. Studer, University of Karlsruhe

I
FUNDAMENTALS
Chapter 1
INTRODUCTION
Semantic Web - a web of data that can be processed directly or indirectly by machines.
-(Bemers-Lee, 1999)
1. Motivation & Problem Description

The Web in its' current form is an impressive success with a growing number
of users and information sources. However, the growing complexity ofthe Web
is not reflected in the current state of Web technology. The heavy burden of
accessing, extracting, interpretating and maintaining information is left to the
human user. Tim Berners-Lee, the inventor of the WWW, coined the vision
of a Semantic Web in which background knowledge on the meaning of Web
resources is stored through the use of machine-processable (meta-)data. The
Semantic Web should bring structure to the content of Web pages, being an
extension of the current Web, in which information is given a well-defined
meaning. Thus, the Semantic Web will be able to support automated services
based on these descriptions of semantics. These descriptions are seen as a key
factor to finding a way out of the growing problems of traversing the expanding
Web space, where most Web resources can currently only be found through
syntactic matches (e.g., keyword search).
Ontologies have shown to be the right answer to these problems by providing
a formal conceptualization of a particular domain that is shared by a group of
people. Thus, in the context of the Semantic Web, ontologies describe domain
theories for the explicit representation of the semantics of the data. The Seman-
tic Web relies heavily on these formal ontologies that structure underlying data
enabling comprehensive and transportable machine understanding. Though on-
tology engineering tools have matured over the l:><;t decade, the manual building
A. Maedche, Ontology Learning for the Semantic Web
© Kluwer Academic Publishers 2002
4 ONTOLOGY LEARNING FOR THE SEMANTIC WEB
of ontologies still remains a tedious, cumbersome task which can easily result in
a knowledge acquisition bottleneck. The success of the Semantic Web strongly
depends on the proliferation of ontologies, which requires that the engineer-
ing of ontologies be completed quickly and easily. When using ontologies as
a basis for Semantic Web applications, one has to face exactly this issue and
in particular questions about development time, difficulty, confidence and the
maintenance of ontologies. Thus, what one ends up with is similar to what
knowledge engineers have dealt with over the last two decades when elabo-
rating methodologies for knowledge acquisition or workbenches for defining
knowledge bases. A method which has proven to be extremely beneficial for
the knowledge acquisition task is the integration of knowledge acquisition with
machine learning techniques (see the seminal work described in (Skuce et aI.,
1985; Reimer, 1990; Szpakowicz, 1990; Buntime and Stirling, 1991; Morik
et aI., 1993a; Nedellec and Causse, 1992; Webb, 1996».
This book is based on the idea of applying knowledge discovery techniques

to multiple data sources to support the task of developing and maintaining
ontologies. The notion of Ontology Learning aims at the integration of a mul-
titude of disciplines in order to facilitate the construction of ontologies, in
particular machine learning and techniques from multivariate statistics. Ontol-
ogy Learning greatly facilitates the construction of ontologies by the ontology
engineer. The vision of Ontology Learning that we propose here includes a
number of complementary disciplines that feed on different types of unstruc-
tured and semi-structured data in order to support a semi-automatic ontology
engineering process. Because the fully automatic acquisition of knowledge
by machines remains in the distant future, the overall process is considered
to be semi-automatic with human intervention. It relies on the "balanced co-
operative modeling" paradigm, describing a coordinated interaction between
human modeler and learning algorithm for the construction of ontologies for
the Semantic Web. This objective in mind, an approach that combines ontology
engineering with machine learning is described, feeding on the resources that
we nowadays find on the Web. 1
2. Research Questions
The extraction and maintenance of ontologies for the Semantic Web opens
a bundle of research questions. In the following the most important ones are
sketched that will be approached in the work described in this book. It has been
mentioned that the idea of using machine learning for knowledge acquisition
is not a new one. However, the question of how to combine machine learning
and knowledge acquisition is still unsolved and difficult to answer in general.
Thus, the following question has to be asked:
Introduction 5
• How to support cooperation between the ontology engineer and the machine
learning algorithm? How can the complexity of the ontology engineering
task be reduced by using machine learning support?
Thus, the question of how to integrate the ontology engineer in the process of
applying ontology learning to achieve a tight integration between human beings
and machines (or the specific algorithm in general) is investigated and possible
solutions are described in this book.
Additionally, if one wants to connect ontologies with existing data sources,

one has to define layers between the given input data (e.g. text) and the target
ontology. This holds especially true for the connection between conceptual
know ledge in the form of ontologies and natural language:
• How to relate existing ontology definitions with natural language? How to
represent this kind of ontologies in the Semantic Web?
The learning of ontologies for the Semantic Web is still lacking a generic
methodological and architectural framework. The subsequent question will be
approached in this work:
• Which core components are required for learning ontologies? How do these
components interact and what phases exist within ontology learning?
The Web is a distributed heterogeneous collaborative multimedia informa-
tion system. It bundles different formats, from free text, to mixed HTML
documents and to semi-structured documents such as dictionaries. This reveals
the following questions:
• What kind of knowledge can be acquired from which types of data? What
data and import processing is required to extract and maintain ontologies
from the available Web data?
The problem of feature engineering is well known from machine learning
research. It was first studied on structured representations of data (e.g. relational
databases). The question becomes more difficult looking at natural language
text:
• How to represent the information transported in texts, so that ontology
learning algorithms can be executed on it? What techniques and algorithms
have to be applied for an efficient support of the difficult ontology learning
task?
A large number of algorithms for extracting knowledge from data are avail-
able from the machine learning and the statistics community. However, the
following question has to be asked:
• How to apply existing algorithms for ontology learning? Where are adap-
tations required? (e.g. for the usage of background knowledge)
Evaluating ontology engineering or more general knowledge engineering is
not this well researched. It lacks generic methods and concrete evaluation and
comparison measures (compared to recalllprecision in information retrieval)
that may be applied for evaluating ontology learning:
• How can the ontology learning results be evaluated? How can two given
ontologies be formally compared?
Knowledge acquisition in the form of manual ontology engineering in general

is a task that lacks a formal evaluation. The question of
• How do humans perform in ontology engineering compared to ontology

learning techniques?
will be approached and results will be provided in the evaluation chapter of

this book.
3. Reader's Guide
Every chapter is preceded with a brief introductory paragraph which explains
how the work presented in the section fits in the overall structure of the book 2 .
The book is divided into four main parts, Fundamentals (I), Ontology Learning
for the Semantic Web (II), Implementation & Evaluation (III) and Related Work
& Outlook (IV). These four main parts are organized as follows:
Part I - Fundamentals.
• Chapter 2 introduces the origin of ontologies, a formal definition of an

ontology structure and a knowledge base structure for supporting the com-
munication between humans and machines. Additionally, it introduces dif-
ferent types of ontologies and describes typical ontology-based applications
for the Semantic Web.
• Chapter 3 provides a comprehensive framework for layered ontology en-

gineering for the Semantic Web. It introduces an incremental approach
for ontology engineering that is based on a ontology and knowledge struc-
tures defined in chapter 2. Techniques for the representation of ontologies
based on Web standards (e.g., XML, RDF(S» and the definition of a formal
semantics for these structures are also given in this chapter. Concluding
this chapter several further research issues that are relevant for web ontol-
ogy engineering are listed. Additionally, requirements for semi-automatic
ontology engineering support by ontology learning are defined.
Introduction 7
Part II - Ontology Learning for the Semantic Web.

• Chapter 4 outlines the framework and architecture for ontology learning
for the Semantic Web and begins the second part of this book. First, we
describe relevant data for ontology learning on the current WWW, such
as HTML documents, dictionaries, existing schemata and organize it in
a taxonomy of relevant data. Data specific features and requirements are
briefly described. Subsequently, the main components required for ontology
learning are introduced and embedded in an architecture. On top of this
architecture the four main phases of a cyclic process for semi-automatic
ontology engineering are introduced.
• Chapter 5 describes the techniques developed for data import & process-
ing. This includes mechanisms for importing and processing of data rele-
vant for ontology learning as defined in chapter 4. The first part includes a
new method for ontology merging FCA-MERGE and a mechanism called
"ontology wrapper". The second part focuses on (semi-structured) natural
language documents, gives an overview on linguistic techniques for syntac-
tically annotating texts and deals with the "transformation" into an algorithm
specific representation.
• Chapter 6 gives a detailed description of the algorithms that have been de-
veloped for extracting and maintaining conceptual structures based on given
legacy or application data. This chapter is separated into two main parts:
First, it describes different algorithms for extracting ontologies. The results
of the different algorithms are combined using a simple multi-strategy ap-
proach. Second, it explains algorithms for maintaining ontologies, including
means for the pruning and the refinement of ontological structures.
Part III - Implementation & Evaluation.

• Chapter 7 considers the implementation & evaluation part of this book. It
explains the functionality of the implemented tool environment for ontology
engineering and learning called TEXT- TO-ONTO. It gives a number of
screenshots and examples of the running system.
• Chapter 8 introduces the approach for evaluating ontology learning tech-
niques and the overall evaluation framework. The approach is mainly based
on a number of developed measures for comparing a set of ontologies. Based
on the evaluation approach and the measures developed, a comparison study
with human ontology engineers, with results and their interpretations is de-
scribed. Additionally, a comprehensive evaluation of the algorithms based
on the gold standard approach is provided. Finally, the ontology learning
Introduction
I Fundamentals
Ontology - Overview
'---y-------' and Definition
1/ Ontology Learning for the

Semantic Web
The Text-To-Onto Chapter 7

System III Implementation &
Evaluation
Evaluation
IV Related Work &

Outlook
Conclusion
and Outl/ook
Figure 1.1. Reading this Book
evaluation results are compared with the human modeling results indicating
that ontology learning compares quite well to human modeling, in the sense
that they complement each other.
Part IV - Related Work & Outlook.
• Chapter 9 deals with the difficult task of giving an overview of related work
on ontology learning. Until now ontology learning has not existed. How-
ever, much work in a number of disciplines, like computational linguistics,
information retrieval, machine learning, databases, software engineering has
researched and applied techniques for solving part of the overall problem of
ontology learning for the Semantic Web. This chapter gives an overview of
related work from a number of different communities.
• Chapter 10 concludes with a short summary of the methodological and

technical results and sketches ideas for further research. It explains the
Introduction 9
main contributions of the work described in this book and lists a number of
insights gained doing this research. Additionally, unsolved questions and
further research issues are defined.
Figure 1.1 gives a graphical overview on how to read the book. Readers that
are acquainted with ontology engineering for the Semantic Web may skip chap-
ter 3 and directly go to chapter 4 that introduces the overall ontology learning
framework. Readers only interested in the implemented tool environment for
ontology learning may skip the technical chapters 5 and 6 on data import & pro-
cessing and ontology learning algorithms. Chapter 8 contains a detailed study
on ontology evaluation and comparison. Readers interested in related work and
getting an overview on ontology learning should read chapter 9. Chapter 10
concludes and lists issues that have been left open in this book.
Notes
1 The reader may note that "Ontology Learning" has become a hot research
topic in the last two years. The relevancy for research on ontology learn-
ing is obvious. Two successful ontology learning workshops (Staab et aI.,
2000c; Maedche et aI., 200lb) have been organized in 2000 and 2001. Ap-
proximately 10 % of accepted papers at classical knowledge acquisition
conferences (such as EKAW and K-CAP) deal with ontology learning.
2 Relevant publications of the author with respect to the specific chapter are
cited in the introductory paragraph.
Chapter 2
ONTOLOGY - DEFINITION & OVERVIEW
A sign, a representamen, is something which stands to somebody for something in some

respect or capacity.
-(Peirce, 1885)
"Ontology" is a philosophical discipline, a branch of philosophy that deals

with the nature and the organization of being. The term "Ontology" has been
introduced by Aristotle in Metaphysics, IV, 1. In the context of research on
"Ontology", philosophers try to answer questions "what being is?" and "what
are the features common to all beings?".
According to (Guarino, 1998) we consider the distinction between "On-
tology" (with the capital "0") , as in the statement "Ontology is afascinating
discipline" and "ontology" (with the lowercase "0"), as in the expression "Aris-
totle's ontology". The former reading of the term ontology refers to a particu-
lar philosophical discipline the latter term has different senses assumed by the
philosophical community and the computer science community. In the philo-
sophical sense we may refer to an ontology as a particular system of categories
accounting for a certain vision of the world. Such a system does not depend on
a particular language in the philosophical point of view (see (Guarino, 1998)).
In recent years ontologies have become a topic of interest in computer science
(e.g. see the introducing articles and definitions in (Gruber, 1993b; Noy and
Hafner, 1997; Maedche et aI., 2001d)). In its most prevalent use in computer
science, an ontology refers to an engineering artifact, constituted by a specific
vocabulary used to describe a certain reality, plus a set of explicit assumptions
regarding the intended meaning ofthe vocabulary. Usually a form of first-order
logic theory is used to represent these assumptions, vocabulary appear as unary
and binary predicates, called concepts and relations, respectively.
This chapter starts with a motivation why computer science requires the
concept of ontology. Subsequently, the basic origin of ontology in philosophy
and and its relation to the meaning triangle known from semiotics is intro-
duced. Based on this foundations it will be shown what an agreement by an
ontology implies for human-human communication, human-machine commu-
nication and machine-machine communication. The underlying idea of the
meaning triangle will be adopted and related to the current view on ontologies
and their application in computer science leading to a layered ontology struc-
ture. A formal introduction of the layered ontology structure describing its core
elements and their interaction will be given in Definition 2.1. In addition to the
layered ontology structure a knowledge base structure that may be instantiated
on top of the ontology will be defined. These definitions will be used throughout
the book.
Subsequently, we will shortly sketch different classification schemes for on-
tologies. Thus, one has to stress, that for ontologies being cost-effectively
deployed, one requires a clear understanding of the various ways ontologies are
being used and developed (Jasper and Uschold, 1999). Finally before we con-
clude this chapter, several application areas and applications that heavily build
on the conceptual background knowledge defined through domain-specific on-
tologies will be introduced.
The need for ontologies in computer science. In general humans use their
language to communicate and to create models of the world. Naturallanguages
are not suitable for building models in computer science, because they are too
ambigouous. Therefore, so calledformallanguages are used to specify models
of the world. One may consider mathematics as such a formal language. Frege
(1848-1925) researched the formal foundations of mathematics as a formal
language. In his work he tried to separate between language and the propositions
that are applied for human reasoning. In his Begriffsschrift (Frege, 1922) he
described one of the best known formal language, namely first order logic
(FOL). Citing Wittgenstein, it can be said that
In practice, language is always more or less vague, so what we assert is never quite precise.
Thus, logic has two problems to deal with in regard to symbolism: (I) the conditions
for sense rather than nonsense in combination of symbols and (2) the conditions for
uniqueness of meaning or reference in symbols or combinations of symbols." A logically
perfect language has rules of syntax which prevent nonsense, and has single symbols
which always have a definite and unique meaning.
-(Wittgenstein, 1922)
Looking at the citation above one can see that the first point is well researched
in computer science: Formal languages are formulated as term substitution
systems based on the work of Thue and Chomsky (Chomsky, 1965). A given
Ontology - Definition & Overview 13
finite set of signs (alphabet) and a finite set of production rules produces an
infite set of expressions or sentences that define the language. The second
point described in Wittgenstein's citation is not as well researched. Producing
a syntactic correct language does not mean that one has captured the meaning
and sense of the sentences of a given language.
In the following the claim that ontologies are means to bridge the "semantic
gap" existing between the actual syntactic representation of information and
its conceptualization is made. Sharing or reusing knowledge across systems
becomes difficult, as different systems use different terms for describing infor-
mation. What an ontology is from its origin point of view will be introduced.
Based on this introduction the original view with its current use in computer
science will be combined. This combination will lead to a formal definition of
an ontology that will be the basis for the overall book.
Ontology - Its origin. It has been already introduced that research in "On-
tology" has its origin in philosophy. It is a philosopical discipline, a branch
of philosophy that deals with the nature and the organisation of being. How
thoughts, words and things relate to one another has been a recurrent topic in
philosophy and language as early as Plato to the modem era (Campbell et aI.,
1998). Plato dealt with the question of the proper naming of things. In his opin-
ion, the use of names in an "optimal world" would be to ensure that a particular
expression will make everybody think of one and only one thing. However,
he was doubtful that perfect names could ever be given, because things are
continually changing.
Aristotles' work went beyond the question of names and was interested in
definitions. His notion of definition was not simply the meaning of a word. A
definition was meant to clearly explain what a thing is by being a statement of
the "essence" of the entity. Therefore, he believed that to say what something
is always requires to say why something is. An Aristotelian definition is given
by specifying the genus and differentia of individuals, and then using logical
arguments to categorize those individuals based upon their definitions. By
identifying common definitional properties of similar individuals, the definition
explains why they are members of the same kind.
Aristotle's foundation was not sufficient: He ignored the unavoidable lim-
itations of communicating meaning via language and the ambiguities created
by implicit exchange of different "senses" of meaning. Understanding such
ambiguities was a topic considered by Gottlob Frege (1848-1925). Frege intro-
duced a distinction of two types of meaning: the concept and the referent. The
graphical interpretation of this distinction is commonly referred to as the mean-
ing triangle (cf. Figure 2.1) and has been introduced by (Ogden and Richards,
1923). The meaning triangle defines the interaction between symbols or words,
thoughts and things of the real world.
Figure 2.1. The Meaning Triangle
The diagram illustrates that although symbols cannot completely capture

the essence of a reference (or concept) or of a referent (or thing), there is a
correspondence among them. The relationship between a word and a thing is
indirect. The link can only be completed when an interpreter processes the
word, which invokes a corresponding concept and then links that concept to a
thing in the world.
The reader may note that the one-to-one relationship between each pair of
members in the triangle hides complexity. We live in a world where referential
complexities lead to difficulties in communication: multiple terms may refer to
the same thing, a single term may refer ambiguously to more than one thing.
A classic example has been given by Frege using the names "Morning Star"
and "Evening Star". Both expressions refer indirectly to the same physical
object, the planet Venus. However, there was a time when people were not
aware of the correspondence between the physical objects implied by the two
terms. In a sense one can say that "Morning Star", "Evening Star" and "Venus"
are equivalent (they have the same meaning in the sense that all refer to the same
planet). One can also say that "Morning Star", "Evening Star" and "Venus" are
not equivalent (in the sense there is more information connoted by these names
than simply the physical objects to which they refer, such as when the entity can
be observed or the past experiences of the observer). Imagine someone who
has seen the sunrise with the "Morning Star" and another individual who has
never seen the sunrise but knows that the "Morning Star" refers to the second
planet. Concluding this example all these thoughts are linked to the planet
venus. However, the concepts invoked by words depend upon our individual
background. For a more detailed explanation of this example the interested
reader is referred to (Campbell et aI., 1998).
1. Ontologies for Communication - A Layered Approach

The definition of the ontology structure is based on the meaning triangle
(Ogden and Richards, 1923) that defines the interaction between symbols (or
words), thoughts and things of the world. As already mentioned the meaning
triangle illustrates the fact, that although words cannot completely capture the
essence of a reference (= thought or concept) or of a referent (= thing), there is
a correspondence among them. The relationship between a word and a thing
is indirect. The link can only be completed when an interpreter processes the
word, which invokes a corresponding concept and then links that concept to a
thing in the world.
An ontology is a logical theory constituted by a vocabulary and a logical lan-
guage. In a domain of interest it formalizes signs describing things in the world,
allowing a mapping from signs to things as exact as possible. A knowledge
base may be defined on top of the ontology describing particular circumstances.
Further details on the separation between ontologies and knowledge bases are
given in in the following.
Symbols I
Syntactic structures
Thoughl$1
Semantic structures
: a speCifIC
; demaIn, e,9, Things ;n the
: animais real world
Figure 2.2. Ontologies for Communication
Figure 2.2 depicts the overall setting for communication between human and
machine agents. Three layers are distinguished:
• First, one can consider things that exist in the real world, including in this
example human and machine agents, cars, and animals.
• Secondly, one can consider symbols and syntactic structures that are ex-
changed.
• Thirdly, models with their specific associated thoughts and semantic struc-
tures are analyzed.
First consider the left side of Figure 2.2 without assuming a commitment to
a given ontology. Two human agents HAl and HA2 exchange a specific sign
(e.g. a word like "Jaguar"). Given their own internal model each of them will
associate the sign to their own concept (or thought) referring to possibly two
completely different existing things in the world, e.g. the animal vs. the car.
The same holds for machine agents: They may exchange statements based on a
common syntax, however, they may have different formal models with differing
interpretations.
Consider the scenario that both human agents commit to a specific ontology
that deals with a specific domain (e.g. animals). The chance they both refer
to the same thing in the world increases considerably. The same is true for
the machine agents MAl and MA2: They have actual knowledge and they
use the ontology to have a common semantic basis. When agent MAl uses
the term "Jaguar", the other agent MA2 may use the ontology just mentioned
as background knowledge and rule out incorrect references, e.g. ones that let
"Jaguar" stand for the car. Human and machine agents use their concepts and
their inference processes, respectively, in order to narrow down the choice of
referents (e.g., because animals do not have wheels, but cars have).
Subsequently, our notion of ontology is defined. However, in contrast to
most other research about ontology languages it is not the purpose to invent a
new logical language or to redescribe an old one. Rather it is a way of modeling
and structuring the elements contained in an ontology that inherently considers
the special role of signs (mostly strings in current ontology-based systems)
and references. Especially for this work signs playa major role for ontology
learning for the Semantic Web, because existing, available web data as starting
point for ontology learning is considered as written signs.
An important aspect is that there exists the conflict that ontologies are for
human and machine agents, but logical theories are mostly for mathematicians
and inference engines. Formal semantics for ontologies is a sine qua non.
However, in addition to the benefits of a logical rigor, user and developer of an
ontology-based system profit from ontology structures that explain the possible
misunderstandings and compromise a direct mapping to natural language.
For instance, one might specify the sign "Jaguar" refers to the union of the
set of all animals that are jaguars and the set of all cars that are jaguars. Alterna-
tively, one may describe that "Jaguar" is a sign that may either refer to a concept
ANIMAL-JAGUAR or to a concept CAR-J AGUAR. The second way is preferred.
In conjunction with appropriate GUI modules one may avoid presentations
of 'funny symbols' to the user like "animal-jaguar", while avoiding 'funny
inference' that may arise from artificial concepts like (ANIMAL-JAGUAR U
CAR-JAGUAR).
The Ontology Structure O. Within the scope of the research described in this
book a definition of an ontology structure, describing its core "elements", their
interaction and a mapping to formal semantics of these elements is required.
Starting the research several existing definitions such as the widely accepted
OKBC model 1(see (Chaudhri et aI., 1998)) or the ISO standard (see (ISO 704,
1987)) of principles and methods of terminology 2 were analyzed.
The existing definitions did not fulfill the requirements of a comprehensive
ontology structure for the work described here. Based on the human point
of view that meaning is established through communication, e.g. through ex-
changing words or more general signs, existing definitions have been extended3 .
The reader may note that in the approach of ontology learning it is dealt with in
this work one considers the "discovery of semantics" that is implicitly contained
in existing data that has been generated by humans through exchanging signs.
Therefore, the focus on the interaction between natural language and formal se-
mantics is formed. The ontology structure 0 introduced here extends existing
ontology definitions such as OKBC and ISO-704 with an "explicit sign level"
and introduces a new model of ontologies. Looking into existing literature the
research area of semiotics, as the study of signs and the ways in which sign
systems convey (and are used to convey) meaning (Eco, 1981; Euzenat, 2000),
has to be mentioned. Semiotics is a good starting point of the required ontology
structure. In semiotics (also called the theory of signs) one distinguishes three
interlinked parts according to Peirce (see (Peirce, 1885)):
• Syntax deals with the study of relationships between signs.

• Semantics analyzes the relationship between signs and the things in the real
world they denote.
• Pragmatics goes beyond syntax and semantics and researches how signs
are used for particular purposes. Thus, it analyzes the relationships between
signs and specific agents.
The different levels are not separated. A linkage between the different levels
is introduced by particular connection relations (cf. the relationship with the
meaning triangle). For instance, syntax and semantics are connected through
a reference relation that links a sign with a set of statements. We consider
ontologies as models that are used to communicate meaning between machines
and human beings. Based on the semiotic view of ontologies described above
an ontology structure as given in the following Definition 2.1 is defined.
DEFINITION 2.1 An ontology structure is a 5-tuple

0:= {C, 'R, HC,rel,AO}, consisting of
• two disjoint sets C and'R whose elements are called concepts and relations,
respectively.
• a concept hierarchy H C: H C is a directed relation H C ~ C x C which

is called concept hierarchy or taxonomy. H C(C1 , C 2 ) means that C 1 is a
subconcept of C2.
• a function rel : 'R -+ C x C, that relates concepts non-taxonomicalll.

Thefunction dom: 'R -+ C with dom(R) := II 1 (rel(R)) gives the domain
of R, and range: 'R -+ C with range ( R) := II2 (rel (R)) give its range. For
rel(R) = (C1 ,C2 ) onemayalsowriteR(C1 ,C2 ).
• A set of ontology axioms A 0, expressed in an appropriate logical language,

e.g. first order logic.
The model introduced above constitutes a core structure that will be conse-
quently used in this book. It is quite straightforward, well-agreed and may be
easily mapped onto existing ontology representation languages.
What is missing so far is an explicit representation of a lexical level that is
typically restricted to ontologies for natural language applications 5 - in spite
of its general usefulness. Therefore, a lexicon for the ontology structure 0 is
defined as follows:
DEFINITION 2.2 A lexicon for the ontology structure

0:= {C, 'R, H C, rel, AO} is a 4-tuple £ := {£C, £n, .1', g} consisting of
• two sets £c and £n, whose elements are called lexical entries for concepts
and relations, respectively.
• two relations .1' ~ £c x C and g ~ £n x 'R called references for concepts

and relations, respectively. Based on .1', letfor L E £c,
.1'(L) = {C E CI(L, C) E .1'}
and for
.1'- 1 (C) = {L E £cl(L, C) E .1'}
g and g-1 are defined analogously.
In general, one lexical entry may refer to several concepts or relations and one
concept or relation may be referred to by several lexical entries. An ontology
structure with lexicon is a pair (0, £), where 0 is an ontology structure and £
is an lexicon.
An Example. Let us consider a short example of an instantiated ontology

structure. Assume C := {XI,X2,X3} and R := {X4}' The following hier-
archical relation, 1{C(x2,xd, and, the non-taxonomic relation, X4(X2,X3), is
defined. The lexicon is given as eC = {"Person", "Employee", "Organization"}
and en = {"works at organization"}. The function F and g map the lexical
entries to the concepts and relations of the ontology. F is applied as fol-
lows: F("Person") = Xl, F("Employee") = X2, F("Organization") = X3 and
g("works at organization") = X4. Figure 2.3 depicts this small example graphi-
cally.
///----------", ---- ~
" Organization -.>,------- ~
,I,' '\\ '"
/ works at _______ ~\_---------------. X4(X2.X3)
{ organization 1 _-------------------
~ : ~---
\
"\
\
\
Person -
------~
: .
,,
\" EmpIOYee---,7'--------------------------___ ~
"' ......... _----_ ... -,"

Figure 2.3. Example: Instantiated Ontology Structure
Up till now the semantics of the primitives introduced in the ontology struc-
ture 0 have not been elaborated. In the next chapter it will be shown how
this ontology structure may be realized in different, concrete representation
languages (e.g. the well-understood logical framework F-Logic (Kifer et al.,
1995) or the W3C standard resource description framework RDF(S)). The se-
mantic pattern approach as described in (Staab and Maedche, 2000; Staab et al.,
2001b; Staab et al., 2001a) will be used for mapping the ontology structure into
different representation languages. Semantic patterns are used for communica-
tion between Semantic Web developers on the one hand, but also for mapping
and reuse to different target languages on the other hand, thus bridging between
different representations and different ways of modeling knowledge. Develop-
ing the approach of semantic patterns, the wheel from scratch is not invented
from scratch, instead insights from software engineering and knowledge rep-
resentation research are picked and integrated for use in the Semantic Web.
Further details on semantic patterns are provided in the subsequent chapter 3.
The Knowledge Base Structure JCB. It was already introduced that a knowl-
edge base6 may be defined and instantiated using an ontology structure. The
general distinction between ontology and knowledge base is that one tries to
capture the conceptual structures of a domain of interest in the ontology, while
the knowledge base aims to specify a given concrete state. Thus, the ontology
is (mostly) constituted by intensional logical definitions, while the knowledge
base comprises (mostly) the extensional parts. The ontology is mostly devel-
oped during the set up (and maintenance) of an ontology-based system, while
the facts in the knowledge base may be constantly changing. These distinctions
("general" vs. "specific", "intensional" vs. "extensional", "set up" vs. "con-
tinuous change") indicate that for purposes of development, maintenance and
good design of an information system it is reasonable to distinguish between
ontology and knowledge base. Based on this idea the knowledge base structure
feB may be defined as follows:
DEFINITION 2.3 A knowledge base structure is a 4-tuple

feB := {O, I, inst, instr}, that consists of
• An ontology 0 := (C, n, He, rei, AO)
• A set I whose elements are called instances.
• A function inst : C -+ 2I called concept instantiation.

For inst( C) = lone may also write C(1).
• Afunction instr : n -+ 2IxI called relation instantiation.

For inst(R) = {h, 12 } one may also write R(h, h).
Again, one may also define a lexicon for a given knowledge base structure
feB.
DEFINITION 2.4 A lexicon for the knowledge base structure

feB := {O, I, inst, instr} is a tupel C,KB := (C I ,:1) consisting of
• a set CI whose elements are called lexical entries for instances, respectively.
• a relation :1 ~ CI X I reference for instances, respectively. Based on :1,

let for L E cI ,
:1(L) = {I E II(L, 1) E :1}
andfor
In general, one lexical entry may refer to several instances and one instance
may be referred to by several lexical entries. A knowledge base structure with
lexicon is a pair (feB, C ICB ), where feB is a knowledge base structure and CICB
is an lexicon
The concrete realization and representation of a knowledge base in a specific

Semantic Web-conform language will be provided in the next chapter. Although
ontology and knowledge base are defined separately, the reader should note that
none of these distinctions draws a distinct borderline. Rather, it is typical that
in a few cases dependending on the domain, the view of the modeler, and the
experience of the modeler, whether one decides to put particular entities and
relations between them into the ontology or into the knowledge base. An overall
picture of the interaction between ontology and the knowledge base is given in
the next chapter in Figure 3.1.
An Example. The following is a short example of an instantiated knowledge

base structure. Assume the given ontology from the example above. The in-
stance set is given as follows: I := {i 1, i2}. Concept instantiations are defined
by X2( i 1 ), X3 (i2), and, the relation instantiation are defined by X4( iI, i2). The
lexicon is given as £KB = {"person:ama", "organization:unika"} and the mapping
from lexical entries (or signs) to the instances using .J("person:ama") = i 1 and
.J("organization:unika") = i2 is defined.
In this section the ontology and knowledge base structure have been intro-
duced. Both will be used within the overall book, providing a background for
the ontology engineering approach and layered representation that is introduced
in chapter 3. The ontology learning framework (see chapter 4) with its associ-
ated data & import processing techniques and algorithms also relies on these
definitions.
2. Development & Application of Ontologies

In this section a short overview on aspects for developing and applying
ontologies is given. First, different classifications of ontologies that separate
ontologies into different classes are introduced. Classifications are important:
Specific types of ontologies may be used in different types of applications. In
general one has to stress, that for ontologies to be cost-effectively deployed,
one requires a clear understanding of the various ways ontologies are used (cf.
(Jasper and Uschold, 1999)).
Classification of Ontologies. The ontology and knowledge base definitions

introduced above provide structures for instantiating ontologies and knowledge
bases. These generic structures provide enough space for developing and in-
stantiating different types of concrete ontologies. As already mentioned the
possibility of using a lexical level opens a wide range of applications of ontolo-
gies in the area of natural language processing. Different classification systems
for ontologies have been developed (van Heijst, 1995; Guarino, 1998; Jasper
and Uschold, 1999). A classification system that uses the subject of conceptu-
alization as a main criterion has been introduced by Guarino (Guarino, 1998).

He suggests to develop different kinds of ontologies according to their level of
generality as shown in Figure 2.4.
Figure 2.4. Different Kinds of Ontologies and Their Relationship
The following different kinds of ontologies may be distinguished as follows:
• Top-Level ontologies describe very general concepts like space, time, event,
which are independent of a particular problem or domain. It seems rea-
sonable to have unified top-level ontologies for large communities of users.
Recently, these kinds of ontologies have been also introduced under the
name "foundational ontologies".
• Domain ontologies describe the vocabulary related to a generic domain by

specializing the concepts introduced in the top-level ontology.
• Task ontologies describe the vocabulary related to a generic task or activity

by specializing the top-level ontologies.
• Application ontologies are the most specific ontologies. Concepts in ap-

plication ontologies often correspond to roles played by domain entities
while performing a certain activity.
Another classification system introduced by (van Heijst, 1995; van Heijst

et aI., 1997) uses the criterion about the amount and type of structures of the
conceptualization and distinguishes along these dimensions: terminological
ontologies, information ontologies and knowledge modeling ontologies.
An ontology learning-focused analysis of different types of ontologies and as-
sociated ontology learning techniques have been given by (Omelayenko, 2001).
In his work he distinguishes natural language ontologies (NLO) and domain on-
tologies. Additionally, the learning of ontology instances as a third dimension
is considered.
It is important to emphasize that in this work the focus is set on application-

and domain-ontologies that may be used in different kinds of ontology-based
applications as given in the following examples.
Examples of Ontology-based Applications. In the following some applica-

tion areas that heavily rely on the conceptual background knowledge defined
through domain-specific ontologies are sketched. The term ontology was intro-
duced as a means for establishing communication between agents and machines.
The aspect of correct interpretation of signs or more specific words is important
in many areas of computer science, e.g. in
• Business process modeling / Task support: (Staab and Schnurr, 2000)

• Digital libraries: (Adam and Yesha, 1996; Chen, 1999; Papazoglou et aI.,
1995)
• Multi-Database Systems: (Papazoglou et aI., 2000)
• Information Retrieval using WordNet: (Fellbaum, 1998)
• Information integration: (Wiederhold, 1992)
• Intelligent agents: (Nwana, 1995)

• Machine Learning / Data & Text Mining: (Hotho et aI., 2001a; Feldman
and Dagan, 1995; Craven et aI., 1999)
• Human-Computer Interfaces (HCI): (Kesseler, 1995; Lamping et aI., 1995)

The following are more elaborated examples of four main application areas
of ontologies, namely
Semantic Web. The development of the World Wide Web is about to mature
from a technical platform that allows for the transportation of information from
Web sources to humans (albeit in many syntactic formats) to the communication
of knowledge from Web sources to machines (Berners-Lee et aI., 2001). The
Semantic Web should be able to support automated services based on formal
descriptions of semantics. The semantic is seen as a key factor in finding a
way out of the growing problems oftraversing the expanding web space, where
currently most web resources can only be found by syntactic matches (e.g.,
keyword search).
The Semantic Web relies heavily on formal ontologies that structure under-
lying data for the purpose of comprehensive and transportable machine under-
standing. They properly define the meaning of data and metadata (cf. e.g.
(Staab et aI., 2000a; Decker et aI., 2000b)). In general one may consider the
Semantic Web more as a vision than a concrete application.
~<ab pag~ Sie1!:fd~d
II
H,snd%~'buh
URl
Figure 2.5. Relational Metadata on the Semantic Web
Figure 2.5 illustrates the use of the terms "ontology" and "relational meta-
data". It depicts some part of the SWRC 7 (semantic web research community)
ontology. Furthermore it shows two homepages, viz. pages about Siegfried and
Steffen (http://www.aifb.uni-karlsruhe.deIWBS/sha) and (http://www.aifb.uni-
karlsruhe.deIWBS/sst), respectively) with annotations given in an XML se-
rialization of RDF facts. For the two persons there are instances denoted by
corresponding URIs (person_sha and person_sst). The SWRC: NAME of person_sha
is "Siegfried Handschuh". In addition, there is a instantiated relationship
COOPERATEWITH between the two persons: thus, they cooperate.
There are several examples of prototypical Semantic Web applications, e.g.

described in (Staab and Maedche, 2001; Maedche et aI., 200lc) introducing
knowledge portals for different application scenarios. Up-to-date information
on the status of concrete Semantic Web applications is available via the Semantic
Web Community Portal SemanticWeb.Org8 .
Natural Language Understanding. The task of comprehensively under-

standing natural language requires an integration of many knowledge sources.
Domain know ledge in the form of ontologies is essential for deep understanding
of texts. This includes the verification of conceptual constraints, the selection
of plausible readings and the assurance of consistent coreference. An approach
that uses domain knowledge in the form of ontologies for extracting knowl-
edge bases from text is pursued in (Hahn and Romacker, 2000). In (Dahlgren.
1995) the application of ontologies for machine translation is described. The
important role of ontologies for information extraction from free text has been
identified early e.g. by (Hobbs, 1993) and is now applied on real-world data
(Staab et aI., 1999).
Knowledge Management. Knowledge Management deals with acquiring,

maintaining and accessing knowledge of an organization. The technologies
of the Semantic Web built the foundation to move from a document oriented
view of knowledge management to a knowledge pieces oriented view where
knowledge pieces are interconnected in a flexible way.9 Intelligent push ser-
vices, the integration of knowledge management and business processes as well
as concepts and methods for supporting the vision of ubiquitous know ledge are
urgently needed. Ontologies are the key means to achieving this functionality.
They are used to annotate unstructured information with semantic information,
to integrate information and to generate user specific views that make knowl-
edge access easier. Applications of ontologies in knowledge management are
described in (Sure et aI., 2000; Angele et aI., 2000; Staab et aI., 2001c; Abecker
etaI.,1998)
E-Business. The automatization of transactions in the area of business-to-

business commerce requires formal descriptions of products beyond syntactic
exchange formats. A common understanding of the terms and their interpre-
tation has to be captured in the form of ontologies allowing interoperabilitylO
and means for intelligent information integration. The need for standardized
vocabularies has become obvious recently, e.g. considering the large amount of
document type definitions (DTDs) being defined for specific domains (e.g. for
the human resource domain l l or the paper industry12). However, these vocab-
ularies do not go beyond syntactical descriptions, but provide a good starting
point for the definition of an ontology. A comprehensive overview on applying
ontologies E-Commerce and its relationships to existing standards is given in
(Fensel,2001).
3. Conclusion
In this chapter an introduction and persuasion why computer science requires
the "concept" of ontology has been provided. Subsequently, the roots of ontol-
ogy in philosophy and its core paradigm based on the meaning triangle have
been introduced. It has been shown how ontologies may support communi-
cation between humans and machines. Therefore, the idea that underlies the
meaning triangle has been combined with a "semiotics view" on ontologies
leading to a ontology and knowledge base structure. A formal introduction of
these structures describing their core elements and their interaction relevant for
this book have been given. As mentioned earlier the layered structures will
accompany the reader through the whole book. The next chapter introduces
a layered framework for engineering concrete instantiations of the structures

given in this chapter.
Notes
1 The OKBC model provides formal semantics for a large set of ontology
primitives enabling machine-processable semantics of information.
2 An interesting aspect is that ISO-704 is divided into three major sections:
concepts, definitions and terms. Concepts are seen as units of thought that
conforms to our view. The idea of a definition is that it fixes the concept in the
proper position of the system. Terms represent natural language representa-
tions of concepts. ISO-704 recommends that one concept should ideally be
represented by one natural language term.
3 Recently this form of meaning establishing communication has been coined
by the term "emergent semantics"
4 In this generic definition one does not distinguished between relations and
attributes.
5 The distinction of lexical entry and concept is similar to the distinction of
word form and synset used in WordNet (Fellbaum, 1998). WordNet has been
conceived as a mixed linguistic / psychological model about how people
associate words with their meaning.
6 In the Semantic Web the knowledge base is given by the relational metadata
that is defined for the different information sources.
7 http://ontobroker.semanticweb.org/ontos/swrc.html
8 http://www.semanticweb.org
9 The EU funded projects OnToKnowledge, http://www.ontoknowledge.org,
and, OntoLogging, http://www.ontologging.com. pursue exactly this goal.
10 http://www.ontology.org
11 http://www.hr-xml.org/channels/home.htm
12 http://www. papinet.org/
II
ONTOLOGY LEARNING FOR THE SEMANTIC

WEB
Chapter 3
LAYERED ONTOLOGY ENGINEERING
Da eben, wo Begriffefehlen, Da stellt ein Wort zur rechten Zeit sich ein.
-(Goethe, Faust I)
In recent years the development of ontologies has been moving from the
realm of Artificial Intelligence laboratories to the desktop of domain experts and
knowledge officers. Ontologies have become common to the World Wide Web,
ranging from very light weight topic hierarchies, such as large categorizations of
web sites (see Yahoo! 1 ) and product hierarchies (like the UNSPSC standard2 )
for use in B2C and B2B applications (see (Fensel, 2001» to heavy weight
ontologies, as described in (Staab and Maedche, 2001) for setting up knowledge
portals or in developing natural language understanding systems as described
by (Hahn and Romacker, 2000).
The development of the Semantic Web as a Meta-Web based on the World
Wide Web will require a large number of domain-specific ontologies. New
methods supporting this ontology engineering task are required, mainly because
of two facts: First, not only trained ontology engineers will develop ontologies,
also domain experts will model and maintain ontologies. Second, the fact that
the ontologies will be developed for the Semantic Web used by humans and ma-
chines in different applications, requires the establishing of new engineering
paradigms. A couple of methodologies (Usc hold and Gruninger, 1996; Guarino
and Welty, 2000; Gomez-Perez, 1996), representation languages (MacGregor,
1991; Scholze and Woods, 1992; Kifer et aI., 1995; Decker et aI., 2000a) and
ontology modeling tools (Swartout et aI., 1996; Fikes et aI., 1997; Grosso et aI.,
1999) have been developed that allow for the development, representation and
engineering of ontologies. In fact, these languages and tools have matured con-
siderably over the last few years. Nevertheless, the approaches cited above have
not been developed with the application "Semantic Web" in mind. Thus, the
work described in this chapter builds on and extends existing work supporting
ontology representation and engineering for the Semantic Web. In this chapter
an ontology engineering framework for the Semantic Web 3 is presented. It is
based on the layered ontology structure 0 and the knowledge base structure
KB introduced in the Definitions 2.1 and 2.3. The framework supports the de-
velopment of ontologies for a wide range of Semantic Web applications, three
examples for ontologies that have been developed on the basis of this frame-
work and the applications in which they have been used are also given in this
chapter. The ontology engineering framework described in this chapter forms
the basis for the concrete implementation of the ontology engineering system
ONTOEDIT that is described in further detail in chapter 7.
This chapter starts with section 1, where the overall ontology engineering
framework supporting the incremental and cyclic development of ontologies
is explained. The interaction between the ontology and its lexicon and the
ontology and the knowledge base is introduced. As mentioned above, it will be
shown how different types of ontologies may be developed using this framework
by giving three real-world modeling examples and application scenarios.
The basis for the underlying representation within the ontology engineer-
ing framework is the Resource Description Framework (RDF)4, a data model
developed by the WWW consortium W3C 5 as a foundation for the Semantic
Web. Different layers within RDF(S)6 will be distinguished and it will be shown
how specific representation languages, domain-specific ontologies and associ-
ated knowledge bases may be represented using the different RDF(S) layers.
Additionally, it will be discussed how mappings of the specific representation
language primitives (or representation vocabularies) into a logical layer to pro-
vide formal semantics are established. Finally, the semantic pattern approach
is introduced and it is shown how the ontology and knowledge base structure
may be mapped into a logical language and processed by an inference engine.
Concluding this chapter open questions regarding the ontology engineering
task will be discussed, including aspects on cooperative and multi-user engi-
neering, ontology federation and evolving ontologies and knowledge bases.
Finally, several requirements for a framework for ontology learning from an
engineering point of view will be summarized and defined. Additionally, the
concluding part of this chapter will prepare the subsequent chapter 4, where the
general framework for ontology learning for the Semantic Web is introduced.
1. Ontology Engineering Framework

As mentioned earlier the ontology engineering framework is mainly based
on the fundamental ontology structure 0 and the knowledge base structure
KB introduced in the Definitions 2.1 and 2.3. In the following the approach
for incrementally developing ontologies within an ontology engineering cycle
Layered Ontology Engineering 31
is shortly introduced. The reader may note that work on a comprehensive

methodology for setting-up ontology-based systems is going on, including the
engineering aspects introduced above (see (Staab et al., 2000a; Staab et al.,
2001c; Maedche et al., 2001c) for a detailed overview on this methodology).
In general, using the introduced ontology and knowledge base structures one
may approach the following problems encountered within concrete ontology
engineering projects:
• First, formal semantics for ontologies is an important building block in

ontology engineering. However, to communicate ontologies between ma-
chines and human beings one has to consider the special role of lexical
references as introduced in our definitions.
• Second, though the ontology (see Definition 2.1) and the knowledge base
structure (see Definition 2.3) were defined separately, in reality there exists
no strict separation between the ontology and the knowledge base.
In the following the elements contained in the ontology and knowledge base
structure are embedded into a comprehensive picture.
AO
-- ~~~-
--------- --- -
---
1{C(C],C2 ) R(C1 , C 2 )
-........
R(Ib 12 )
C=g-- - - - - - - - -
C(I)
, .......
F
_f
~I:,C I:,R .J
KB
I:,
KB
Figure 3.1. Layered Ontology Engineering
Figure 3.1 depicts the overall layered and cyclic ontology engineering frame-
work. On the left side the ontology primitives as defined in Definition 2.1 and
on the right side the knowledge base primitives according to Definition 2.3 are
depicted.
In the bottom layer, one finds lexical entries representing signs for concepts
U:,c), relations (C n ) and instances (C/(6). The second layer comprises on the
left side the set of concepts C referenced by CC , the set of relations R referenced
by Cn, the concept taxonomy defined by statements such as 1{C(C1 , C2 ) and
non-taxonomic relations defined by statements such as R(C1 , C2 ). On the
right side the middle layer comprises the set of instances I referenced by C/(6,
the set of concept (by statements such as C (1) and relation instantiations (by
statements such as R(h, 12 ). The set of ontological axioms AO is defined on
top of the two existing layers. The reader may recognize the connection between
the ontology and the knowledge base: Instances are defined as members of
concepts via concept instantiation, relation instantiations refer to the set of
relations R defined in the ontology.
The overall incremental model consists of two main interactions: The first,
vertical interaction represents the dependency between the layers, the second,
horizontal interaction represents the dependency or overlap between ontology
and knowledge base. An example dependency for the first point may be th
requirement of the definition of a concept while defining a domain-specific
axiom. An example for the second point may be the definition of "default"
instances, such as specific instances of the concept CITY in the tourism domain.
Typically, the ontology engineer starts with a collection of lexical entries.
These lexical entries may be given by interviews or protocols as typical in knowl-
edge acquisition. Based on this collection of lexical entries the formalization
process starts: On the one hand, she defines concepts, conceptual relations and
a set of ontology axioms. On the other hand, the ontology engineer typically
develops the ontology with a specific application domain in mind. Therefore,
knowledge base entries may already be defined during ontology engineering.
These entries may be defined not only for debugging the ontology, but also to
constraint possible entries within the application (e.g. in a human resource sce-
nario the knowledge base may contain instance objects describing employees).
An implementation of the ontology engineering framework introduced in this
section has been realized by the ontology engineering environment ONTO EDIT,
which is described in the "system" chapter 7. Currently ongoing work on
a comprehensive methodology extends ONToEDIT with a component called
ONTO KICK, that supports the kick-off phase in ontology engineering (see
(Boyens, 2001».
Example Ontology Engineering Scenarios. The layered approach offers

ways for modeling different types of ontologies. In the following three differ-
ent ontology modeling scenarios that illustrate our approach are sketched:
As first scenario the SEAL-II framework (Hotho et al., 2001b) for devel-
oping knowledge portals and its instantiation by the HR-TopicBroker (Kuehn,
2001) application is considered. The application builds a company- and domain-
specific portal for human resource topics. Thus, it supports the allocation of
relevant topics (in the form of web pages) by using a focused web crawler,
it presents the crawled documents along the concept hierarchy and allows for
the joint definition of a simple knowledge base according to the given "topic
ontology" (e.g. it allows to define the contact information of people that are
experts for specific topics).
U sing the layered approach, lexical entries representing human resource
topics (e.g. "working hours model", "E-Learning") are first collected. The set
of concepts is derived by generalizing and formalizing the lexical entries, a
mapping between concepts and lexical entries is established. The application
is based on a set of concepts that are organized in a the taxonomy He. In
the next step, potential non-taxonomic relations between concepts have been
explored and discussed with the users (e.g. leading to relations such as the
relation CONTACT between TOPIC and PERSON).
Developing an ontology 0' for the TOPICBROKER application requires a set
of concepts C i E e, a set of lexical entries Li E Cp, (Ie I = I£ I) and a bijective
mapping F. The concepts e are embedded in the taxonomy He. Additionally,
a set of non-taxonomic relations R in the form of simple attributes has been
modeled to allow for ajoint definition of a knowledge base as described above.
All in all, 45 concepts, 76 lexical entries and five non-taxonomic relations have
been modeled in the ontology. Ontology 0' may be considered as a light-weight
ontology.
The second scenario of ontology engineering has been carried out in the
GETESS 7 application (see (Staab et al., 1999)). The application targets the
development of domain-specific search engines that support natural language
queries, and natural language processing of Web documents (e.g. generating
abstracts) using ontological background knowledge. The application is based
on an ontology consisting of a taxonomy of concepts and relations between
them. Bilingual (English I German) word stems (lexical entries that are used
by the natural language processing component) and external representations
(lexical entries that appear in the user interfaces) are mapped to the concepts
and the relations. Using the layered approach interfaces to the ontology as
well for the natural language processing component as for the user interface
component could be provided. Developing an ontology 0" for the GETESS
application requires a set of concepts Ci E e, a set of relations R j E R, lexical
entries Lk E £0 and mapping functions F, g, respectively. The concepts e
are embedded in the taxonomy He. All in all, in the tourism domain of the
GETESS project approx. 1120 concepts, 3200 lexical entries and 300 non-
taxonomic relations have been modeled in the ontology.
The third task of ontology development has been carried out in the devel-
opment of the ontology-based human resource management systems Proper
and OntoProper (see (Sure et al., 2000)) that uses means from decision the-
ory to allow for compensate skill matching. Additionally, these methods are
combined with intelligent means for inferencing of skill data. For the latter an
ontology provides background knowledge, i.e. conceptual structures and rules,

which supplement the skill database with ground and inferred facts from sec-
ondary information, such as project documents. These supplement facts reduce
maintenance efforts since much secondary information is gathered in the orga-
nizational memory through common working tasks. The system uses a heavy-
weight ontology 0 111 consisting of a set of concepts, the taxonomy of concepts
Ji e , non-taxonomic relations n and domain-specific axioms A o . The ontology
is realized using the Frame-Logic representation language introduced by (Kifer
et aI., 1995). Additionally, default instances I of people and corresponding
relations taken from a human resource legacy database have been predefined in
the knowledge base. A detailed explanation of this "heavy-weight" ontology
is given in (Sure et aI., 2000).
These three examples show what different types of ontologies used by dif-
ferent applications may be developed using the ontology and knowledge base
structures with the underlying layered and cyclic ontology engineering frame-
work. Thus, the special role of signs is considered (e.g. allowing the definition
of different types of lexicons) and the interaction between ontology and knowl-
edge base is supported (e.g. allowing the definition of predefined instances)
2. An Architecture for Layered Ontology and Knowledge

Base Representation
This section focuses on the question of how to represent instantiated ontology
and knowledge base structures for the Semantic Web for machine processing.
Following the proposal of (Decker and Melnik, 2000) for a layered approach to
interoperability of information models, a layered language architecture for the
ontology engineering framework is presented.
Figure 3.2 depicts the layered language architecture. The lowest layer, the
syntax layer provides a syntactic representation using the Extensible Markup
Language (XML)8. Using XML one guarantees a uniform, syntactic represen-
tation of the ontology and the knowledge base. The next layer, the data layer, is
based on the Resource Description Framework (RDF) (see (Lassila and Swick,
1999» that provides a simple data model. On the data layer of the Semantic Web
a generic mechanism for expressing machine readable and processable data is
required. The Resource Description Framework (RDF) integrates a variety of
applications from library catalogs and world-wide directories to syndication
and aggregation of news, software, and content to personal collections of mu-
sic, photos, and events using XML as an interchange syntax. RDF is considered
as this foundation for representing and processing (meta-)data. It provides a
language for expressing factual statements, a simple data model and a standard
syntax according to the syntax layer.
Translation Translation
Translation
into into Operation- Logical I
into
KIF
description logic
F-Logicwell alization ... Operationalization
axiomatic founded
semantics S1tIQ semantics
Layer
tI
t
I
t'-....... "'-Jt
I
DAML+ OIL OIL F-Logic Representation
Primitives Primitives Primitives
O/KB
Primitives
...
Vocabulary
------------------------------------ ----------------- -------- ----- ------------------------- Layer
Core representation vocabulary based on RDF-Schema
Data
Simple data model (resource description framework (RDF))
Layer
Syntax
Syntax based on XML
Layer
Figure 3.2. Representation Layers
The middle layer, the representation vocabulary layer, provides different

"language primitive vocabularies" for representing ontologies. We consider
the RDF-Schema (Brickley and Guha, 2000) vocabulary as the least common
denominator as representation languages may agree and may be built on 9 . The
RDF-Schema specification (Brickley and Guha, 2000) contains a simple vo-
cabulary very similar to semantic nets. In the representation vocabulary layer
one may define additional vocabularies with respect to specific knowledge rep-
resentation languages and concrete operationalizations (e.g. the ONTOEDIT
ontology engineering environment and its vocabulary). An important aspect
is that there will be in the Semantic Web a number of different representation
vocabularies (e.g., OIL 10, DAML+OIL 11, DRDF(S)12) that may be used within
the different applications.
The logical and operationalization layer provides a formal semantics, rea-
soning mechanisms and concrete operationalizations for the representation vo-
cabularies introduced in the middle layer. Thus, in the logical and interpretation
layer the representation vocabulary contained in the vocabulary layer is mapped
onto concrete operationalizations of the vocabulary. The realization of the log-
icallayer and an approach for mapping formal semantics to a given vocabulary
is also introduced in this section.
3.2.1 RDF -based Data Layer

In the following a very short introduction into the RDF (Lassila and Swick,
1999) is provided. The interested reader is referred to the excellent RDF Tutorial
marriedWith
http://www.foo.comlW. Simth http://www.foo.com/S.Simth
a)
tt ://www.vatican.va/hoILfather
confirmed By
http://www.foo.comlW. Simth http://www.foo.com/S.Smith
firstName lastName firstName lastName

I William Smith I I Susan Smith I
b)
c) http://www.vatican.va/hoILfather
rdt:subject -....... rdt: object
http://www.foo.comlW.Simth http://www.foo.com/S.Smith
firstName lastName firstName lastName

William Smith I Susan Smith
Figure 3.3. An RDF Example
by (Decker et aI., 2000c), the recently published book by (Hjelm, 2001) and
the more technical W3C specification of (Lassila and Swick, 1999).
RDF is an abstract data model that defines relationships between entities
(called resources in RDF) to a similar fashion as semantic nets. Statements
in RDF describe resources, that can be web pages or surrogates for real world
objects like publications, pieces of art, persons, or institutions. In a graphical
representation of an RDF statement, the source of the relationship is called the
subject, the labeled arc is the predicate (also called property) and the relation-
ship's destination is the object.
The RDF data model distinguishes between resources, which are objects
represented by URIs, and literals which are just strings. Resources may be
related to each other or to literal (Le. atomic) values via properties. Such a
relationship represents a statement that itself may be considered a resource, i.e.
reification is directly built into the RDF data model. Thus, it is possible to make
statements about statements. These basic notions can be easily depicted in a
graphical notation that resembles semantic nets. To illustrate the possibilities
of pure RDF the following statements are expressed in RDF and depicted in
Figure 3.3: 13
• Firstly, in part (a) of Figure 3.3 two resources are defined, each carrying
a FIRSTNAME and a LASTNAME property with literal values, identifying the
resources as William and Susan Smith, respectively. These two resources
come with a URI as their unique global identifier. They are related via the
property MARRIED WITH, which expresses that William is married with Susan.
• Part (b) of the illustration shows a convenient shortcut for expressing more
complex statements, i.e. reifying a statement and defining a property for the
new resource. The example denotes that the marriage between William and
Susan has been confirmed by the resource representing the Holy Father in
Rome.
• The RDF data model offers the predefined resource rdf: statement and the
predefined properties rdf:subject, rdf:predicate, and rdf:object to reify a
statement as a resource. The actual model for the example (b) is depicted in
part (c) of Figure 3.3. Note that the reified statement makes no claims about
the truth value of what is reified, i.e. if one wants to express that William
and Susan are married and this marriage has been confirmed by the pope.
Then the actual data model must contain a union of part (a) and part (c) of
the example illustration.
Additionally to these core primitives, RDF defines three types of containers

to represent collections of resources or literals: one distinguishes (i) bags, that
are unordered lists, (ii) sequences, that are ordered lists, and, (iii) alternatives,
that are lists from which property can use only one value (see (Decker et aI.,
2000c».
3.2.2 Representation Vocabulary Layer

It was introduced that the representation vocabulary layer provides different
"language primitive vocabularies". It was also mentioned that one may consider
the RDF-Schema (Brickley and Guha, 2000) vocabulary as a core least common
denominator on that representation languages may agree and may be built on.
In the following subsection RDF-Schema that provides means for representing
core ontologies is introduced. In the approach described here, it is built on
RDF and RDF-Schema for representing the core of the ontology and knowledge
base structure. Additionally, it is shown how RDF(S) may be extended with
additional vocabularies, e.g. the vocabulary required for representing the lexical
layer contained in 0 and KB.
RDF -Schema. RDF-Schema is an RDF application that introduces an exten-

sible type system to RDF. Thus, e.g. it provides means to define concept (or
class) hierarchies, and domain and range restrictions for properties. Illustrated
here is how an ontology can be modeled in RDF(S) by presenting a sample
S subClassOf (rdfs:subClassOf)
R
D
domain (rdfs:domain)
range (rdfs:range)
~~
VI ..
T instanceOf (rdf:type) !'S
"".a
f ~
Ul 0
LL >
C c
a: 0
Li:;
c ..
a:
u "g
"'''"
.r; ::
~E
VI ..
. ..
C C
0"0
., C
.!! ..
"D.E
Uu
VI
··································11··
http://www.foo.comlW.Smith
appl:marriedWith
http://www.foo.com/S.Smith
~~
~ ti
C"!:""O
.2 ca !!!
~~E
c."ii
"",,
cui
Figure 3.4. An RDF-Schema Example
ontology (see Figure 3.4) in the abstract data model. RDF-Schema offers a
distinguished representation vocabulary defined on top of RDF to allow the
modeling of object models. The most relevant RDF-Schema primitives are
given in the following list:
• The most general class is rdf: Resource. It has two subclasses, namely
rdfs: Class 14 and rdf: Property (see Figure 3.4 15 ). When specifying a do-
main specific schema for RDF(S), the classes and properties defined in this
schema will become instances of these two resources.
• The resource rdfs:Class denotes the set of all classes in an object-oriented
sense. That means that classes like appl: Person or
appl: Organisation are instances of the meta-class rdfs: Class.
• The same holds true for properties, i.e. each property defined in an applica-
tion specific RDF-Schema is an instance of rdf: Property, e.g.
appl:marriedWith
• RDF-Schema defines the special property rdfs: subClassOf that defines the
subclass relationship between classes. Since rdfs: subClassOf is transitive,
definitions are inherited by the more specific classes from the more general
classes. Resources that are instances of a class are automatically instances
of all superclasses of this class. In RDF-Schema it is prohibited that any

class is an rdfs: subClassOf itself or of one of its subclasses .
• Similar to rdfs:subClassOf, which defines a hierarchy of classes, another
special type of relation rdfs: subPropertyOf defines a hierarchy of properties
(e.g. one may express that FATHEROF is an rdfs: subPropertyOf PARENTOF).
• RDF-Schema allows to define the domain and range restrictions associated
with properties. For instance, these restrictions allow the definition that
persons and only persons may be MARRIEDWITH and only with other persons.
As depicted in the middle layer of Figure 3.4 the domain specific classes
appl : Person, appl: Man, and appl: Woman are defined as instances of
rdfs: Class. In the same way domain specific properties are defined as instances
of rdf: Property, i.e. APPL:MARRIEDWITH, APPL:FIRSTNAME, and
APPL:LASTNAME.
The underlying syntax. The interchange of data represented in RDF must be

facilitated through a concrete serialization syntax. XML is an obvious choice,
and the RDF specification uses it as one possible syntactic realization of the RDF
data model. In the following the abbreviated XML syntax that is used for repre-
senting RDF(S) models is introduced. An explanation of in the context of this
work important paradigms of XML (e.g. namespaces) are introduced briefly.
Several examples for representing as well new representation vocabularies as
concrete ontology and knowledge bases using the representation primitives is
given. The reader may note that one important aspect for the use of RDF in the
WWW and thus, the Semantic Web in general, is the way RDF models may be
represented and exchanged.
The use ofXML Namespaces in RDF(S). The XML namespace mechanism

(Bray et aI., 1999) plays a crucial role in the layered representation architecture
for the development of RDF(S) and its applications. It distinguishes between
different modeling layers (see Figure 3.4 and 3.10) for reusing and integrating
existing schemata and applications. At the time being, there exist a number of
canonical namespaces, e.g. for RDF16 and for RDF-Schema 17 .
In the future more specific representation vocabularies (e.g. the ONTOEDIT
and OIL, DAML+OIL representation vocabulary presented subsequently) for
representation primitives and concrete ontologies will be available through
namespaces (see Figure 3.6).
An example for the XML syntax of RDF(S) is given in Figure 3.5. RDF
code usually starts and ends with <rdf: RDF> and </rdf: RDF> tags. An RDF
description using the abbreviated XML syntax starts with a typing identifier
(such as <rdfs:Class> or <app:Man». Then an ID can be defined, usually with
the <rdf: about> or <rdf: ID> XML attribute, to enable references to the defined
<rdf:RDF
xmlns:rdf ''http://www.w3.org/1999/02/22-rdf-syntax-ns#''
xmlns:rdfs = ''http://www.w3.org/2000/01/rdf-schema#''
xmlns "example-ontology.rdfs#">
<rdfs:Class rdf:ID="Person"/>
<rdfs:Class rdf:ID="Organization"/>
<rdfs:Class rdf:ID="Han">
<rdfs:subclassOf rdf:ID=IIPerson">
</rdfs:Class>
<rdfs:Class rdf:ID="Woman ll )
<rdfs:subclassOf rdf:ID="Person")
</rdfs:Class>
<rdf:Property rdf:ID="firstName">
<rdfs:domain rdf:resource="#Person"/>
<rdfs:range rdf:resource=''http://www.w3.org/TR/xmlschema-2/#string''/>
</rdf:Property>
<rdf:Property rdf:ID=lIlastName ll )
<rdfs:domain rdf:resource="#Person"/>
<rdfs:range rdf:resource=''http://www.w3.org/TR/xmlschema-2/#string''/>
</rdf:Property>
<rdf:Property rdf:ID="marriedWith">
<rdfs:domain rdf:resource="#Person li />
<rdfs:range rdf:resource="#Person ll />
</rdf:Property>
</rdf:RDF>
Figure 3.5. An Example for the RDF-Schema Serialization Syntax
resource. The next level of nested tags gives properties of the resource denoted
by the ID. Figure 3.5 gives an example for the RDF schema serialization syn-
tax. The example represents in XML the classes and property types defined
in Figure 3.4. Additionally, domains and ranges of the properties are defined
using the RDF constraint properties rdfs: domain and rdfs: range.
<rdf:RDF
xmlns:rdf = ''http://www.w3.org/1999/02/22-rdf-syntax-ns#''
xmlns:app = "example-ontology.rdfs#">
<app:Han rdf:ID= .. http://www.foo.com/W.Smith .. >

<app:firstName>William</app:firstName>
<app:lastName>Smith</app:lastName>
<app:marriedWith rdf:resource= .. http://www.foo.com/S.Smith .. />
</app:Han>
<app:Woman rdf:about= .. http://www.f-oo.com/S.Smith .. />

</rdf:RDF>
Figure 3.6. XML Serialization of RDF instances, their literals, and relations between them.
Figure 3.6 gives an example for RDF instances defined according to the
ontology given in Figure 3.5. The application specific classes <app : Man> and
are instantiated by the two identifiers

<app : Woman>
http://www.foo.com/W.Smith and http://www.foo.com/S.Smith. The instance
http://www.foo.com/W.Smith has two properties APP:FIRSTNAME and
APP:LASTNAME to literal values and the property APP:MARRIEDWITH to another
resource, namely the instance http://www.foo.com/S.Smith.
Extension ofRDF -Schema with Representation Primitives. As depicted in

Figure 3.2, additional layers may be defined on top ofRDF(S). A short example
for adding the lexical layer used in the ontology engineering framework to
RDF(S) is given below.
<rdf:RDF
xmlns:rdf ; ''http://www.w3.org/1999/02/22-rdf-syntax-ns#''
xmlns:rdfs ; ''http://www.w3.org/2000/01/rdf-schema#''
xmlns = "http://ontoserver.aifb.uni-karlsruhe.de/schema/oevoc.rdfs#lI)
<rdfs:Class rdf:ID;"LexicalEntry"/>
<rdfs:Class rdf:ID=IILanguage ll )
<rdfs:Class rdf:ID=IIDE H )
<rdfs:subClassOf rdf:resource="#Language"/>
</rdfs:Class>
<rdf:Property rdf:ID=ltreferences">
<rdfs:range rdf:resource;''http://www.w3.org/2000/01/rdf-schema#Resource"/>
<rdfs:domain rdf:resource="#LexicalEntry ll/>
</rdf:Property>
<rdf:Property rdf:ID=lIlanguage ll )
(rdfs:range rdf:resource="#Language"l>
<rdfs:domain rdf:resource="#LexicalEntry"/>
</rdf:Property>
<rdf:Property rdf:ID="value")
<rdfs:range rdf:resource;''http://w,,,,.w3.org/2000/01/rdf-schema#Literal"/>
<rdfs:domain rdf:resource="#LexicalEntry"/)
</rdf:Property>
</rdf:RDF>
Figure 3.7. OntoEdit Representation Vocabulary
To allow the instantiation of a lexical layer, one defines new representation

primitives on a new namespace 1S . Here, only describe a small part of rep-
resentation vocabulary as given in Figure 3.7 is described. The definition of
a representation vocabulary works by analogy to the definition of a concrete
ontology as given in Figure 3.5. One may store the representation vocabulary
in a predefined namespace. For instance in the example above we define the
class LexicalEntry and the property value. Additionally, we define a restriction
of the property value allowing the definition that LexicalEntry classes may be
assigned a literal value. Figure 3.8 depicts an excerpt of a concrete instantiated
ontology using the lexicon representation vocabulary introduced above. The
"lexicon representation vocabulary" is imported using the already explained
namespace mechanisms. We use the vocabulary by referring to the particular

classes, e.g. <0: LexicalEntry>. In this small example we represent the def-
inition of the lexical entry L = "Organisation" (in German) for the concept
ORGANIZATION.
<rdf:RDF
xmlns = Itexample-ontology.rdfs#"»
<rdf:Description rdf:ID="A189">
<rdf:type resource=lIhttp://ontoserver.aifb.uni-karlsruhe.de/
schema/oevoc.rdfs#LexicalEntryli/>
<references resource="#Organization ll />
<value>Organisation</value>
<language resource=lIhttp://ontoserver.aifb.uni-karlsruhe.de/
schema/oevoc.rdfs#DE")
</rdf:Description>
</rdf:RDF>
Figure 3.8. A Concrete Representation of the Lexicon
An example for defining and representing the OIL description logics vocab-
ulary on top of RDF-Schema has been given in (Decker et aI., 2000a; Fensel
et aI., 2000). Figure 3.9 depicts the derived OIL extensions, using as many
as possible RDF-Schema constructs. The usage of RDF-Schema primitives as
a basis for the OIL representation vocabulary has the advantages that agents
that do not understand OIL, at least may understand the classes, properties and
subclass relationships.
rdfs.:Resource
otl:ExpteSsian
oil;CllSsEll9tenion rdfa 'Con5JraintRe~ rdf Property
,--_ _ _ _.--_..1.'- - - - - - , I
oil:TranSili~eProperty oil:FLlftet~.IPropN1)'
~
tdfa:Ctass
I
oU:SaoieanExp'e.ston
oil:p,opert~Re$f1ietlon o":SymmetrieP/oJ)etty
oil:PrlmitiveCtas$ I oitAnd oil:Or oil:Not

oil:[)efinedCI," QII:CenfinaHtyRestfiction oil:ValueType o":HuValueoil:HasFiller
oil:ConetereTypeExpr.ssion
.iI:~q"" I .~,:L I O":R~ng. I

oil:Oisjoint oil:Co:".ring oil:EquivaJent
oil:MaxCardinality
oil:MI" oiJ;GreaurThan oU:LessThan oil~Co>nr oit:DisjointCo'ler
Figure 3.9. OIL Extensions ofRDF(S)
3.2.3 Logical Layer - Mapping to Formal Semantics

Recently several steps have been made towards a logic-based formalization of
RDF(S) and a clarification of its semantics 19 . RDF(S) is only a starting point
towards more comprehensive ontology representation languages with richer

formal semantics. The following is a short overview on existing mapping ap-
proaches: One of the first works on processing RDF triples with logic have
been described in (Decker et aI., 1998). The "Simple Logic-based RDF In-
terpreter (SiLRI)" transforms a RDF triple syntactically into a fact of F-Logic
(Kifer et aI., 1995), and applies sound and complete inference mechanism on
it. Along the same lines an approach for specifying the semantics of RDF(S)
based on the F-Logic semantics has been proposed by (Wei, 1999). In her ap-
proach the RDF(S) primitives are directly mapped onto corresponding F-Logic
elements, e.g. a statemenet given through the triple (Frank,worksWith,Ole) is
directly mapped to Frank[worksWith -H Ole]. (in analogy to (Decker et aI.,
1998)). Additionally, the intended semantics of the RDF-Schema vocabulary
is formally represented by F-Logic rules.
Conen & Klapsing (Conen and Klapsing, 2000) capture the intended seman-
tics of RDF(S) in first-order logic. In their work they design a logic-based
formulation of RDF concepts and constrains in the RDF-spirit of simplicity,
universality, and extensibility. They represent their formalization as DATA-
LOG rules that may be processed by SiLRI.
In Figure 3.9 the OIL extensions ofRDF-Schema have been introduced. The
Ontology Inference Layer OIL is a proposal for a web-based representation
and inference layer for ontologies. This combines the widely used modeling
primitives of frame-based languages with the formal semantics and reasoning
services provided by description logics. A model-theoretic specification of the
meaning of OIL constructs is also provided2o .
The DARPA Agent Markup Language (DAML) is based on a major, well-
funded initiative21 , aimed at joining the many ongoing Semantic Web efforts
and, focuses on bringing ontologies on the Web. An axiomatization of RDF,
RDF-Schema, and the DARPA-DAML specific representation vocabulary
DAML+OIL 22 by specifying a mapping of a set of descriptions in any ofthese
languages into a logical theory expressed in first -order predicate calculus has
been provided by (Fikes and McGuiness, 2001). Their basic claim is that the
logical theory produced by the mapping specified therein is logically equivalent
to the intended meaning. Providing a means of translating RDF, RDF-Schema,
and DAML+OIL descriptions into a first-order theory not only specifies the in-
tended meaning of the descriptions, but also produces a representation of the de-
scriptions. From these descriptions inferences can automatically be made using
traditional automatic theorem provers and problem solvers. The mapping into
predicate calculus consists of a simple rule for translating RDF statements into
first-order relational sentences and a set of first-order logic axioms that restrict
the allowable interpretations of the non-logical symbols (i.e., relations, func-
tions, and constants) in each language. Since RDF-Schema and DAML+OIL
23 are both vocabularies of non-logical symbols added to RDF, the translation
of RDF statements is sufficient for translating RDF-Schema and DAML+OIL

as well. The axioms are written in Knowledge Interchange Format (KIF)24,
which is a proposed ANSI standard). The axioms use standard first-order logic
constructs plus KIF-specific relations and functions dealing with lists. Lists
as objects in the domain of discourse are needed in order to axiomatize RDF
containers and the DAML+OIL properties dealing with cardinality.
The attention in the work described in this book is on the extraction and
maintenance of primitives according to 0, that is a subset of the RDF-Schema
vocabulary (with exception of AO). In the following it is shown how the
ontology and knowledge base structure introduced in the last section may be
mapped onto a concrete representation language (in F-Logic our case) using
semantic patterns.
Semantic Patterns. RDF-Schema was introduced as the basic vocabulary on

the representation layer of our approach (see Figure 3.2). While RDF-Schema
certainly goes an important step into the direction of the "Semantic Web", it
only provides a very lightweight, and thus extremely restricted vocabulary for
representing ontologies. Therefore, a number of proposals for languages and
language extensions on top of RDF-Schema with an associated logical layer
are currently under development (see (Decker et aI., 2000a; Corby et aI., 2000),
which describe some of them). Given the large variety of logics in use in many
systems nowadays and given experiences from knowledge representation and
reasoning 25 that have shown the necessity of this multitude of languages. The
variety of these proposals gives only a first impression of the Babel of languages
which will come up in the Semantic Web.
Therefore, a new approach for engineering machine-processable knowledge
in a way such that it is reusable across different Semantic Web languages and
across different styles of modeling has been developed. First, it builds on
RDF(S) and, second, it is based on so-called semantic patterns (Staab et aI.,
2001a) that capture the intended semantic entailments. Semantic patterns are
used for communication between Semantic Web developers on the one hand.
Semantic patterns are also used for mapping and reuse to different target lan-
guages , thus bridging between different representations and different ways
of modeling knowledge. Developing the semantic patterns, it is not intended
to invent the wheel from scratch, but insights from software engineering and
knowledge representation research are picked and integrated for use in the Se-
mantic Web. In general according to Figure 3.2 one may consider semantic
patterns as a connecting or mediation mechanism between the representation
layer and the logical layer.
The core idea. A rough outline of how semantic patterns may be developed
and used is given in the following. For a comprehensive introduction the inter-
ested reader is referred to (Staab et aI., 200la). The work on semantic patterns
has been motivated first by axiom schemata (Gruber, 1993a). While axiom
schemata already go into the direction of abstracting from formal model char-
acteristics (see (Staab et aI., 2001b)), by definition they are developed for one
language only. Hence, one part of the high-level idea was to allow for (an
open list of) new epistemological primitives (see (Brachman, 1979)) that can
be instantiated in different representation languages for modeling particular
semantic entailments and which are, thus, similar to named axiom schemata
working in one language.
However, one needs a more flexible paradigm better suited to apply to a larger
range of representation languages and able to abstract more from particular
formal models. As described above, the general problem does not allow to
come up with a completely formal and ubiquitously translatable specification
of semantics. Hence, the other part of the high-level idea is to require extra
efforts from Semantic Web developers. To support them in their efforts, it
appeared to be a prerequisite that they could communicate more efficiently
about these new epistemological primitives - similar to the way that software
engineers talk about recurring software designs.
Design patterns have been conceived for object-oriented software develop-
ment to provide (i) a common design vocabulary, (ii) a documentation and
learning aid, and (iii) support for reorganizing software. Likewise to the nam-
ing and cataloguing of algorithms and data structures by computer scientists,
design patterns are used by software engineers to communicate, document and
explore design alternatives by using a common design vocabulary or a design
pattern catalog. This way, they also decrease the complexity of developing
and understanding of software systems. Additionally, design patterns offer so-
lutions to common problems, help a novice "acting" more like an expert and
facilitate the reverse-engineering of existing systems.
Though bridging between formal representations seems to be a formal task
only, very often quite the contrary becomes true, when not everything, but
only relevant aspects of knowledge can or need to be captured, when not all
inferences, but only certain strains of semantic entailments can or need to be
transferred. The development of new semantic primitives should not only al-
lude to the formal definition of translations into target languages, but also to
informal explanations. Therefore a semantic pattern does not only comprise
new epistemological primitives, but likewise to design patterns, it also serves
as a means for communication, cataloguing, reverse-engineering, and problem-
solving. Thus, it may contribute to a more efficient exploitation of Semantic
Web techniques.
Figure 3.10 summarizes our approach for modeling axiom specifications in
RDF(S) in an overall picture. It depicts the core of the RDF(S) definitions
and our extension for some example semantic patterns (i.e. our ontology meta
_ subClassOf (rdfs:subClassOf)
.............
......
~:-.:........... .,..
., I",
~ ,.',',,\
" I I, 1
., I I I 1
" I 'I ,
/:
, , I I 1
I
,,' I,
",
" "
I I II ,
1
I I ' ,
I I J ,
,,I / : \
I I I ,
'". '" .
U
<LeU
w ~ ~
~E=
~~~
a.
w
~
••••••••••••• •••• :~ .. ~ • ••••••••••••••••••••••••• •••••• ~.:.-.:-.:- .... - ......:!... :................................................ .
http://www.foo.comNV.Smith
,
appl:marriedWith
--.
http:/twww.foo.comlS.Smith
Figure 3.10. Extending RDF(S) using Semantic Patterns
layer). A simple ontology, especially a set of application specific relationships,

is defined in terms of our extension to RDF(S). In the following we will shortly
introduce how the semantic pattern primitives are defined and give some con-
crete instantiation examples.
Example. Inverse axioms like given in (3.1) ensure consistency in e.g. in

a knowledge warehouse (it is impossible to have a project with a participant
that doesn't work at that project) and free the user from providing redundant
information (e.g., a person works at a project, and therefore the project has the
person as a participant).
Considering F-Logic (Kifer et al., 1995) as the targeted ontology represen-
tation language one may define an "inverse axiom" as follows:
(1) FORALL X, Y X: Person[woRKsIN -H Y] +-

Y : Project[PARTICIPANT -H X].
The "ProjectParticipation axiom" describes the information that if a project

has a person as participant, than the person works at the project. One may
capture this specific axiom using a semantic pattern for inverse relations, e.g.:
(2) INVERSEREL( WORKSIN ,PROJECT ,PARTICIPANT ,PERSON)
The definition of the axiom in the fonn of a semantic pattern has the ad-
vantage, that it captures the axiom in a very generic way that is independent
of the concrete representation language. The instantiated semantic pattern IN-
VERSEREL may than be easily translated in F-Logic.
The next example concerns composition of relations. For instance, consider
if a first person is FATHEROF a second person, who is MARRIED WITH a third person
then one may assert that the first person is the FATHERINLAWOF the third person.
The definition such an axiom may be denoted in F-Logic as follows:
(3) FORALL X, Y, Z X[FATHERINLAWOF -H Z] +--

X[FATHEROF -H Y] and Y[MARRIEDWITH -H Z].
Again different inferencing systems may require completely different real-

izations of such an implication. Therefore, one may capture this specific axiom
using a semantic patterns for relation composition, e.g.:
(4) COMPOSITION(FATHERINLAWOF ,MARRIED WITH,FATHEROF)
The relation composition semantic pattern may be instantiated in RDF(S) as

follows:
(5) <o:Composition rdf:ID="FatherlnLawComp">

<o:composee rdf:resource="fatherlnLawOf"/>
<o:firstComponent rdf:resource="fatherOf"/>
<o:secondComponent rdf:resource="marriedWith"/>
</0: Composition>
A straightforward transfonnation of the semantic pattern into F-Logic may

be realized as follows:
(6) VR, Q, S, X, Y, Z X[S --» Z] +-- COMPOSITION(S, R, Q) /\ X[R --»

Y] and Y[Q -""* Z].
The Ontology 0 and Knowledge Base KB structure. As mentioned earlier

the ontology and knowledge base structure is considered as semantic patterns
that may be easily transfonned into a concrete representation language using
the mechanism introduced above. Following this strategy, the steps from the
ontology and knowledge base structure to axiom meaning (based on F-Logic
semantics as given in (Kifer et aI., 1995)) are now summarized in Table 3.l.
We restrict this mapping for the four basic primitives. The reader may note
that the primitive A 0 easily may be extended towards standard axiom primi-
tives such as INVERSE or COMPOSITION as introduced above. Thus, in the
Primitive F-Logic representation Translation axiom
1{C(C1 ,C2 ) subConcOf( c 1,c2) FORALL el, c2 el ,. c2 +-

subConcOf( el, c2).
R(C1 , C 2) rel(r,c l,c2) FORALL r, el, c2 el [r =» c2] +-
rel(r, el, c2).
inst(h, C2) instOf(i l,c2) FORALL iI, el il el +-
instOf( iI, el).
instr(R, h, lz) instrel(r,i l,i2) FORALL r, iI, i2 il[r -* i2] +-
instrel(r, iI, i2).
Table 3.1. Mapping of 0 and KB to F-Logic
future one may develop libraries of semantic patterns that are driven by the
requirements of Semantic Web developers and specific application domains.
Pattern Libraries. With the engineering of ontologies on the Web new ideas
will come up about what type of inferencing should be supported and, hence,
made interchangeable between representation systems. Since this development
is in its infancy right now, a number of semantic patterns that seem widely
applicable have been collected:
• Gruber's Frame Ontology (Gruber, 1993a) includes a set of over 60 prim-
itives, some of which are found in core RDF(S), e.g. rdf: type, and some
of which are more sophisticated, e.g. symmetry of relations or composition
(database joins).
• Medical knowledge processing often relies on the engineering of part-whole
reasoning schemes that appear or do not appear when considering the fol-
lowing examples: (i), the appendix is part of the intestine. Therefore, an
appendix perforation is an intestinal perforation. And, (ii), the appendix is
part of the intestine, but an inflammation of the appendix (appendicitis) is
not an inflammation of the intestine (enteritis). In (Staab et aI., 2000b) it is
described how to represent structures that allow for expressing (for (i)) and
preventing (for (ii)).
• Inheritance with exception is a semantic pattern that is very often useful.
Its application is tractable, even efficient, and the reasoning part has been
described, e.g., in (Morgenstern, 1998). The core idea is that one considers
the inheritance of properties, allows for the non-inheritance of certain prop-
erties, and uses a particular, unambiguous strategy for resolving conflicts
between paths of inheriting and non-inheriting a particular property. A sim-
ple example is that a PATIENT'S treatment may be covered by medical in sur-
ance, a NON-COMPLIANT PATIENT'S treatment may not be covered, but a

NON-COMPLIANT, MENTALLY DISTURBED PATIENT'S treatment will be
paid by the insurance company. Hence, coverage of treatment is typically
inherited, e.g. by almost all subclasses of patient, but not by ones like
NON-COMPLIANT PATIENTS.
Note that often there is no translation into particular target languages for
this pattern. For instance, it can be achieved in Prolog or F-Logic, but not
in the standard description logics systems .
• A number of patterns may be derived from object-oriented or description

logics systems, e.g. local range restrictions are often useful. A simple
example is that the parentOf a HUMAN is restricted to HUMAN, the parentOf
a FOX is restricted to FOX, while the range restriction of parentOf may be
ANIMAL in general.
A more complete elaboration of these and other patterns is under development

and will grow with further applications that realize the vision of the Semantic
Web.
3. Conclusions & Requirements for Ontology Learning

In this chapter the approach for layered ontology engineering has been in-
troduced. The approach pursues a application-centered, cyclic and incremental
view on ontology engineering that is based on the definition of the ontology and
knowledge base structure introduced in chapter 2. A comprehensive language
layer architecture based on RDF(S) was defined allowing the definition of rep-
resentation vocabularies, concrete ontologies and the corresponding knowledge
bases that are created on top. Different mapping approaches for defining formal
semantics for a given representation vocabulary have been introduced and the
semantics of the ontology structure and knowledge base structure have been
clarified.
As already mentioned, layered ontology engineering for the Semantic Web
may be extended further to axioms and rules and more complex representations.
The approach of semantic patterns that may be used to communicate between
Semantic Web developers and for mapping and reusing between different rep-
resentation has been introduced. Work in this area will have to be pursued
for the future. In particular, one should investigate how software engineering
methodologies about modeling and code generation from an evolving library
of semantic patterns can be brought to bear within our layered engineering
approach and the ontology engineering environment ONTO EDIT.
In this concluding part, topics that have not been considered so far in this
chapter, including cooperative & multi-user ontology engineering, federated
ontologies and knowledge bases and evolving ontologies will be enumerated.
Finally, requirements for an ontology learning framework extending the ontol-

ogy engineering approach with semi-automatic means will be defined.
3.1 Further Topics in Ontology Engineering

Ontology Engineering has a long tradition in knowledge engineering and
acquisition research. However, there is still a number of issues that have to be
approached in the near future to make the idea of the Semantic Web based on
ontologies realistic and successful. The following are three main areas that are
being investigated for ontology engineering for the Semantic Web:
Cooperative & Multi-user Ontology Engineering. Ontologies represent a

particular agreement of a conceptualization of the world. They have been moti-
vated by research done in knowledge sharing and distributed AI. Therefore,
the process of developing an ontology has always been considered as a coop-
erative process involving more than one human subject. Developing a shared
vocabulary is an important aspect, especially for its "acceptance". Thus, if more
and more users are involved in the development and standardization process,
one may reach a better coverage of a specific domain of interest.
Cooperative ontology engineering based on client-server environments is not
well researched due to the complexity of the task. First steps towards coopera-
tive ontology engineering have been described in (Fikes et aI., 1997). However,
to support consistent ontologies with a formal semantics while supporting mul-
tiple user engineering requires more comprehensive concepts based on trans-
action management similar to database theory, but more complex. Recently, a
commercial tool supporting multiple ontology engineering has been presented
by VerticalNet, a demonstrator is available for download on the Web 26 .
Multiple and Federated Ontologies & Knowledge Bases. It has been shown
that ontology-based community websites work well (Staab et aI., 2000a) with
a central community ontology. However, a problem to this approach may be
the global categorization schema. In the form of the central ontology it may
not be understood and accepted by all community members in the same way
(see (Lacher and Groh, 2001). In real life communities, community members
would like to keep their own perspective and view on the community repository.
For representing such specific views one needs a distributed ontology setting,
where explicit knowledge is exchanged via ontology mappings. In (Lacher
and Groh, 2001) an approach for automatically performing ontology mappings
via analyzing document categorizations done by users has been described. The
approach is very similar to the approach for ontology merging described in
the next chapter, building on an extensional description of concepts and deriv-
ing mappings or merging by comparing extensional descriptions. Supporting
ontology-based information integration using automatic means in the form of

ontology clustering has been presented in (Visser and Tamma, 1999).
First ideas for engineering federations of ontologies is provided in (Stumme
and Maedche, 2001a). The approach builds on foundations of multi-database
systems, or more specific federated database systems (Sheth and Larsen, 1990).
Relevant work has also been presented in the area of mUltiple thesaurus data-
bases (see (Kramer et ai., 1997». In their paper they have presented an archi-
tecture for integrating distributed and heterogeneous thesaurus databases.
Although there has quite a lot of work in this area or related areas, there are
still a number of open issues with respect to the interoperability between on-
tologies and knowledge bases, e.g. the management of mappings and mergings,
the identification of identity in the knowledge base, etc.
Evolving Ontologies & Knowledge Bases. Managing ontologies and asso-

ciated knowledge bases in distributed environments such as the Web is still an
open problem. Some initial experiments and problems have been discussed in
(Heflin and Hendler, 2000). In their paper they discuss the features of SHOE27
that address ontology versioning, the effects of ontology revision on SHOE
web pages, and methods for implementing ontology integration using SHOE's
extension and version mechanism. Their main contribution however is the dis-
cussion and analysis of problems associated with managing ontologies in a
dynamic, distributed, and heterogeneous environment such as the Web. This
work is an initial, important step into this direction, but requires more extensive
research in the near future.
The problem of ontology maintenance is accounted in chapter 6, where dif-
ferent strategies for ontology pruning and refining based on concrete web data
following a bottom-up approach are discussed.
3.2 Ontology Learning for Ontology Engineering

In this chapter the approach for layered ontology engineering has been intro-
duced. Though, the generic methodologies for building ontologies, represen-
tation languages, and tools for ontology engineering have become mature over
the last decade. However, the manual acquisition and modeling of ontologies
still remains a tedious, cumbersome task resulting in a knowledge acquisition
bottleneck.
As mentioned earlier having to face exactly this issue in concrete applications
(e.g., in developing knowledge portals according to the frameworks SEAL
(Maedche et ai., 2001c) and SEAL-II (Hotho et ai., 2001b» developed at our
institute where domain-specific application ontologies had to be developed. In
particular we were given questions like
• How fast can you develop an ontology? How much time does it take to build
an ontology for the target application?
• Is it difficult to build an ontology? How can the human expert, domain

knowledge be made explicit in the form of an ontology?
• How do you know the ontology has a good coverage for a given domain of
interest?
• How does the ontology develop over time? Are there any mechanisms that
support the problem of knowledge maintenance?
In fact, these problems of time, difficulty, confidence, and maintenance were
similar to what knowledge engineers have dealt with over the last two decades
when they elaborated on methodologies for knowledge engineering or work-
benches for defining ontologies and knowledge bases.
A generic method that proved extremely beneficial for the knowledge engi-
neering task was the integration of knowledge acquisition with machine learning
techniques (see the seminal work described in (Skuce et aI., 1985; Reimer, 1990;
Szpakowicz, 1990; Buntime and Stirling, 1991; Morik et aI., 1993a; Nedellec
and Causse, 1992; Webb, 1996». Current work on applying machine learn-
ing for knowledge acquisition and engineering mainly restricted its attention to
structured knowledge, normally in the form of instances, given in the form of
more or less complex knowledge bases.
The World Wide Web uses a wide range of different data (see the data taxon-
omy depicted in Figure 4.1). Thus, in this work it is targated at the integration
of a multitude of disciplines in order to facilitate the construction of ontolo-
gies, in particular machine learning. Because the fully automatic acquisition of
know ledge by machines remains in the distant future, the process of ontology
learning is considered as semi-automatic with human intervention, adopting the
paradigm of balanced cooperative modeling (Morik, 1993) for the construction
of ontologies for the Semantic Web. With this objective in mind, an architec-
ture that combines knowledge acquisition with machine learning, feeding on
the resources that we nowadays find on the syntactic Web, has been built. The
layered ontology engineering approach introduced in this section builds the ba-
sis for human intervention in the ontology engineering process. Thus, thinking
about how to integrate ontology learning for supporting manual engineering a
number of requirements had to be fulfilled. In the following these requirements
that make up a generic framework supporting the idea of Ontology Learning
are defined:
• Following the balanced cooperative modeling paradigm "each step may

be done by the user or by an algorithm" (see (Morik, 1993)). Ontology
Learning is considered as a "plug-in" for the overall ontology engineering
process, that may be switched on or off. Algorithms should be available for

all relevant elements contained in () as defined in Definition 2.1.
• The framework for ontology learning should define relevant phases in that
one would like to support the ontology engineer. Ontology learning should
be embedded in a process-oriented view.
• The framework should offer ways for user interaction. The user should
easily access and import data. Data may be transformed and directly passed
to an ontology learning algorithm.
• The user should be guided in finding relevant data for ontology learning.
The access and the process of this data should be managed by the system. A
combination of different data should be enforced and compensate different
aspects of knowledge and different data quality.
• Reuse of existing ontologies should be supported, that means that mecha-

nisms for integrating and accessing existing ontologies is required. Addi-
tionally, if more than one ontology for a given target domain is available,
the framework has to offer means to combine or merge ontologies from
overlapping domains.
• Existing models or existing ontology structures, e.g. a concept taxonomy
He, should be used as background knowledge within the algorithms. Thus,
an incremental extension and improvement of a given knowledge model
should be supported.
• There is no single best algorithm supporting the extraction of a specific
knowledge structure. Therefore, the framework should allow for the usage
of different algorithms trying to solve the same problem. Results generated
by these algorithms should be combined.
• The framework should provide adequate techniques for result presentation
and combination of extracted results from different data. Especially the
important question of how to constraint the number of discovered proposals
for engineering the ontology should approached.
This is an non-exhaustive "wish-list" for integrating ontology learning with
ontology engineering. The next chapter will present the general framework
supporting semi-automatic means for ontology extraction and maintenance.
Notes
1 http://www.yahoo.com
2 http://www.unspsc.org
3 Part of the work described in this chapter has been published in (Maedche
et aI., 2000; Staab and Maedche, 2000; Erdmann et aI., 2000).
4 http://www.w3c.orgIRDF
5 http://www.w3c.org
6 "RDF(S)" is used to refer to the combined technologies of RDF and RDF-
Schema.
7 German Text Exploitation and Search System - http://www.getess.de
8 http://www.w3.org/XML/
9 As depicted in Figure 3.2 DAML+OIL and OIL do not built on all RDF-
Schema primitives and exclude some of them, e.g. reification. There is
still an ongoing discussion on the primitives that may be contained in future
versions of RDF-Schema
10 http://www.ontoknowledge.org/oill
11 http://www.damI.orgI200l/03/daml+oil-index
12 http://www-sop.inria.fr/acacialdrdfs/
13 Resources are represented by shaded ovals, literal values by rectangles, and
properties by directed, labeled arcs.
14 The reader may note that term "class" used in RDF(S) is used as a synonym
of the term "concept", as given in the ontology structure definition.
15 The reader may note that only a small part of RDF(S) is depicted in the
RDFIRDFS layer of the figure. Furthermore, the relation APPL:MARRIED WITH
in the data layer is identical to the resource
APPL:MARRIEDWITH in the schema layer.
16see http://www.w3.org/1999/02/22-rdf-syntax-ns
17 see http://www.w3.org/2000/01lrdf-schema
18 The namespace is accessible at
http://ontoserver.aifb.uni-karlsruhe.de/schema/oevoc.rdfs
19The reader may note that the discussion of the semantics of RDF(S) is still
going on, e.g. see the W3C RDF-Logic mailing list, archive available at
http://lists. w3.org/ArchiveslPublic/www-rdf-logic/.
20 http://www.ontoknowledge.org/oilldownllsemantics.pdf
21 http://www.damI.org
22DAML+OIL is a joint initiative of the DARPA-DAML project and EU-
funded projects.
23 see http://www.damI.org/200 1I03/daml+oil-index.html
24 see http://logic.stanford.edulkif/kif.html
25 Various applications request different types of languages and reasoning sys-

tems, ranging from description logics systems (e.g., for data warehouse qual-
ity (Franconi and Ng, 2000)), over - tractable - non-monotonic reasoning
systems (e.g., non-monotonic inheritance for insurance help desk (Morgen-
stern, 1998)), or systems that include temporal reasoning.
26 http://www.ontobuilder.comlindex.html
27 http://www.cs. umd.edulprojects/plus/SHOE/
Chapter 4
A FRAMEWORK FOR ONTOLOGY LEARNING
This chapter introduces a comprehensive ontology learning framework for

the Semantic Web l . The framework is based on the foundations and defini-
tions of the ontology and knowledge base structure and the layered ontology
engineering paradigm introduced in the last two chapters. Thus, it provides
components, interaction between components and an overall process for apply-
ing ontology learning to real-world data for the extraction and maintenance of
ontologies. The framework described here is considered to be the backbone
of the entire book.
This chapter is separated into the following three main parts: In the first part,
different kinds of data that are considered to be relevant input for ontology
learning are introduced. A classification of the relevant data is presented in
the form of a taxonomy used to introduce the different types of data and the
relations between them. The specific characteristics of the different types of
data will also be introduced. In the second part of this chapter, the following
four core components which have been identified as a basic requirement for
ontology learning will be explained:
• the Ontology Engineering & Management Environment,
• the Data Import & Processing Component,
• the Algorithm Library Component and
• the Graphical User Interface and Management Component.
These four components will be embedded in a comprehensive architecture.
Special emphasis will be placed upon the analysis of the interaction of these
components, which will be described in detail in the second part of this chap-
ter. The subsequent chapter 5 on data import & processing and chapter 6 on
ontology learning algorithms will put life into these core components. The
comprehensive architecture introduced here has been implemented in the on-
tology learning and engineering environment TEXT- TO-ONTO (Maedche and
Volz, 2001) that will be presented in chapter 7.
In the third part of this chapter a process-oriented view on ontology learn-
ing will be introduced. This process-oriented view embeds four main phases,
namely import, extract, prune and refine into an ontology learning cycle.
Based on these four phases, an overall picture of how to apply ontology learning
for extraction and maintenance of ontologies will be provided. One important
aspect in the Semantic Web, besides building up an ontology by importing
existing knowledge structures and extracting new structures is to consider the
"maintenance phases" necessary for the pruning of an ontology and for the
refining of the ontology supported by ontology learning techniques.
1. A Taxonomy of Relevant Data for Ontology Learning

In this section an overview of different types of relevant data for ontology
learning available on the Web will be given. A classification of the relevant data
in the form of a taxonomy will be used for introducing the different types of
data and the relations between them. Subsequently, the specific characteristics
of the different types of data in the taxonomy will be introduced. The reader
may note that each of the different types of data described require specific
import, processing and transformation techniques before an ontology learning
algorithm can be applied. The import and processing techniques required for
the different types of data will be discussed in chapter 5.
semi-
structured
•
data
NL docs with
pure semi-structured
NLtext inf01tion
ling.
ontologies
(Wordnet)
reI. ER 00 DTD XML-S
model dictionaries
Figure 4.1. Taxonomy of Relevant Data for Ontology Learning
Figure 4.1 provides an overview and a classification of the relevant data for
ontology learning in the form of a taxonomy. Five main types of relevant data
are distinguished: ontologies, schemata, instances, web documents and semi-
structured data. These five generic types are further refined into more specific
kinds of data and are explained in further detail in the following section.
Ontology Learning Framework 61
Data in the form of ontologies. The first type of data considered is the left
branch of the taxonomy labeled "ontologies". Ontologies may also be con-
sidered to be a specific kind of data. As introduced in the ontology overview
chapter, different kinds of ontologies are available, e.g., linguistic ontologies
(such as WordNet (Fellbaum, 1998) and its German counterpart GermaNet
(Hamp and Feldweg, 1997) discussed in Section 1), thesauri (Wersig, 1985) or
domain-specific web ontologies 2 . The topic of reusing already existing on-
tologies was a topic of discussion long before large libraries of ontologies were
available (see (Pirlein, 1995». This has changed and is becoming increasingly
important: The Semantic Web is being built on top of domain-specific schemata
in the form of ontologies. Currently and continuing on in the future more and
more ontologies will be available (by using the XML namespaces mechanism)
and the quick adaptation of an ontology from one domain to another or the
extension of a given ontology will become critical.
As mentioned above in the context of this book, reuseable ontology structures
are considered to be available in different kinds of ontologies. Examples are the
large linguistic lexical-semantic nets, e.g., WordNet or its German counterpart
GermaNet, and, domain-specific ontologies, e.g., switching from a tourism on-
tology to a finance ontology as done in the GETESS project (Staab et aI., 1999).
One may also consider thesauri 3 (Wersig, 1985) to be light weight ontologies
that may be reused to derive more comprehensive ontologies 4 • The interested
reader is referred to (Amann and Fundulaki, 1999), where an interesting ap-
proach for the re-engineering of thesauri to ontologies has been described. In
their work an ontology represented in RDF(S) has been derived by exploiting
existing domain-specific ontologies and thesauri.
Further elaboration on how to import existing ontologies and similar struc-
tures will be discussed in the the next chapter.
Data in the form of schemata. Information systems typically depend on a

(semantic) data model (e.g., given as an entity-relationship model or an object-
oriented model). These "modeling paradigms and mechanisms" have been
developed to conceptually represent an application-specific part of the world.
It seems obvious that the engineering and modeling effort contained in this kind
of schemata may be reused for ontology learning for the Semantic Web. A short
overview is given as to what kind of schemata can be considered as relevant
for ontology learning. However, this type of data is not considered further in
this book because the focus is on semi-structured and free text documents on
the Web. It is important to mention that the ontology learning architecture that
is presented in the next chapter is generic, so that a module for importing and
reverse engineering existing schemata may be easily plugged in.
• Database Schemata: One of the classic techniques for modeling domains
within information systems is the entity-relationship model introduced
by (Chen, 1976). Typically, relational database schemata are generated

from entity-relationship models, e.g., as described in (Lang and Locke-
mann, 1995). In the last few years, it has become obvious that models with
more comprehensive semantics are required to build information systems
that have grown in complexity. Therefore, the problem of re-engineering
existing structures into more complex structures (such as 00 database mod-
els) has surfaced. In this context the field of database reverse engineering
(like (Mueller et aI., 2000; Tari et aI., 1998; Fong, 1997; Ramanathan and
Hodges, 1997) for an overview) has developed a wide range of techniques
for re-engineering relational database models and extracting the relevant
indicators of semantic structures (a comprehensive overview is given in
(Klettke, 1998».
In the area of intelligent information integration (Wiederhold, 1992) the

problem of re-engineering existing schemata and mapping them to a medi-
ation service has been researched over a long period of time. An interesting
approach for learning so-called source descriptions for data integration has
been described in (Doan et aI., 2000). A source description contains a
source schema that describes the content of the source and a mapping be-
tween the corresponding elements of the source schema and the mediated
schema. Manually constructing these mappings is both labor-intensive and
error-prone, and has proven to be a major bottleneck when deploying large-
scale data integration systems in practice. Thus, in their paper «Doan et aI.,
2000» the authors report on initial work towards automatically learning
mappings between source schemata and the mediated schema. Along the
same lines, mechanisms for capturing the semantics of a database schema
in order to combine autonomous systems in the form of federated database
systems has been researched (cf. (Sheth and Larsen, 1990».
• Web Schemata: Recently on the Web a number of so-called web schemata

have been proposed. Semi-structured data is characterized by the lack of
any fixed and rigid schema, although typically the data has some implicit
structure (Abiteboul et aI., 1999). Legacy data in the form ofXML document
type definitions (DTD's)5 and XML-Schema6 may also be classified as
this kind of data7 . The critical problem of re-engineering these kinds of
structures is not this well-researched and has been researched by relatively
few scientists. Recently, some work on the extraction of semantic models
from this more or less structural type of schemata has been published. In (Ide
et aI., 1997; Welty and Ide, 1999) an approach to derive an ontology from
a given DTD is described. In their work, the authors experiment with the
representation of a DTD and associated documents (i.e., documents which
conform to the DTD) using the LOOM knowledge representation system8 .
Thus, their main target is to provide more sophisticated query and retrieval
for documents than current systems provide.
Data in the form of instances. Instances are considered to be objects that

are extensionally defined, e.g. see Definition 2.3, which is based on a given
ontology according to the knowledge base structure KB. Thus, collections
of instances contained in data or know ledge bases represent an extensional
description of domain concepts. Therefore, they represent relevant data for
inductively extracting intensional descriptions for a given domain.
The work of learning from instances is known under the name concept for-
mation or conceptual clustering. Seminal work has been described in (Fisher
et aI., 1991), which clusters similar objects and classifies object descriptions.
An approach for deriving concept descriptions from instances has been pre-
sented in (Kietz and Morik, 1994). A nice overview on concept formation
approaches in machine learning is given in (Wrobel, 1994).
Recently, several interesting approaches under the name of "A-Box-Mining"
have been published (see (Schlobach, 2000». Initial research about learning
from RDF instance data has been described in (Delteil et aI., 2001). The authors
present a method for learning concept hierarchies by systematically generating
the most specific generalization of all possible sets of resources.
Data in the form of semi-structured data. As mentioned earlier, semi-

structured data is characterized by the lack of any fixed or rigid schema, although
typically the data has some implicit structure (Abiteboul et aI., 1999; Buneman
et aI., 1997). While the lack of fixed schema makes extracting semi-structured
data fairly easy and an attractive goal, presenting and querying such data is
greatly difficult. Thus, a critical problem is discovering the structure implicit
in semi-structured data and subsequently recasting the raw data in terms of this
structure.
In (Nestorov et aI., 1997) a notion of a type hierarchy for semi-structured
data and an algorithm for deriving the type hierarchy and rules for assigning
types to the data elements are introduced. The algorithm is motivated by the
well-known data mining technique of association rules, but is adopted to the
task of generating a so-called counting lattice. An approach for automatically
clustering large amounts of semi-structured data has been described in (Pernelle
et aI., 2001). The authors introduce two languages and algorithms for the
generation of a lattice of classes covering the whole set of data and for the
refinement of parts of the lattice. In (Wang and Liu, 1997) a new framework
and algorithm for schema discovery in semi-structured data is described and
applied to the internet movie database. The schema approach DataGuides for
semi-structured data has been introduced by (Goldman and Widom, 1997) for
the OEM data model. DataGuides are dynamically generated from a given
data graph and represent a compact structural description of the given set of
semi-structured data.
The interested reader can refer to the comprehensive overview on the con-
nection between ontologies and semi-structured data (with specific focus on
XML) presented in (Erdmann, 2001).
Data in the form of natural language text. Natural language documents are
data that are freely available in large amounts on the Web. This kind of data
is considered to be the most important source of data for ontology learning
for the Semantic Web. Therefore, the applicability of the techniques laid out
in this book can be guaranteed, because great amounts of this kind of data are
available for all domains of interest. In general, the following distinction can
be made:
• Pure natural language text: Natural language text exhibits morphological,
syntactic, semantic, pragmatic and conceptual constraints that interact in
order to convey a particular meaning to the reader. Thus, the text transports
information to the reader and the reader embeds this information into his
background know ledge.
Linguistic constraints found in the language serve as an important input for
ontology learning. Real-world data on the web requires specific techniques
for recognizing and analyzing these constraints. Shallow text processing
and the underlying techniques that are applied in the overall framework are
introduced in chapter 5.
• Natural language documents enriched with semi-structured informa-
tion: As mentioned above when referring to the success of new standards
for document publishing on the web, there exists a proliferation of semi-
structured data on the web. Formal descriptions of semi-structured data are
available freely and widely, e.g., HTML data adds more or less expressive
semantic information to documents. Documents on the WWW typically
combine natural language text with semi-structured information, e.g., the
information contained in tables, lists, etc. Dealing with this kind of infor-
mation is a challenge and requires specific techniques.
One specific kind of a semi-structured document is found in online dictio-
naries on the web 9 . In (Litkowski, 1978) it is argued that the definitions
contained in dictionaries hold a great deal of information about the seman-
tic charateristics that should be attached to a lexeme. Dictionaries serve as
stable resources of domain knowledge that provide a good starting point for
a core ontology by supporting subsequent ontology learning from pure text.
Conclusion. In this section, a taxonomy of relevant data for ontology learning

as depicted in Figure 4.1, was introduced. As mentioned earlier in the follow-
ing chapters attention will only be paid to two specific kinds of data, namely
ontologies and natural language documents. However, the architecture that is
presented in the following section is independent of specific input data.
2. An Architecture for Ontology Learning

The purpose of this section is to introduce the ontology learning architec-
ture and its relevant components for the Semantic Web. In the last chapter, the
requirements for an ontology learning framework were collected and defined
and can be summarized shortly as follows: First, following the balanced co-
operative modeling approach introduced in (Morik, 1993), ontology learning
should be considered to be a plug-in for an ontology engineering & management
environment. Second, extensive support for data discovery, import, processing
and suitable transformations should be provided 1o . Third, ontology learning al-
gorithms should produce interpretable results. Providing the ontology engineer
with redundant results should be avoided. Fourth, existing ontology structures,
which offer background knowledge should be accessible for all components.
Fifth, the including of user support in the form of graphical interfaces for ap-
plying ontology learning techniques is an important functionality that should
be provided.
2.1 Overview of the Architecture Components

Based on these considerations and requirements, four core components have
been identified and embedded into a coherent generic architecture for ontology
learning. The following are the four main components:
• Ontology Engineering & Management Environment ONTOEDIT: ON-
ToEDIT offers comprehensive ontology management capabilities and a
graphical user interface to support the ontology engineering process which
are manually performed by the ontology engineer. The underlying paradigms
of ONTO EDIT which are based on the view of layered ontology engineer-
ing were already been introduced in chapter 3. A detailed overview of
ONTOEDIT is given in chapter 7.
• Data Import & Processing Component: This component contains a wide

range of techniques for discovering, importing, analyzing and transforming
relevant input data. An important sub-component is a natural language
processing system as depicted in the left upper part of Figure 4.2. The
underlying algorithms and strategies implemented in this component are
introduced in the next chapter. The general task of the data import and
processing component is to generate a set of pre-processed data as input for
the algorithm library component.
• Algorithm Library Component: This component acts as the algorithmic
backbone of the framework. A number of algorithms are provided for the
extraction and maintenance of the elements contained in the ontology struc-
ture O. In order to be able to combine the extraction results of different
learning algorithms, it is necessary to standardize the output in a common
way. Therefore a common result structure (according to 0) for all learn-
ing methods is provided. If several extraction algorithms obtain the same
results, they are combined and presented to the user only once.
• Graphical User Interface and Management Component: The ontology
engineer uses this component to interact with the ontology learning compo-
nents for data import & processing and for the algorithm library. Compre-
hensive user interfaces are provided to the ontology engineer to help select
relevant data, apply processing and transformation techniques or start a spe-
cific extraction mechanism. Data processing can also be triggered by the
selection of an ontology learning algorithm that requires a specific repre-
sentation. Results are merged using the result set structure and presented
to the ontology engineer with different views of the ontology structures. A
collection of user interfaces is provided as screenshots in chapter 7.
Algorithm
Library
Figure 4.2. Architecture for Ontology Learning
The overall architecture and the interaction between the four components
introduced above is graphically depicted in Figure 4.2. The reader may note
that the ''functional components" (the data import & processing and the algo-
rithm library component) are hidden by the ''interface components" (graphical
user interface management component and ONTOEDIT). Thus, the ontology

engineer only interacts with the comprehensive user interfaces contained in
the graphical user interface, the management component and the ONTOEDIT
ontology engineering environment. Using these components, the ontology en-
gineer may trigger the mechanisms available within the functional ontology
learning components (e.g. one may start document indexing and pre-processing
mechanisms, execute an extraction algorithm or check the consistency of the
instantiated ontology structure using an external inference engine accessible by
ONTOEDIT).
One important aspect of the architecture is that all components access the on-
tology which the ontology engineer is currently working on. The components
use the available background knowledge in the ontology, e.g., for extracting rel-
evant ontology elements, for merging results, for generalization and for result
presentation. The natural language processing module is directly connected to
the ontology model via the lexicon: It uses this connection to map lexical entries
to their corresponding concepts and to relate concepts with semantic relations.
The algorithm library accesses a set of pre-processed data in an algorithm spe-
cific form, eventually including existing concepts in the available ontology.
The algorithms available in the algorithm library use the conceptual knowledge
contained in the ontology and the corresponding inferences for generating more
comprehensive results. The result set component compares existing structures
in the ontology with structures extracted from the pre-processed data. If several
extraction algorithms obtain equivalent results these results are combined and
presented to the user only once.
In the following section further details are given about the components and
their specific task in the overall architecture is explained.
2.2 Ontology Engineering Workbench ONTOEDIT
The layered ontology engineering approach was introduced in chapter 3 of

this book. Different layers are distinguished for modeling, as well as for repre-
senting ontologies. As already mentioned, the ontology engineering workbench
ONTO EDIT is the running implementation ofthe layered ontology engineering
approachll. Thus, it acts as the manual modeling and engineering backbone
for the overall framework. ONTOEDIT offers comprehensive ontology man-
agement capabilities, and a graphical user interface to support the ontology
engineering process, which is manually performed by the ontology engineer.
A screenshot of ONTOEDIT is given in Figure 4.3. As one can see, different
views are offered to the user targeting the conceptual level rather than a particu-
larrepresentation language. On the left side of Figure 4.3, the concept hierarchy
He is displayed. In the middle of Figure 4.3, a view of non-taxonomic relations
with specific ranges R with respect to a selected domain concept D in the con-
cept hierarchy is depicted. On the right side of Figure 4.3, the set of all defined
relations n is shown, independently of the specific domain/range concepts. In
the lower right part of Figure 4.3, the view for defining ontological axioms A 0
using the semantic pattern approach introduced in the last chapter is depicted.
In the example, locally restricted inverse relations are defined, e.g. it is stated
that if an organization develops a specific product: the inverse relation that the
specific product is developed by the organization holds. Semantic Web appli-
cations profit from the definition of inverse relations, because new facts may
automatically be generated by applying inference rules that are automatically
generated from ONTOEDIT.
Figure 4.3. Screenshot of the ONTOEDIT Ontology Engineering Environment
One important aspect of ONTO EDIT is the account of the layered language
approach, ontologies in different representation languages (e.g., representation
vocabularies like OIL 12, DAML+OIL13, F-Logic or OntoEdit specific language
primitives) may be developed. New representational requirements (e.g., new
modeling primitives) can easily be defined on account of the layered approach:
New ONTO EDIT specific ontology representation primitives are represented
in the OntoEdit namespace. 14
An important advantage of the overall ontology learning architecture is that
ontologies in different representation languages may be accessed. On the other
hand, ontology elements that are extracted (e.g., a concept taxonomy He) may
be generated in different representation languages. At the moment ONTO EDIT

accesses the SILRI 15 F-Logic based inference engine directly. Therefore, exe-
cutable representations of the ontology for constraint checking and application
debugging can be generated and queried. A detailed explanation of the available
modeling views supported by ONTOEDIT is given in chapter 7.
All the techniques and mechanisms introduced above support the manual
generation of high quality web ontologies. However, it was already mentioned
that there is a large conceptual gap between the ontology engineering tool and
input (and often legacy data) such as Web documents, which ultimately deter-
mine the target ontology. The next three components support the extension of
the manual ontology engineering environment ONTO EDIT to a comprehensive
framework for extracting and maintaining ontologies for the Semantic Web.
2.3 Data Import & Processing Component

The data import & processing component contains a wide range of techniques
for discovering, importing, analyzing and transforming relevant input data.
As already mentioned in the section on relevant data in this chapter, attention is
only paid to natural language documents (e.g., free text, HTML, dictionaries)
and to different kinds of ontologies. An important sUb-component of the data
import & processing component is a natural language processing system as
depicted in the left upper part of Figure 4.2. In general the data import &
processing component generates either a set of preprocessed data as input for
the algorithm library component or an instantiation of the internal ontology
model. The underlying algorithms and strategies of this component will be
introduced in the next chapter. Below, the sUb-components contained in the
data import & processing component are briefly introduced:
• Ontology Wrapper and Importer: This sUb-component offers a generic
method for ontology import and a number of concrete representation lan-
guage-specific "ontology wrappers" for importing ontologies (e.g., Word-
Net, GermaNet) into the internal ontology structure.
• Ontology Merging via FCA-MERGE: If two more more ontologies are
available for import, one has to merge the input ontologies into one common
ontology. Using FCA-MERGE for merging ontologies involves following
a bottom-up approach which offers a structural description of the merging
process (Stumme and Maedche, 2001b). A detailed description of FCA-
MERGE is given in the next chapter.
• Ontology-based Focused Crawler: This is a sub-component that uses
background knowledge in the form of ontologies to focus document search
in the Web. Thus, it supports focusing the collection of relevant input data
for ontology learning (e.g., web documents).
• Natural Language Processing System: For processing free natural text,

the system introduced in this book accesses a natural language processing
system. For this work SMES (Saarbriicken Message Extraction System),
a shallow text processor for German (cf. (Neumann et aI., 1997)), has been
used. SMES comprises a tokenizer based on regular expressions, a lexi-
cal analysis component including various word lexicons, a morphological
analysis module, a named entity recognizer, a part-oj-speech tagger and a
chunk parser.
• Document Wrapper: For processing dictionaries and other kinds of semi-
structured documents with explicit structuring, this component offers spe-
cific processing techniques for wrapping the data into an algorithm-specific
target relational representation.
• Transformation Module: This module takes a set of linguistically an-
notated documents as input (generated by the natural language processing
system) and transforms them into an algorithm-specific relational represen-
tation.
As depicted in Figure 4.2 the final result of the data import & processing
module is either an instantiated ontology structure (e.g. through importing or
merging) or "blocks" of algorithm-specific preprocessed data (depicted in the
lower left part of Figure 4.2).
2.4 Algorithm Library

It has already been mentioned that ontology learning is considered to be a
plug-in for an ontology engineering environment. Following this paradigm,
a broad set of interactions are provided so that the engineer may start with
primitive methods first. These methods require very little or even no background
knowledge, but they may also be restricted to return only simple hints, like
lexical entry frequencies. While the knowledge model matures during the semi-
automatic learning process, the engineer may tum towards more advanced and
more knowledge-intensive algorithms, such as our mechanism for discovering
concept hierarchies or non-taxonomic relations as found in chapter 6. Two
different types of algorithms that are contained in the library are distinguished:
• First, algorithms for ontology extraction are established. That means al-
gorithms for extracting lexical entries as concept indicators, a taxonomic
order of concepts and a set of non-taxonomic relations between concepts
with domain and range restrictions, as well as lexical entries for relations.
• Second, algorithms for ontology maintenance are developed. That means
algorithms for ontology reduction or pruning and algorithms for ontology
evolution or refinement.
In general, an ontology may be extracted using various algorithms working on

different kinds of pre-processed input data. While specific algorithms may vary
from one type of input to the next, there is also considerable overlap concerning
the underlying learning approaches. Hence, it is possible to reuse algorithms
from the library for extracting different parts of the ontology structure.
As mentioned above, we use a result combination approach16. Thus, each al-
gorithm that is plugged into the library generates normalized results that adhere
to the ontology structure 0 and may be combined to form a coherent ontology.
Within the result set, existing structures in the ontology are compared with the
structures extracted from the pre-processed data. If several extraction algo-
rithms obtain the same results, these results are combined and only presented
once to the user.
A detailed introduction of the algorithms is given in chapter 6 of this book.
Chapter 8 describes the evaluation of the proposed ontology learning algorithms.
2.5 Graphical User Interface & Management Component

As depicted in Figure 4.2, the ontology engineer only interacts with ON-
ToEDIT and the graphical user interface and management component. As
mentioned above, the idea is that the component hides the technical complex-
ity inherently contained in the ontology learning framework. The ontology
learning graphical user interface and management component fulfills several
functions:
• It supports the ontology engineer in selecting relevant data (e.g. different
processes to discover relevant data, to clean and to index relevant data may
be started).
• It offers different interfaces to parameterize and start ontology learning al-
gorithms. This includes interfaces for selecting relevant parts of the ontology
that should, for example, be extended to the extraction of non-taxonomic re-
lations between concepts. Additionally, it allows domain specific extraction
patterns to be defined (i.e., the pattern-based extraction techniques described
in chapter 6)
• It provides comprehensive result set views, where results may be browsed
through, sorted, selected and added to the target ontology. Example screen-
shots of these views can be found in chapter 7. Additionally, they offers
mechanisms for visualizing the extracted results based on different graph-
visualization algorithms. An important aspect that is to be mentioned here
that in general there is a tight connection of the result set views to the manual
ontology engineering interfaces.
Examples of interfaces contained in the ontology learning GUI and manage-
ment component are provided as screenshots in chapter 7.
In this section of chapter 4 the overall architecture for ontology learning

with its four main components was introduced. In the next section a process-
oriented view will be introduced and the components will be embedded in the
four phases of ontology learning.
3. Phases of Ontology Learning - Import, Extract, Prune

& Refine
In the introduction to this chapter, it was mentioned that the ontology learn-
ing framework introduced in this book provides a process-oriented view based
on four main phases: import, extract, prune and refine. Each of these phases
access ontology learning relevant data and use the components introduced in
the last section. The overall ontology learning cycle is depicted graphically
in Figure 4.4. The ontology learning cycle is considered to be similar to a
bootstrapping mechanism: According to (Jones et aI., 1999) bootstrapping ini-
tializes an ontology learning "algorithm with seed information. It then iterates,
applying the learning algorithm to calculate labels for the unlabeled data, and
incorporating some of these labels into the training input for the learner".
Figure 4.4. Ontology Learning Cycle
The four main phases are introduced as follows:

• If available, one may start by importing and reusing existing ontologies.

For instance, (Peterson et aI., 1998) describe how the ontological struc-
tures contained in Cyc 17 were used in order to facilitate the construction
of a domain-specific ontology. The approach described in this book uses
lexical-semantic nets as input for ontology learning. In the next chapter, the
manner in which the lexical-semantic ontologies WordNet and GermaNet
(cf. (Hamp and Feldweg, 1997» have been imported into the ontology
learning cycle is described.
• Second, in the ontology extraction phase major parts of the target ontology
are modeled with learning support. The extraction phase may profit from the
imported ontologies given in the form of background knowledge. However,
extraction algorithms may also work without a given core ontology. The
algorithms supporting this phase are introduced in chapter 6.
• Third, this rough outline of the target ontology needs to be pruned in order
to better adjust and prepare the ontology for its prime purpose. Imported
ontologies typically do not fit together exactly with a specific application.
Pruning or the elimination of ontological structures helps keep the ontology
at the right size for a given application. The pruning algorithm is introduced
in chapter 6. .
• Fourth, ontology refinement profits from the given domain ontology, but
completes the ontology at a fine level of granularity (also in contrast to
extraction).
One may run through this cycle, e.g., to include new domains into the con-
structed ontology or to maintain and update its scope. The mechanisms
supporting ontology refinement are explained in further detail in chapter 6.
Additionally, the prime target application serves as a measure for the valida-
tion of the resulting ontology. Thus, an important aspect of the application is
that legacy and actual application data may serve as input for ontology learn-
ing. It is important to note that each phase of the cycle can be executed without
dependency on another phase. Therefore, each process step in the cycle may be
skipped, e.g. an ontology may be imported and subsequently pruned directly
using application specific data.
In the following section a more detailed overview of the four core processes
of ontology learning based on (Maedche and Staab, 2001c) is provided. In
this detailed overview, the four phases will be partially mapped to concrete
techniques that are described in chapters 5 and 6.
3.1 Import & Reuse

Given the experience in the fields of tourism (Staab et aI., 1999; Maedche
and Staab, 2000c), telecommunication (Maedche and Staab, 2000b), insurance
(Kietz et ai., 2000a) and finance, it is to be expected that there are some kinds
of ontology structures available for almost any commercially significant do-
main. Thus, mechanisms and strategies are required to import & reuse these
structures. The import & reuse process may be roughly separated into two
parts:
• In the first part, relevant ontologies have to be selected and importing strate-
gies have to be defined, e.g., an ontology wrapper supporting the transfor-
mation from one representation language to another has to be defined.
• In the second part of the import & reuse step, imported conceptual struc-
tures need to be merged. This should for the basis for the subsequent ontol-
ogy learning phases of extracting, pruning and refining.
While the general research issue of merging and aligning is still an open prob-
lem, recent proposals (e.g., (Noy and Musen, 2000)) have shown how to improve
the manual process of merging/aligning. Existing methods for merging/aligning
mostly rely on matching heuristics to propose merging concepts and similar
knowledge-base operations. As mentioned earlier, the FCA-MERGE method
is used for generating a common ontology, such as input for learning. This
mechanism follows an application data-oriented, bottom-up approach based
on formal concept analysis (Ganter and Wille, 1999). A detailed introduction
and examples of this approach are given in the next chapter of this book.
3.2 Extract
In the ontology extraction phase of the ontology learning cycle, major parts
(i.e., the complete ontology or large chunks reflecting a new sub domain of
the ontology) are modeled with learning support to exploit the various types of
relevant data. During this process, ontology learning techniques partially rely
on given ontology parts. Thus, encountering an iterative growing model during
previous revisions to the ontology learning cycle may result in subsequent ones.
More sophisticated algorithms may work best with the structures in combination
with the more straightforward ones, which where previously applied.
An example for this iterative, bootstrapping model is instantiated with the
non-taxonomic relation extraction technique described in chapter 6. Relations
between concepts may be extracted based on a given set of concepts C and an
associated lexicon C. However, if a taxonomic order He of these concepts
is given, more comprehensive, generalized relations between concepts may be
extracted. Thus, the existing background knowledge allows for the generaliza-
tion of the results. The algorithms for extracting the ontological structures are
described in detail in chapter 6.
3.3 Prune
A common theme for modeling in various disciplines is the balance between
the completeness and the incompleteness of the domain model. It is a widely
held belief that targeting completeness for the domain model appears to be
practically inmanagable and computationally intractable. Targeting the incom-
plete model, on the other hand, is overly limiting with regard to expressiveness.
Hence, striving for a balance between these two models is one of the greatest
challenge to successful ontology learning. The import & reuse of ontologies,
as well as the extraction of ontologies can result in imperfect (often too large
and comprehensive) models . Therefore pruning the ontology to diminish its
size is of great importance. Thus, there are at least two dimensions to the
problem of pruning that need to be examined:
• First, one needs to clarify how the pruning of particular parts of the ontol-
ogy (e.g., the removal of a concept or a relation) affects the ontology as a
whole. For instance, (Peterson et aI., 1998) describe strategies that leave
the user with a coherent ontology (i.e. no dangling or broken links). A sim-
ilar strategy has been described by (Swartout et aI., 1996) where ontology
pruning is considered to be the task of "intelligent" deletion of ontological
structures. The ontology engineering environment ONTOEDIT employs a
specific technique for the elimination of concepts & relations: If a user wants
to eliminate a specific concept, all the ontological structures concerned are
computed and proposed to the user. Finally, the user decides if the specific
concept or the relation should be eliminated.
• Second, one may consider strategies for proposing ontology items that
should be either kept or pruned. Several mechanisms for generating propos-
als from application data have been investigated (see (Kietz et aI., 2000b)
and chapter 6, subsection 2.1). These proposals follow an application data-
driven approach: Concepts and relations that are never instantiated in a set of
relevant documents are not considered to be necessary and their elimination
is proposed.
Given a set of application-specific documents, there are several strategies for
pruning the ontology:
• First, one may count lexical entry frequencies. Concepts referring to lexi-
cal entries that seldom appear are deleted from the ontology. Lexical entries
may also be deleted or mapped to more generic concepts that remain in the
ontology. Thereby, one may substitute simple lexical entry frequencies with
more sophisticated information retrieval measures, such as term frequency
/ inverse document frequency (tfidf), which seems to offer a more balanced
estimation of the importance of a lexical entry and its corresponding con-
cept. Additionally, using the concept hierarchy 1I.c
one may propagate the
frequencies via their concepts through the taxonomy. This method has the
advantage that top level concepts which serve as structuring concepts are
not removed from the ontology.
• Second, a more sophisticated pruning technique compares the lexical en-
try frequencies of a domain specific document collection (e.g., reports about
electronic products) with those of a generic reference document collection
(e.g., general newspaper reports). Thus, one may avoid having very inter-
esting, but rare domain-specific lexical entries and their concepts pruned
from the ontology.
A detailed description of the pruning algorithms is given in the second part
of chapter 6.
3.4 Refine
Refining plays a role similar to extracting. The difference between these
methods and when to use them cannot be laid down with a clear-cut distinction.
While extracting serves mostly the cooperative modeling of the overall ontology
(or at least very significant chunks of it), the refinement phase is about fine
tuning the target ontology and the support of its evolving nature. In principle,
the same algorithms may be used for extraction as for refinement. However,
during refinement one must consider the existing ontology in detail and the
existing connections to the ontology, while extraction works more often than
not practically from scratch.
A prototypical approach to refinement (though not to extraction!) has been
presented by (Hahn and Schnattinger, 1998». They have introduced a method-
ology for automating the maintenance of domain-specific taxonomies. An
ontology is incrementally updated as new concepts are acquired from text. The
acquisition process is centered around the linguistic and conceptual "quality" of
various forms of evidence underlying the generation and refinement of concept
hypotheses. In particular, semantic conflicts and analogous semantic structures
from the ontology are considered in order to determine the quality of a partic-
ular proposal. Thus, an existing ontology is extended with new lexical entries
for £, new concepts for C and new taxonomic relations for He.
The approach to ontology refinement described in this book is mainly based
on the assumption that unknown lexical entries share similar "conceptual be-
haviour" with respect to already known lexical entries (lexical entries that are
assigned to concepts via F). In addition to this assumption, frequency distri-
butions are compared using different statistical similarity measures.
The refinement phase may also use data that comes from a concrete Semantic
Web application, e.g., log files of user queries or generic user data. Adapting
and refining the ontology with respect to user requirements plays a major role
in the acceptance of the application and its further development. This basic
idea is also called "Semantic Web Mining" and will be briefly introduced in
the concluding chapter of this book.
4. Conclusion
This chapter defined and explained the overall ontology learning framework
that underlies the work presented in this book. The framework is based on the
foundations and definitions of ontologies and knowledge bases and the layered
ontology engineering paradigm introduced in the first part of this book.
In the first part of this chapter, relevant data for ontology learning was de-
fined providing a basis to roughly distinguish between ontologies, schemata,
instances, web documents and semi-structured data. On account of the com-
plexity of the overall topic ontology learning, attention was restricted to data
in the form of natural language text and existing ontologies. Data in the form of
instances and web and database schemata and semi-structured data has been pur-
posely neglected. The reader may note that the generic architecture introduced
here is mainly abstracted from the concrete data and offers core components to
support ontology learning. Thus, the purpose of the second part of this chapter
was to give an overview of the generic architecture for learning ontologies for
the Semantic Web and its relevant components.
Moreover, ontology learning was embedded in a process-oriented cycle con-
sisting of the core phases import, extract, prune and refine. These four main
phases reflect the fact that ontology learning can be applied to ontology building
on the one hand and to ontology maintenance, on the other hand.
Chapters 5 and 6 which will follow and belong to this portion of this book,
will instantiate the ontology learning components of the architecture and fur-
ther elaborate on data import & processing, as well as on ontology learning
algorithms which support extracting and maintaining ontologies. Chapters 7
and 8 will describe the implementation of the architecture in an actual system
and its evaluation.
Notes
1 Parts of this chapter has been published in (Maedche and Staab, 2001c;
Maedche and Staab, 200Ia).
2 An example of a large library of ontologies is the collection developed in the
DAML project, that is available at
http://www.damI.org/ontologies/
3 A large number of domain-specific thesauri are available online at
http://www.thesaurus.com!
4 Classification schemes in the form of thesauri are standardized in ISO 2780,
see "Documentation - Guidelines for the establishment and development
of monolingual thesauri. International Organization for Standardization, 11
1986, Ref. No. ISO 2788-1986".
5 http://www.w3.org/XML/1998/06/xmlspec-report
6 http://www.w3.org/XML/Schema
7 http://www.w3.orgIXML/
8 The system is available for download at http://www.isi.edulisd/LOOMI.
9 A directory of domain specific dictionaries is accessible at
http://www.dictionary.com.
10 Similar to the area knowledge discovery in databases (KDD) the preprocess-
ing task is considered as difficult and time-intensive.
11 Examples and a detailed description of the comprehensive modeling envi-
ronment ONTO EDIT are given in section 2.
12 http://www.ontoknow ledge.org/oil
13 http://www.damI.org/daml+oil
14 The namespace is online available at
http://ontoserver.aifb.uni-karlsruhe.de/ontorep.
15 An introduction to the main memory deductive, object-oriented database sys-
tem is given in (Decker et aI., 1998). The engine is available for download at
http://www.ontoprise/download/. Its successor, called TRIPLE, is currently
under development, see
http://www.dfki.uni-kl.de/frodo/triple/index.html.
16The approach is motivated by the multi-strategy learning idea.
17 CyC is a large common-sense ontology, cf. http://www.cyc.com.
Chapter 5
DATA IMPORT & PROCESSING
W3 is a distributed heterogeneous collaborative multimedia information system ".

-(Tim Bemers-Lee, 1993)
As mentioned in the introduction the Semantic Web can be seen as a Meta-

Web that is built on the existing WWW. Ontologies are the key ingredients
supporting the generation of the Semantic Web by describing vocabularies that
may be instantiated, e.g. for generating metadata about an existing web docu-
ment.
In the last chapter it has been shown what components are contained in our
architecture for learning ontologies for the Semantic Web and how they interact.
If one considers ontology learning as a re-engineering process, it has to be
dealt with legacy data for deriving a new ontology. In the setting described here
typically a large number of different, available legacy and application data is
given as starting point for ontology learning. Thus, as well legacy as application
data is considered as a source from which ontological elements can be extracted.
In this chapter it will be shown how available data on the existing Web may
be explored, accessed, processed and transformed into a relational, algorithm
specific representation!. Figure 5.1 gives a graphical overview of this chapter.
This picture follows the relevant data taxonomy that has been depicted in Figure
4.1. The reader may note that this chapter only deals with two specific kinds
of relevant input data: ontologies and natural language documents. As
mentioned earlier existing ontologies serve as an important source of knowledge
for deriving new ontologies. Existing ontologies are translated and merged to
be accessed in the framework. The final result of this processing method is an
ontology according to the ontology structure O. On the document side different
import and processing techniques are offered to the ontology engineer. Relevant
documents may be discovered and collected by the ontology-based crawling

approach. Applying this processing technique results in a domain-specific
"learning corpus". The natural language processing module transforms the
given natural language documents in linguistically annotated documents (in
dependency of the actually existing background knowledge contained in the
lexicon and the ontology). The document wrapper is a specific module for
generating a relational representation of given semi-structured documents in the
form of dictionaries. Finally, the transformation module uses the linguistically
annotated documents to generate a specific relational representation that serves
as input to the ontology learning algorithms.
j
fktol0JyOntoIOS'! Cmw1er Do!> NtP TUl1),,-
Wroppef Metgin@ Wmpper System lCfMiMkm
Figure 5.1. Import and Processing Modules
This chapter is structured as follows. Section 1 of this chapter focuses

on techniques for ontology import and processing. Two aspects are dis-
tinguished: First, if an ontology is given in a specific representation language
the ontology has to be wrapped to the ontology structure 0 in order to access
it within the ontology learning framework. Second, if two or more ontologies
are imported a mechanism for merging these ontologies has to be provided to
the ontology engineer.
In section 2 of this chapter the mechanisms for discovering, accessing, ana-
lyzing and transforming documents within our ontology learning framework
are introduced. First, the mechanism for ontology-focused document crawling
Data Import & Processing 83
and indexing from the Web, where a relevant set of document data is "compiled"
by applying a semi-supervised algorithm, is introduced. As mentioned earlier
in the last chapter, one central sUb-component for data import & processing is a
natural language processing (NLP) system. The architecture of the system and
the underlying techniques for shallow text processing are introduced in sub-
section 2.2. In particular, extensions of the system supporting the interaction
between shallow linguistic processing and the ontology are described. Domain-
specific dictionaries are considered as a stable source of know ledge for deriving
ontologies. Our approach of document wrapping described in subsection 2.3
allows a fast import and normalized representation of a given dictionary. Subse-
quently, the normalized dictionary is given directly to the learning component.
Finally, one important issue of ontology learning or machine learning in gen-
eral is to find the right representation for the application of a given algorithm.
In subsection 2.4 it is formally defined what type of relational structures are
generated from the linguistically normalized data. The concluding section 3
summarizes the content of this chapter and defines a list of further work that
has not been approached here.
1. Importing & Processing Existing Ontologies

In the near future more and more ontologies will be available (by using
the XML namespaces mechanism) and the fast adaptation from one domain
to another or the extension of a given ontology becomes critical. As men-
tioned earlier, one may distinguish different sorts of ontologies: for instance
large linguistic lexical-semantic nets, e.g., WordNet or its german counterpart
GermaNet, or domain and application ontologies (e.g., a tourism ontology de-
veloped for a specific application, like GETESS, the German Text Exploitation
and Search System). These different sorts of ontologies have different com-
plex underlying representation languages, from very simple (e.g. the WordNet
database format, cf. (Fellbaum, 1998» to formal representations (e.g. domain
ontologies given in the already mentioned OIL2 or F-Logic representation lan-
guages). If one wants to import and process these different sorts of existing
ontologies, one typically has to carry out the following two steps:
• First, one has to transform the given ontology into a representation that may
be used within the ontology learning framework.
• Second, if more than one ontology is available, the given ontologies have to
be merged into one common ontology.
In general for ontology import and processing we restrict our attention to
the core elements contained in 0 without considering axioms A o . As depicted
in Figure 5.1, two modules are offered for importing existing ontologies: an
ontology wrapper and the ontology merging approach FCA-MERGE. They
are presented in the following two subsections.
1.1 Ontology Wrapper & Import

There exists a large number of representation languages for ontologies. Typ-
ically, these languages vary in specific aspects, as already mentioned above
some only include only an implicit semantic, and some of them are a proper
subset of first order logic with a clearly defined meaning of the primitives.
Even before the wide-spread use of the Web, there have been efforts to find
one representation level for all languages (cf., KIF (Ginsberg, 1991; Gene-
sereth, 1998)) and to automatically translate between different languages (cf.,
OntoLingua (Gruber, 1993a)). Both approaches heavily suffered from the fact
that the meaning of representations, i.e. their semantic entailments, could not
be adequately represented in a single lingua franca. The semantic patterns ap-
proach partially dealing with this problem during engineering an ontology has
been shortly introduced in chapter 3. However, in using ontologies in concrete
applications the following situation has been experienced: If a specific domain
or application ontology is represented in a specific representation language,
one has to write a wrapper that wraps the ontology specific representation with
the ontology stmcture G. The meaning of the ontology structure is externally
defined through a specific mapping. It is not claimed that every kind of logical
expression may be imported, however, the ontology wrapper approach works
well for importing the basic elements contained in G.
In the following a short example is given how the lexical semantic nets
WordNet and GermaNet have been imported into the framework.
Importing WordNetiGermaNet. WordNet (Fellbaum, 1998) and its German

counterpart (Hamp and Feldweg, 1997) are lexical semantic nets which inte-
grate ontological information with lexical semantics within and across word
classes. WordNet is an on-line lexical reference system whose design is in-
spired by current psycholinguistic theories of human lexical memory. English
nouns, verbs, adjectives, and adverbs are organized into synonym sets, each
representing one underlying lexical concept. Different types of relations link
the synonym sets.
Both lexical semantic nets are useful resources for ontology learning. Both,
WordNet and GermaNet have been transformed to an instantiated ontology
structure G of the ontology learning framework. To define a suitable ontology
wrapper for WordNet and GermaNet the first step is to examine the contained
primitives to define a mapping to the ontology structure or to extend the structure
with a specific namespace (such as for lexical entries). The following is a list
of relevant, ontological primitives contained in WordNetiGermaNet:
• SynSet: a synonym set; a set of words that are interchangeable in some

context.
• Hypernym: the generic term used to designate a whole class of specific

instances. Y is a hypernym of X if X is a (kind of) Y.
• Hyponym: the specific term used to designate a member of a class. X is a

hyponym of Y if X is a (kind of) Y .
• Holonym: the name of the whole of which the meronym names a part. Y
is a holonym of X if X is a part of Y.
• Meronym: the name of a constituent part of, the substance of, or a member
of something. X is a meronym of Y if X is a part of Y.
• Antonym: a pair of words between which there is an associative bond built

up by co-occurrences. In adjective clusters, direct antonyms appear only in
head synsets .
There exist more primitives in WordNet, like reason, link and pertainym.
However, on account of their lexical motivation it was decided not to integrate
them. Table 5.1 lists the mappings that have been performed for the GermaNet
and WordNet lexical-semantic ontology to the ontology structure D.
WordNet I GennaNet Ontology 0 Comment

SynSet A synset corresponds to a concept C, words con-
tained in the synset are stored in the lexicon .c
and mapped to the specific concept C
Hyperonym, Hyponym Hyperonym relations were evaluated between
two synsets and directly mapped to He
Meronym, Holonym s Meronyn relations are named "has-part",
holonym relations are named "part-or'.
Antonym s Antonym relations are named "opposite-of"
Table 5.1. Building an Ontology Wrapper for GennaNet
Figure 5.2 depicts an example excerpt from tic extracted from WordNet on
the left side and GermaNet on the right side visualized with the ONTOEDIT
concept hierarchy view. The conversion of GermaNet results in an ontology
consisting of approx. 20, 000 concepts and 2 relations (meronym, holonym)
with 2,713 domain and range restrictions. The taxonomy tic has an average
depth of 7.19 and a maximum depth of 18.
1.2 FCA-MERGE - Bottom-Up Ontology Merging

The ontology wrapper approach works well, if there is only one ontology
available for a given domain. Nevertheless, if two or more ontologies are to be
imported, e.g. from two domains that complement each other, an approach for
Ji,'f~z~t~z:~H"k\ A~;t.'m:$t>¥
C{f.:rH#:,:)~r,:~t f-Xfl;NN':$j:;·:.'?iH4iWN, r:i::wWn~
*" -lJ fkktM t~t~:#J~Q;$hNli*¥r ~,::t$j$t-h

-0; Pi~l:ti'tt ;'::?:Hzw*i*dt'tl:!W M~nn:':;t
:$= tn:irLk':1H ""'~ Fi,,;llkq)\
q :m p~ ~;A~8~·*.»'::~A ~ t;~w~M::\ 'G~:::Q;~m~~
$<- ~ h~;.A$t:~~l1 l, =t%Ako:'
""'~fW@;W1
&@uAk W :$:: ~lH:~h:~fWf M~~~w£j~~ b1t. MM:N:~t:, l1¢,,,<t<',,M,W
? ~'W\U1;&\M
~- -:.~c~;~k:n:l'K:kn
:$;- WN~~~I
? ,'* ®I
"'" 'f$ Ww"''''',:~Alh1b, Wk~""$cr'£lkr
L&"Wli%, L~kNWW
rr4ddM~ltk rrk~U'H0n1
@%ctM "'" @ !HtMdldM#wm, Hu~M6u~"km,
""'~'Wlntwul $11,<4,1<;(1#, t eMW
:.,. $=- _w::.hw:J::r:~~3$~:i;~ ? @ li<;liUlI"N,mlu, kh~[et"lH, \e"wm,
? ~'k¥bW .... :.- 'N:·jft{¥{::%h.;:~::·Jl:~W t.{~:*%
:$--
:(.: ~&~:~;t~b:t
:t-nat:t~
"'?+f$
" ,'*
?!i;,:Iw%*b~""
\'hNk'i"",ub; L&lrw
'@ h%~!eNMlI\, hKhkkm
GblWtMd"wlh, U","owtil&lr
Figure 5.2. WordNet and GermaNet Example (visualized by ONTO EDIT)
merging two ontologies into one target ontology is required. The process of
ontology merging takes as input two (or more) source ontologies and returns a
merged ontology based on the given source ontologies.
A new method, called FCA-MERGE, has been developed for merging on-
tologies following a bottom-up approach which offers a structural description
of the merging process 3 . The method is guided by an extensional description
of the set of concepts C of two given source ontologies, that are to be merged.
The extensional description of concepts is derived by using the natural language
processing core system4 . Extensional descriptions are stored in so-called con-
texts that serve as input to a theory called formal concept analysis (cf. (Ganter
and Wille, 1999». Formal concept analysis derives a lattice of concepts as a
structural result of FC A-MERGE. The result is then explored and transformed
into the merged ontology with human interaction.
1.2.1 A Short Introduction into Formal Concept Analysis

Recalling the basics of Formal Concept Analysis (FCA) as far as they are
needed for describing FCA-MERGE. A more extensive overview is given in
(Ganter and Wille, 1999). To allow a mathematical description of concepts as
being composed of extensions and intensions, FCA starts with aformal context
defined as a triple K := (G, M, 1), where G is a set of objects, M is a set
of attributes, and I is a binary relation between G and M (i. e. I ~ G x M).

(g, m) E I is read "object 9 has attribute m".
DEFINITION 5.1 (FORMAL CONTEXT) For A ~ G, we define A' := {m E

M I Vg E A: (g,m) E I} and,for B ~ M, we define B' := {g E G I Vm E
B: (g, m) E I}.
A fonnal concept ofaformal context (G, M, I) is a pair (A, B) with A ~ G,
B ~ M, A' = B, and B' = A. The sets A and B are called the extent and the
intentoftheformal concept (A, B), respectively. The subconcept-superconcept
relation is formalized by
The set of all formal concepts of a context K together with the partial order
::; is always a complete lattice,5 called the concept lattice of K and denoted by
Il3(K).
A possible confusion might arise from the double use of the word "concept"
in FCA and in Definition 2.1. This comes from the fact that FCA and Definition
2.1 are two models for the concept of "concept" which arose independently. In
order to distinguish both notions, the FCA concepts will always be referred to
as 'formal concepts'. The concepts in ontologies are referred to as "concept"
or as 'ontology concepts '.
There is no direct counter-part of fonnal concepts in Definition 2.1. Concepts
as defined in Definition 2.1 are best compared to FCA attributes, as both can
be considered as unary predicates on the set of objects.
1.2.2 The FCA-MERGE Method

The overall process of merging two ontologies using FCA - MERGE consists
of three steps, namely
1 Extraction of extensional concept descriptions and computation of two for-
mal contexts K1 and K2,
2 Application of the FCA-MERGE core algorithm that derives a common
context and computes a concept lattice, and
3 The generation of the final merged ontology based on the concept lattice.
Figure 5.3 depicts an overview of the overall method for ontology merging.
The method takes as input data the two ontologies and a set D of natural
language documents. The documents have to be relevant to both ontologies, so
that documents are described by the concepts contained in the ontology. The
documents may be taken from the target application which requires the final
merged ontology.
m
Vl------------------~--------~+
n1
D IL--
cB=
~
~
Linguistic
°
Processing
FCA- IBp(lK) Lattice
Merge ~ Exploration -+
new
~
Linguistic
Processing
~
Figure 5.3. Ontology Merging Method
From the documents in D, concept descriptions as described in subsection

2.2 are derived. From these concept descriptions a formal context is derived
indicating which ontology concepts appear in which documents 6 . The extrac-
tion of this information from documents is necessary because there are usually
no instances which are already classified by both ontologies. However, if this
situation is given, one can skip the first step and use the classification of the
instances directly as input for the two formal contexts.
The second step of our ontology merging approach comprises the FCA-
MERGE core algorithm. The core algorithm merges the two contexts and
computes a concept lattice from the merged context using FCA techniques
(cf. (Stumme et aI., 2000)). More precisely, it computes a pruned concept
lattice (as defined below) which has the same degree of detail as the two source
ontologies.
The extraction of references to concepts and the FCA-MERGE core algo-
rithm are fully automatic. The final step of deriving the merged ontology from
the concept lattice requires human interaction. Based on the pruned concept
lattice and the sets of relation names Rl and R 2 , the ontology engineer cre-
ates the concepts and relations of the target ontology. Graphical means of the
ontology engineering environment ONTOEDIT for supporting this process are
offered. For obtaining good results, a few assumptions have to be met by the
input data:
• First, the documents have to be relevant to each of the source ontologies. A

document from which no instance is extracted for each source ontology can
be neglected for the task. This fact reflects the application-driven approach
pursued by FCA-MERGE.
• Second, the documents have to cover all concepts from the source ontolo-
gies. Concepts which are not covered have to be treated manually after the
merging procedure (or the set of documents has to be expanded).
• And last but not least, the documents must separate the concepts well enough.
If two concepts which are considered as different always appear in the same
documents, FCA-MERGE will map them to the same concept in the target
ontology (unless this decision is overruled by the knowledge engineer).
When this situation appears too often, the knowledge engineer might want
to add more documents which further separate the concepts.
In the following we introduce the three core steps for deriving a common
ontology based on a set of input ontologies using FCA-MERGE. For purpose
of explanation, a small example based on two simple ontologies is taken from
the tourism domain is used. The method has been empirically evaluated in a
larger scenario, using two tourism ontologies, each containing approx. 300
concepts.
1.2.3 Extraction of Extensional Concept Descriptions

The aim of this first step is to generate, for each ontology Oi, iE{l, 2}, a
formal context Ki : = (G i, M i , Ii). The set of documents D is taken as object set
(G i := D), and the set of concepts is taken as attribute set (Mi := Ci ). While
these sets come for free, the difficult step is generating the binary relation Ii. The
relation (g, m) E Ii shall hold whenever document 9 contains an instance of m.
The computation uses linguistic techniques as described in subsection 2.2. For
instance, the lexical entry "Stuttgart" is associated with the concept CITY. If the
concept CITY is contained in ontology 0 1 and document 9 contains the lexical
entry "Stuttgart", then the relation (g,CITY) Eh holds. Another more complex
example is given by the expression "Hotel Schwarzer Adler" that is associated
with the concept HOTEL. If the concept HOTEL is contained in ontology 0 1
and document 9 contains the expression "Hotel Schwarzer Adler", then the
relation (g,HOTEL) Eh holds.
Finally, in this case, reflexivit/ and transitivity for the H C -relation is com-
piled into the formal context, i.e. (g,m)EI and HC(m,n) implies (g,n)EI.
This means that if (g,HOTEL) Eh holds and
H C(H OTEL,AccOMMODATION), then the document also describes an instance
of the concept ACCOMMODATION: (g,ACCOMMODATION) Eh.
Figure 5.4 depicts the contexts Kl and K2 that have been generated from 14
documents for two small example ontologies taken from the tourism domain
(e. g., document doc5 contains instances of the concepts EVENT, CONCERT,
and ROOT of ontology 0 1 , and MUSICAL and ROOT of ontology ( 2 ), All
other documents contain some information on hotels, as they contain instances
of the concept HOTEL both in ontology 0 1 and in ontology O 2 .
6
·rl
J.J
11)
5 '0
.r<
'"
~
~
0 M
'" ..,
'u'""' '-'
..,'"
U
rl Q) rl
c u
'0'""'
'-' 0 .r<
II
:n
> ::c '"
:>
tu
c
0
u
0
0
0::
1;.:
0
::c
U
t!
..;
III
:>
:l1:
0
0
0::
docl X X X X X docl X X X X
doc2 X X X X X doc2 X X X
doc3 X X X X doc) X X X X
doc4 X X X X X doc4 X X X X
docS X X X docS X X
doc6 X X X X doc6 X X X X
doc7 X X doc7 X X X
docS X X X X X doc8 X X X X
doc9 X X X X doc9 X X X
doclO X X X X doclO X X X
docll X X X X X docll X X X X
doc12 X X doc12 X X X
doc13 X X X X doc13 X X X X
doc14 X X X X doc14 X X X
Figure 5.4. Two Example Contexts Kl and K2
1.2.4 FCA-MERGE Core Algorithm

The second step takes as input the two formal contexts Kl and K2 which were
generated in the last step, and returns a pruned concept lattice, which will be
used as input in the next step.
First the two formal contexts are merged into a new formal context K, from
which the pruned concept lattice will be derived. Before merging the two
formal contexts, the attribute sets ha~ to be disambiguated, since C1 and C2
may contain the same concepts: Let Mi := {(m, i) 1m E Md, for iE{l, 2}.
The indexation of concepts allows the possibility that the same concept exists
in both ontologies, but is treated differently. For instance, a CAMPGROUND
may be considered as an ACCOMMODATION in the first ontology, but not in
the second one. Then the merged formal context is obtained by K := (G, M, I)
with G := D, M := Ml U M 2 , and (g, (m, i)) E I :{::} (g, m) E Ii .
The whole concept lattice of K is not computed, as it would provide too
many specific concepts. The computation is restricted to those formal concepts
which are above at least one formal concept generated by an (ontology) concept
of the source ontologies. Remaining within the range of specificity of the
source ontologies is assured. More precisely, the pruned concept lattice is
given by ~p(K) := {(A, B)E~(K) I :3mEM: ({ m}', {m }") ~ (A, B)}. For
the example, the pruned concept lattice is shown in Figure 5.5. It consists of six
formal concepts. Two formal concepts of the total concept lattice are pruned
since they are too specific compared to the two source ontologies.
Hotel_l
Hotel_2
Accommodation_2 Event 1
concert_l
sical_2
o
Figure 5.5. The Pruned Concept Lattice
The computation of the pruned concept lattice is done with the algorithm
TITANIC 8 . However for the specific task described here, it is modified and
adopted to allow the pruning of a derived concept lattice. Compared to other
algorithms for computing concept lattices, TITANIC has - for the purpose -
the advantage that it computes the formal concepts via their so-called key sets
(or minimal generators). A key set is a minimal description of a formal concept.
We refer the reader to (Stumme et aI., 2000) where a detailed introduction of
the algorithm is given. In this application of the algorithm, key sets serve two
purposes. First, they indicate if the generated formal concept gives rise to a
new concept in the target ontology or not. A concept is new if and only if it has
no key sets of cardinality one. Second, the key sets of cardinality two or more
can be used as generic names for new concepts and they indicate the arity of
new relations.
1.2.5 Lattice Exploration

While the previous steps (instance extraction, context derivation, context
merging, and TITANIC) are fully automatic, the derivation of the merged ontol-
ogy from the concept lattice requires human interaction, since it heavily relies
on background knowledge of the domain expert.
The result from the last step is a pruned concept lattice. From it the target
ontology has to be derived. Each of the formal concepts of the pruned concept
lattice is a candidate for a concept, a relation, or a new subsumption in the target
ontology. There is a number of views which may be used to focus on the most
relevant parts of the pruned concept lattice. These views are discussed after the
description of the general strategy - which follows now. Of course, most of
the technical details are hidden from the user.
The documents are not needed for the generation of the target ontology.
Therefore, the attention is restricted to the intents ofthe formal concepts, which
are sets of (ontology) concepts of the source ontologies. For each formal concept
of the pruned concept lattice, the related key sets are analyzed. For each formal
concept, the following cases can be distinguished:
1 It has exactly one key set of cardinality 1.
2 It has two or more key sets of cardinality 1.
3 It has no key sets of cardinality 0 or 1.
4 It has the empty set as key set. 9
The generation of the target ontology starts with all concepts being in one of the
two first situations. The first case is the easiest: The formal concept is generated
by exactly one ontology concept from one of the source ontologies. It can be
included in the target ontology without interaction of the knowledge engineer.
In the example, these are the two formal concepts labeled by VACATION_I and
by EVENT_I.
In the second case, two or more concepts of the source ontologies generate
the same formal concept. This indicates that the concepts should be merged
into one concept in the target ontology. The user is asked which of the names
to retain. In the example, this is the case for two formal concepts: The key
sets {CONCERT _I} and {MUSICAL2} generate the same formal concept, and
are thus suggested to be merged. The key sets {HOTELI}, {HOTEL2}, and
{ACCOMMODATION_2} also generate the same formal concept.1° The latter
case is interesting, since it includes two concepts of the same ontology. This
means the set of documents does not provide enough details to separate these
two concepts. Either the knowledge engineer decides to merge the concepts (for
instance because he observes that the distinction is of no importance in the target
application), or he adds them as separate concepts to the target ontology. If there
are too many suggestions to merge concepts which should be distinguished, this
is an indication that the set of documents was not large enough. In such a case,
the user might want to re-Iaunch FCA - MERGE with a larger set of documents.
When all formal concepts in the first two cases are dealt with, then all concepts
from the source ontologies are included in the target ontology. Now, all relations
from the two source ontologies are copied into the target ontology. Possible
conflicts and duplicates have to be resolved by the ontology engineer.
Data Import & Processinl{ 93
In the next step, it is dealt with all formal concepts covered by the third case.
They are all generated by at least two concepts from the source ontologies, and
are candidates for new ontology concepts or relations in the target ontology.
The decision whether to add a concept or a relation to the target ontology (or
to discard the suggestion) is a modeling decision, and is left to the user. The
key sets provide suggestions either for the name of the new concept, or for the
concepts which should be linked with the new relation. Only those key sets
with minimal cardinality are considered, as they provide the shortest names for
new concepts and minimal arities for new relations, respectively.
Example. For instance, the formal concept in the middle of Figure 5.5 has
{HOTEL2, EVENT_l}, {HOTEL_I, EVENT_l}, and
{ACCOMMODATION_2, EVENT _I} as key sets. The user can now decide if she
wants to create a new concept with the default name HOTELEvENT (which
is unlikely in this situation), or to create a new relation with arity (HOTEL,
EVENT), e. g., the relation ORGANIZESEvENT.
There is exactly one formal concept in the fourth case (as the empty set is
always a key set). This formal concept gives rise to a new largest concept in
the target ontology, the ROOT concept. It is up to the knowledge engineer to
accept or to reject this concept. Many ontology tools require the existence of
such a largest concept. In the example, this is the formal concept labeled by
ROOT _1 and ROOT _2.
Finally, the taxonomic order on the concepts of the target ontology can be
derived automatically from the pruned concept lattice: If the concepts CI and
C2 are derived from the formal concepts (AI,B I ) and (A 2 ,B2 ), resp., then
He (CI , C2) if and only if B I :2 B2 (or if the user explicitly modeled it based on
a key set of cardinality 2).
1.2.6 Views on the Pruned Concept Lattice

In order to support the knowledge engineer in the different steps, there is a
number of views for focusing her attention to the significant parts of the pruned
concept lattice. Two views support the handling of the second case (in which dif-
ferent ontology concepts generate the same formal concept). The first is a list of
all pairs (ml, m2) E CI x C2 with {ml}' = {m2}'. It indicates which concepts
from the different source ontologies should be merged. In the small example,
this list contains for instance the pair (CONCERT _1, MUSICAL_2). In the larger
scenario, pairs like (Zoo_I, TIERPARK_2) and (Zoo_I, TIERGARTEN_2) are
listed. It was decided to merge ZOO [engl.: zoo] and TIERPARK [zoo], but not
ZOO and TIERGARTEN [zoological garden].
The second view returns, for ontology Vi with i E {I, 2}, the list of pairs
(mi,nd E Ci x Ci with {mi}' = {nd'. It helps checking which concepts
out of a single ontology might be subject to merge. The user might either
conclude that some of these concept pairs can be merged because their dif-
ferentiation is not necessary in the target application; or he might decide that
the set of documents must be extended because it does not differentiate the
concepts enough. In the small example, the list for 0 1 contains only the pair
(HOTEL_I, ACCOMMODATION_I). In the larger scenario that has been car-
ried out, additional interesting pairs like (RAUMLICHES, GEBIET) and (AuTo,
FORTBEWEGUNGSMITTEL) are have been introduced. For the target applica-
tion, RAUMLICHES [spatial thing] and GEBIET [region] have been merged, but
not AUTo [car] and FORTBEWEGUNGSMITTEL [means of travel].
The number of suggestions provided for the third situation can be quite high.
There are three views which present only the most significant formal concepts.
These views can also be combined:
• First, one can fix an upper bound for the cardinality of the key sets. The
lower the bound is, the fewer new concepts are presented. A typical value is
2, which allows the retention of all concepts from the two source ontologies
(as they are generated by key sets of cardinality 1), and to discover new
binary relations between concepts from the different source ontologies, but
no relations of higher arity. If one is interested in having exactly the old
concepts and relations in the target ontology, and no suggestions for new
concepts and relations, then the upper bound for the key set size is set to 1.
• Second, one can fix a minimum support. This prunes all formal concepts
where the cardinality of the extent is too low (compared to the overall number
of documents). The default is no pruning, i. e., with a minimum support of
o%. It is also possible to fix different minimum supports for different
cardinalities of the key sets. The typical case is to set the minimum support
to 0 % for key sets of cardinality 1, and to a higher percentage for key sets
of higher cardinality. This way all concepts are retained from the source
ontologies, and generate new concepts and relations only if they have a
certain (statistical) significance.
• Third, one can consider only those key sets of cardinality 2 in which the
two concepts come from one ontology each. This way, only those formal
concepts are presented which give rise to concepts or relations linking the
two source ontologies. This restriction is useful whenever the quality of
each source ontolology per se is known to be high, i. e., when there is no
need to extend each of the source ontologies alone.
In the small example, there are no key sets with cardinality 3 or higher.
The three key sets with cardinality 2 (as given above) all have a support of
~~ ~ 78.6 %. In the larger application, 2 has been fixed as upper bound
for the cardinality of the key sets. Key sets like (TELEFON_l [telephone],
OFFENTLICHE.-EINRICHTUNG_2 [public institution]) (support = 24.5 %),

(UNTERKUNFT _1 [accommodation], FORTBEWEGUNGSMITTEL_2 [means
of travel]) (1.7 %), (SCHLoss_l [castle], BAUWERK_2 [building]) (2.1 %),
and (ZIMMER_1 [room], BIBLIOTHEK_2 [library]) (2.1 %) have been obtained.
The first give rise to a new concept TELEFONZELLE [public phone], the second
to a new binary relation HATVERKEHRSANBINDUNG [hasPublicTransportCon-
nection], the third to a new subconcept-relation He (SCHLOSS, BAUWERK),
and the fourth was discarded as meaningless.
2. Collecting, Importing & Processing Documents

In the last section mechanisms for reusing, importing and merging different
sorts of ontologies in our ontology learning framework were introduced. This
section will focus on collecting, importing, processing and transforming web
documents. As pointed out in the beginning of this chapter (cf. Figure 5.1),
it will be further elaborated on the following four processing techniques for
documents, namely
1 Collecting relevant documents from the Web using an ontology-focused
crawler, a mechanism that supports the compilation of a representative
corpus D for ontology learning.
2 Shallow processing of documents using natural language processing tech-
niques, a comprehensive core NLP and information extraction system for
the German language.
3 Using a document wrapper for transforming semi-structured documents
(e.g., domain-specific dictionaries) into a standardized, relational represen-
tation. Semi-structured documents serve as a stable resources for ontology
learning.
4 Transforming the linguistically and partially semantically annotated docu-
ments into a relational representation of the ontology learning algorithms
presented in the next chapter.
The reader may note the application of the processing techniques 1. and 3.
is optional, e.g. the crawler module may only be usefully applied if already a
core ontology is available, the document wrapper is only useful if any kind of
semi-structured document is available. The processing methods 2. and 4. are
required for each ontology learning scenario in the framework and represent
core preprocessing techniques.
2.1 Ontology-focused Document Crawling

The task of extracting or maintaining domain-specific ontologies from the
web often starts with a given core ontology that is to be extended and adapted.
The experiences have shown that different selection strategies of specific learn-
ing corpora heavily influence the final, target ontology. All in all, the reader
may note that for ontology learning from web documents "intelligent support"
for the definition of a representative learning corpus D is required. Having this
target in mind, an ontology-focused document crawler has been developed
In general, a crawler is a program that retrieves Web pages, commonly used
by a search engine (Pinkerton, 1994) or a Web cache. Roughly, a crawler starts
off with the URL for an initial page Po. It retrieves Po, extracts any URLs in it,
and adds them to a queue of URLs to be scanned. Then the crawler gets URLs
from the queue (in some order), and repeats the process. Every page that is
scanned is given to a client that saves the pages, creates an index for the pages,
or summarizes or analyzes the content of the pages. With the rapid growth of
the world-wide web new challenges for general-purpose crawlers are given (cf.
recent work done by (Chakrabarti et al., 1999».
The crawler builds on the general crawling mechanism described above and
extends it by using ontological background knowledge to focus the search in the
web space. Therefore, it supports the configuration of a learning corpus V. It
takes as input a user-given set A of seed documents (in the form ofURLs), a core
ontology 0, a maximum depth level dmax to crawl and a minimal document
relevance value r min. The resulting output of the crawling process is a focused
learning corpus V.
The algorithm. The crawler downloads each document contained in the set
A of start documents. Each document is analyzed using the same extraction
mechanism as used in FCA-MERGE. Based on the results of the extraction
mechanisms for each document a relevancy measure r(d) is computed. In its
current implementation this relevancy measure is equal to the overall number
of concepts referenced in the document, defined as follows:
DEFINITION 5.2 (DOCUMENT RELEVANCE r(d)) LetL d := {L E .c IL E

d} and Cd := {C E C 13L E Ld : (L, C) E F}. The document relevance
value for a document d E D is given by
(5.2)
Ifthe relevancy r(d) exceeds the user defined threshold rmin, the specific
document is added to the learning corpus Vll. All hyperlinks starting from
a document d are recursively analyzed. If the crawling process for a given
d does not automatically stop, the crawling process is additionally restricted
with a maximum depth level dmax for a given start document d. A detailed
description of the focused crawling approach and its evaluation areprovided in
(Ehrig, 2001).
2.2 Shallow Text Processing using SMES

Ontology Learning focuses on the extraction of ontological structures o.
To extract regularities from natural language documents and dictionaries the
documents have to be transferred into a normalized representation schema to
which learning mechanisms may be applied. Hence, mechanisms for extracting
regularities are needed: Parsers establish relations between tokens of words
or concepts. As there are many possible ways in which the words could be
connected, the parser must have constraints under which it restricts the selection
of the relations.
In theory, it might be possible to use an exhaustive and deep general text
understanding system that tries to cope with the full complexity of language,
and that builds all conceptual structures required for a knowledge-intensive
system. However, even if there were the possibility to formalize and represent
the complete grammar of a natural language, the system would still need a very
high degree of robustness and efficiency in order to be able to process a large
number of real-world texts, such as web documents. Past experiences have
convinced most people that such a system is not feasible within the next few
years.
In order to fulfill the ever increasing demands for improved processing of
free texts, natural language researchers have turned from a theoretical view of
the overall problem, which aimed at a complete solution in the distant future
towards more practical approaches that are less comprehensive. This has led
to so-called "shallow" text processing approaches (cf. e.g (Piskorski and Neu-
mann, 2000», which provide the requested robustness and efficiency. These
approaches neglect certain generic language regularities which are known to
cause complexity problems (e.g., instead of computing all possible readings
only an underspecified structure is computed) or handle them very pragmati-
cally (e.g. by restricting the depth of recursion on the basis of a corpus analysis
or by making use of heuristic rules, like "longest matching substrings"). This
engineering view on language has led to a renaissance and improvement of
well-known, efficient techniques, most notably finite state technology for pars-
ing (Mori, 1997).
We rely our ontology learning framework on a general architecture for shal-
low text processing for German texts, namely the system SMES, the Saar-
bruecken Message Extraction System (Neumann et aI., 1997; Piskorski and
Neumann, 2000) developed at German Research Center for Artificial Intelli-
gence (DFKI)12. The structure and functionality is drawn from common prop-
erties found in almost all recent approaches that deal with real-world text (see
also (Hobbs, 1993; Chinch or et aI., 1993; Appelt et aI., 1993; Grishman and
Sundheim, 1996; MUC7, 1998». The basic design criteria of such a general
system is to provide a set of basic, powerful, robust, and efficient natural lan-
guage components and generic linguistic knowledge sources which can easily
be customized to process different domain-specific tasks in a flexible manner.

The major tasks of these core shallow text processing tools are to extract as much
linguistic structure from the text as possible and represent all extracted infor-
mation in one data structure (called text chart) as compactly as possible. The
task of free text processing is considered a preprocessing stage for extracting as
much as linguistic structure as possible. The natural language analysis is con-
sidered a step-wise process of normalization from more general coarse-grained
to more fine-grained information dependending on the degree of structure, and
the naming of structured elements (Piskorski and Neumann, 2000).
One important point that is missing so far is a tight connection of the nat-
ural language processing system to the ontology. Therefore, the ParseTalk
approach (Neuhaus and Hahn, 1996) that has been developed at the University
of Freiburg (Computational Linguistics Research Group) has been adopted. In
the ParseTalk system the processing task is performed by dispatching process
subtasks to actors that communicate with each other by exchanging messages
that is done in object-oriented programming. ParseTalk uses a dependency-
grammar based approach for text parsing. In their approach, the conceptual
system is based on description logics: the LOOM language and reasoning en-
gine is applied as a specific, well known description logics. An up-to-date
description of the overall framework called Syndicate is given in (Hahn and
Romacker, 2000).
Com~&ptual System Linguistic Text Chart

Knowledge Pool
Ontology: Lexica! database:

Domain-specific »700.000 word forms
semantic koO'wledge Named entity lexica,
compound & tagging
ruies
Finite State Grammers
Domain Lexicon: Word Levitl Sentence Level

Domain-specific mapping • Tokerli.:wr • Named Eomy Finder
of words to the • Lexical Processor • Phrase Recognizer
Concaptval systam • Clavse Recognizer
• POS·lagger
Figure 5.6. Natural Language Processing System Architecture

Figure 5.6 depicts the overall architecture of the natural language processing
component. As seen above in Figure 5.6 the architecture of the NLP framework
may be decomposed into four main components: (1) a linguistic knowledge pool
consisting of a large lexical database and finite state grammers, (2) a conceptual
knowledge module with access to the ontology and the associated domain-
specific lexicon according to Definition 2.1, (3) a shallow text processing engine
comprising different models for parsing at the lexical and clause level and (4)
a so-called text chart, a common data structure for navigation and storage of
results.
The following subsections provide an overview of the core parsing technol-
ogy, the linguistic knowledge pool, shallow text processing strategies at the
lexical and the clause level and heuristic processing strategies. A more detailed
description of aspects 0), (3) and (4) is given in (Neumann et aI., 1997; Pisko-
rski and Neumann, 2000).
2.2.1 Core Technology

SMES uses finite-state devices that are time and space efficient. Finite
state devices have recently been extensively used in many areas of language
technology, especially in shallow parsing. The core finite-state software that
comes with SMES is the DFKI PSM toolkit (Piskorski and Neumann, 2000).
This toolkit consists of a library of tools for building, combining and opti-
mizing finite-state machines, which are generalizations of weighted finite-state
automata and transducers. Finite state transducers are finite automata where
each transition has an output label in addition to the more familiar input label.
The second kind of crucial mechanisms for efficient language processing
are parametrized tree-based data structures for efficiently storing sequences of
elements of any type. Unlike classical tries for storing sequences of letters,
generic dynamic tries are capable of storing more complex structures.
It is important the system can store each partial result of the processing level
in order to maximize the contextual information available. The knowledge
pool that maintains all partial results computed by the shallow text processor
is called a text chart. Each component returns its output as feature value
structures, together with their types (e.g., date token, noun (N), adjective (Adj),
proper name (PN), nominal phrase (NP) or verb group (VG) etc.) and the
corresponding start and end positions of the spanned input expressions.
2.2.2 Linguistic Knowledge Pool

The linguistic knowledge pool contains more than 700,000 full form words
(created from 120,000 stem entries), named entity lexica and compound & tag-
ging rules. Additionally, more than 12,000 subcategorization frames describing
information used for lexical analysis and chunk parsing, and specific finite state
grammers are available for shallow text processing at the lexical and sentence
level.
2.2.3 STP on the Lexical Level

Shallow text processing on the lexical level may be separated into the fol-
lowing modules: (i) text tokenizer, (ii) morphological analysis, (iii) compound
analysis and (iv) a part-of-speech filter. A short description is provided for each
of these core modules:
• Tokenizer: Its main task is to scan the text in order to identify bound-
aries of words and complex expressions like "$20.00" or "Mecklenburg-
Vorpommern,,13, and to expand abbreviations.
• Morphological Analysis: Each token identified as a potential word form is

submitted to morphological analysis including on-line recognition of com-
pounds and hyphen coordination. Each token recognized as a valid word
form is associated with the list of its possible readings, characterized by
stem, inflection information and the part-of-speech category. The complete
output of the morphology returns a list of so called lexical items represented
in the form of tuples.
• Compound Analysis: Each token not recognized as a valid word form

is a potential compound candidate. In German compounds are extremely
frequent and, hence, their analysis in parts of, e.g. "database" becoming
"data" and "base", is crucial and may yield interesting relationships between
concepts. Furthermore, morphological analysis returns possible readings for
the words concerned.
• POS Filter: The output returned from the morphological analysis com-
prises the word form together with all its readings. Considering words in
isolation - as is usually done in lexical analysis - , each word is poten-
tially ambiguous. In order to decrease the set of possible candidates for the
following components, local and very efficient disambiguation strategies
are applied such that implausible readings are filtered out. This is usually
performed through part-of-speech taggers as well as through the applica-
tion of case-sensitive rules l4 . The final task of a part-of-speech tagger is to
determine the unique part-of-speech of a current word in its current context
using local rules (see (Brill, 1993)).
Example. A short example is provided of a result from shallow processing

at the lexical level using the morphological component. Consider the follow-
ing example sentence: "Wir bieten die Moglichkeiten von Kutschfahrten in
Wittenbeck." 15
Figure 5.7 depicts the result of processing the sentence morphologically

(abbreviated where convenient).
«"Wir"
(llwir ll
«(:TENSE . :NO) (:FORM . :NO) (:PERSON . 1) (:GENDER . :NO)
(:NUMBER . :P) (:CASE . :NOM)))
. : PERSPRON))
(llbieten"
(libiet"
«(:TENSE . :PRES) (:FORM . :FIN) (:PERSON . :ANREDE)
(:GENDER . :NO) (:NUMBER . oS) (:CASE . :NO))
. :V)))
{"die"
("d-detll
«(:TENSE . :NO) (:FORM . :NO) (:PERSON . 3) (:GENDER . :M)
(:NUMBER. :P) (:CASE . :NOM))
:DEF))
)))) )
Figure 5.7. Example SMES Output - Morphological Component
2.2.4 STP on the Clause Level

SMES uses weighted finite state transducers (Neumann et aI., 1997) to ex-
press phrasal and sentential patterns. The parser works on the phrasal level,
before it analyzes the overall sentence. Clause level processing is subdivided
into three components. In the first step, named entities and phrasal fragments
are recognized, e.g. general nominal expressions and verb groups or specialized
expressions for time, date and named entity.
• Named Entity Finder: Processing of named entities includes the recogni-
tion of proper and company names like "Hotel Schwarzer Adler" as single,
complex entities, as well as the recognition and transformation of complex
time and date expressions into a canonical format, e.g. "January 1, 2000"
becomes "1/1/2000". An example for the named entity recognizer is given
in the following, based on an excerpt of the analysis results from the given
sentence: "Die Daimler Benz AG hat groBe Verluste gemacht.,,16.
«:TYPE • :NAME-NP)
(:SEM (:NAME • "Daimler Benz ") (:COMP-FORM • "AG"»)
• Clause Level Processing: The structure of potential phrasal fragments is

defined using weighted finite state transducers. In the second step, the
dependency-based structure (cf. (Hudson, 1990)) ofthe sentence fragments
are analyzed using a set of specific sentence patterns. Dependency for-
malisms use binary relations between words only (in contrast to the more
common grammers like Context Free Grammer, Tree Adjoining Grammer,

etc. describe the syntax of a sentence with the help of categories). An ex-
ample for a dependency grammer is given in Figure 5.8, where the sentence
consists of a noun phrase (the subject "wir) and a verb phrase ("bieten").
The latter is again split into a noun phrase (the direct object) and a prepo-
sitional phrase. The noun phrase "Kutschfahrten" is again split into a noun
phrase and a prepositional phrase.
:7 bieten
~bj
M6glichkeiten
Wir adV ~-attr
von
die "-!:obj
Kutschfahrten
~-.attr
v
In
obj
Wittenbeck
Figure 5.B. Dependency Grammer Description
These patterns are also expressed by means of finite state transducers, so

that each step is uniformly realized by the same underlying mechanism.
In the implementation of SMES that has been used in this work, mechanisms
for the recognition of grammatical functions (subject, object) such as depicted
in Figure 5.8 based on the dependency structures from previous steps were not
available. Recent developments for the recognition of grammatical functions
are described in (Piskorski and Neumann, 2000).
Example. Let us consider a short example of a result from shallow processing

at the sentence level. Consider the following example sentence:
"Der Mann sieht die Frau mit dem Fernrohr.,,17. The underspecified dependency
structure is given in abbreviated form in the following figure.
In this structure, the feature : VERB collects all the information of the complex
verb group which is the head of the sentence. :PPS collects all prepositional
phrases and : NPS is a list of all dependent nominal phrases.
2.2.5 Heuristic Processing

Chunk parsing as performed by SMES returns many phrasal entities (re-
ferring to concepts) that are not related within or across sentence boundaries.
This means that the approach described above would miss many conceptual
relations that often occur in the corpus, but that may not be detected due to
«(:PPS
«: SEM (:HEAD "von")
(:COMP (:HEAD "kutsch-fahrt"))))
... )
( :NPS
(:SEM (:HEAD "vir"))
... )
(: SEM (: HEAD "moeglichkeit") (: QUANTIFIER "d-det")
... ))
( : VERB
(:ART . : FIN)
(:STEM . "biet")
(:FORM . "bieten")
(:TYPE . : VERB)
(:TYPE . : VERB-NODE)
)))
Figure 5.9. Example SMES Output - Underspecificed Dependency Structure (abbreviated)
the restricted processing capabilities of SMES with respect to the complexity

of natural language. For instance, SMES does not attach prepositional phrases
in any way and it does not handle anaphora. In human understanding of free
text, syntax, semantics, context, and/or knowledge may trigger the search for
conceptual bridges between syntactic objects (cf. (Romacker et aI., 1999». For
instance,
• Syntax: the syntactic dependencies in the phrase "the house of Usher" signal
a conceptual dependency between the conceptual denotations corresponding
to "house" and "Usher".
• Semantics: In the phrase "The baby have cried." the semantic restrictions
allow to infer the conceptual relationship between the denotates of "baby"
and the "cry"ing - even though the sentence is syntactically illegitimate.
• Context: In "They are geniuses. Michael, Dieter, Markus." the resolution
of "Michael being a genius, etc." may occur because of contextual cues
(and ellipsis resolution) (cf. e.g., (Markert and Hahn, 1997».
• Knowledge: In "CPU A is faster than B.", knowledge is responsible to
associate the denotate of cpua A and B with a comparison of their frequency
figures rather than their physical speed (because they could be traveling in
a vehicle).
SMES constitutes the natural language processing component for signaling
syntactic cues. There exists a wide range of possibilities according to which
a bridge may be built. The principal variance comes from effects such as
granularity, metonomy, or figurative language. For instance, one may model in
the ontology that a country contains states and states contain counties. Because
of the transitivity of the contains relationship, the ontology also allows the direct
connection of country with county.
In the approach described in the following it is focued on syntactically mo-

tivated briges. Metonymic and figurative language is ignored, because they
currently constitute research topics of their very own (see (Romacker et aI.,
1999) for a complete survey of mediated conceptual relationships). The empir-
ical evaluation (further described in chapter 8) has shown that a high recall of
the linguistic dependency relations extracted is needed, even if it means a loss of
linguistic precision. The motivation is that with a low recall of dependency re-
lations the subsequent algorithms introduced in the next chapter will have only
a small amount of data from that conceptual relations may be learned, while
with less precision the learning algorithm may still sort out part of the noise.
Therefore, the SMES output has been extended to include heuristic correlations
as am extension of linguistics-based dependency relations.
SMES offers a number of heuristics for building pairs of related concepts.
Several heuristics determine which concepts are paired on the basis of text and
document structures. The employed keys for pairing two lexical entries may be
either linguistic or extra-linguistic. The following are the heuristics that have
been used within this work and the concrete system 18 :
• The title heuristic combines the lexical entries between the starting and
ending HTML-title tags with those from the rest of the document.
• The table heuristic combines lexical entries found in HTML tables; here
the identification of table cells used in the same manner as the identification
of sentence boundaries.
• The NP-PP-heuristic couples all directly consecutive sequences of nominal

and prepositional phrases. Thus, it models minimal PP-attachment.
• If no linguistic dependency in a sentence is recognized, the sentence heuris-

tic conjoins all lexical entries of a sentence with each other.
A merger allows every suggested pairing to appear only once in the set of
tuples, however, it tracks every heuristic that suggests a tuple. Therefore, if
a tuple is suggested by more than one heuristic, it can be regarded as more
relevant.
Example. The following example illustrates the heuristic processing tech-

niques. Consider the already introduced example sentence: "Wir bieten die
Moglichkeiten von Kutschfahrten in Wittenbeck". An interesting syntactic
relation is the relation between "Kutschfahrten" and "Wittenbeck" with their asso-
ciated concepts FREIZEITEINRICHTUNG and STADT.
Thereby, the grammatical dependency relation does not not even hold di-
rectly between two conceptually meaningful entities. For instance, in the ex-
ample above "Kutschfahrt" and "Wittenbeck", the concepts of which appear in the
<tuple name='getess-output')
<fragment>
<concept>Freizeiteinrichtung</concept>
<inst>Freizeiteinrichtung_2</inst>
<lex>kutsch-fahrt</lex>
</fragment>
<fragment>
<concept>Stadt</concept>
<inst>Stadt_5</inst>
<lex>vittenbeck</lex>
</fragment>
<constraint type='PREP')in</constraint>
</tuple>
Figure 5.10. Example for a Heuristic Concept Relation
ontology as FREIZEITEINRICHTUNG and STADT, respectively, are not directly

connected by a dependency relation. However, the preposition "in" acts as a
mediator that incurs the conceptual pairing of FREIZEITEINRICHTUNG with
STADT.
2.3 Semi-Structured Document Wrapper

As already mentioned above, dictionaries 19 are considered as an important
source for ontology learning. In (Litkowski, 1978) it is argued that the defi-
nitions available in dictionaries contain a great deal of information about the
semantic charateristics which should be attached to a lexeme. Dictionaries
serve as stable and carefully handcrafted resources of domain knowledge that
gives a good starting point for a core ontology supporting subsequent ontology
learning from pure text.
In real-world scenarios (e.g., in our case studies carried out in the insurance
and telecommunications domain) dictionaries appear in very different formats,
from free text (separated by specific limiters) over HTML documents to XML
documents. Thus, a mechanisms for importing these heterogeneous dictionary
representions into the ontology learning framework is required.
So-called wrappers 20 are used within this framework. They can be seen as
programs that are designed for extracting content of a particular information
source and delivering the content in a self describing representation. Thus, the
task of a wrapper in the Web context is to convert information implicitly stored,
e.g., as an HTML document into information explicitly stored as a formal data
structure for further processing. In the approach described here the dictionary
wrapper uses a predefined dictionary schema formalized in a specific RDF-
Schema available at a specific namespace 21 . The schema contains a set of
classes (e.g. termEntry) and properties (e.g. hasDefinitionText) describing
a generic dictionary representation. Instances in the form of RDF that are
defined according to this schema serve as input for further processing within
the framework, e.g. for applying a specific preprocessing method of dictionary
definitions or for executing an ontology learning algorithm. The advantage of

this approach is, that we have one clearly defined input structure on which many
different algorithms operate. Thus, if a dictionary serves as input for ontology
learning, only a wrapper for translating the given dictionary (e.g. in a specific
HTML format) to an instantiation of the pre-defined schema has to be written.
The construction of a wrapper can be done manually, or by using a semi-
automatic (Sahuguet and Azavant, 1999) or automatic approach (Kushmerick
et aI., 1997; Ashish and Knoblock, 1997). For the simple task of transforming
a given more or less well-structured dictionary into the internal representation
the required mapping and extraction rules are defined manually. However,
the reader may note that recently a number of tools for helping the manual or
semi-automatic construction of wrappers have been developed.
Example. This examples refers to a case study that has been carried out in
the finance domain within the GETESS project22 . The overall task has been
the generation of a finance ontology supported by means of ontology learning
using different kinds of given input data. One important kind of input data
was a dictionary describing finance terms by natural language definitions. The
dictionary is freely available -on the web in HTML format 23 . Thus, for each
dictionary entry one HTML document served as input for generating our nor-
malized, RDF-based representation using the document wrapper. An example
of this representation is given in Figure 2.3.
<rdf:RDF
xmlns: diet = ''http://ontoserver.aifb.uni-karlsruhe . de/schema/diet. rdfs#">
<diet: termEntry rdf: IO=IIT1 11 >

<dict:hasType rdf:resource="dict#definition tl />
<dict:hasLanguage resource=
''http://ontoserver.aifb.uni-karlsruhe.de/schema/oevoc.rdfs#EN"/>
<dict:hasDefinitionTerm>CD-ROM</dict:hasDefinitionTerm>
<dict:hasDefinitionText>
a compact disk that is used with a computer (rather than with an
audio system); a large amount of digital information can be
stored and accessed but it cannot be altered by the
user.
</dict:hasDefinitionText>
</dict:termEntry>
</rdf:RDF>
Figure 5.11. Example Normalized Dictionary Entry in RDF
This normalized representation serves as input to the natural language pro-

cessing module as described in the previous section, that analyzes and linguisti-
cally processes the term definitions. Subsequently, the preprocessed definition
is directly given to the algorithm library. The mechanisms for extracting ontol-
ogy elements such as CC, C, H C from dictionaries are described in chapter 6,

subsection 1.1.2 and subsection 1.2.2.
2.4 Transforming Data into Relational Structures

The extraction and maintenance algorithms proposed and used in this book
typically work on relational or propositional data. Each of the ontology learning
algorithms that are proposed in the next chapter requires its own, specific input
format. Hence, in this subsection the transformation of linguistically annotated
data to a format that can be processed by the algorithms is formally defined.
The classical relational database model described in (Abiteboul et aI., 1994) is
used as a basis:
DEFINITION 5.3 (RELATION 24 ) LetX beafinite, non-empty set ofattributes.
• To each A E X a non-empty set dom(A) is assigned, called the domain of
A. Let dom(X) = UAEXdom(A).
• A tuple over X is afunction p, : X -+ dom(X), where p,(A) E dom(A),

forall A E X. Tup(X) is the set of all tuples over X.
• A relation over X is afinite set r ~ Tup(X).
Based on this definition transformations from linguistically preprocessed

documents to specific relations that are relevant in the context of this work
serving as input for ontology learning algorithms are defined. Let V be the
set of linguistically annotated documents d E V. As mentioned earlier in this
chapter documents may be annotated at different processing levels available in
our NLP component (e.g. tokenization level, morphological level, POS level,
heuristic pairs level).
In general one may apply ontology learning techniques on the lexical level, if
no ontological background knowledge is available, or at the conceptual level, if
some kind of ontological background knowledge is available. In the following
a number of relevant input relations on both levels is provided.
2.4.1 Relations on the Lexical Level

Considering the lexical level is important if no pre-knowledge about the
structure of the target ontology is available. Three relations are presented that
are available within the ontology learning framework, the lexical entry-lexical
entry relation, the document-lexical entry relation and the lexical entry trans-
action relation.
Lexical entry-lexical entry relation. This first relation that is presented here
is a relation restricted to the lexical level without having available a set of
concepts C and a corresponding mapping F from lexical entries to concepts.

The lexical entry-lexical entry relation Til describes the relationship between a
given set of lexical entries to each other. It is defined as follows:
DEFINITION 5.4 (LEXICAL ENTRy-LEXICAL ENTRY RELATION Til) A

lexical entry-lexical entry relation is a function fll : .cC x .cC t--+ Rci.
The general idea behind this representation is that based on a preprocessed

document set V a relationship between the occurring lexical entries is described.
The relationship between two concrete lexical entries L 1 ,L 2 is expressed by a
value v := ftl(L 1 , L 2 ); VERt. The resulting values of applying the function
fll may be based on different computation strategies. The easiest case is just
using a co-occurrence model: If two lexical entries co-occur in a linguistic or
heuristic pattern (e.g. as presented in the last subsection 2.2), one increases
their occurrence frequency. More complex models such as given in (Manning
and Schuetze, 1999), e.g. based on mutual information of two lexical entries,
may also be applied. The similarity measures that are used within our approach
are described in the next chapter.
According to (Abiteboul et aI., 1994) one may define X := {id} U c , .c
.c .c
dom(id) := c , dom(L) := R, for all L E c . Hence, in the database, f
.c
is represented as the relation Ttl := {ILL t--+ dom(X) IL E C , ILL( id) =
L,ILL'(L') = f(L,L') for L' E c . .c
Example. An example for a lexical entry-lexical entry relation Ttl may be
given in the following: Let.c' = {"accomodation", "event", "hotel"}. Using fll
based on a given similarity measure (e.g., cosine, kullback leibler divergence,
etc.) may result in the database as follows: X = {ID,"accomodation","event",
"hotel"}, dom(ID) = {"accomodation", "event", "hotel"}, dom("accomodation")
= dom("event") = dom("hotel") = R.
ID "accomodation" "hotel" "event"

"accomodation" 1 0.8 0.3
"hotel" 0.8 1 0.4
"event" 0.3 0.4 1
Table 5.2. Example Lexical Entry-Lexical Entry Relation
Table 5.2 depicts an example lexical entry-lexical entry relation. The reader
may note that for example the lexical entries "accomodation" and "hotel" show a
similar behaviour with respect to the lexical entries "accomodation", "hotel", and
"event".
Another relation on the lexical level is the document-lexical entry relation

T dl. This relation has been motivated by the idea of having a "document space"
according to the vector-space model researched by information retrieval com-
munity (Salton, 1988; Sparck-Jones and Willett, 1997):
DEFINITION 5.5 (DOCUMENT-LEXICAL ENTRY RELATION Tdl) A
document-lexical entry relation is afunction fdl : 'D x £c f---t No.
The underlying idea ofthe document-lexical entry relation Tdl is that the set
of documents 'D is put into the context of the lexical entries occurring in these
documents. The function f dl counts the number of times a lexical entry occurs
in a document d E 'D. In our framework the document-lexical entry relation
serves as input for our mechanisms presented in next chapter for extracting
lexical entries that relevant indicators for concepts.
Another possibility for representing data is a "transaction view". One may

represent lexical entries that occur in a linguistic or heuristic relation such as
described earlier as follows:
DEFINITION 5.6 (LEXICAL ENTRY TRANSACTION RELATION Tit) A
lexical entry transaction relation is given as Tit ~ £C x £C.
We mainly use the transaction view for deriving candidate conceptual re-
lations between lexical entries using the algorithm that will be introduced in
chapter 6, subsection 1.3.
2.4.2 Relations on the Concept Level

As mentioned earlier if some kind of background know ledge is available, e.g.,
lexical entries referring to concepts or relations, a set of concepts C or relations
R and corresponding mapping functions F, 9 between them, one may derive
data representations at the conceptual level as input for the ontology learning
algorithms.
The first relation presented at the conceptual level has already been intro-
duced at the lexical level. The concept-concept relation represents the concept
space, viz. the relationship between concepts and is defined as follows:
DEFINITION 5.7 (CONCEPT-CONCEPT RELATION Tee) A concept-
concept relation is afunction fcc: C xC f---t Rt.
Again the co-occurrence may be computed using different models, e.g. with
respect to documents, paragraphs or other units. In general it is assumed that
concepts are similar to the extent they co-occur with the same concepts.
Often one has to combine the lexical and the concept level, e.g. for the
recognition of meaning of unknown lexical entries as done by our refinement
algorithm that will be introduced in chapter 6, subsection 2.2. We define the

relation r cle as a simple extension of the relation r ee:
DEFINITION 5.8 (CONCEPT/LEX. ENTRy-CONCEPT RELATION rclJ A

conceptllexical entry - concept relation is a function f cle : (c u £') x C t--+ Rt.
The underlying idea of the the concept/lexical entry-concept relation r cle is
that unknown lexical entries £' are put into context to the existing concepts
C. Thus, unknown lexical entries may be compared with concepts (= known
lexical entries).
Example. An example for a concept/lexical entry-concept relation r cle is

given in Table 5.3. In this small example the unknown lexical entry "week-
end excursion" is put into context to the concepts ACCOMODATION, HOTEL,
and EVENT. If one compares the context of "weekend excursion" with the con-
texts of the known concepts EXCURSION and CAR, one easily recognizes the
similarity between "weekend excursion" and EXCURSION.
JD ACCOMODATION HOTEL EVENT

EXCURSION 5 4 2
"weekend excursion" 4 4
CAR
Table 5.3. Example Concept/Lexical Entry-Concept Relation
The document space based on lexical entries has been introduced in Defini-
tion 5.5. The document space may be analogously defined based on a given set
of concepts C as follows:
DEFINITION 5.9 (DOCUMENT-CONCEPT RELATION rdJ A document-

concept relation is a function f dl : D x C t--+ No.
Example. An example of a document-concept relation r de is given by Table

5.4. In this small example the number of concepts occurring in documents
d 1 := doc1.html, d2 := doc2.html, and d3 := doc3.html is represented.
The reader may note the similarity between the document-concept relation
and the formal context that built the basis for FCA-MERGE as introduced
in subsection 1.2. The formal context is derived from the document-concept
relation by easily substituting the frequencies by the binary decision of occurs
(1) or not occurs (0).
An implemented user interface within the TEXT-TO-ONTO ontology learn-
ing environment (see chapter 7) for generating a focused document-concept
ID ACCOMODATION HOTEL EVENT

docl.html 5 4 2
doc2.html o o 4
doc3.html
Table 5.4. Example Document-Concept Relation
:l;-~''l;:,:i:I~<:<-S<~
? :·-;-'::,:i:~*,m~*,,).
'* ~:-'l::;:;~~m~m
*':-:;C;'X~~~;
~:;~'~;:;~~'}
\;~:i::'~~ffi
~<: "_:i::,*';:;-'%i,~~
?~$_'l::::;:'~<1':1
~~$'l::':::'~'::Ji<
~~'~::>:-$'~:)«
Figure 5.12. Concept Matrix Generation View
relation and the result is depicted in Figure 5.12. On the left side of Figure 5.12
the user may select concepts along the documents should be described. On the
right side the resulting relation for a concrete set of documents is depicted.
The last relation that is defined here is the concept transaction relation ret.
The relation has been already introduced on the lexical level. It represents
binary pairs of concepts and is defined as follows:
DEFINITION 5.10 (CONCEPT TRANSACTION RELATION ret) A concept

transaction relation is given as a binary relation rlt ~ C x C.
The underlying idea of this relation is that dependencies between concepts

that occur together are represented. The concept transaction relation is derived
by the heuristic pairing strategies introduced in subsection 2.2. The concept
transaction relation will serve as input for our mechanisms that supports the
extraction of non-taxonomic relations between concepts that will be presented
in chapter 6, subsection 1.3.
Example. An example for a concept -transactionrelation r may be given on the

database as follows: C = {PERSON, HOTEL, AREA, EVENT, CITY}, X =
{ID,ITEMA,ITEMB}, dom(ID) = N, dom(ITEMA) = dom(ITEMB) = C.
ID ITEMA ITEMB
PERSON HOTEL
2 AREA EVENT
3 CITY EVENT
Table 5.5. Example Concept-Transaction Relation
3. Conclusion
This chapter introduced the mechanisms for data import and processing de-
veloped in this book. The range of considered data was restricted to free natu-
rallanguage and semi-structured documents (e.g. in the form of dictionaries)
and ontologies. For importing and processing existing ontologies an ontology
wrapper approach that supports importing single ontologies into the system has
been introduced. If two or more ontologies are available, the data-oriented,
bottom-up ontology merging method FCA-MERGE may be used to derive a
common ontology. Natural language documents are linguistically annotated
using shallow linguistic processing with a mapping to the ontology if avail-
able. Semi-structured documents in the form of dictionaries are wrapped into
a specific format that may be further processed.
It has been seen that the task of preprocessing data requires complex and
difficult mechanisms for transforming data into different forms with varying
complexities. The reader may note that the quality of the preprocessing step
directly influences the quality of the results that may be generated by the algo-
rithms that are provided in the subsequent chapter.
Concluding this chapter three main issues are defined that have to be ap-
proached in the future for further improving data processing and importing:
natural language processing techniques, structure-aware document processing,
and the usage of multi-relational data for ontology learning.
3.1 Language Processing for Ontology Learning

As mentioned above, in the implementation of SMES that has been used
within this book mechanisms for the recognition of grammatical functions
(subject, object) were not available. A "wishlist" is given as to what kind
of information a natural language processing module should deliver.
Semantic Disambiguation. As mentioned earlier SMES contains a technique

for disambiguation within part-of-speech tagging. The mapping from words
to concepts also requires a disambinguation strategy, e.g. for the "Jaguar"

example. The current status of the system neglects semantic disambiguation.
Semantic disambiguation should be added to the system, e.g. according to the
approach for statistically based semantic disambiguation described in (Manning
and Schuetze, 1999).
Sentence Parsing. The mechanisms for determining grammatical functions

(such as subject, direct object) for each dependency-based sentential structure
(e.g. on the basis of subcategorizations frames in the lexicon) have to be further
improved.
Discourse Parser. In the future, a discourse parser must be implemented,

along the lines of (Baldwin et aI., 1998). The discourse parser must detect
the textual cohesion between sentences and, in particular, it must establish
coreference between phrases. For instance, adjacent phrases such as
1 "The hotel offers 45 lUxury rooms." and
2 "It also has a splendid bar overlooking the ocean."
These phrases should be analyzed as the "hotel" is linked to the "luxury rooms"
as well as to the "splendid bar".
3.2 Ontology Learning from Web Documents

The mixed, heterogeneous structure of Web pages has to be reflected in the
data import and processing task. As part of this work initial experiments with
structure analysis of the documents to derive strategies for processing different
document types and structures have been carried out.
HTML is considered as the language of the current Web. In its current
version 25 it contains a large number of different tags that mix structure, content
and layout information. Analyzing and counting the usage of these tags results
in a structural document profile. Each document can be described along with
several features, as shown in Table 5.6. Several experiments relying on the
assumption that similar structured documents require similar processing strate-
gies have been carried out in this work. The unsupervised k-means clustering
algorithm (see (Kaufman and Rousseeuw, 1990» has been applied to determine
groups of similar structured documents. Similarity is computed by comparing
the documents feature as shown in Table 5.6. Each document structure cluster
may be described by a prototype structure, for specific preprocessing strategies.
If documents are contained in a completely identical cluster, one can also write
a document wrapper such as described earlier. In the case the documents do
not follow some kind of structural similarity, it may be useful to suggest the
deletion of the HTML elements and ignore the structure element.
DocID URL HR TD A DocSize

1 http://www.allinall.deIk123.html 23 23 23 23 23
Table 5.6. Document Structure Profile
In the future approaches that combine structure analysis with natural lan-
guage processing have to be researched. The importance of including struc-
tural features into the text extraction process has been recently recognized (e.g.,
relevant work in the same direction has been done by (Wang and Liu, 1998)).
3.3 (Multi-)Relational Data

Statistical and classical propositional machine learning approaches work well
with the single relation representation introduced above. However, represent-
ing the available data in one single relation has severe limitations, whereas the
flexibility and expressivity of logical representations make them highly suitable
for natural language analysis. Logical approaches have been successfully em-
ployed in text categorization and information extraction (see (Freitag, 1998)).
Different representations have been proposed in the literature, e.g. in (Espos-
ito et aI., 2000) one proposes a formal representation of texts using descriptors
such as subj (el ,e2) or obj eel ,e2). Using this kind of formal representation they
intend to exploit a logic representation for exploiting the grammatical structure
of texts. Their aim is investigating the feasability of learning semantic defi-
nitions for some kind of sentences. Along the same lines (Cumby and Roth,
2000) show that relational representations facilitate learning. In their work
they develop an expressive relational representation language that allows the
use of propositional algorithms when learning relational definitions. In general,
the interaction between logical representations and ontology learning has to be
further researched.
Notes
I This task is known in machine learning under the name "preprocessing".

2 cf. http://www.ontoknowledge.orgloil
3 The work described in this subsection has been published in (Stumme and
Maedche, 2001b; Stumme and Maedche, 200Ia).
4 A detailed introduction into the applied techniques is given in subsection 2.2.
5 i. e., for each set of formal concepts, there is always a greatest common
subconcept and a least common superconcept.
6 This vector-space model like representation is introduced in more detail in
subsection 2.4
7 The reader may note that reflexivity of He is not included in the semantics
defined in subsection 3.9. However, for the purpose of generating a formal
context reflexivity is required. Therefore, the semantics of He is modified
in this part of the work.
8 A detailed introduction in the algorithm is given in (Stumme et aI., 2000).
9 This implies (by the definition of key sets) the formal concept does not have
another key set.
lO{RooT_l} and {ROOT_2} are not key sets, as each of them has a subset
(namely the empty set) generating the same formal concept.
II The reader may note this strategy for measuring relevancy may be further
refined, e.g. with normalized counts or with the inclusion of ontological
background knowledge, e.g. contained in the taxonomy He.
12http://www.dfki.de/nlt
13 Mecklenburg-Vorpommem is a region in the north east of Germany.
14Generally, only nouns (and proper names) are written in standard German
with a capitalized initial letter (e.g., "der Wagen" the car vs. "wir wagen"
we venture). Since typing errors are relatively rare in press releases (or
similar documents) the application of case-sensitive rules are a reliable and
straightforward tagging means for the German language.
15We offer the possibility of horse riding in Wittenbeck [eng].
16Daimler Benz AG has made heavy losses [eng].
17The man sees the woman with the glasses losses [eng].
18The reader may note these heuristics are extremely domain-specific. Their
application has to be carefully selected.
19 A dictionary is a reference "book" in which words are listed alphabetically
and their meanings are given in the form of a natural language explanation;
either in the same language or in another language, and other information
about them are given.
20In its most generic sense a wrapper is a piece of software that accompa-
nies resources or other software for the purpose of improving convenience,
compatibility, or security.
21 The namespace is online available at
http://ontoserver.aifb.uni-karlsruhe.de/schemaldict.rdfs.
22 http://www.getess.de
23 http://www.boersenlexikon.de/
24according to (Abiteboul et aI., 1994)
25 http://www.w3c.orglMarkUp/
III
IMPLEMENTATION & EVALUATION

Chapter 6
ONTOLOGY LEARNING ALGORITHMS
Learning: The act, process, or experience of gaining knowledge or skill.

-(American Heritage Dictionary)
This chapter presents the ontology learning algorithms developed and used
in the context of the ontology learning framework. According to the phases
of the ontology learning cycle described in chapter 4 a bundle of algorithms
is presented that support ontology extraction and maintenance. An important
aspect of all algorithms is that they support the idea of an ''incremental growing
ontology structure". In the last chapter it has been introduced how existing
ontologies may be imported and used within our framework as background
knowledge. The reader may note that all algorithms presented also work without
any given conceptual structures in the form of a baseline ontology. However, if
some kind of conceptual structures are available, the algorithms profit from the
existing background knowledge (e.g. in the form of already existing conceptual
structures such as a concept hierarchy) for the generation of further conceptual
structures building on the existing ones.
This chapter is introduced with section 1, that presents the mechanisms for
extracting ontologies according to the ontology structure (J. The section is
separated along the ontology elements, namely lexical entries referring to con-
cepts and relations, taxonomic relations in the form of concept hierarchies and
non-taxonomic relations between pairs of concepts. For each of these elements
different algorithms are offered that operate on different kinds of input data,
such as free natural language texts and structured dictionaries as introduced in
the last chapter. The techniques applied within the framework may be roughly
separated into (i) statistical or machine learning-oriented algorithms and (ii)
pattern-matching based techniques. Within the work described here existing
algorithms and techniques such as term extraction, statistical hierarchical clus-

tering, association rules have been adopted and used for the task of ontology
learning for the Semantic Web. The underlying idea of the extraction approach
is using a bundle of algorithms, because each algorithm produces results with a
varying quality (e.g. depending on input data, parameters). Thus, by combin-
ing them one may compensate advantages and disadvantages of each technique
and reach a high overall quality of results. In order to be able to combine the
results from different extraction algorithms it is necessary to standardize the
output in a common way. Therefore a common result structure for all learning
methods is provided. If several learning algorithms obtain equal results these
results are combined and presented to the user only once.
As mentioned earlier the aspect of maintenance is of highest relevance in the
area of ontology engineering. Ontology Learning supports the maintenance of
ontologies with reference to two aspects as given in section 2. The first aspect
considers the idea of "unlearning" knowledge structures. The approach that
is pursued in this work is called ontology pruning, supporting the detection
and deletion of unrelevant concepts for a given domain of interest. The second
approach may be considered under the name of ontology extension and refine-
ment that deals with the incremental extension of an ontology. These kinds of
algorithms share a lot of similarities with ontology extraction algorithms, thus,
one may also use the extraction algorithms introduced in section 1 to extend a
given ontology, because the existing ontology structures are included and used
for further extraction processes.
This chapter ends by summarizing the features of the established algorithms
and our overall approach. Additionally, a list of future work in the area of
developing ontology learning algorithms is listed.
1. Algorithms for Ontology Extraction

This section focuses on algorithms for ontology extraction. As mentioned
earlier all the algorithms presented here work from scratch without any pre-
defined knowledge. However, if any kind of ontology structure is available, it
is used as background know ledge for the ontology extraction task. This sec-
tion is roughly separated according to the core ontology elements introduced
in Definition 2.1: The lexicon .c referring to concepts C and non-taxonomic
relations n, the taxonomy He and the set of non-taxonomic relations. Each
of the subsequent subsections will provide different algorithms that operate on
different kind of data to extract the ontology elements.
1.1 Lexical Entry Extraction

In this subsection different strategies are proposed for the extraction of
lexical entries indicating and referring to concepts and relations by analyzing
Ontology Learning Algorithms 119
different types of input data. Related work exists in the "infonnation retrieval"
and "tenninology" community, where one mainly talks about terms. We refer
the interested reader to a recently published book by (J aquemin, 2001) that gives
a comprehensive overview on the topic of "spotting and discovering tenns"
and to the related work chapter contained in this book. Before the different
frequency measures and algorithms for extracting lexical entries are introduced,
a short clarification is provided about the difference between words and tenns
and their relationship to lexical entries and concepts, respectively.
Lexical Entries and their relationship to words and terms. As mentioned

above one often talks about tenns instead of words. To clarify the meaning
of words and tenns different definitions and explanations what the meaning of
"words" and "tenns" are introduced:
• A word is an arbitrary entity that has on one side a concept and on another,
a fonn (see (de Saussure, 1916)).
• A word is a single unit of language which has meaning and can be spoken
or written (see Cambridge English Dictionaryl).
• A word is a sound or sign that, if written down, corresponds to a string
of letters that has spaces or punctuation marks on either side (see (Miller,
1996)).
• A word is a syntactic atom, something that can be a member of a category
(noun, verb) and that can be a product of a morphological rule (see (Bloom,
2000)).
It becomes obvious there exist at least two opinions about what a word con-
tains: On the one hand, a word is considered as something that has a meaning,
on the other hand it is considered as a pure syntactic unit. As mentioned above
in several communities one often talks about tenns instead of words, e.g. as
given as follows:
• A tenn is a lexical unit that refers to a specific concept in a particular subject

field (see (Frantzi et aI., 2000)).
• A tenn is a word or expression used in relation to a particular subject, such as
to describe an official or technical word(see Cambridge English Dictionary).
A specific kind of tenns are the so-called multi-word terms 2 that are espe-
cially important for identifying concepts.
Finally, the decision is reached there is neither a clear definition of what a
word is nor of what a tenn is. In general one may say that an important aspect
of tenns is they are more specialized than words of the everyday language. In
(Frantzi et aI., 2000) it is argued that terms are mainly noun phrases with an
average length of two words. Therefore, no further distinction between terms
and words will be made in the following and so-called lexical entry are used
as a general description of both "words" and "terms".
A Note on Extracting Concepts. The next question that has to be answered

is the distinction between the extraction of concepts and lexical entries. The
reader may note that in general one still relies on the meaning triangle as intro-
duced in chapter 2 dealing with the separation between words, terms or more
general lexical entries, concepts, and things in the world. In general the ques-
tion if a specific lexical entry is modeled as a concept or only as a lexical entry
with a mapping to an existing concept is considered as a domain-specific mod-
eling decision. Therefore, lexical entries are proposed as surfaces of potential
concepts or relations to the ontology engineer. Thus, in this approach "only"
semi-automatic ways for the extraction of concept indicators are offered.
1.1.1 Frequency Measures

A straightforward technique for extracting relevant lexical entries that may
indicate concepts is just counting frequencies of lexical entries in a given set of
(linguistically processed) documents, the corpus V. In general this approach is
based on the assumption that a frequent lexical entry in a set of domain-specific
texts indicates a surface of a concept. Research in information retrieval has
shown that there are more effective methods of term weighting than simple
counting of frequencies. A standard information retrieval approach is pursued
for term weighting, based on the following quantities:
• The lexical entry frequency lefl,d is the frequency of occurrence of lexical

entry 1 E £, in a document d E V.
• The document frequency dfl is the number of documents in the corpus V

that 1 occurs in.
• The corpus frequency cfl is total number of occurrences of 1 in the overall

corpus V.
The reader may note that in general dfl ~ efl and L:d lefl,d = cfl . The
extraction of lexical entries is based on the information retrieval measure tfidf
(term frequency inverted document frequency). Tfidf combines the intro-
duced quantities as follows:
DEFINITION 6.1 Let leid l be the lexical entry frequency of the lexical entry 1
in a document d. Let dft b~ the overall document frequency of lexical entry 1.
Then tfidfi ,d of the lexical entry l for the document d is given by:
tfidJi,d = leJi,d * log ( IVI)

dfi . (6.1)
Tfidf weighs the frequency of a lexical entry in a document with a factor that
discounts its importance when it appears in almost all documents. Therefore
terms that appear too rarely or too frequently are ranked lower than terms that
hold the balance. A final step that has to be carried out on the computed tfidf1,d is
the following: A list of all lexical entries contained in one of the documents from
the corpus V without lexical entries that appear in a standard list of stopwords
is produced. 3 The tfidf values for lexical entries l are computed as follows:
DEFINITION 6.2
tfidfi := L tfidJi,d, tfidfi E R. (6.2)

dED
The user may define a threshold k E R+ that tfidfl has to exceed. The
lexical entry approach has been evaluated in detail (e.g. for varying k and
different selection strategies). A detailed description of the evaluation results
is given in subsection 4.2.
1.1.2 Lexical Entry Extraction via Dictionary Exploration

As already mentioned in the data import & preprocessing chapter, dictionar-
ies are considered as relevant input data for ontology learning. Based on the
arguments of (Litkowski, 1978) an approach for learning ontologies from dictio-
naries is followed. Litkowski argues that definitions contained in dictionaries
contain a great deal of information about the semantic charateristics which
should be attached to a lexical entry. Dictionaries serve as stable resources of
domain knowledge that gives a good starting point for a core ontology support-
ing subsequent ontology learning from pure natural language text.
Practical experiences have shown (e.g., the Insurance Company Case Study
(Volz, 2000; Kietz et al., 2000b) and the Telecommunications Case Study
(Maedche and Staab, 2000b» that often domain-specific dictionaries are avail-
able as a starting point for ontology learning. Dictionaries are transformed into
a common relational standard representation as described in chapter 5, subsec-
tion 2.3 using the document wrapper approach. Each termEntry is considered
as a potential lexical entry referring to a concept. Thus, the shallow processing
techniques are applied on the lexical level as introduced in chapter 5, subsection
2.2. Finally, the preprocessed lexical entries are proposed as potential candi-
dates for concepts to the ontology engineer. Based on these extracted baseline
entries (representing dictionary anchors), pattern matching techniques that are

further explained in the following are applied.
In this subsection different strategies for extracting concept indicators in

the form of lexical entries have been presented. These approaches worked
without any background knowledge and were restricted to the extraction of
simple hints. In the following two subsections we consider techniques for the
extraction of taxonomic and non-taxonomic relations that may also work on
the lexical level. However, if the mapping from lexical entries to concepts
exists, the algorithms operating on the concept level may profit from this kind
of background knowledge.
1.2 Taxonomy Extraction

An important building block of ontologies is the taxonomic organization of
concepts. Several methodologies for manual ontology engineering have pro-
posed different techniques for establishing a taxonomy, e.g. top-down, middle-
out, and bottom-up strategies (see the related work chapter 9). In analogy to
these manual engineering techniques, automatic ontology learning techniques
for taxonomy generation may be separated.
This subsection contains two complementary techniques for deriving a tax-
onomy of concepts from given data. The first method is based on linguistically
annotated text and uses hierarchical clustering techniques, well-known and
established in multi-variate statistics. The second method takes as input lin-
guistically preprocessed dictionary definitions (see chapter 5, subsection 2.3)
and applies pattern-matching for extracting taxonomic relationships. This
method is very powerful on semi-structured data like dictionaries.
1.2.1 Hierarchical Clustering

Clustering can be defined as the process of organizing objects into groups
whose members are similar in some way (see (Kaufman and Rousseeuw, 1990».
In general there are two major styles of clustering: non-hierarchical cluster-
ing in which every object is assigned to exactly one group and hierarchical
clustering, in which each group of size greater than one is in tum composed of
smaller groups. Hierarchical clustering algorithms are preferable for detailed
data analysis. They produce hierarchies of clusters, and therefore contain more
information than non-hierarchical algorithms. However, they are less efficient
with respect to time and space than non-hierarchical clustering4 • (Manning
and Schuetze, 1999) identify two main uses for clustering in natural language
processing5 : The first is the use of clustering for exploratory data analysis, the
second is for generalization. Seminal work in this area of so-called distribu-
tional clustering of English words has been described in (Pereira et aI., 1993).
Their work focuses on constructing class-based word co-occurrence models

with substantial predictive power. In the following the existing and seminal
work of applying statistical hierarchical clustering in NLP (see (Pereira et aI.,
1993)) is adopted and embedded into the framework. Embedding into the on-
tology learning framework in this case means, that relying mainly on concepts
and not on lexical entries and to establish mechanisms that built on existing
ontological background knowledge (e.g. use a baseline, predefined taxonomy
of concepts 1ic and extend it). However, the reader may note that the algo-
rithms presented in the following also work on lexical entries £c without having
conceptual background knowledge available
As input for the clustering mechanisms on the conceptual level the concept-
concept relation as introduced in Definition 5.7 and on the lexical level the
lexical entry-lexical entry relation given in Definition 5.4 is used. Both describe
a "concept" space or "lexical entry" space, respectively. Insofar they put lexical
entries or concepts in context to lexical entries or concepts, respectively.
Algorithm 1 Hierarchical Clustering Algorithm - Bottom-Up

Require: a set X = {Xl, ... , xn} of objects, n as the overall number of
objects,
a function sim: IfJ(X) x IfJ(X) ~ R
Ensure: the set of clusters K (or cluster hypobook)
for i:=l to n do
k i := Xi·
end for
K:= {k l , ... ,kn }.
j := n + l.
while IKI > 1 do
(knl' kn2 ) :=arg max(ku,kv)EKxKsim(ku , kv)·
k j = knl U k n2 .
K := K\{k nl , k n2 } U {k j }.
j := j + 1
end while
Baseline Hierarchical Clustering. The tree of hierarchical clusters can be

produced either bottom-up, by starting with individual objects and grouping
the most similar ones, or top-down, whereby one starts with all the objects and
divides them into groups. Algorithm 1 (adopted from (Manning and Schuetze,
1999)) describes the bottom-up algorithm. It starts with a separate cluster for
each object. In each step, the two most similar clusters are are determined, and
merged into a new cluster. The algorithm terminates when one large cluster
containing all objects has been formed. The most important aspect in clus-
tering is the selection of an appropriate computation strategy and a similarity
measure. We will introduce a number of computation strategies (e.g. single-
link, complete link or group-average) and similarity measures (e.g. cosine,
kullback-Ieibler) later in this subsection.
Algorithm 2 (adopted from (Manning and Schuetze, 1999» roughly de-
scribes the top-down algorithm. It starts out with one cluster that contains all
objects. The algorithm then selects the least coherent cluster in each itera-
tion and splits it. Clusters with similar objects are more coherent than clusters
with dissimilar objects. Thus, the strategies single-link, complete link and
group-average can also serve as measures of cluster coherence (function coh)
in top-down clustering.
The reader may note that splitting a cluster (function split) is also a clustering
task (namely the task of finding two sub-clusters of a cluster). Thus, there is
a recursive need for a second clustering algorithm. Any clustering algorithm
may be used for the splitting operation, including bottom-up algorithms.
Algorithm 2 Hierarchical Clustering Algorithm - Top-Down

Require: a set X = {Xl, ... , xn} of objects, n as the overall number of
objects,
a function coh: q3(X) --t R
a function split: q3(X) x q3(X) -t q3(X)
K := {X}(= k l )
j:= 1
while :3ki E Ks.t.lkil > 1 do
ku := arg minkvEKcoh(kv)
(kj+1' kj+2) = split(ku )
K := K \ {ku} U {kj+1' kj +2}
j:= j + 2.
end while
As mentioned earlier an important aspect is the selection of an appropriate

computation strategy and a similarity measure. In the following the most impor-
tant ones are presented. The possible combinations of algorithms, computation
strategies and measures have been evaluated to determine the best setting for
learning a concept hierarchy He. The evaluation setting and the obtained results
are described in detail in chapter 8.
Computation strategies used in hierarchical clustering. In this work it is

focused on the three functions single link, complete link and group-average
that have shown to perform good in statistical hierarchical clustering. Their
advantages and disadvantages a shortly introduced. The interested reader is
referred to a more detailed introduction given in (Kaufman and Rousseeuw,

1990). Measuring similarity based on single linkage means that the similarity
between two clusters is the similarity of the two closest objects in the clusters.
Thus, one has to search over all pairs of objects that are from the two different
clusters and select the pair with the greatest similarity. Single-link clustering
have clusters with local coherence. If similarity is based on complete linkage
the similarity between two clusters is computed based on the similarity of the
two least similar members. Thus, the similarity of two clusters is the similarity
of their two most dissimilar members. Complete-link clustering has a similarity
function that focuses on global cluster quality. The last similarity function
considered is group-average. Group average may be considered as a bit of
both, single linkage and complete linkage. The criterion for merges is the
average similarity between members.
Similarity Measures. As mentioned earlier clustering requires some kind

of similarity measure that is computed between objects using the functions
described above. Different similarity measures (a good overview is given in
(Lee, 1999)) and their evaluation (Dagan et al., 1999) are available from the
statistical natural language processing community . The two most important
measures within our work, namely the cosine measure (see Definition 6.3)
and the kullback leibler divergence (see Definition 6.4) are briefly introduced.
The cosine measure and the kullback leibler divergence proved to be the most
important ones in the area of statistical NLP.
DEFINITION 6.3 The cosine measure or normalized correlation coefficient

between two vectors x and y is given by
_ ;7\ :ExEX,YEY xy
cos ( x, Y J = --;========= (6.3)
J:ExEX x 2 :EyEY y2
U sing the cosine measure it is computed how well the occurrence of a specific
lexical entry correlates in x and y and then divided by the Euclidean length of
the two vectors to scale for the magnitude of the individual length of x and y.
Though, the following measure is not a metric in the strong sense, it has been
quite successfully applied in statistical NLP. The kullback leibler divergence
has its roots in information theory and is defined as follows:
DEFINITION 6.4 For two probability mass functions p(x), q(x) their relative
entropy is computed by
~ p(x)
D(pllq) = ~ p(x)log (x) (6.4)
xEX q
The kullback leibler divergence is a measure of how different two probability

distributions (over the same event space) are. The kullback leibler divergence
between p and q is the average number of bits that are wasted by encoding events
from a distribution p with a code based on a not-quite-right distribution q. The
quantity is always non-negative, and D(pllq) = 0 iff p = q. An important
aspect is that kullback leibler divergence is not defined for p(X) > 0 and
q(x) = O. In cases where propability distributions of objects have many zeros,
the usage of bottom-up clusering becomes nearly impossible. Thus, for using
kullback leibler divergence top-down clustering is the more natural choice.
Example. To explain the similarity measures a small example is given in the

following. Imagine a simple concept-concept matrix as given by Table 6.1
consisting of 5 concepts.
ID HOTEL ACCOMODATION ADDRESS WEEKEND TENNIS

HOTEL 14 7 4 6
ACCOMODATION 14 11 2 5
ADDRESS 7 11 10 3
WEEKEND 4 2 10 5
TENNIS 6 5 3 5
Table 6.1. Example Matrix Tee
Using the cosine measure one may compute the similarity between the con-
cepts HOTEL and ACCOMODATION as follows. The vector of the concept
HOTEL is given by iT = (0,14,7,4,6), the vector of the concept
ACCOMODATION is given by if = (14,0,11,2,5).
_ _)
cos (X,Y
+4.2+6.5
= 7 . 11 101 ~
0 93
. (6.5)
. 150
For computing the kullback leibler divergence one has first calculate the prob-
ability mass functions for each concept and its corresponding frequencies. The
probability mass functions for HOTEL are given as (0,0.45,0.22,0.13,0.19)
the probability mass functions for the concept ACCOMODATION are given as
(0.44,0,0.34,0.06,0.16)
Based on these values one can compute the kullback leibler divergence as
follows
D (HOTEL IIACCOMODATION ) = 0.22·-+ ... +0.19·-

0.22 0.19
~ 0.65 (6.6)
0.34 0.16
We refer the reader to (Manning and Schuetze, 1999) where a detailed in-
troduction into further similarity measures between two sets X and Y such as
the matching coefficient X n Y, the dice coefficient ~1~~11' the Jaccard or
.lammoto
'T' •
Clent IxnYI
coefIi' XuY or the overI ap coefIi Clent
. IxnYI··
minC!XI,IYI) IS gIven.
Hierarchical clustering including background knowledge. One problem

of hierarchical clustering approaches is that the computed clusters are not la-
beled, thus, they have to be represented as conjunctions of concepts and later
labeled by the ontology engineer. In the context of this work a strategy is de-
veloped that builds on a result of a hierarchical clustering computation K and a
given partial taxonomy of concepts He. It derives as much as possible labels for
the computed clusters based on the given taxonomic background knowledge.
Hierarchical clustering result Existing

root ontological structures
sere"n, oit), dlea)
-
(hotsl, assaIRaaatiaA, FestB.FBAt. seeF
accomodation 1 eM='«'<
(hote' accQmodatioR, res urant. beer garden) $""
$ t(~¢o/~w
e ~,:~ e;,y:;..~:,.:
area
aeco odation
New ontological structures

Insert new concepts by labeling:
service company :=
(restaurant, beer garden)
organization: =
hotel aoooR>oda'ion restaurant beer garden city .......... (accomodation, service company)
Figure 6.1. Hierarchy Clustering with Labeling
Figure 6.1 depicts an example scenario. On the left side the hierarchical
clustering result, on the right upper side the existing background know ledge
and on the right lower side the new, manually added ontological structures
based on interpreting the clustering result are depicted. In this simple example
three nodes (ACCOMODATION, AREA, ROOT) of the clustering tree could
be labeled using the existing background knowledge. The labeling strategy
introduced above has the advantage that it narrows down the length of presented
node names (that are typically conjunctions of lexical identifiers representing
concepts). In this small example two new concepts (SERVICE COMPANY,
ORGANIZATION) have been defined for extending the taxonomic structure of

the overall ontology.
In this subsection mechanisms for the extraction of concept taxonomies by
hierarchical clustering have been introduced. It is obvious that the different pa-
rameter constellations (e.g. combining the strategies with different similarity
measures) allow the generation of different results and different proposals for
concept taxonomies by the algorithm. Therefore we have evaluated the hierar-
chical clustering for ontology learning. A detailed evaluation of the different
hierarchical clustering parameter constellations as described above is given in
chapter 8.
1.2.2 Pattern-Based Dictionary Exploration

As mentioned earlier different algorithms are offered that support the extrac-
tion of the same ontological elements using different underlying techniques.
Thus, in our algorithm library another technique is available for extracting tax-
onomic relations. This technique is based on a pattern matching approach that
is applied on linguistically preprocessed dictionary definitions 6 . Pattern-based
approaches are heuristic methods using regular expressions that have been suc-
cessfully applied in the area of information extraction (see (Hobbs, 1993». The
idea of using pattern-based approaches for the extraction of semantic structures
has been introduced by (Hearst, 1992; Morin, 1999). The underlying idea is
very simple: Define a regular expression that captures re-occurring expressions
and map the results of the matching expression to a semantic structure, such as
1{c(C1 , C2 ).
In the setting for ontology learning patterns work very well, due to the fact
that the output of the natural language component is regular. In general pat-
terns may be used to acquire taxonomic as as well non-taxonomic relations.
However, here our attention is restricted to patterns for the acquisition of taxo-
nomic relations. In the approach described here the idea is using the structured
information contained in domain-specific dictionaries as input for extracting
taxonomic relations. Thus, in the framework several heuristics are offered to
acquire taxonomic relations. An important aspect is that the descriptions of all
dictionary entries are preprocessed and normalized using our natural language
processing system introduced in chapter 5. The following heuristics give an
example of patterns that have been successfully applied to the preprocessed
dictionary definitions:
• Figure 6.2 depicts an example of the successful application of a pattern:

In the upper part of Figure 6.2 an example dictionary definition for so-called
"Automatic Debit Transfer" is given (the example is taken from the insurance
domain). The underlying idea of this pattern is that in a definition of a given
dictionary entry (such as "Automatic Debit Transfer") typically lexical entries
Automatic Debit Transfer: Electronic service arising from a debit au-

thorization of the Yellow Account holder for a recipient to debit bills that
fall due direct from the account. (see also direct debit system).
Pattern
1 Dictionary term: (N PI, N P 2 , N Pi, and / or N P n )
2 for all N Pi, 1 <= i <= n HC("dictionary entry", N Pi) 7
Result
H C (C21 Cd, where
C 1 = F("Electronic service") and
C2 = F("Automatic Debit Transfer").
Figure 6.2. Example Pattern for Dictionary Descriptions
referring to a superconcept are introduced (such as "Electronic service"). In

some cases it holds that more than one lexical entry referring to a super-
concept is listed by a separating "," or an "and/or". This pattern proved
to be extremely valuable in the aforementioned case study (Volz, 2000),
where a certain domain of the intranet of the insurance company contained
dictionary definitions.
• Another heuristic available in our framework deals with compounds, that are
very frequent in German, for example "Arbeitslosenentschadigung" (in English
"unemployment benefits") is a compound. We have introduced that the
linguistic processor is able to decompose compounds, thus supplying parts
of compounds. The heuristic treats the last part of a compound delivered by
the linguistic processor as a hypernym and suggests the concept retrieved
using the supplied stem as a superconcept. In this example it would mean
the relation H C(C11 C2 ), where C1 = F("Arbeitslosenentschadigung") and
C 2 = F("entschiidigung")
The heuristics become powerful ifthey are applied in a combined way: Two
examples of prototypical dictionary definitions are given on the left side of
Figure 6.3. Parts of the proposed taxonomy H C by applying pattern extraction
on these definitions are depicted on the right side of the figure.
The implementation supporting the engineering and application of patterns
for ontology learning will be described in chapter 7. Within our approach the
heuristics introduced above are collected into so-called extraction pattern li-
braries. The reader may note that specific domains require specific patterns.
However, in the case studies mentioned above it has been experienced that a
large number of the patterns could be reused (e.g. switching from insurance to
telecommunication and finance one could reuse 90 % ofthe insurance patterns).
verminderte Arbeitsfiihigkeit: Vermittlungsfahigkeit einer ~ e· \tUfrJU$sd.t11n~J

versicherten Person, die unter anderem wegen Krankheit,
Unfall oder Mutterschaft vorObergehend eingeschrankt ist.
~ 0: AnsrfuctlsVt){faU:S$-ct.n~w~
Vermittlungsfiihigkeit: Anspruchsvorausselzung zum Bezug von 4 • ;(;if~~~;1;i~~:;,~r~~!i#!Mi!tr
Arbeitslosenentschadigung, wobei der Versicherte bereit, in der • V<>lgdm
Lage und berechtigt sein muss, eine zumutbare Arbeit anzunehmen. =0 PfWSUPP(H>lbo~
Figure 6.3. Concept Hierarchy Extraction from Dictionary Definitions
An empirical evaluation of applying patterns for learning taxonomies of con-

cepts is available in (Volz, 2000), that describes a case study in the insurance
domain. The evaluation has shown that around 30% percent of the extracted
taxonomic relations have been correct and could be added to the ontology.
1.3 Non-Taxonomic Relation Extraction

The experiences from real-world ontology engineering have shown that ma-
jor efforts in ontology engineering must be dedicated to the definition of non-
taxonomic relations. In fact, in the concrete ontology engineering work carried
out, it appears to be the more difficult task. In general, it is less well known how
many and what type of relationships should be modeled in a particular ontology.
The approach described here for discovering non-taxonomic relations builds on
pure text facilitating the important part of defining non-taxonomic relations of
ontology engineering 8 . In this subsection first a baseline algorithm for discov-
ering non-taxonomic relations between concepts is introduced. If a taxonomic
part He of an ontology 0 is available, the taxonomy is included as background
knowledge for generalization (see subsection 1.3.2), for supporting the user in
adding so-called concept constraints to the algorithm (see subsection 1.3.3)
and for the hierarchical presentation of extracted relations.
1.3.1 Core Learning Algorithm

The core learning algorithm is based on the idea of extracting association
rules from a given database (see (Agrawal et aI., 1993)). Association rules have
been established in the area of data mining, thus, finding interesting association
relationships among a large set of data items. Many industries become inter-
ested in mining association rules from their databases (e.g. for helping in many
business decisions such as customer relationship management, cross-marketing
and loss-leader analysis (see (Han and Kamber, 2001) for examples». A typi-
cal example of association rule mining is market basket analysis. This process
analyzes customer buying habits by finding associations between the different
items that customers place in their shopping baskets. The information discov-
ered by association rules may help to develop marketing strategies, e.g. layout
optimization in supermarkets (placing milk and bread within close proximity
may further encourage the sale of these items together within single visits to the
store). In (Agrawal et al., 1993) concrete examples for extracted associations
between items are given. The examples are based supermarket products that
are included in a set of transactions collected from customers' purchases. One
of the classical association rule that has been extracted from these databases is
that "diaper are purchased together with beer".
The reader may now ask the question what these kinds of applications de-
scribed above have in common with the ontology learning task of extracting non-
taxonomic relations between concepts. The underlying idea is that an adopted
association rule algorithm analyzes statistical information about the linguistic
output such as given in Definition 5.10. For instance, the linguistic processing
may find out the lexical entries "hotel", "guest house", and "youth hostel" all co-
occur with the lexical entries "costs" in sentences like
"Costs at the youth hostel amount to $20 per night." several times in the texts. Thus,
statistical data is derived indicating co-occurrences of the corresponding con-
cepts HOTEL, GUEST HOUSE, and YOUTH HOSTEL with COSTS. In this
example, the relation between HOTEL and COSTS may be proposed to the
engineer for inclusion in the ontology.
Input for Non-Taxonomic Relation Extraction. As already mentioned above

linguistic and heuristically coupled pairs of lexical entries or concepts are used,
respectively. On the lexical level the lexical entry transaction relation is used
according to Definition 5.6. On the conceptual level the concept transaction
relation (see Definition 5.10) serves as input to the mechanism. These pairs
serve as input to the association rule algorithm that is introduced in detail in the
following.
The basic association rule algorithm is provided with a set of transactions

T := {ti Ii = 1 ... n}, where each transaction ti consists of a set of so-called
items. If the algorithm is executed on the lexical level the items are given as
ti := {ai~ Ij = 1 ... mi, ai,j E .eC } and each item ai,j is from the set of lexical
entries.e . On the other hand executing the algorithm on the concept level the
items are given in the form ti := {ai,jlj = 1 ... mi, ai,j E C} and each item
ai,j is from a set of concepts C.
The algorithm computes association rules for lexical entries X k -+ Y k

(Xk,Yk c .ec,XknYk = 0)orforconceptsXk -+ YdXk,Yk c C,XknYk =
0) such that measures for support and confidence exceed user-defined thresh-
olds. Thereby, support of a rule X k -+ Yk is the percentage of transactions that
contain X k U Yk as a subset, and confidence for X k -+ Yk is defined as the
percentage of transactions that Yk is seen when X k appears in an transaction:
DEFINITION 6.5 (SUPPORT OF AN ASSOCIATION RULE)
(6.7)
DEFINITION 6.6 (CONFIDENCE OF AN ASSOCIATION RULE)
(6.8)
The basic association rule extraction approach may be transferred to the on-
tology learning setting with only slight modifications. The major modification
that needs to be taken care of is the input transaction data fed into the learning
algorithm. A baseline approach has been chosen that considers each linguistic
pair to constitute a transaction on its own. The most appropriate choice for
compiling transaction rules from sets of concept pairs found by linguistic pro-
cessing may depend on the actual set of domain texts. Let's consider a particular
scenario to explain why this is the case. Let's assume a set of 100 texts each de-
scribing a particular hotel in detail. Each hotel has an address, but it also has an
elaborate description of the different types of public and private rooms and their
equipment resulting in 10,000 concept pairs returned from linguistic processing.
A baseline choice for transactions considers each concept pair as a transaction.
Then support for the rule H OTEL-t ADDRESS is equal or, much more probable,
(far) less than 1%. while rules about rooms and their equipment or their style,
like ROOM-tBED, might achieve ratings of several percentage points. This
means that an important relationship between HOTEL and ADDRESS might get
lost among other conceptual relationships. In contrast, if one considers com-
plete texts to constitute transactions, an ideal linguistic processor will lead to
more balanced support measures for HOTEL-t ADDRESS and ROOM-tBED
of 100% each. In other scenarios it might be useful to constitute transactions
from the results of processing paragraphs of texts, e.g. when hotel descriptions
are listed within text documents, or to combine results from several texts, e.g.
when each hotel description is spread over several web pages.
Another important aspect is allowing input transactions such as
{PERSON, PERSON}, because for the ontology learning task one may derive
associations such as PERSON -t PERSON that may result in a non-taxonomic
relation like COOPERATEWITH(PERSON,PERSON). To realize the possibility of
extracting these kind of non-taxonomic relations the concept PERSON is splitted
into two "artificial concepts" (like PERSON-l and PERSON-2). The concepts
are later merged to PERSON, and thus not presented to the user.
The general underlying idea of computing association rules may be separated
into the following two steps:
1 Compute all item sets that fulfill a user defined minimum support Smin.
These item sets are called frequent item sets.
2 Derive from these frequent item sets all associations rules that fulfill a user
given minimum confidence Cmin.
The basic algorithm for extracting frequent item sets from the preprocessed
data is given in Algorithm 3. As mentioned earlier either lexical entries or
concepts are considered as input to the algorithm. Thus, in general the term
"item" is used as a generalization of lexical entries and concepts within the
algorithm.
Algorithm 3 Basic Algorithm for Relation Extraction

Require: Smin
Ensure: A set of n-frequent item sets In
1:= 0;
n:= 1;
Compute for all H E HI the support;
h := {H E HI Isupport (H) ~ Smin};
if h i- 0 then
I = I U h;
while In i- 0 do
Hn+I:= {{il,i2, ... ,i n +dIVj:
1::; j::; n + 1: ({i l ,i2, ... ,in+l} - {ij}) E In};
n:= n+ 1;
Compute for all H E Hn the support;
In := {H E Hnlsupport(H) ~ Smin};
1= IU In;
end while
end if
The algorithm is based on the idea that for item set X' ~ X it holds that
support(X') ::; support(X) (see step Hn+l ... ). Thus, it iteratively computes
frequent item sets h, 12 , ....
Finally, for generating association rules from the computed frequent item
sets one typically uses the following characteristic property of frequent item
sets: Let X be a frequent item set, than it holds for all X' ~ X
support (X)
confidence((X - X') -+ X') = (6.9)
support(X - X')
Thus, the underlying idea is that the confidence may be computed from the
support of the frequent item sets. An algorithm for deriving association rules
from frequent item sets based on the characteristic property introduced above
is given in (Agrawal et aI., 1993).
1.3.2 Relation Extraction including Background Knowledge

According to the ontology learning paradigm of "using what exists" as back-
ground knowledge, the baseline algorithm introduced above has been extended
for extracting non-taxonomic relations using background knowledge in the form
of a concept taxonomy H C . The usage of product hierarchies (or item hierar-
chies) for deriving association rules has already been researched. The problem
is that basic association rule algorithms restrict the items that appear in a rule
to be at the leaf level of a taxonomy. There are at least two reasons why rules
across different levels of a taxonomy are useful:
• Rules at lower levels may not have minimum support. Thus, if items are
organized in a taxonomy may interesting rules that would otherwise not be
found, due to the fine leaf-level granularity, can be detected.
• Taxonomies can be used to prune uninteresting or redundant rules. This is

a simple approach towards the detection of interesting rules.
The ontology learning mechanism with inclusion of background knowledge

that is proposed here is based on the algorithm for discovering generalized asso-
ciation rules proposed by Srikant and Agrawal (Srikant and Agrawal, 1995). It
uses the background knowledge from the taxonomy in order to propose relations
at the appropriate level of abstraction. An important aspect is that relational ex-
traction with background knowledge only works on the basis of a set of concepts
and not for lexical entries.
The advantage of using background know ledge for deriving generalized as-
sociation between concepts may be explained in a short example: For instance,
the linguistic processing may find the lexical entries "hotel", "guest house", and
"youth hostel" all co-occur with the lexical "costs" in sentences like
"Costs at the youth hostel amount to $20 per night" several times in the texts. Thus,
statistical data is derived indicating co-occurrences of the concepts HOTEL,
GUESTHoUSE, and YOUTHHoSTEL with COSTS. Additionally, the taxo-
nomic relations HC(HOTEL, ACCOMODATION), HC(GuEsTHousE,
ACCOMODATION), H C (YOUTHHoSTEL, ACCOMODATION) hold between
these concepts. Thus, the generalized association rule algorithm determines
confidence and support measures for the relationships between these three pairs,
as well as for relationships at higher levels of abstractions, such as between
ACCOMMODATION and COSTS. In the final step, the algorithm determines
the level of abstraction most suited to describe the conceptual relationships
by pruning less adequate ones. In this small example, the relation between
ACCOMMODATION and COSTS is proposed to the engineer for inclusion in
the ontology. The usage of background know ledge supports the generation of
more general non-taxonomic relations that are inherited to the sub-concepts,
e.g. the relation HASCOST holds between ACCOMOCATION and COSTS, but
also between sub-concepts of ACCOMODATION as given above.
Algorithm 4 Non-Taxonomic Relation Extraction Algorithm with Background

Knowledge
h :={frequent I-item sets};
n:= 2;
while I n - 1 i= (/) do
Compute Hn (New candidates of size n generated from In-d using Al-
gorithm 3.
for all transactions t E T do
Add all ancestors of each item in t to t (by querying H C )
Remove any duplicates in t.
Increment the count of all candidates in Hn that are contained in t.
end for
In :=all candidates in Hn with minimum support.
n := n + 1;
end while
Answer: UnIn;
The algorithm for including background knowledge for the generation of

non-taxonomic relations is given in pseudo code in Algorithm 4. The idea
behind the algorithm is that one first extends each transaction ti to also include
each ancestor of a particular concept ai,j, i.e. t~ := ti U {all (ai,j, al) E HC}.
Based on this "pre-compiled" data one computes confidence and support for all
possible association rules Xk --+ Y k where Y k does not contain an ancestor of
X k as this would be a trivially valid association. Finally, one prunes all these
association rules X k --+ Yk that are subsumed by an "ancestral" rule Xk --+ Yk ,
the item sets Xk, Yk of which only contain ancestors or identical items of their
corresponding item set in X k --+ Yk . This criteria of "interest" has been used by
(Srikant and Agrawal, 1995) to define a simple interest measure for association
rules using the taxonomic background knowledge. A constant R may be defined
that defines that only rules whose support is more than R times the expected
value or whose confidence is more than R times than the expected value. An
interesting rule is than defined as follows: A rule is is interesting if it has no
ancestors or it is R-interesting with respect to its close ancestors among its
interesting ancestors.
Example. For the purpose of illustration, an example is provided to the reader.

The example is based on actual experiments. A text corpus given by a WWW
Table 6.2. Examples for Linguistically Related Pairs of Concepts
LJ ail! L2 ai,2
"Mecklenburgs" AREA "hotel" HOTEL
"hairdresser" HAIRDRESSER "hotel" HOTEL
"balconies" BALCONY "access" ACCESS
"room" ROOM "TV" TELEVISION
provider for tourist information has been processed 9 . The corpus describes
actual objects referring to locations, accomodations, furnishings of accomo-
dations, administrative information, or cultural events, such as given in the
following example sentences.
(7) a. "Mecklenburg's" sch6nstes "Hotel" liegt in Rostock. ("Mecklenburg's"

most beautiful "hotel" is located in Rostock.)
b. Ein besonderer Service fiir unsere Gaste ist der "Fris6rsalon" in un-
serem "Hotel". (A "hairdresser" in our "hotel" is a special service for
our guests.)
c. Das Hotel Mercure hat "Balkone" mit direktem "Strandzugang". (The
hotel Mercure offers "balconies" with direct "access" to the beach.)
d. AIle "Zimmer" sind mit ''TV'', Telefon, Modem und Minibar ausges-
tattet. (All "rooms" have "TV", telephone, modem and minibar.)
Processing the example sentences (7 a) and (7b) the dependency relations

between the lexical entries are extracted (and some more). In sentences (7c)
and (7d) the heuristic for prepositional phrase-attachment and the sentence
heuristic relate pairs of lexical entries, respectively. Thus, four concept pairs -
among many others - are derived with knowledge from the lexicon.
Figure 6.4. An Example Concept Taxonomy as Background Knowledge for Non-Taxonomic

Relation Extraction
The algorithm for learning generalized association rules uses the concept
hierarchy, an excerpt of which is depicted in Figure 6.4, and the concept pairs
from above (among many other concept pairs). In our actual experiments, it dis-
Ontology Learning Algorithms l37
Table 6. 3. Examples of Discovered Non-Taxonomic Relations
Discovered relation Confidence Support

(AREA, ACCOMODATION) 0.38 0.04
(AREA, IIO'FEb) (U 9.93
(ROOM, FURNISHING) 0.39 0.03
(ROOM, 'FEbm'ISIOPI) 9.29 9.92
(ACCOMODATION, ADDRESS) 0.34 0.05
(RESTAURANT, ACCOMODATION) 0.33 0.02
covered a large number of interesting and important non-taxonomic conceptual

relations. A few of them are listed in Table 6.3. Note that in this table we also
list two conceptual pairs, viz. (AREA, HOTEL) and (ROOM, TELEVISION),
that are not presented to the user, but that are pruned. The reason is that there
are ancestral association rules, viz. (AREA, ACCOMODATION) and (ROOM,
FURNISHING), respectively with higher confidence and support measures.
The mechanisms for the extraction of non-taxonomic relations using back-

ground knowledge have been evaluated. A detailed description of the evaluation
results is given in chapter 8.
1.3.3 Concept Constraints C*

The application of association rules in practice has led to the experience, that
often only a subset of the extracted association rules is of interest. In (Srikant
et aI., 1997) the example is given that one may want rules that contain a specific
item or rules that contain children of a specific item in a hierarchy. Using such
constraints has not only the advantage of focusing the result, it also drastically
reduces the execution time of the algorithm. The underlying idea is that only
relevant transactions (that means transactions that fulfill the given constraints)
are provided to the algorithm.
In this work means are offered to the ontology engineer for the focused gen-
eration of non-taxonomic relations by defining so-called concept constraints.
The engineer may select a set of concepts and define if their sub- and/or super-
concepts may be added to the set of relevant concepts. The algorithm operates
under consideration of the overall transaction set on the restricted set of focused
transactions given through the concept constraints C*. The consideration of the
overall transaction set is important and is reflected in an updated definition of
the support. Let C* ~ C be the set of selected concepts. Then T' ~ T is the
reduced input transaction set. It holds that Vt E T' : t ~ C 1\ (t' n C* I- 0).
One may still use the same association rule extraction algorithm operating on
T', however it has to consider the number of transactions contained in T. This

is reflected by a modified support definition as given in (Srikant et al., 1997).
1.3.4 Hierarchical Ordering of Extracted Relations

The use of background knowledge in the form of a taxonomy of concepts may
generate a large number of redundant and unnecessary relations. Consider the
following example: Assume as well the relation (HOTEL,RoOM) motivated by
the input data as the generalized relation (ACCOMODATION,RoOM) are gener-
ated and proposed by the algorithm. The user has to decide, how these relation
proposals are added to the ontology. One solution could be adding the relations
HASRoOM(AccOMODATION,RoOM) and HASHoTELRoOM(HoTEL,RoOM).
Additionally, the ontology engineer may define the relation between HASRoOM
and HASHoTELRoOM using an axiom, e.g. one may say the relation HASHoTELRoOM
is a sub-relation of the relation HASRoOM. To support this decision process an or-
der on the relations has to be defined. The ordering given through the taxonomy
H C than allows for a hierarchical presentation of the extracted relations.
DEFINITION 6.7 (PREDECESSOR RELATION) Let (CI , C 2 ) be an
extracted relation. Then, (C~,C~) is a predecessor relation of (CI ,C2), iff
(C~ E Clos(C I , H C) /\(C~ E Clos(C2 , HC).10
Relation extraction result New ontological structures
(Accomodation,Room) hasRoom(Accomodation,Room)
hasHoteIRoom(Hotel, Room)
And an axiom:
(Accomodation,DoubleRoom)
subRelation(hasHoteIRoom,
hasRoom)
F-Logic:
(Hotel,DoubleRoom) FORALL x,y hasRoom(x,Y) <-
hasHoteIRoom(x,y).
Figure 6.5_ Hierarchical Order on Extracted Non-Taxonomic Relations
In the example above the relation (HOTEL,DoUBLERoOM) is a predecessor

relation of (ACCOMODATION,RoOM), because HOTEL
E Clos(AcCOMODATION, H C ) and DOUBLERoOM E Clos(RooM, He).
DEFINITION 6.8 (DIRECT PREDECESSOR RELATION) Let (q, C~) be a

direct predecessor relation of( C I , C 2 ), iffthere exists no other relation (C; , C~)
that is a predecessor relation of (CI , C 2 ) and (C~, C~) is predecessor relation
of(C;,C~).
Considerung the example above the relation (HOTEL,RoOM) is a direct pre-

decessor relation of (ACCOMODATION,RoOM). The notion of predecessor
and direct predecessor relations as given in Definitions 6.7 and 6.8 is used to
derive a hierarchical presentation of the extracted relations. To derive the hier-
archical presentation for each relation according to the taxonomy the possible
predecessor relations are computed and the predecessor relations are assigned
as "superrelations" to the relations. Due to the transitivity of the generalization
relation it is sufficient to compute only the direct predecessor relations.
Figure 6.5 depicts an example for an hierarchical order on the extracted re-
lations (ACCOMODATION,RoOM), (HOTEL,RoOM), (ACCOMODATION,
DOUBLERoOM), (HOTEL,DoUBLERoOM) and the added ontological ele-
ments by the ontology engineer.
2. Algorithms for Ontology Maintenance

In the previous sections the algorithms used for extracting ontologies from
given data have been presented. Referring to chapter 4, where the cycle of ontol-
ogy learning has been introduced, in this section strategies for data-oriented on-
tology maintenance are introduced, supporting the phases of adequate adapting
and tailoring an ontology to a specific application and extending it by analyzing
application data. Two essential mechanisms are presented that are considered as
necessary for ontology maintenance: The first one is a mechanism for ontology
pruning, the process of removing elements from the ontology that are not rel-
evant to the application domain. The second approach describes a mechanism
for ontology extension and refinement.
2.1 Ontology Pruning

Ontology pruning becomes necessary if large ontologies are imported into
the system (e.g. as described in chapter 5). As these ontologies may be partially
relevant for the target application, a large number of concepts are not required
and may even make the performance of the application worse. In general,
ontologies have to be focused for the target application to become widely applied
and accepted. The approach of ontology pruning (see (Kietz et aI., 2000a)) uses
data-oriented techniques that are based on lexical entry and concept frequencies
according to the techniques presented in subsection 1.1 dealing with lexical
entry extraction. In general, one may consider ontology pruning as ''inverse''
lexical entry extraction. The idea behind this approach is, that the lexical
entries contained in a domain-specific corpus are analyzed (their occurrence
frequency is computed). Lexical entries that refer to concepts in the learned
ontology, but do not occur in the domain-specific corpus, signalize that these
concepts should be eliminated from the learned ontology. In the following
two strategies for pruning are presented, namely (i) baseline pruning and (ii)
relative pruning. The first strategy mainly relies on the assumption that if a
concept frequently occurs in a corpus, it and its super-concepts (frequencies
are propagated to super-concepts) remain in the ontology. The second strategy
is more sophisticated. In order to determine domain relevance the relative
frequency of given ontological entities is considered with respect to frequencies
obtained from a generic corpus.
2.1.1 Baseline Pruning

Baseline pruning adopts the idea of term extraction ("frequent terms tend to
lexicalize concepts") in an inverse sense: lexicalizations of concepts or rela-
tions that are not frequent or do not even exist, indicate that the concepts or
relations may be eliminated from the ontology. Thus, baseline pruning receives
a given ontology and a set of domain-relevant texts. The texts are processed
and for each concept a domain frequency is computed (via the mapping from
concrete lexicalizations of concepts contained in texts to concepts). Addition-
ally, frequencies are propagated to super-concepts. Thus, if a concept occurs
frequently in a domain-specific corpus, it and its super-concepts remain in the
ontology (see Algorithm 5 for a detailed description).
2.1.2 Relative Pruning

As in the lexical entry extraction task lexical entries that are frequent in
a given corpus are considered as a constituent of a given domain. But-
in difference to the lexical entry extraction purpose - the mere frequency of
ontological entities is not sufficient. In order to determine domain relevance one
must consider the relative frequency of given ontological entities with respect
to frequencies obtained from a generic corpus. Thus, frequencies are also
determined from a second corpus that contains generic documents as found in
reference corpora like CELEX 11 or the freely available archive of a well-known
German newspaper TAZ12 that is used as a generic corpus in the experiments.
The Relative Pruning Algorithm. The underlying ideas of the relative prun-
ing algorithm are that domain relevance is considered as the relative frequency
of given ontological entities with respect to frequencies obtained from a general
corpus. The pruning algorithm is given in pseudo code in Algorithm 5. It takes
as input an ontology 0, a domain-specific corpus D, a general corpus G, a ratio
r E R+ and the selected frequency computation technique m.
The general idea of relative pruning is that first for all lexical entries frequen-
cies values are computed using a measure selected by the user (e.g. lef or tfidf).
Then, both frequencies values obtained in the two corpora are compared. All
existing concepts and relations that are more frequent 14 in the domain-specific
corpus remain in the ontology. The user can also specify whether or not con-
cepts that are neither contained in the domain-specific nor in the generic corpus
should be pruned from the ontology.
When pruning the mapping information in the lexicon cfJ has also to be
updated. In general we offer two ways of interaction. First the user has to decide
what happens with the lexical entries. Second, we offer an automatic strategy:
In order to minimize the loss of references in F, references are migrated the
closest super-concept of C E C that remains in the ontology. If mUltiple super-
concepts in distinct paths remain, the stem reference is deleted, because there
is no possibility for automatically selecting the correct super-concept. If, for
example, "chair" is pruned from the ontology, the lexical reference to "furniture"
might be updated to the closest super-concept of "chair" that remains in the
ontology. Using this strategy we obtain an underspecified semantics for lexical
entries (similar to the approach described in (Buitelaar, 1998»
Algorithm 5 Relative Pruning Algorithm

Require: Ontology (), domain-specific corpus D, general corpus C, ratio r,
selected measure m
for all G E C do
C.domainFreq = c.generaIFreq = 0;
end for
Compute frequencies values f D : =freq(.c c ,D ,m);
Compute frequencies values fG :=freq(.cc ,C,m);
for all L E .cC do
for all (L, G) E F do
if LED then
G.domainFreq = G.domainFreq + fD(L)
for all (8) E UClos(G, 1{C)13 do
8.domainFreq = 8.domainFreq + fD(L)
end for
end if
if LEe then
G.generalFreq = C.generaIFreq + fG(L)
for all (8) E UClos(G, 1{C) do
8.generalFreq = 8.generaIFreq + fG(L)
end for
end if
end for
end for
C' = {CIG E C and G.domainFreq 2: r * G.generaIFreq};
2.2 Ontology Refinement

In this subsection the approach for incremental extension of an ontology with
new concepts is introduced. As mentioned earlier the algorithms for extraction
may also be used for ontology refinement and extension. Refining plays a
similar role as extracting. Their difference exists rather on a sliding scale
than by a clear-cut distinction. While extracting serves mostly for cooperative
modeling of the overall ontology (or at least of very significant chunks of it),
the refinement phase is about fine tuning the target ontology and the support of
its evolving nature.
In this subsection the extraction of the meaning of unknown lexical entries
and the discovering of indicators that a lexical entry should be defined as a
reference to a new concept is concentrated on. The approach is based on the
assumption that unknown lexical entries share a similar conceptual behaviour
with respect to already known lexical entries with the concepts they are mapped
on.
Example. A small example may illustrate the idea. For example looking at the
unknown lexical entry "weekend excursion", one may say that it shares the same
behaviour as a concept like EXCURSION. Sharing conceptual behaviour may be
computed using measures of distributional similarity based on concept vectors
as introduced in subsection 1.2.1. According to Definition 5.8 introduced in
chapter 5 the data for this learning task is represented in a concept/lexical entry-
concept matrix as given in Figure 6.4.
ID ACCOMODATION HOTEL EVENT

EXCURSION 5 4 2
"weekend excursion" 4 4
CAR 1
Table 6.4. Example matrix r c/e
The counts for dependency between a lexical entry or concept and a concept
entered in the relation are computed using the different linguistic and heuristic
indicators introduced in chapter 5.
2.2.1 The Refinement Algorithm

The underlying ideas of the refinement algorithm are that (i) important un-
known lexical entries are recognized, (ii) similar concepts are retrieved and pro-
posed to the user and (iii) the user decides about the meaning of the unknown
lexical entries by an appropriate assignment. The incremental refinement algo-
rithm is given in pseudo code in Algorithm 6. It takes as input a concept/lexical
entry-concept matrix, an ontology, a selected similarity measure (see subsection
1.2) and two thresholds.
In the first step a so-called coupling frequency of the unknown lexical en-
tries L~ E U (U is the set of lexical entries where the mapping to concepts
is unknown) in r clc is computed. This is done by easily summarizing the
overall number of occurrences with concepts. In our example given in Ta-
ble 6.4 the coupling frequency cf of the unknown word "weekend excursion" is
cf("weekend excursion") = 9. Ifthe coupling frequency exceeds the threshold t
the similarity between the unknown lexical entry and all known lexical entries,
corresponding to concepts C is computed. The k most similar concepts are
retrieved to the ontology engineer who determines the meaning of the unknown
lexical entry by either assigning it to an existing concept or by the definition of
a new concept and maps the unknown lexical entry to the concept.
A further extension of the algorithm is that if the taxonomic knowledge
contained in He is available it may be used for improving the determination of
the overall coupling frequency of unknown lexical entries and existing concepts.
Thus, one may compile the transitive closure into r clc. For the example this
Algorithm 6 Refinement Algorithm

Require: Matrix r de, ontology 0, similarity measures sim, frequency thresh-
old t, concept return threshold k
begin
Compute coupling frequency cf (the summarized number of couples to con-
cepts) of the unknown lexical entries L~ E U contained in r de
for all L~ E U do
if cf(L~) 2 t then
for all C E C do
Compute sim( C, LD.
end for
Return the k most similar concepts to the ontology engineer.
Determine meaning of unknown lexical entry by assigning it to an exist-
ing concept C via F or by defining a new concept C' and an associated
assigment in F.
end if
end for
end
would mean, that if a lexical entry or a concept is coupled with e.g. HOTEL it
is automatically coupled with the concept ACCOMODATION, on account of the
taxonomic relation He (HOTEL, ACCOMODATION).
3. Conclusion
This section has presented a number of algorithms that have been adopted,
designed, and implemented for the ontology learning algorithm library. The
algorithms have been separated into algorithms that support ontology extraction
and that support ontology maintenance.
The algorithms build on the shallow linguistic processing and the relational
transformations introduced in chapter 5. An important aspect of all algorithms is
that they are based on the idea of an "incrementally growing ontology structure".
Thus, on the one hand the algorithms support developing an ontology from
scratch (e.g. by lexical entry extraction, hierarchical ordering of lexical entries
and the extraction of associations between lexical entries), on the other hand
the algorithms profit from existing conceptual background knowledge, e.g. in
the form of the lexical entry to concept mapping or a concept hierarchy.
The following is a list of open issues that have only been partially explored in
this work but are of highest relevance for future work. The attention is restricted
to three important points, mainly multi-strategy learning, the relation separation
problem and the learning and maintenance of axioms A o .
3.1 Multi-Strategy Learning

Ontology Learning is a mechanism for semi-automatically supporting the
ontology engineer in engineering ontologies. It has been seen that a large num-
ber of different data may be available for ontology learning. Additionally, each
of these different kinds of data may serve as input for a number of different
algorithms. E.g. for extracting taxonomic knowledge in the form of concept
hierarchies }{e two different techniques have been proposed, the first one based
on free text, the second one based on semi-structured data in the form of dic-
tionaries. The extraction of non-taxonomic relations has been supported by an
adopted association rule-based algorithm. However, a large number of other
algorithms may be used for supporting this task (e.g., formal concept analysis
as introduced in chapter 5 also supports the generation of taxonomic as well as
non-taxonomic relations).
In order to be able to combine the results from different extraction algo-
rithms it is necessary to standardize the output in a common way. Therefore a
common result structure for all learning methods is provided (see Figure 7.14).
If several learning algorithms obtain equal results these results are combined
and presented to the user only once. This implements a simple multi-strategy
learning approach. Thereby the complex task of ontology engineering is better
fitted as it is possible to combine the results from different learning algorithms.
A suggested best approach is that one takes into account many different kinds of
information, and argue for the suitability of a multi-strategy learning approach.
Multi-strategy learning has been successfully applied in supervised learning
tasks (Michalski and Kaufmann, 1998). E.g. in (Morik and Brockhausen, 1996)
a multi-strategy architecture for learning rules in a restricted first-order logic
from very large data sets has been presented. Their motivation and underlying
idea is that one starts with a simple algorithm to detect basic structures that serve
as input to more complex algorithms and narrow down the search space. In the
future more complex models, e.g., for weighting different ontology learning
algorithms and more advanced combinations of them depending on the given
input data may be researched.
3.2 Taxonomic vs. Non-Taxonomic Relations

As seen in this chapter there exist different algorithms for the extraction
of relations from different kind of data. The problem is that often the algo-
rithms do not assign further semantics to the extracted relations, namely in the
simplest case the separation between taxonomic and non-taxonomic relations.
The separation between taxonomic and non-taxonomic relations is not a trivial
task. First steps towards an algorithm for the "intelligent" separation based
on the idea that taxonomic relations inherit relations to their subconcepts
have been described in (Boch, 2001). However, this work is still in an early
stage and further research in this area is required. A specifically interesting

aspect is the question how much background knowledge (e.g., in the form of a
core ontology) is required to reach good values when applying the separation
algorithm.
3.3 A Note on Learning Axioms - A 0

The notion of an ontology 0 has been introduced in definition 2.1. A first
step to derive a relation hierarchy has been explained by the approach ofhierar-
chic ally ordering extracted non-taxonomic relations. However, further support
of the extraction of axioms AO is not available in the algorithm library. In the
current system (see chapter 7), the user may model axioms manually on top of
the extracted structures using the semantic pattern mechanisms proposed in the
layered ontology engineering approach.
The definition and maintenance of axioms is an important building block for
ontology engineering. Thus, in the future mechanisms for semi-automatically
extracting axioms are required. The problem that has been identified in this
work is that algorithms for extracting axioms require well-structured relational
data, in the best case in a form of a know ledge base. This kind of relational data
is currently rare, but will grow with the success of the Semantic Web. Based on
this kind of Semantic Web data one may pursue approaches such as described
in (Morik et aI., 1993b). In their work they used the idea of axiom schemata to
exploit knowledge bases for instantiating axiom schemata.
Notes
1 The dictionary is online available at http://dictionary.cambridge.org.
2 A comprehensive overview on multi-term recognition and extraction is given
in the EAGLES-96 report on
http://www.ilc.pLcnr.itlEAGLES96/rep2/node38.html.
3 Available for German at
http://www.loria.frrbonhomme/sw/stopword.de
4 Hierarchical clustering has in the average quadratic time and space complexity.
5 A comprehensive survey on applying clustering in NLP is also available in
the EAGLES report, see
http://www.ilc.pi.cnr.itlEAGLES96/rep2/node37.htm
6 Parts of this work has been published in (Kietz et aI., 2000b; Maedche and
Staab,2000b)
7 This means that all nominal phrases N Pi contained in a definition of a dic-
tionary entry that are followed by a comma or by an and lor may be repre-
sent lexical entries that refer to super classes of the given dictionary entry,
respectively.
8 The work described in this subsection has been presented in (Maedche and
Staab, 2000c; Maedche and Staab, 2000a)
9 A detailed description of the text corpus and the overall application and
evaluation study is provided in section 4.
10The function Clos(H C , C) retrieves the set of concepts that are super- or sub-
concepts of a given C including the transitive closure based on the taxonomy
H C.
11 http://www.kun.nUcelex/
12http://www.taz.de/
13 The function UClos retrieves the set of concepts that are superconcepts of
C including transitivity based on H C .
14 The factor r can be provided by the user.
Chapter 7
THE TEXT-TO-ONTO ENVIRONMENT
Ein Knowledge Acquisition-Tool wird erst zu einem Knowledge Acquisition-Tool, wenn

man mindestens einmal damit Wissen akquiriert hat.
-(Ulrich Reimer, 2(00)
Following the point of view that "the proof is in the pudding", the method-
ological and theoretical research results of this book have been implemented in
the comprehensive TEXT- TO-ONTO ontology learning environment. TEXT-
TO-ONTO is the "running result" of the research described in this bookl.
TEXT- TO-ONTO is roughly separated into two main, historically caused parts:
In part I of this book the fundamentals of ontologies, ontology engineering and
ontology-based applications were introduced. According to these fundamen-
tals an environment for manually engineering and managing ontologies, the
ONTOEDIT ontology engineering environment, has been designed.
In part II an architecture, components and concrete mechanisms for support-
ing ontology learning for the Semantic Web have been provided. Inspired by
the idea of balanced cooperative modeling ONTOEDIT has been extended with
data import and processing techniques and algorithms that support the extrac-
tion and maintenance of ontological structures. Soon it became obvious that
for using ontology learning as a "plug-in" in an ontology engineering environ-
ment like ONTOEDIT, the mechanisms and user interfaces for data import and
processing and algorithm execution are extremely important when using it in a
real-world scenario.
TEXT- TO-ONTO builds on the capabilities of a comprehensive ontology
management and engineering environment and provides means for semi-auto-
matic ontology extraction and maintenance building on different kinds of input
data. This chapter describes TEXT- TO-ONTO, the ontology learning environ-
ment. The chapter is separated in three sections: First, the component-based

approach (see (Brown, 2000) for an overview on component technologies)
that builds the basis for the implementation of the environment is introduced in
section 1. Three different types of components are distinguished, mainly data
structure components, processing components and interface and user interaction
components.
Second, as mentioned above, TEXT- TO-ONTO is based on an environment

for manually engineering ontologies, the ONTO EDIT ontology engineering
environment. In section 2 the functionality of ONTOEDIT will be described.
ONTOEDIT follows the layered ontology engineering approach introduced in
chapter 3. Thus, it will be described how lexical entries C may be collected
using the environment. Subsequently it is shown how concepts C are modeled
based on these lexical entries and organized in the taxonomy He. An important
aspect of ontology engineering is the support of the definition of non-taxonomic
relations between concepts, and axioms that are manually engineered using
graphical user interfaces. Axioms are considered as an important building block
of ontology engineering. Although, no means are offered for automatically
extracting and proposing axioms to the ontology engineer, mechanisms for
defining axioms based on the extracted structures using graphical means are
introduced. Additionally, it is shown how instances I and relations between
them may be defined using ONTO EDIT on the basis of a given ontology. It is
also explained how ONTO EDIT connects to SiLRI, the F-Logic based inference
engine. Examples are given how the ontology and an associated knowledge base
may be queried and substitutions may be calculated according to the F-Logic
semantics.
The main underlying idea of the overall approach is that each modeling step
may be done by the user or by an ontology learning algorithm. Thus, section 3
explains the functionality of the TEXT- TO-ONTO ontology learning environ-
ment building on ONTOEDIT. This section is started by explaining the main
features of the data import and processing component that may be accessed by
the management component of TEXT- TO-ONTO. Subsequently, some exam-
ples are shown how the algorithms for ontology extraction and maintenance
contained in the library may be accessed by the user. An important aspect is
the adequate presentation of extraction and maintenance suggestions of an al-
gorithm. The different views available and implemented in TEXT- TO-ONTO
for result presentation are explained.
Finally, before this chapter is concluded a list of important issues of fu-

ture work for improving the ontology engineering and learning environment is
provided.
The TEXT- TO-ONTO Environment 153
1. Component-based Architecture
The main components of the TEXT- TO-ONTO environment and the inter-
actions between them are depicted in Figure 7.1. The reader may note the
relation to the ontology learning framework and its accompanying architecture
introduced in chapter 4. As mentioned above in the implementation of the
architecture it is internally distinguished between the data structures, the pro-
cessing modules and the graphical user interfaces. The reader may note that
all components are connected with the ontology, knowledge base and lexicon
model that play an important role in the overall environment. These three mod-
els are implementations of the formal definitions introduced in chapter 2. Three
Data Import &

Processing Modules
Text-To-Onto
Graphical User
Interface
Ontology, Knowledge Base

and Lexicon Model
OntoEdit
Graphical User
Interface
I F-Log~ I
I~O
Figure 7.1. TEXT-To-ONTO Components
different types of components are distinguished:

• Data Structure Component: The lexicon, ontology and knowledge base
model is an implementation of the ontology and know ledge base structures
o and KB introduced in Definition 2.1.
• Processing Components: Processing components in TEXT- TO-ONTO are
a natural language processing environment (in the concrete implementation
SMES, see chapter 5, subsection 2.2), an inference engine, several import
and export modules supporting different ontology representation languages,
a library of algorithms and data import and processing components.
• User Interface & Interaction Components: Interface and user interaction
components are the ONTOEDIT GUI and the TEXT-To-ONTO Manage-
ment GUI, e.g. including a document browser, several algorithm parame-

terizations interfaces and result browsers.
The data structure follows Definition 2.1 and implements the layered en-
gineering and representation approach. The processing components are im-
plementations of the techniques and algorithms that have been introduced in
chapter 5 and 6.
Figure 7.2. TEXT-To-ONTO Ontology Learning Environment
As mentioned above this chapter will mainly focus on the user interface and
interaction component supporting learning and engineering ontologies for the
Semantic Web. The overall user interface of TEXT- TO-ONTO is depicted
in Figure 7.2. The reader may note that there is no strict separation between
manual engineeering ontologies using ONTOEDIT and the usage of ontology
learning techniques contained in TEXT- TO-ONTO. Again, this fact reflects
the paradigm of balanced cooperative modeling.
2. The Ontology Engineering Environment ONTOEDIT

ONTOEDIT is the implementation of the ontology engineering approach
that has been introduced in chapter 3. A screenshot ofthe running environment
has been already depicted in Figure 4.3. In the following, it is introduced how
ONTOEDIT supports the development of ontologies according to the layered

ontology engineering paradigm as introduced earlier.
The Lexicons £,0 , £}(B. Ontology engineering typically starts with the col-
lection of lexical entries that refer to concepts, relations, and instances. Figure
7.3 depicts the view that supports the definition of lexical representations for
the core primitives C, n, I contained in the ontology 0 and the knowledge base
KB (following the Definitions 2.1 and 2.3). The reader may note that a m:n
mapping between the lexical entries and the core primitives has to be supported.
Figure 7.3. OntoEdit·s View for Lexical Layer Definition
Several meta-information in the form of built-in and freely definable "tags"

may be attached to a specific lexical entry. Built-in tags comprise basic infor-
mation such as the language of a lexical entry and a comment. "Free definable"
tags may be generated for specific applications. In the concrete applications
that used OntoEdit, the following tags have been defined:
• Tags referring to natural language processing, like the part-of speech (POS)
information depicted in Figure 7.3.
• Tags referring to graphical user interfaces, like the specific definition of

lexical entries that are shown in the graphical user interface of the respective
ontology-based application.
Figure 7.3 depicts parts of the bilingual (GermanlEnglish) lexical represen-

tation for a tourism ontology developed in the GETESS project (Staab et al.,
1999; Klettke et al., 2001). The reader may note that the lexical entries de-
fined are already morphologically reduced to their stems with respect to the
interaction with the natural language processing environment SMES.
Emp!O¥~~
$
e- !>z.d~m"Slaff
,&,:jmlr;;:5-"tr<:<'ti'ysSta:ft
$ M"n.9N
$ TethnicSlSt.IT
:;>$ student
k$prMuc1
:2·'-". Pr-;1jed
+'.-.$ Ftit:;ic~tin-n
+$ Topic
Figure 7.4. View for Modeling Concepts and Taxonomic Relations
Concepts C, Concept Hierarchy H C and Relations R. On top of the lexical

entries £, the set of concepts and relations is defined. Figure 7.4 depicts the
view for modeling concepts and their arrangement in the concept taxonomy
H C • Concepts may be defined using multiple inheritance, a specific view for
inspecting multiple inherited concepts is available by pushing the "More... "
button. By pushing the button a new window is depicted that shows all super-
concepts and lexical entries of the given and selected concept. As depicted in
Figure 7.4 the concept EMPLOYEE is edited2 . Multilingual documentation as
well as other meta-information (e.g. if the concept is concrete or abstract) may
be attached to this specific concept. The example given in Figure 7.4 has been
taken from the Semantic Web Research Community (SWRC) Ontologi.
The set of of non-taxonomic relations may be defined using different views.
Figure 7.5 depicts the two views available in ONTO EDIT for defining relations.
On the right side of Figure 7.5 relations may be defined as first order objects
with a unique identifier without defining a domain and range of the relation.
Typically the unique identifiers of relations are not visualized, instead the lexical
entries for a specific relation are used.
The view depicted in the middle of Figure 7.5 lists relations assigned to
a specific domain concept. In this view, the ontology engineer may define
new relations with domain and range restrictions or add domain and range
restrictions to existing relations.
Figure 7.5. Views for Modeling Non-Taxonomic Relations
In Figure 7.5 a relation excerpt from the SWRC ontology that has been
developed for the Semantic Web Research Community is depicted. The con-
cept ACADEMICSTAFF has been selected in the concept hierarchy view. Re-
lations that are marked grey are inherited from super-concepts of the concept
ACADEMICSTAFF, e.g. the relation NAME has been inherited from more generic
concepts (e.g. the concept PERSON). A number of relations are in particular
attached to the concept ACADMICSTAFF, e.g the relation WORKSATPROJEC with
the range PROJEKT. In contrast to the conceptual definition of an ontology 0,
ONTOEDIT provides several pre-defined datatypes such as STRING and IN-
TEGER that reference the datatypes contained in the XML-Schema standard4 .
The ontology engineer may define meta-information to a concrete relation,
e.g. minimum and maximum cardinalities, multi-lingual documentation and
further lexicalizations.
Engineering Axioms Using ONTOEDIT. An important aspect of ontology

engineering is the definition of axioms to ensure semantic constraints and to
generate completed views on data. As mentioned earlier there are no ontology
learning algorithms for supporting the extraction of axioms from given Web
data. However, the manual engineering of axioms using our semantic pattern
approach is considered here. ONTO EDIT offers graphical means for modeling
axioms and supporting the ontology engineering with adequate user interfaces
for the definition of axioms at a conceptual level. Two small examples are
given for instantiating two simple patterns, namely inverse relations and disjoint
concepts. The interested reader is referred to the comprehensive description of
the approach given in (Staab et aI., 200Ia).
Figure 7.6. View for Modeling Inverse Relations
Inverse Relations. A simple example for inverse relations has been given
in subsection 3.9 as follows: Consider there exists a relation WORKS_AT hold-
ing between the concept PERSON and the concept PROJECT, and a relation
HAS_PARTICIPANT between PROJECT and PERSON. Explicit modeling that these
two relations are inverses of each other has several advantages: (i) it ensures
consistency in the knowledge base (e.g. it is impossible to have a project with a
participant that doesn't work at that project) and (ii) it frees the user from pro-
viding redundant information. Figure 7.6 depicts the general view of modeling
"relation axioms" and some concrete inverse relation axioms for the tourism
domain. On the left side all relations with their corresponding domain / range
restrictions are listed and provided to the user. On the right side a table for
defining that two relations are inverse is provided to the user. The user may
add a relation from the left pane via pushing the "+" button. Inverse relations
may be defined globally, e.g. that IN _G EBIET and BIETET _UNTERKUNFT in general
are inverse, or, locally restricted, e.g. that IN_GEBIET(UNTERKUNFT,GEBIET)
and BIETELUNTERKUNFT (GEBIET,UNTERKUNFT) are inverses of each other.
Global and local inverse relations are distinguished by the selection of specific
domain restrictions, respectively.
Disjoint Concepts. Figure 7.7 depicts the general view for modeling so-called
"concept axioms". Disjoint axioms are specific concept axioms that enforce
consistency and quality for the ontology and the knowledge base. Using the
view for modeling disjoint concepts the ontology engineer may select concepts
The TEXT-To-ONTO Environment 159
from the left pane and explicit model their disjointness. If more than two
concepts are selected, the cartesian product of disjointness between the selected
concepts is computed and the particular disjoint concepts are represented in the
right pane of Figure 7.7.
Pe-rst)ef!i~Che:;:_Oi!1g
TeHw'Bis€-_Mat€:lie))$$_Dir:g
«t:K>J::stwerk
'J"'FG
.- LBb-e-V~~'Bsen
Person
H $I ,ie'
HG Fisch
e:--t): Saeugetlef
e$l Vogel
2",0, Organ1s.alh:m
Figure 7.7. View for Modeling Disjoint Concepts
The ontology engineer may check the consistency of the actual model con-
cerning disjoint concepts by clicking on the check button.
Defining a Knowledge Base: Instances, Concept & Relation Instantiations.

In Definition 2.3 the knowledge base structure KB := (V,I, inst, instr) was
introduced consisting of instances I, concept instantiations (given by the func-
tion inst) and relation instantiations (given by the function instr).
Figure 7.8 depicts the corresponding view supported by ONTOEDIT for
defining a knowledge base on top of a modeled ontology. On the left side of
Figure 7.8 the concept hierarchy view is depicted, on the right side correspond-
ing instances with their concept instantiations are listed.
Using the instance view, the ontology engineer typically selects a concept in
the concept hierarchy that should be instantiated. In this specific case, the con-
cept PERSON has been selected for instantiation. Selecting a specific concept
in the concept hierarchy triggers the window on the right upper side of Figure
7.8: All defined instances of the concept PERSON and its sub-concepts, e.g.
AUFSICHTSRATCHEF, are listed. The user may now modify the instances, e.g
by adding relations between the listed instances and other defined instances.
In the right lower side of Figure 7.8 the dialogue for modifying and adding
relations between instances is depicted. The dialogue uses the information
contained in the ontology, e.g. an instance of the concept PERSON may only
instantiate relations that are defined for the concept PERSON in the ontology. In
this small example, the relation HAT _NAME has been selected and instantiated,
Ge:se-llst!,i$l'ter
On.::md:Ji)~sm:t:Jli~<1
A Haef10ler
Figure 7.B. ONTOEDIT'S Knowledge Base View
referring to a literal. Another example is the relation instantiation using the

relation GEHOERLORGANISATION.-AN. This relation instantiation connects the
two instances ama:person and University_oLKarlsruhe.
Access to the F -Logic-based Inference Engine. ONTO EDIT access two

inference engines: First, the F-Logic inference engine SILRI. Second, the
description logics engine FACT (Horrocks, 1998). Here, it is focused on the
F-Logic-based access to the defined ontology and knowledge base. A short
introduction is given how the F-Logic engine may be used within the overall
framework.
As mentioned earlier a mapping from the ontology 0 and know ledge base
structure KB to F-Logic has been defined. Additionally, the definition of rules
or axioms in F-Logic that are not captured in a semantic pattern is supported.
Figure 7.9 depicts the F-Logic rule editor for the definition and accessing of
F-Logic rules. On the left side the F-Logic translations of instantiated semantic
patterns (e.g. symmetric) and freely defined rules (like the WELLNESS-TOURIST
rule) that can be switched on and off are depicted.
An instantiated ontology and knowledge base structure may also be queried
by the ontology engineer. Figure 7.10 depicts the view for querying the S IL RI
F-Logic inference engine that may be directly accessed by ONTOEDIT. Open-
ing the query view generates one instance of the inference engine with the
corresponding ontology, and if available the knowledge base. In the example
given on the right side of Figure 7.10 the following F-Logic query is evaluated
by the inference engine:
Figure 7.9. F-Logic Axiom Engineering
J:: .. "A.'!e9:d~~~"
k "" ".~~2"l':2Ind2r 1i1ll*dJ::h1::'"
t: .. "+4,,··(\:.l}?11··!).{lS l):SS:S"
z .. "'CRicl:J.t~r'"
k ,. "Cot.nd:lia Ric'ht:l:l:t:"
.e x «~ 'nl $0$ 4063»
t::..<::.io:n~n t1.t Ij.en llet9.'!illSJ'ten Rul1!~~
·:?hD~t.ud~nt.~~Wdent~G'-$uet~" 82546.0
&.lo1a:e:dche ,:St:..<dent.lJ.::&&~e.u:"" 626ao.O
A.."!ae<iC'~.e / Smder.t:., P1>.D$t:u.d.ent.... S2Sa7. Q
Figure 7.10. View for Querying the SILRI F-Logic Inference Engine
(8) FORALL z, k, e +- z : STUDENT[NAME -+t k] and Z[PHONE --» e].
Using this query it is asked for all instances of the concept STUDENT with
their names and phone number. The results are given as substitutions of the
variables z, k, e. The left side of figure 7.lO depicts the explanation to the
evaluation results. It describes the rules that have been activitated for generating
the appropriate substitutions, e.g. the transitivity of subconcept relationships
(see rule "2.0"):
(9) FORALL X, Y, Z (sub_(X, Z)) < -((sub_(X, Y) and sub_(Y, Z))).

3. Components for Ontology Learning

In the last section, parts of the comprehensive management and graphical
user interface for manually engineering ontologies, viz. the ontology engi-
neering environment ONTOEDIT were explained. In this section the focus
is set on components that extend ONTOEDIT towards the ontology learning
environment TEXT-To-ONTO.
Data Import & Processing Environment. As introduced earlier the ontol-

ogy engineer starts with data import & processing. Figure 7.11 depicts the doc-
ument selection and processing environment. Clicking on the buttons depicted
in the left upper part of Figure 7.11 the user may index a number of documents
from the web ("+Web") or from the local file environment ("+File"). If she
selects the web button, the ontology-based crawling algorithm is executed and
a learning corpus is automatically collected from the Web.
Figure 7.11. View for Data Selection and Processing
The dialog shown in the upper part of Figure 7 .11 allows to perform different
operations on the indexed documents. First, if necessary, HTML elements may
be eliminated and substitution rules may be applied. Second, the natural lan-
guage processing component for normalizing the documents may be accessed
from this dialogue. An important aspect is that the natural language processing
component accesses the actual ontology and its lexicon to refer the documents
to the available background knowledge. The linguistically pre-processed doc-
uments serve as input for the transformation module. An example view for
generating a document-concept relation has been given in Figure 5.12. Another
graphical component that is available is the document wrapper for importing
semi-structured documents, in particular domain-specific dictionaries.
Algorithm Library. The algorithm library takes as input properly prepro-

cessed data. Several examples how algorithms may executed or relevant infor-
mation for algorithm application may be defined are given in the following.
Pattern Engineering. The regular-expression based mechanism for extract-

ing hierarchical and non-hierarchical relations from text has been introduced in
subsection 1.2.2. Using the pattern debugger regular expressions may be iter-
atively developed and refined by applying the example texts. Existing patterns
may be adapted to a specific type of texts using the pattern debugger. Figure
7.12 depicts the view of pattern engineering that allows to development and
debugging of regular expression patterns for ontology learning. To support the
user in engineering patterns is an important aspect for their concrete applica-
tion. On the left side of Figure 7.12 a specific pattern stored in the repository
of the environment has been selected and is presented to the user. Patterns have
a name, a short description, belong to a pattern category and naturally include
a concrete regular expression.
,;1':t:AD • "~':~J
t:stlE {::S.£rf i:JliODS "'-v:n;2o!::g~""J- (:HEAD ~ "t:~e:n"l)
1:5£:H {:Ml)DS "beruthf;n"; f,:r:EAD • «o;.roz.:"o-rge"i'

~!)t't . "):1-~t"l)
! :Affi
Figure 7.12. Graphical Interface for Pattern Engineering
On the right side of Figure 7.12 the view for pattern engineering and debug-
ging is depicted. A specific selected pattern may be applied to a test document
and matching elements are presented to the user. Using this view, the user may
adjust a specific pattern to the available domain texts or to a specific dictionary.
A detailed -description of the pattern engineering approach is available in (Volz,

2000).
Lexical Entry Extraction. Lexical entry extraction is a baseline mechanism

for extending the lexicon and for proposing new concepts. In the current im-
plementation the user may select between the two extraction measures lef or
tfidf and define a threshold that the computed measures for a lexical entry must
exceed.
Hierarchical Clustering. Hierarchical Clustering has been introduced as a

mechanisms for deriving concept hierarchies or for extending existing concept
hierarchies. A problem of hierarchical clustering is that it may be used within
different parameter constellations. Graphical means are provided for selecting
the different possible parameters (e.g. the similarity functions and the similarity
measures) and for selecting existing background knowledge that may be used
by the naming mechanisms as explained in Figure 6.1.
Figure 7.13. Non-Taxonomic Relation Extraction Algorithm View
Non-taxonomic Relation Extraction. In chapter 6 the different mechanisms

for the extraction of non-taxonomic relations have been introduced. The tech-
nique is applicable on the lexical and the concept level. If a taxonomy of
concepts He is available, learning uses this taxonomy as background knowl-
edge. Figure 7.13 depicts the graphical interface for starting the mechanism
for non-taxonomic relation extraction. The dialogue on the left side of Fig-
ure 7.13 offers different configurations that may be defined by the user: First,
the algorithm may be constraint to specific concepts as described in chapter 6,

subsection 1.3.3. Second, the user may define different support & confidence
values. In chapter 8 it will explained what a modification of this values means
for the extraction results. Experiences have shown that typically the user starts
with high support & confidence values to explore general relations (that ap-
pear often) and then, subsequently decrease the values to explore more detailed
relations.
Ontology Pruning. Ontology pruning has been introduced as "inverse" lex-

ical entry extraction. It is distinguished between the baseline pruning and the
relative pruning algorithm. In the first case the user selects a set of documents,
the frequency measures (lef or tfidf), and the threshold. The algorithm then pro-
poses concepts that may be eliminated from the ontology. In the second case
the user selects a domain specific and a generic corpus, a frequency measure,
and a threshold. Again as a result of applying this technique a set of concepts
that may be pruned is proposed.
Ontology Refinement. Within ontology refinement it is focused on the ex-

traction of important lexical entries according to their conceptual behaviour and
the definition of a mapping of these unknown lexical entries to existing con-
cepts. The user starts the refinement algorithm by selecting a relevant corpus,
a similarity measure, and a threshold (k). Then, iteratively an unknown lexical
entry with its k similar concepts is retrieved and proposed to the user.
Result Presentation. It has been introduced that different algorithms may

generate the same or similar results. Depending on the parameters the algo-
rithms generate a lot of results. Hence, a suitable mechanism for presenting
the extracted results is required. Different mechanisms for result presentation
have been implemented in the ontology learning framework. The attention is
restricted to the result presentation component (see Figure 7.14) and the graph-
based visualization (see Figure 7.15).
Result Presentation Component. The result presentation and matching com-

ponent may be separated into two different views: The first view presents single
entities that have been extracted (e.g. entries in CC ). The second view depicts
binary relations between entities (e.g. taxonomic and non-taxonomic relations
between concepts) that have been extracted. Figure 7.14 depicts these two
views. On the left side of Figure 7.14 the single entity view with extracted
lexical entries for concepts is depicted. On the right side of Figure 7.14 the
view depicting binary relations between entities is shown.
Figure 7.14. Result Presentation View
Graph-based Visualization. The graph-based visualization mechanism is

based on the spring-embedding algorithm described in (Kamada and Kawai,
1989). Figure 7.15 depicts a specific visualization of a concept network that
has been generated by applying the algorithm for extraction non-taxonomic
relations. The graph-based visualization has the advantage the user may not
only explore binary relations, but also "chains" of relations between concepts.
Figure 7.15. Graph-based Visualization

4. Conclusion
In this chapter parts of the implemented environment TEXT- TO-ONTO, the
ontology learning environment as the running result of the research described
in this book have been presented . Following the quote from the start of this
section, TEXT- TO-ONTO has been indeed successfully used for "real-world
knowledge acquisition" from existing Web data.
However, as usual in developing software environments, there is still a lot
of further work required for improving the ontology engineering and learn-
ing environment. In the following important possibilities for extending the
environment are sketched: First, as mentioned earlier the TEXT- To- 0 NTO
environment is still in an early stage with respect to providing methodologi-
cal guidelines in applying ontology learning supporting ontology engineering.
Thus, in the future the integration between a comprehensive methodology with
support of the application of semi-automatic means is required. This holds es-
pecially for the difficult tasks of data import and processing, where experiences
have to be collected and provided to the ontology engineer.
Second, at the current moment the TEXT- TO-ONTO ontology learning en-
vironment is restricted to textual data that is based on the German language.
It has already been mentioned that in general the natural language processing
component may be replaced or complemented by another NLP component, e.g.
for English. However, using different engines for processing natural language
requires clearly defined representation layers for abstracting from language
specific issues. Using TEXT- TO-ONTO in a multi-language setting would
require the development of a generic technique for layering NLP results gen-
erated by different language-specific engines.
Furthermore, a flexible plug-in architecture should be developed. Specific
applications typically demand extensions or adaptions of existing functionality
of the ontology learning and engineering environment. A plug-in architecture
is currently developed to support the location and the management ofplug-in's
that provide specific functionality. The architecture with its plug-in and service
mechanisms is described in (Handschuh et aI., 2001). The components are
dynamically plug-able to the core Ont-O-Mat component environment5 . The
plug-in mechanism notifies each installed component, when a new component
is registered. Through the service mechanism each component can discover
and utilize the services offered by another component (Handschuh, 2001). A
service represented by a component is typically a reference to an interface. This
provides among other things a decoupling of the service from the implementa-
tion and allows alternative implementations.
Finally, a server as a infrastructural kernel for Semantic Web develop-
ment and applications is required. The ontology engineering and learning
environment contains a lot of functionality that several other ontology-based
applications also require. Strictly spoken, the engineering environment de-
veloped within this book is only a client that access a server that acts as a
infrastructural kernel for semantic web applications: The system will be real-
ized as a component-based, extensible plug-in architecture tightly connected to
several core components: an ontology and fact repository for persistent storage
based on RDF as a basis for the other services, inference engines for offering
reasoning services, versioning, application programmer interfaces (API's) for
ontology engineering, maintenance, migration and integration and Semantic
Web applications6 . The challenge of this task is the provisioning of capable,
but open interfaces that allow the welding together of components that already
exist or are just about to be provided by the research community.
Notes
1 Parts of this chapter have been published in (Maedche and Volz, 2001).
2 Concepts are defined by a unique identifier that is typically not visualized.
Instead, optionally, the possibility to visualize the lexical entries for onto-
logical elements is offered.
3 The ontology may be accessed at the namespace
http://ontobroker.semanticweb.org/swrc-30-1 O-OO.rdfs
4 see http://www.w3.orgITRlxmlschema-2/
5 http://ontobroker.semanticweb.org/annotation/ontomatl
6 Further information is available on http://kaon.semanticweb.org
Chapter 8
EVALUATION
The evaluation of ontology learning for the Semantic Web is a challenging

task. No standard evaluation measures, like in the information retrieval com-
munity with precision and recall or in the machine learning community with
accuracy, are available for judging the quality of ontology learning techniques.
The unsupervised nature of ontology learning techniques makes evaluation even
more difficult than in typical supervised tasks such as classification.
A comprehensive framework for ontology learning as presented in this book
requires facilities for evaluating, comparing, characterizing and elaborating dif-
ferent ontology learning techniques. This fact is also reflected by an important
recent development in the area of natural language processing (NLP), namely
the use of more rigorous standards for the evaluation of NLP systems and
frameworks. It is generally agreed that the ultimate demonstration of success
is showing improved performance at an application task. Nevertheless, while
developing systems, it is often unconvenient to assess components of the sys-
tem on some artificial performance score. Characterization of the effects that
different ontology learning techniques have on the results of learning provides
methodological guidelines to help the ontology engineer select the most suitable
method for a given corpus or task and provide support to create a new one.
Little work has been done in the area of evaluating machine learning for
knowledge acquisition. In (Barker et ai., 1998) the authors describe a systematic
performance evaluation of the TANKA text analysis system for knowledge
acquisition. Their results confirm their basic assumptions, namely, that the
systems learns to perform better, that knowledge acquisition is possible even
from erroneous or fragmentary parses and that the process is not too onerous
for the user. An interesting approach describing an experimental evaluation of
integrating machine learning with knowledge acquisition (Webb et ai., 1999) is
based on structured data. The approach has shown that know ledge acquisition
with machine learning outperforms manual knowledge acquisition techniques.
Several experiences concerning the specific knowledge acquisition scenario
of ontology engineering and learning and the application of ontologies have
been collected within the projects and case studies carried out. The experiences
led to two different ontology evaluation approaches: (i) application-specific
evaluation and (ii) cross-comparing two ontologies using a gold-standard. In
the first approach ontologies are evaluated through the application in which
they are employed, e.g. for information retrieval or information extraction
using their standard measures l . The second approach follows an application-
neutral paradigm: To learn about the effects that different ontology learning
methods lead to by a given set of data, a hand-modeled ontology is compared
with ontologies that have been generated by a specific algorithm (and its spe-
cific parametrization). Here, the second approach is pursued, following an
application-neutral paradigm for evaluating the ontology learning algorithms
based on hand-modeled gold standards.
This chapter is organized as follows 2 . Section 1 introduces the general
evaluation approach that references to the foundations of chapter 3 and its
resulting definition of an ontology structure O. The approach is mainly based
on comparing a given ontology or parts of it with an extracted ontology or parts
of it, respectively. Comparison is done at two different levels, namely the lexical
level and the conceptual or semantic level. Section 2 introduces a number of
measures for comparing two ontologies or parts of them at the lexical and the
conceptual level. 3 Th~ measures are first applied in section 3. In this section
a case study for evaluating human performance using the evaluation approach
and measures are presented. The last section evaluates the ontology learning
techniques that have been introduced previously and provides methodological
guidelines for applying them in real-world scenarios. Additionally, the results
that may be generated by different ontology learning algorithms are compared
with human performance leading to the result that the combination of ontology
learning with human modeling capabilities delivers clearly better results than a
pure human modeling approach.
1. The Evaluation Approach

Based on the definitions of chapter 2, the evaluation approach pursues a lay-
ered approach. As mentioned above an implicit evaluation approach is used and
the similarity between a hand-modeled ontology, and an ontology that has been
generated by applying a particular ontology learning technique is measured. It
is assumed that a high similarity between the hand-modeled ontology and the
ontology learning-based acquired ontology indicates a successful application
of a particular ontology learning technique. This approach is based on the idea
of having a so-called gold standard (cf. (Grefenstette, 1994)) as a reference. In
Evaluation 173
the evaluation approach a given human-expert-modeled ontology is considered

as a gold standard that may be approximated by an ontology learning-based
acquired ontology.
Figure 8.1 depicts the overall setting for comparing two given ontologies
0 1 and 02. According to the ontology engineering framework (and the Figure
3.1) different kinds of measures are proposed for comparing two ontologies. In
particular, it is distinguished between the following:
1 Measures for computing an exact match between two ontologies using

adopted precision and recall measures (see section 2.1).
2 Measures for computing overlap or similarity at the pure lexical level (based
on disjoint lexicons of the two ontologies) (see section 2.2).
3 Measures for computing similarity at the conceptual level considering over-
lapping lexicons and computing similarity at the conceptual level with iden-
tical lexicons (see section 2.3).
r-------------- ..
ll C (C ll C2 ) llC (C 1, C2 )
c R(CI, C 2 )
~ ~(C], C2 )
1 j
£:c eR
o
---------------
2
Figure 8.1. Levels for Evaluating Ontology Learning
Comparing two ontologies (according to the definition) is not well researched.

Thus, it is not possible to rely on standard measures. Additionally, the compar-
ison of two ontologies is complex on account of the different layers considered
in an ontology (lexical vs. conceptual level). For the lexical level one may only
refer to similarity of form, i.e. string similarity, for the conceptual level a richer
set of relations that may be exploited is available, viz. the concept hierarchy
and the non-taxonomic relations. Ontology axioms A 0 are not included in the
evaluation approach.
2. Ontology Comparison Measures

As mentioned above several measures to evaluate an ontology for using the
evaluation approach are proposed in this book. The three subsequent subsec-
tions are separated according to three items listed above, namely precision-and
recall-based measures, measures considering the lexical level and measures

considering the conceptual level.
2.1 Precision and Recall

The first measures that are considered are precision and recall as typically
used in information retrieval. The general setting is that for many problems a
set of targets is given (in the ontology learning task, e.g. He) contained within
a larger collection (e.g. all possible combinations of a given set of elements).
The algorithm or the system then decides on a specific selected set of elements
(in the case of the information retrieval task, it selects a set of documents (=
retrieval) that are relevant for a given query).
Positive Documents ~------ Negative Documents
tp fp
Retrieval
fn
Figure 8.2. Introduction of Precision and Recall
Figure 8.2 depicts this general setting. The overall set of positive and negative
documents is separated into so-called true positives tp, true negatives tn, false
positives f p and true negatives tn. 4 The retrieval naturally consists of true and
false positives that are related to the set of false negative documents. Based on
these counts of the number of items given in the classical definition, a definition
of precision and recall for ontology learning is adopted.
DEFINITION 8.1 (PRECISION) Precision is a measure of the proportion of
selected items the system got right:
. . tp
preClSlOn = tp+ f p
Adopted to the ontology learning task precision is given as follows:
. . IComp n Refl
preclslOnOL = ICompl
Ref is the set of elements that are given in the reference ontology. Comp
is the set of elements that are contained in the comparison ontology. Elements
Evaluation 175
in this sense are primitives according to the ontology structure 0, e.g. like the
concept lexical entries £c, concepts C, the concept hierarchy or non-taxonomic
relations between concepts.
DEFINITION 8.2 (RECALL) Recall is a measure of the propertion of the tar-

get items that the system selected:
recall = tp f
tp+ n
Recall adopted to the ontology learning task is given as follows:
II _ ICamp n Refl
reca OL - IRefl
The measures precisionoL and recalloL are inverses of each other, i.e.
precisionoL(Ref, Comp) = recalloL(Camp, Ref) and recallodRef,
Camp) = precisionoL(Camp, Ref). Hence, in the following it is only re-
ferred to precisionOL> but "both directions" will be evaluated. This means,
when cross-evaluating two ontologies it will be evaluated how precisely a sub-
ject agrees with another subject and vice versa. Since recallOL is the inverse
of precisionoL this way will also yield all agreement recall numbers.
A note on precision and recall in ontology learning. Running the experi-

ments it has been recognized that precision and recall give good hints about how
to gauge the thresholds for the algorithms. Nevertheless, these measures lack a
sense for the sliding scale of adequacy prevalent in the hierarchical target struc-
tures. To evaluate the quality of elements proposed to the ontology engineer,
bonus to elements that almost fit hand-coded elements should be added. Finally,
the general idea is to compare different learning schemes on this basis. For this
reason, a number of new evaluation measures that reflect the distance between
the automatically discovered elements and the set of hand-coded elements are
presented in the following.
2.2 Lexical Comparison Level Measures

On the lexical comparison level the comparison of two ontologies is re-
stricted to comparing their lexicons without looking at the conceptual struc-
tures defined in the level above. The measures proposed on the lexical com-
parison level are based on the edit distance formulated by Levenshtein (Lev-
enshtein, 1966). The edit distance is a well-established method for weighing
the difference between two strings. It measures the minimum number of to-
ken insertions, deletions, and substitutions required to transform one string into
another using a dynamic programming algorithm 5. For example, the edit dis-
tance, ed, between the two lexical entries "Top Hotel" and "Top_Hotel" equals 1,
ed("TopHotel", "Top_Hotel") = 1, because one insertion operation changes the

string "TopHotel" into "Topjfotel". Based on Levenshtein's edit distance a
lexical similarity measure is proposed for strings, the String Matching (SM),
which compares two lexical entries L i , Lr
DEFINITION 8.3 (STRING MATCHING (SM))
min(ILil, ILjl) - ed(Li' Lj)) ]

SM(Li,L j ) := max ( 0, min(ILil,ILj!) E [0,1.
SM returns a degree of similarity between 0 and 1, where 1 stands for perfect

match and zero for bad match. It considers the number of changes that must
be made to change one string into the other and weighs the number of these
changes against the length of the shortest string of these two. In the example
above, one computes SM("TopHotel", "Top_Hotel") = ~.
4
SM may be generalized to n-tuples of strings := (if ... if) ,ij := (i} ... ij)
(e.g. for comparing pairs of lexical entries whose concepts are related by He).
Thereby, the geometrical mean is used to reflect that the similarity accumulated
from the component pairs (if, iT) should approach 0 when one component
pair is very dissimilar:
DEFINITION 8.4 (STRING MATCHING (SM) FOR n-TUPLES)
SM(4,ij) := yfrrm:=l...n SM(if, iT) E [0,1].
In order to provide a summarizing figure for comparison of the lexical level

of two ontologies. The lexica of 0 1 , O 2 referring to concepts £f, £f or re-
lations £f, £~ are taken as input to compute the averaged String Matching
SM(£O 1, £°2 )
DEFINITION 8.5 (AVERAGED STRING MATCHING SM)
SM(£1, £2) is an asymmetric measure that determines the extent to which

the lexical level of an ontology £1 (the target) is covered by the one of a sec-
ond ontology £2 (the source). Obviously, SM(£1, £2) may be quite different
from SM(£2, £1). E.g., when £2 contains all the lexical entries of £1, but also
plenty of others, then SM(£1, £2) = 1, but SM(£2, £d may approach zero.
SM diminishes the influence oflexical entry pseudo-differences in different on-
tologies, such as use vs. not-use of underscores or hyphens, use of singular vs.
plural, or use of additional markup characters. Of course, SM may sometimes
Evaluation 177
be deceptive, when two strings resemble each other though there is no mean-
ingful relationship between them, e.g. "power" and "tower". In the case studies
performed, however, it has been found that in spite of this added "noise" SM
may be helpful for proposing good matches of lexical entries.
2.3 Conceptual Comparison Level Measures

On the conceptual level one can compare conceptual structures of ontologies
0 1 , O2, that vary for concepts C1 , C2. In the model the conceptual structures
are solely constituted by taxonomic H C 1, H C 2 and non-taxonomic relations
R 1 , R2 with their domain and range restrictions. It has been already introduced
that two scenarios are considered at the conceptual level: partially overlapping
lexicons and identical lexicons. In the following a number of different measures
is provided for computing conceptual similarity between two ontologies with
overlapping and identical lexicons.
2.3.1 Comparing two taxonomies 1lC1l1lC2

It is started by determing the extent to which two taxonomies compare from
two particularly identified concepts. More precisely, assuming that one lexical
entry L E cf cfn refers via F1 and F2 to two concepts C 1, C 2 from two
different taxonomies H C 1, H C2. The conceptual structures of C 1 (C2 ) may be
constituted by the conceptual cotopy (SC) of C 1 (C2 ), i.e. all its super- and
subconcepts. The definition of the conceptual cotopy is given as follows:
DEFINITION 8.6 (CONCEPTUAL COTOPY (SC))
sqci , H C) := {Cj E ClHC(Ci , Cj) V HC(Cj, C i ) V Ci = C j }.

The reader may note that the semantic characteristics of H C as introduced
in Definition 2.1 are used. Thus, the "transitive closure" of concept Ci is
computed, based on the taxonomy H C and the reflexive relationship of Ci is
added to itself.
One may overlead SC to process sets of concepts, too. The definition for a
set of concepts is given as follows.
DEFINITION 8.7 (CONCEPTAL COTOPY FOR SETS OF CONCEPTS)
SCc({C 1 , .•. ,Cn },HC ):= U SqCi,H C).

i:=l. .. n
Example. A small example is given for computing SC based on a given

concept hierarchy H C and the definition introduced above. Figure 8.3 depicts
the example scenario graphically.
The conceptual cotopy SC(F( {"person"}), H C ) is given by
F11 (SC (F ( {"person"} ), H C )) = {"student", "researcher", "person"}.
Figure 8.3. Example for Computing SC
As mentioned earlier the so called taxonomic overlap (TO) between H C 1

and H C 2 as seen from the concepts referred to by a lexical entry L' may be
computed by following .1"1 1 and .1":;1 back to the common lexicon.
DEFINITION 8.8 (TAXONOMIC OVERLAP (TO))

TO' (L' (] (]):= IF11 (SC(F( {L'}), 1{c J)) n F;I (SC(F( {L'}), 1{C 2))1.
, I, 2 IFI I(SC(F({U}), 1{C 1)) U F2 I(SC(F({U}), 1{C 2))1
Averaging over all lexical entries one may thus compute a semantic simi-
larity for two given hierarchies. In addition, however, one must consider the
case where a lexical entry L" is in CC1, but not in CC 2. Then, the simplest
assumption is that the L" is simply missing from Cc 2 , but when comparing the
two hierarchies the optimistic taxonomic approximation is the one that searches
for the maximum overlap given a fictive membership of L" to CC 2. This may
be reached by solving the following maximization problem.
DEFINITION 8.9
Td'(L" (] (] ):=max{IF1 1 (SC(F({L"}),1{CJ))nF;I(SC(C),1{C 2 )1}.
, I, 2 CE C2 IFII(SC(F({L"}),1{cJ))UF21(SC(C),1{C2)1
Given these premises the average similarity TO between two taxonomies

(H C1, H C2) of two ontologies (01, O2 ) may then be defined as:
Evaluation 179
with
ifLE.q
if L ~ C~
Example. A small example for taxonomy comparison is depicted in Fig-

ure 8.4. The taxonomic overlap TO' ("hotel", 1{C 1, 1{C 2) is determined by
.1'1 1 (SC(.1'( {"hotel"}), 1{C I)) = {"hotel", "accomodation"} and
.1'2 1 (SC(.1'( {"hotel"}), 1{C 2)) = {"wellness hotel", "hotel"} resulting in
i
TO' ("hotel", 1{C 1, 1{C 2) = as input to TO.
Figure 8.4. Two Example Ontologies 0 1, O2
Consider the lexical entry "accomodation", which is only in the tax- cf,
onomic overlap is computed as follows: For the lexical entry "accomodation"
.1'1 1 (SC(.1'( {"accomodation"}), 1{C 1)) ={"youth hostel", "accomodation", "hotel"}
is computed. The concept referred to by "hotel" in C2 yields the best match re-
sulting in .1'2 1 (SC(.1'( {"hotel"}))) = {"wellness hotel", "hotel"} and
TO" ("accomodation", 1{C 1, 1{C 2) = :l-. The reader may note several properties
of TO:
• First, TO is asymmetric. While TO' is a symmetrical measure, TO" is

asymmetric, because depending on coverage it may be very easy to integrate
one taxonomy into another one.
• Second, obviously TO becomes meaningless when cf

and cf
are disjoint.
The more cfand cf
overlap (or are made to overlap, e.g. through a syntactic
merge), the better TO may focus on existing hierarchical structures and not
on optimistic estimations of adding a new lexical entry to cf.
2.3.2 Comparing Non-Taxonomic Relations

On the lexical level a relation R 1 is referred to by a lexical entry L E Ln.
On the conceptual level it specifies a pair R 1 (C 1, C 2 ), C 1,C2 E C describing
the concept C 1 the relation belongs to and its range restriction C2 . Thus, on
the conceptual level the computation of similarity of two relations R 1, R2 boils
down to comparing their domain-range pairs, i.e. Rl (C 1 , C 2 ) with R 2 (C3 , C 4 ).
Computation of the overall similarity of two sets of relations Rl with R2 enu-
merates over pairs from the two ontologies. The accuracy that two relations
match, RO (relation overlap), is determined based on the geometric mean value
of how similar their domain and range concepts are. The geometric mean re-
flects the intuition that if either domain or range concepts utterly fail to match,
the matching accuracy converges against 0, whereas the arithmetic mean value
might still tum out a value of 0.5.
The similarity between two concepts (the concept match CM) may be com-
puted by considering their conceptual cotopy. However, the measures derived
from complete cotopies underestimate the place of concepts in the taxonomy.
For instance, the conceptual cotopy of the concept corresponding to "hotel" in
£2 (see Figure 8.4) is identical to the conceptual cotopy of the one correspond-
ing to "well ness hotel". Hence, for the purpose of similarity of concepts (rather
than taxonomies), the upwards cotopy (UC) is defined as follows:
DEFINITION 8.11 (UPWARDS COTOPY (UC))
UC(Ci , H C) := {Cj E CIHC(Ci, Cj) V Cj = Cd.
Again, the semantic characteristics of H C are utilized as introduced in Def-

inition 2.1. However, the attention is restricted to super-concepts of a given
concept Ci and the reflexive relationship of Ci to itself. Based on the definition
ofthe upwards cotopy (UC) the concept match (CM) is then defined in analogy
to TO':
DEFINITION 8.12
CM(C 0 CO).= IFl1 (UC(C 1 , HCd) n F:;I(UC(C2 , H C2))1
1, 1, 2, 2 . IFl1 (UC(C 1 , HCd) U F:;I(UC(C2 , HC 2 ))I·
Example. A small example is given for computing UC and CM based on a

given concept hierarchy H C • Figure 8.5 depicts the example scenario graphi-
cally. The upwards cotopy UC(F( {"researcher"}), H C ) is given by
Fl1 (UC(F( {"researcher"}), H C )) = {"researcher", "person"}. The upwards
cotopy UC(F( {"project"}), H C ) is computed by Fl1(UC(F( {"project"}),
H C )) = {"project"}.
One may also compute the concept match CM between two given specific
concepts C 1 , C2 , e.g. computing the concept match CM between
Evaluation 181
F--- --person
------------------
student Jesearcher
.. ----
---------project
.---- research project
4l--
Figure B.S. Example for Computing UC and CM
F( {"researcher"}) and F( {"project"}) results in 0, the concept match between

CM between F( {"researcher"}) and F( {"student"}) is given as !.
Based on upwards cotopy and concept match one may now approach the defi-
nition of the non-taxonomic relation overlap. The definition of RO is separated
in two parts: First, without considering lexical entries, RO' of two relations
R 1, R2 is defined as follows:
DEFINITION 8.13 Let d := CM(domain(Rd, 0 1, domain(R2) , (h) and

r := CM(range(R1)' 0 1, range(R2) , Od.
RO'(R 1,01,R2, 02) := ~
RO' is based on the geometric mean value of how close domain and range con-
cepts match such as given by the concept match CM6. Basically, this measure
reaches 100% when both concepts coincide (i.e., their distance in the taxonomy
He is 0); it degrades to the extent to which their distance increases; however,
this degradation is seen as relative to the extent of their agreement.
In order to take reference by L E cf, LEer into account, RO" is defined as

follows. Again, equal to the definition of TO" one has to solve a maximization
problem.
DEFINITION 8.14
An important aspect is that some of the lexical entries may only refer to
relations in R 1 , thus, this is reflected by the following definition, where one
maximizes over all relations contained in RE without considering their lexical
representation.
DEFINITION 8.15
Combining the two definitions introduced above, one is given for L E

nally the following measure:
£r fi-
DEFINITION 8.16
if L E £R 2
if L tt £R 2
The averaged relation matching accuracy RO is then defined by:
Example. Figure 8.6 depicts a simple example setting. One relation Rl in

0 1 is assumed, referenced by "located at" and specifying the domain and range
corresponding to (HOTEL, AREA). In O2 , the same lexical entry may refer
to R2, with domain and range corresponding to (HOTEL, CITY). Computing
CM for the concepts referred to by "hotel" in 0 1 and O 2 results in ~. The CM
between the concepts referred to by "area" in 0 1 and "city" in O 2 also returns ~.
Thus, the RO' for the lexical entry "located at" results in J~ .~ = 0.5 as input
to the overall RO.
The reader may note two major characteristics of RO:

• First, the values for computation of relation overlap, depend on the agree-
ment between the (i) lexica and (ii) the taxonomies of 0 1 and O2 - In general
it can be said that without reasonable agreement on lexica and the concept
taxonomy, RO may not reach high values.
• Second, RO is also asymmetric reflecting the coverage of relations by the
first and the second ontology_
Evaluation 183
Figure 8.6. Two Example Ontologies 0 1 ,02
3. Human Performance Evaluation

In the previous sections a number of different measures that allow comparing
ontologies have been introduced as a basis for our evaluation approach. In this
section a particular case study is presented that has been carried out in a seminar
on ontology engineering at the institute AIFB, University of Karlsruhe. Two
main objectives have been pursued with the evaluation study: (i) to determine
the quality of the measures and evaluate them on actual data, and, (ii), to inves-
tigate how similar different ontologies are that have been modeled by different
persons.
reference
ontology
Figure 8.7. Measuring Human Modeling Performance
Figure 8.7 depicts the overall evaluation scenario for measuring human per-
formance. Several students generated complete ontologies or parts of it «OS)
from different starting points (see the detailed description in the following).
One ontology (Ogold) has been defined within the GETESS project using
standard knowledge acquisition techniques (e.g., questionnaire, competency
questions, etc.), serving as the gold standard for our evaluation. The "basic,
gold-standard" ontology consisted of 2,690 bilingual lexical entries, 1,087 con-
cepts, 1,200 taxonomic relations H C , 199 non-taxonomic relations. An simple

"evaluation" ontology has been extracted from the basic gold-standard ontol-
ogy and was restricted to 1,200 bilingual lexical entries, 311 concepts, 322
taxonomic relations H C , 71 non-taxonomic relations.
3.1 Ontology Engineering Evaluation Study

The experiment was carried out with four subjects, viz. undergraduates in
industrial and business engineering. The modeling expertise of the subjects
was sparse. Before actual modeling, they received 3 hours training in ontology
engineering in general and 3 hours in using the ontology engineering workbench
ONTOEDIT. Our study required each of them the building of ontologies in the
tourism domain using their background knowledge and using web pages from a
WWW site 7 about touristic offers, e.g. hotels with various attractions or cultural
events.
The objective was an overall cross-comparison of ontologies, but also to test
the appropriateness of single measures. To avoid error chaining, the evaluation
has been performed in three phases (resulting in 4 . 3 = 12 ontologies). Fur-
thermore, the "gold standard" ontology served as the 13th ontology. The three
phases may be characterized as follows:
• Phase I: A small top level structure was given to the subjects. 8 Based on this
top level and the available knowledge sources (the set of documents from
the tourism information provider), the subjects had to model a complete
tourism domain ontology. To keep the ontologies within comparable ranges
the students were required to model around 300 lexical entries referring
to concepts and 80 lexical entries referring to relations. Additionally, the
subjects were required to embed the concepts in the concept taxonomy and
define domain and range restrictions for the relations.
• Phase II: The second phase was geared to produce results for TO and RO,
while avoiding the uncertainties of lexical disagreement. Therefore, the
subjects were given 310 lexical entries (for concepts) from the gold standard
and the top level structure described before. Everyone of them had to first,
model the taxonomy for concepts (concepts referred to by the 310 lexical
entries), and, second, model about 80 lexical entries of relations referring
to relations and the domain and range restrictions of the relations.
• Phase III: The last phase was carried out to control RO in absence of "noise"
from different taxonomies and lexica. There the taxonomy (from the gold
standard) was given. It consisted of 310 lexical entries, £c, and a set of 310
corresponding concepts, C, taxonomically related by H C • The subjects had
to model about 80 lexical entries of relations referring to relations and the
domain and range restrictions of the relations.
Evaluation 185
Subject
o 2 3 4
310/310/310 340/310/310 474/310/310 338/310/310 300/310/310
31113111311 347/318/311 503/3141311 3111313/311 315/3111311
71171171 65/89/65 621103177 65170/82 39/69/65
Table B.1. Basic Statistics - Phase I / Phase II / Phase III
Table 8.1 depicts the basic statistics that have been computed from the mod-
eled ontologies over the three pashes. Ontology 0 gold (generated by subject
0) served as gold standard and has been modeled by an expert ontology engi-
neer. It is obvious this ontology delivers identical statistics for all three phases
(because it had been modeled only once).
3.2 Human Evaluation - Precision and Recall

Precision and recall values have been computed for the three phases. Figure
8.8 depicts the results obtained by computing precisionoL(ReJ, Camp) and
recallodReJ, Camp)(= precisionoL(Camp, Ref)) for phase I ontologies.
Figure 8.8 depicts the pairwise results of comparing concept lexical entries
from the given ontologies.
recall
1,00
0,80
0,60
1-2
0,40
0-1
0,20 •
0-2
;. ••0-4
2-3· ••
0,00
0,00 0,20 0,40 0,60
precision
Figure B.B. Precision and Recall for Lexical Entry Modeling
It is obvious that on the basis of phase I-ontologies the values of precision

and recall evaluation for lexical entries cannot be expected too high. The best
precision/recall value is obtained by comparing the ontology 0 1 with ontology
O 2 resulting in approx. (0.28,0.4). The overall average of precision/recall
values is approx. (0.2,0.2). Thus, the reader may note that on the simple basis
of a set of documents people do not tend to agree on on common lexical entries.
As mentioned earlier phase II is based on common lexical entries. There-
fore, better precision and recall values is expected. An example is given for
the elements contained in He. Figure 8.9 depicts the results that have been
obtained. The comparison of taxonomic relations lead to acceptable values for
both precision and recall.
recall
1.00
0.80
1-4
0,60
0,40
0,20
0.00
0,00 0,20 0,40 0.60 0.80 1,00
precision
Figure 8,9. Precision and Recall for Concept Hierarchy Modeling
For phase III ontologies only the computation of precision and recall values
of non-taxonomic relations makes sense. The results of the pairwise compar-
ison are given in the following table. Figure 8.10 depicts the results obtained
by comparing the non-taxonomic relation elements contained in the different
ontologies_ It can be seen that the recall values for modeling non-taxonomic
relations are very low_ Later, it will be seen how the proposed ontology learning
algorithms perform in the same task by applying the same evaluation measures_
Subject
i\j 0 2 3 4
0 1,1 0.3,0.24 0.07,0.03 0.14,0.15 0.08,0.07
1 0.24,0.3 1,1 0.25,0.13 0.22,0.3 0.07,0.07
2 0.03,0.07 0.13,0.25 1,1 0.05,0.14 0.02,0.04
3 0.15,0.14 0.3,0.22 0.14,0.05 1,1 0.15,0.12
4 0.07,0.08 0.07,0.07 0.04,0,02 0.12,0.15 1,1
Table 8.2. Precision and Recall for Non-Taxonomic Relation Modeling

Evaluation 187
recall
1,00
0,80
0,60
0,40
.1-3
.0-1
0,20
2-3. 0-3• •3-4 • 1-2
0,00 • :.0-4
0,00 0,20 0,40

precision
Figure 8.10. Precision and Recall for Non-Taxonomic Relation Modeling
As mentioned earlier, it has been experienced that the measures precision and
recall reduce the decision criteria to match ornot match and do not include nearly
perfect matches. In the following the measures proposed for the lexical and
conceptual comparison level for comparing the ontologies have been computed.
3.3 Human Evaluation - Lexical Comparison Level

The phase I-ontologies described above have been used for general cross-
comparison, including the lexical level. The pairwise string matching (SM, see
subsection 2.2) of the five lexica referring to concepts and relations, respectively,
returned the results depicted in Table 8.3.
Subject
i\j 0 1 2 3 4
0 0.51,0.35 0.53,0.21 0.46,0.39 0.5,0.29
1 0.43,0.52 0.65,0.43 0.43,0.53 0.39,0.41
2 0.42,0.24 0.54,0.37 0.36,0.24 0.4,0.2
3 0.38,0.47 0.43,0.45 0.38,0.28 0.38,0.36
4 0.46,0.38 0.41,0.5 0.48,0.16 0.43,0.39
Table 8.3. SM(.ec i, .ec j), SM(.e"R. i, .e"R. j) for Phase I-Ontologies.
Results. The results for computing SM(.cC 1, .cC 2 ) of matching lexical entries
referring to concepts vary between 0.38 and 0.65 with an average of 0.45.
Comparing lexical entries referring to relations SM(.cf, .cf) results in values
L1 L2 SMeLl, L2)
"Sehenswuerdigkeit" "SehenswOrdigkeit" 0.875
[see sight] [seesight]
"Verkehrsmittel" "Luftverkehrsmittel" 0.71
[vehicle] [air vehicle]
"Zeit" "Zeit" 0.75
[tent] [time]
"AnzahLBetten" "hat...AnzahLBetten" 0.77
[numbeLbeds] [has..number _beds]
Table 8.4. Typical String Matches
between 0.16 and 0.53 with an average of 0.36. Several typical, though not
necessarily good, pairs for which high string match values were computed are
shown in Table 8.4. RelHit(£C 1, £C 2 ) ranged between 20 to 25%, i.e. this
percentage of lexical entries referring to concepts matched exactly. For lexical
entries referring to relations the results were much worse, viz. between 10 to
15%.
Interpretation. Analyzing the figures it can be seen that human subjects have
a considerable higher agreement on lexical entries referring to concepts than on
ones referring to relations. Investigating the auxiliary measures we have found
that SM values above 0.75 in general retrieve meaningful matches - in spite of
few pitfalls, like for example when comparing "tent" and "time" (see Table 8.4).
3.4 Human Evaluation - Conceptual Comparison Level

On the conceptual level one may compare conceptual structures contained
in ontologies 0 1 , O 2 , that vary for concepts C1, C2 . We use the ontologies of
phase I, II, and III for evaluating the measures introduced in subsection 2.3.
Results. Table 8.5 presents the results that are obtained for the phase I-onto-
logies using the similarity measures taxonomy overlap (TO) and relation overlap
(RO). The reader may note these ontologies have been built without any pre-
vious assumptions about the lexica £1 and £2, thus their similarity values are
well below those of later phases where the lexica for concepts were predefined.
Table 8.6 depicts the similarity measures computed for phase II-ontologies.
Values for TO range between 0.47 and 0.87, the average TO over all 20 cross-
comparisons results in 0.56. RO yields values from 0.34 to 0.82 with an average
of 0.47.
Interpretation. The figures indicate that subjects tend to agree or disagree

on taxonomies irrespective of the amount of material being predefined (agree-
Evaluation 189
Subject
i\j 0 2 3 4
0 0.33,0.35 0.31,0.25 0.32,0.5 0.29,0.28
0.35,0.15 0.4,0.41 0.34,0.03 0.28,0.15
2 0.28,0.12 0.36,0.25 0.25,0.04 0.24,0.15
3 0.36,0.4 0.31,0.32 0.24,0.04 0.26,0.03
4 0.38,0.29 0.31,0.21 0.32,0.2 0.32,0.26
Table 8.5. TO(Oi, OJ), RO(Oi, OJ) for Phase I-Ontologies.
Subject
i\j 0 2 3 4
0 0.57,0.5 0.54,0.47 0.54,0.48 0.59,0.39
1 0.57,0.44 0.86,0.78 0.48,0.45 0.55,0.35
2 0.54,0.46 0.87,0.82 0.46,0.46 0.58,0.35
3 0.54,0.44 0.48,0.5 0.46,0.47 0.47,0.34
4 0.58,0.4 0.55,0.45 0.57,0.45 0.47,0.35
Table 8.6. TO( Oi, OJ), RO( Oi, OJ) for Phase II-Ontologies.
mentJdisagreement takes place at different levels). In fact, correlation between

TO values of phase I-and phase II - ontologies support this indication, because
correlation is 0.58 - distinctly positive - for the ontologies with and without
predefined lexica. Furthermore, one may conjecture the comparison between
TO values (in order to select the best) remains meaningful even with a restricted
overlap of lexica.
Results. Table 8.7 depicts the similarity measures computed for phase III-
ontologies, where only RO has been computed, because the taxonomy was
predefined. RO ranges between 0.23 and 0.71, the average RO over all 20
cross-comparisons achieving 0.5.
Subject
i\j 0 1 2 3 4
0 0.61 0.38 0.51 0.54
0.69 0.56 0.57 0.55
2 0.4 0.49 0.35 0.23
3 0.67 0.71 0.5 0.57
4 0.45 0.44 0.3 0.41
Table 8.7. RO( Oi, OJ) for Phase III-Ontologies.

Interpretation. The correlation of RO values between phases I and II com-

putes to 0.34, between phases I and III to 0.27, and between phases II and III to
0.16. In general, higher RO values are reached without a predefined taxonomy
- this reflects the observation that subjects found it easy to use a predefined
lexicon, but extremely difficult to continue modeling given a predefined taxon-
omy.
Overall, one may conjecture that the engineer's use of their lexicon corre-
lates rather strongly with their semantic model and vice versa: The similarity
measures for subject 3 ontologies with subject 4 ontologies result in very low
values on the syntactic and on the conceptual level (with the exception of RO).
In contrast, subject 1 ontologies reach high similarity values with subject 2
ontologies on all levels.
Conclusion. In this case study the proposed measures have been applied for
evaluating human ontology engineering. In the following section the same
measures will be applied to different scenarios for the application of ontology
learning algorithms.
4. Ontology Learning Performance Evaluation

In the last section human performance in ontology engineering has been
analyzed. The performance has been computed as a kind of intra-engineering
agreement. In this section human modeling is compared to the application of
ontology learning techniques.
Gold standard Algorithm X

(hand-modeled) /I~
varying paramters, different input data,
1
varying background knowledge, ....
,------'1 / ' ,------'1

J ~
,------'1
;r I'
,------'1
III
I
c\ 111
£"
1/C
R
I
I
III
I
c\
c: R
'H," I III
II
C\ In IIII
1/0
C'
II
I
I
c\ In II
'H,C
C'
I
I
L
R R
t t
12 _____ ~ 1,2 _____ ~ 12 _____ ~ 1,2 _____ ~
°gOld °S1 °S2

compare _ _ _-'-_ _ _ _ _ _"--_ _ _ _ _ _ _ _-.J
°sn j
Figure B.11. Measuring Ontology Learning Performance
Figure 8.11 depicts the overall evaluation scenario for measuring ontology
learning performance. A given algorithm may generate different parts of an
ontology (e.g. the lexical entry extraction algorithm may generate proposals
for the ontology lexicon). Typically an ontology learning algorithm may be
Evaluation 191
parameterized (e.g. with a threshold) or may be fed with different input texts.
Thus, one algorithm may generate different kinds of ontologies in dependency
from different parameters. A given gold standard ontology may be compared
using the measures with these different kinds of ontologies that are generated
by an algorithm. The comparison then leads to similarity values. Thus, a high
similarity between the gold standard and an extracted ontology represents a
successful application of a specific algorithm based on the given set of param-
eters.
Though human decisions in this matter should not be taken for pure gold, it is
necessary to have measures that allow the comparison of different approaches
and parameter settings - even when the bases of these measures depend to
some extent on the quality of and on rather arbitrary, but equally plausible,
choices between modeling decisions.
4.1 The Evaluation Setting

The evaluation scenario for ontology learning introduced above has been
instantiated using ontologies from the tourism domain, that were developed
within the GETESS project (Staab et ai., 1999). Again, we used the follow-
ing two ontologies: The "basic, gold-standard" ontology consisted of 2,690
bilingual lexical entries, 1,087 concepts, 1,200 taxonomic relations Ji c , 199
non-taxonomic relations. An simple "evaluation" ontology has been extracted
from the basic gold-standard ontology and was restricted to 1,200 bilingual
lexical entries, 311 concepts, 322 taxonomic relations Ji c , 71 non-taxonomic
relations.
The corpus used for the evaluation consisted of 2,234 HTML documents, 16
million words and HTML tags. The corpus is available online9 provided by the
MANET Marketing GmbH. The corpus is about the baltic sea, especially, the
federal state Mecklenburg-Vorpommern of Germany. It describes the country
(facts, figures, history), the regions (inland, baltic coast, baltic sea, cities),
accomodations, cultural events, eating & drinking, among many others.
4.2 Evaluation of Lexical Entry Extraction

The lexical entry extraction techniques deal with the automatic generation
of relevant items and hints for potential concepts that should be modeled in a
specific domain. At this low extraction level one may only apply the measures
for deriving precision and recall curves and lexical entry comparisons. The
measures lef and tfidf as introduced in chapter 6, subsection 1.1 have been
evaluated using the evaluation setting introduced above.
The overall corpus contained 10,775 lexical entries (based on a preprocessing
using shallow linguistic processing at the phrase level). The computation of lef
values results in an average value for lef of 17, a minimum of 1 and a maximum
of 9,438. Tfidf computation delivered an average value of 44, a minimum of °

and a maximum of 3,427.
All in all, 13 artificial and very simple ontologies (only consisting of lexical
entries) have been generated using different parameter settings for the extraction
algorithms. Based on the frequency distributions of lef and tfidf, the numerical
conditions 5,20,50, 100,500, 1000 for lef and tfidf values have been defined.
Additionally, an ontology without a condition has been generated. Table 8.8
depicts the number of proposed lexical entries with reference to the specific
conditions and the used extraction measures (lefi' tfidfi ).
Condition
>0 >5 > 20 > 50 > 100 > 500 > 1000
lefi 10,775 2,790 920 427 248 56 31
tfidfi 10,775 10,759 3,605 1,753 892 150 46
Table 8.8. Number of Proposed Lexical Entries
recall
1,00
0,80
0,60 -v
•
0,40
lef >=5
0,20
•
tfidf>=20· • lef >=20
tfidf >=50
.lef>=100
0,00 ......~ •
tfidf >=1000
0,00 0,20 0,40 0,60

precision
Figure 8.12. Precision and Recall for Lexical Entry Extraction
Precision & Recall Evaluation. Figure 8.12 depicts the results obtained by
comparing the automatically extracted set of lexical entries. The reader may
note that the well-known trade-off between precision and recall becomes ob-
vious again in this figure. An interesting aspect of this figure is the average
tfidf measure outperforms lef for the task of lexical entry extraction for ontol-
ogy generation. Another fact has been recognized in evaluating lexical entry
Evaluation 193
extraction: Recall values are very low, even if the extraction algorithms are
executed without any condition. Thus, one may conclude that the manually
engineered GETESS ontology does not optimally reflect the lexical content of
the corpus.
4.3 Evaluation of Concept Hierarchy Extraction

The mechanisms for deriving concept hierarchies using hierarchical cluster-
ing have been introduced in chapter 6, subsection 1.2.1. As mentioned ear-
lier one fundamental problem in applying hierarchical clustering for ontology
learning is the labeling of the super-concepts that are created by the algorithm.
Therefore, a labeling mechanism using existing background knowledge has
been presented. For the evaluation it has been decided to experiment with
"varying" background knowledge combined with different computation strate-
gies, viz. with background knowledge ontologies of different size and the three
computation strategies (single link, complete link, average link). Nevertheless,
the reader may note that all nodes that could not be labeled are not comparable
by the proposed measures.
recall
1,00
0,80
0,60
large_single
• large_average
0,40
+ large_complete
0,20
• middle
• small
0,00
0,00 0,20 0,40
precision
Figure 8.13. Precision and Recall for Taxonomic Relations Discovery
To derive the required input matrix for hierarchical clustering we used our
linguistic and heuristic preprocessing that came up with approx. 51,000 lin-
guistically related pairs of concepts using the small reference ontology. The
preprocessing strategy for extracting these pairs of concepts has been described
in chapterS, subsection 2.2.5. The three "background knowledge" ontologies
are distinguished between 0small (55 concepts), 0medium (110 concepts),
and 0large (211 concepts). They have been manually derived from the simple
"evaluation" tourism ontology. Based on these three input ontologies, three
different computation strategies have been used to derive 9 ontologies that are
compared with the simple "evaluation" tourism ontology introduced earlier.
Figure 8.13 depicts the results obtained by computing precision and recall
for the He element. It is obvious that the three derived ontologies based on the
large background knowledge ontology 0large result in the best precision and
recall values. An interesting aspect however is that the computation strategies
single and average linkage outperform complete linkage. Another interesting
aspect is that the results based on the two background knowledge ontologies
0medium and 0small do not distinguish too much.
TO
0,5 .----.--------------.. -~
large_average
0,4
large_complete large_single
0,3 • • I
0,2
mediu~_average
medium single
0,1 . ~. small~omplele
medium_complete. ~
small_average small_s ngle
0,0
Figure 8.14. TO of Discovered Taxonomic Relations
Figure 8.14 depicts the values obtained from computing TO graphically. The
computation strategy "average link" with 0large outperforms the remaining
models. Again, it can be seen that the results based on the two background
knowledge ontologies 0medium and 0small do not distinguish too much.
The evaluation strategy proved to be very useful for deriving computation
strategies, e.g. how clusters are computed (e.g. single link, complete link,
average link). The interested reader is referred to (Boch, 2001) where further
and more detailed evaluation results are provided.
4.4 Evaluation of Non-Taxonomic Relation Extraction

The non-taxonomic relation extraction algorithm deals with the automatic
generation of relevant hints for potential non-taxonomic relations (with domain
and range restrictions) between concepts that should be modeled in a specific
domain. The algorithm presented in chapter 6, subsection 1.3 mainly depends
Evaluation 195
Table 8.9. Overview of Evaluation Results (number of proposed non-taxonomic relations. RO.
recall, precision
Confidence
Support 0.01 0.1 0.2 0.4
0.0001 2429/0.55 865/0.57 485/0.57 238/0.51
66%/2% 31% / 3% 18% / 3% 2%/1%
0.0005 1544 / 0.57 651/0.59 380/0.58 198/0.5
59% / 3% 30%/4% 17%/4% 1% /1%
0.002 889/0.6 426/0.61 245/0.61 131/0.52
47%/5% 27%/6% 16% / 6% 1% /1%
0.01 342/0.64 225/0.64 143/0.64 74/0.53
31%/8% 19% / 8% 14% / 8% 1%/1%
0.04 98/0.67 96/0.67 70/0.65 32/0.51
13 % / 11 % 11%/10% 6%/7% 0%/0%
0.06 56/0.63 56/0.63 48/0.62 30/0.53
6%/9% 6%/9% 3%/6% 0%/0%
on two thresholds, support and confidence. It is obvious that by increasing

these two thresholds one restricts the number of proposed non-taxonomic rela-
tions. However, it is not obvious which combination of support and confidence
delivers the best results for the ontology learning task. Therefore, the extrac-
tion mechanisms with background know ledge in the form of a given concept
taxonomy using the evaluating setting introduced above has been evaluated at
the conceptual level.
Using the linguistic and heuristic preprocessing came up again with approx.
51, 000 linguistically related pairs of concepts based on the simple "evaluation"
ontology. The preprocessing strategy for extracting these pairs of concepts has
been described in chapter 5, subsection 2.2.5. An excerpt of the evaluation
that surveys the most characteristic results is given in Table 8.9. The number
of discovered non-taxonomic relations D, RO, recall and precision for varying
support and confidence thresholds.
Calculating all non-taxonomic relations using a support and confidence thresh-
old of 0 yields 8,058 relations, scoring a RO of 0.5l. As expected, both the
number of discovered non-taxonomic relations D and recall is decreasing with
growing support and confidence thresholds. Precision is increasing monoton-
ically at first, but it drops off when so few relations are discovered that have
almost no direct hit. Higher support thresholds correspond to larger RO values.
The best RO is reached using a support threshold of 0.04 and a confidence
threshold of 0.01 and achieves 0.67. This constellation also results in the best
trade off between recall and precision (13% and 11 %). The RO value of 0.53
remains convincing, even when recall and precision fall to 0%, due to a lack of
recall
1,00
0,80
0,0001/0,01
0,60
• "0,0005/0,01
.0,00210,01
0,40
•• .0,002/0,1
0,20
•• • :0,0410,01
0,00 .. •• •
0,0410,1
0,00 0,20 0,40

precision
Figure 8.15. Precision and Recall of Non-Taxonomic Relation Discovery
exactly matching non-taxonomic relations. Standard deviation ranges between

0.22 and 0.32 in the experiments. Given the average RO scored well in the
sixties, this means that there is a significant portion of bad guesses, but - what
is more important - a large number of very good matches, too. Hence, one
may infer the approach is well-suited for integration into an interactive ontol-
ogy engineering environment. The reason is that an ontology engineer does not
require near perfect discovery, but a restriction from a large number of relations,
e.g. 311 2 = 96721 (squared number of concepts leaving out the top concept),
to a selection, e.g. a few hundred, that contains a reasonable high percentage of
good recommendations.
Random Choice: Finally, the significance of RO measure as compared to
a uniform distribution of all possible, viz. 311 2 , non-taxonomic relations is
explored. The RO computed from this set was 0.39 and, thus, significantly worse
than learning results in the presented approach. Standard deviation achieved
0.17 and, thus, it was lower than for the discovery approach - the good match
by random is indeed very rare. One may note that though the overall mean of
0.39 is still comparatively high, there are non-taxonomic relations that score
with the minimum.
5. Conclusion
In this chapter an evaluation approach for ontology learning has been intro-
duced. The approach is based on the ontology structure definition of chapter 2
and follows the layered view on ontologies distinguishing between a lexical and
a conceptual level. The underlying idea of the overall approach is to compute
Evaluation 197
the similarity between a hand-modeled ontology (gold standard ontology) and

an ontology that has been generated by applying a particular ontology learning
technique is measured. It is assumed that a high similarity between the hand-
modeled ontology and the ontology learning-based acquired ontology indicates
a successful application of a particular ontology learning technique.
The human case study has shown that one should not expect too much overlap
if different people model a given domain of interest, e.g. like tourism. Again this
reflects the fact that for ontology engineering cooperative support (supporting
an ontology by a group of people) is required. The evaluation of ontology
learning performance helped in characterizing the effects that different ontology
learning techniques have on the results of learning. It allows to provide rough,
methodological guidelines to help the ontology engineer selecting the most
suitable method for a given corpus or task or to provide support to create a new
one. Finally, if one compares manual engineering with automatic generation
of ontology structures one may conclude that humans are able to reach a high
precision, where the ontology learning algorithms provide a good recall. This
fact reflects the paradigm of semi-automatic ontology engineering along the
lines of balanced cooperative modeling.
In the following two short comments on further extension and possibilities
steps towards future and more elaborated evaluation techniques for ontology
learning are provided. It is focused on two aspects, namely (i) application-
oriented evaluation and (ii) standards datasets for evaluation.
5.1 Application-oriented Evaluation

As mentioned earlier in the introduction of this chapter in this work it has not
been pursued an application-oriented evaluation. However, it is here referred to
an application (see (Hotho et aI., 2001a».that uses ontology learning-based ac-
quired ontologies. The application has been developed on top of the ontology
learning framework that significantly improved results: Text clustering typi-
cally involves clustering in a high dimensional space, which appears difficult
with regard to virtually all practical settings. In addition, given a particular
clustering result it is typically very hard to come up with a good explanation
of why the text clusters have been constructed the way they are. Thus, the
proposed approach uses background knowledge during preprocessing in order
to improve clustering results and allow for selection between results. The input
data is preprocessed applying an ontology-based heuristics for feature selec-
tion and feature aggregation. Thus, a number of alternative text representations
is constructed. Based on these representations, multiple clustering results are
computed using K-Means. The results may be distinguished and explained
by the corresponding selection of concepts in the ontology. The results com-
pare favourably with a sophisticated baseline preprocessing strategy. Thus, in
(Hotho et aI., 2001a) text clustering has been successfully extended with onto-
logical background knowledge and proved to be extremely useful for deriving

high quality results.
Along similar lines the work done by (Faure and Poibeau, 2000) shows that
an ontology-based information extraction system that uses an ontology that has
been acquired by using ontology learning techniques significantly outperforms
a manually modeled ontology.
5.2 Standard Datasets for Evaluation

Classical machine learning applications (e.g., algorithms for supervised clas-
sification) have been developed further in the recent years towards high levels
of accuracy. A development towards these high accuracy levels has only been
made possible by providing standard data sets for prototypical tasks lO . The
same will hold for ontology learning algorithms: To further improve and com-
pare different algorithms a set of standard data sets are required. When applying
ontology learning to natural language text, a serious problem is the definition
of multi-lingual document sets.
Evaluation 199
Notes
1 The reader may note the application-specific evaluation in a knowledge man-
agement scenario becomes more difficult with respect to the current unsolved
problem of measuring the success of a knowledge management initiative.
2 Parts of this section have been published in (Maedche and Staab, 2001 b).
3 The reader may note that measures for computing similarity between ontolo-
gies may open a wide range of more general applications, like for example
agent-based systems or for ontology merging & mapping tasks.
4 From the statistical point of view false negatives are type I errors, false
positives are type II errors.
5 The algorithm is based on a dynamic programing technique that is described
in detail in (Levenshtein, 1966). The algorithm builds a matrix for the two
strings that are to be compared. Each element (x,y) depends on the values of
(x-1,y), (x,y-l) and (x-l,y-l). Whenever the characters for x and yare the
same then the value of (x,y) will be equal to the minimum of the three values
it depends on. When the characters are different then the value of (x,y) will
be this minimum plus one. In order to be able to compute the (x,y) values the
program uses a virtual initial row with only larger integer values as elements
and a virtual initial column with the values 0,1,2,3,4, ...
6 The geometric mean reflects the intuition that if either domain or range
concepts utterly fail to match, the matching accuracy converges against 0,
whereas the arithmetic mean value might still turn out a value of 0.5.
7 see http://www.all-in-all.de
8 It contained four concepts referred to by THING, MATERIAL, INTANGIBLE,
and SITUATION organized in the hierarchical relationships llC(MATERIAL,
THING) and ll C(SITUATION,INTANGIBLE).
9 http://www.all-in-all.coml
lOThe UCI Machine Learning Repository provides different kinds of standard
data sets and is online available at
http://www.ics.uci.edul mlearnlMLSummary.html
IV
RELATED WORK & OUTLOOK

Chapter 9
RELATED WORK
This chapter gives a brief overview of related work. Although ontology

learning can be regarded as a new research topic and area, it may fall back on
results that have been established in different, existing research communities.
Giving an overview of related work relevant to this book is not an easy un-
dertaking. In general there may be two ways of organizing relevant literature:
First, one may classify related work along "research communities and areas",
e.g. the following research communities deal with techniques and approaches
related to the ontology learning task:
• Natural Language Processing is the first area one may look at. Trying
to build a system that understands natural language has a long tradition.
Typically these systems are built on large amounts of domain knowledge.
Thus, the natural language processing community early started research in
semi-automatically establish domain knowledge. Along these lines machine
readable dictionaries have also been exploited for semantic knowledge (see
(Vanderwende, 1995; Richardson, 1997; Ide and Veronis, 1995; Jannink and
Wiederhold, 1999».
Information Extraction is one application of NLP that also uses a notion
of ontology to fill templates with instances. Some work has been done in
constructing these templates automatically from a given set of domain texts
(see (Freitag, 1998; Yangarber et aI., 2000».
• The database community has done research in the context of database
reverse engineering, namely in building semantic data models based on
given, existing databases (see «Mueller et aI., 2000; Tari et aI., 1998; Fong,
1997; Ramanathan and Hodges, 1997». The new research area in databases,
data mining, also explores methods for extracting semantic relations (e.g.,
the basic algorithms for discovering association rules (see (Agrawal et aI.,
1993; Han and Kamber, 2001» have been investigated by the database com-
munity).
• The machine learning community has a long research tradition in learning
from all kinds of data. One may distinguish between propositional and
non-propositional algorithms. The latter ones are further researched by the
inductive logic programming community (see (Muggleton, 1992»).
• Research for extracting domain knowledge from web documents has also
been done in the information retrieval community targeting a better ac-
cess to documents. Especially the clustering of term hierarchies has been
researched (see for example (Sanderson and Croft, 1999».
• The research area of terminology has much experience in acquiring and
modeling terminologies (e.g., (Biebow and Szulman, 1999; Daille, 1996».
In their work they concentrate mainly on the extraction of terms from a given
set of document resources.
• The knowledge engineering and acquisition community is a classic field
that deals with modeling knowledge-based systems. Within the knowledge
engineering community mechanisms for semi-automatically acquiring con-
ceptual knowledge supporting knowledge acquisition have been researched
a long time, e.g. the work done by (Skuce et aI., 1985; Reimer, 1990; Sz-
pakowicz, 1990).
All of these research communities and areas have (mostly) independently
analyzed and explored methods and algorithms that may be subsumed under
the term ontology learning. The list given above has to be considered as non-
exhaustive.
As mentioned above there is a second possibility for organizing related work:
One may introduced related work according to the "organization of the overall
book", namely ontology engineering, data import & processing, algorithms,
etc. In this chapter the second approach is followed as a way to provide the
reader a comprehensive overview on existing and related work. Figure 9.1
depicts a taxonomy of related work. Related work is mainly distinguished
between work on ontology engineering, on knowledge acquisition (KA) and
machine learning (ML) frameworks, on data and import processing, on
algorithms and on evaluation.
The following five sections will elaborate further on these main categories
of related work.
1. Related Work on Ontology Engineering

In this section existing and related work in the area of ontology engineering
is presented. With respect to the research described in this book the work may
be roughly separated into
Related Work 205
Comparison application-
with semantic based
merging
terms prune
hierarchy association
Figure 9.1. Taxonomy of Related Work
• methodologies for ontology engineering,
• methods and tools for ontology engineering, especially ontology engineer-

ing for the Semantic Web and
• methods and tools for ontology merging.

In the following an overview of the most relevant existing work in these areas
is provided.
Methodologies for Ontology Engineering. In the past years only a few re-
search groups proposed methodological approaches guiding the ontology de-
velopment process. U scholds generic suggestions were the first methodological
outlines proposed in 1995 on the basis ofthe experience gathered in developing
the Enterprise Ontology (see (Uschold and King, 1995)). The methodological
outlines may be separated into five core guidelines, namely the identification
of the purpose, the building of the ontology (separated into capturing, coding
and integrating), evaluation and documentation. On the basis of the Toronto
Virtual Enterprise (TOVE) project, Grueninger and Uschold described ontol-
ogy development steps in (Usc hold and Gruninger, 1996). At the same time
METHONTOLOGY by (Gomez-Perez, 1996) appeared. In parallel the more
philosophical viewpoint on ontology has been evolved towards an engineering
discipline. (Guarino and Welty, 2000) demonstrate how some methodology
efforts founded on analytic notions that have been drawn from philosophy can
be used as formal tools of ontological analysis. A more linguistic viewpoint on
ontology has been provided by Kathleen Dahlgren. She defends the choice of
a linguistically-based content ontology for NLP and demonstrates that a single
common-sense ontology produces plausible interpretations at all levels from

parsing through reasoning (see (Dahlgren, 1995)). The explicit relationship be-
tween ontology construction and natural language has been also researched by
(Bateman, 1993; Bateman, 1995). In his work he distinguishes several different
classes of "ontology" each with its own characteristics and principles.
In contrast to the proposal described here, the tight interaction between the
ontology and the knowledge base and their relation to the lexicon is not ana-
lyzed in depth. The reader may note it is not intended to provide a complete
methodology within the work described here. A comprehensive methodology
for setting-up ontology-based systems including and extending the aspects in-
troduced above has been developed at our institute l . The approach on layered
ontology engineering mainly builds a foundation for ontology learning. Thus,
its main focus is on the interaction between natural language and ontologies.
Ontology learning is considered as only one (of many) possible, different ap-
proaches supporting the difficult ontology engineering task.
Ontology Engineering Environments. A number of tools have been devel-

oped for ontology engineering. An outdated survey on tools for developing
and maintaining ontologies is given in (Benjamins et aI., 1999). In this sur-
vey different ontology engineering tools are evaluated using two ontologies: a
simple one about people working and studying at a university, and a second
more complex one describing "university studies in the Netherlands". Their
empirical evaluation of the different tools was conducted using a framework,
which incorporates aspects of ontology building and testing, as well as coop-
eration with other users. The evaluation is conducted on three dimensions: a
first dimension evaluating the tools like normal programs (e.g. user-interface
and actions supported), the second dimension refers to ontology-related issues,
like the help on ontology building and the high-level primitives provided, the
third dimension is that of cooperation, viz. supporting ontology engineering by
several people at different locations.
A short overview is given here on two up-to-date ontology engineering envi-
ronments PROTEG1~; and OILED that currently "compete" with ONToEDIT.
Protege. The Ontology Editor was developed at Stanford Medical Informat-

ics SMI and has a 10 year history in the area of knowledge acquisition (Grosso
et aI., 1999). Protege is a tool which allows the user to construct a domain ontol-
ogy, customize knowledge-acquisition forms, and, to enter domain knowledge.
Additionally, it may be considered as a platform which can be extended with
graphical widgets for tables, diagrams, animation components to access other
knowledge-based systems embedded applications. Finally, it is also a "library"
which other applications can use to access and display knowledge bases.
Related Work 207
The Protege methodology, to which the tool belongs, allows system builders
to construct software systems from modular components including reusable
frameworks for assembling domain models and reusable domain-independent
problem-solving models that implement procedural strategies for solving tasks.
The idea behind this methodology is the ontology editor and the layout editor
are supporting tools for the final generation of a knowledge acquisition tool for
entering instances. In contrast to the work described here ONTO EDIT does
not focus on the automatic generation of knowledge acquisition interfaces. In
general there is a trade-off between the automatic generation of user interfaces
and the question how ergonomic these user interfaces are. One example that
may be given for this trade-off is the relation hierarchy. Naturally, hierarchi-
cal relations may be defined by a simple template based interface. However,
user will prefer having at least a tree-oriented user interface available for the
definition of a relation hierarchy.
OilEd. OilEd has been developed at University of Manchester. OilEd is a

simple ontology editor that supports the construction of OIL-based ontologies
using the FaCT description logics reasoner (Horrocks, 1998). The central com-
ponent used throughout OilEd is the notion of a frame description. This consists
out of a collection of superclasses along with a list of slot constraints. Where
OilEd differs from classical ontology engineering tools is wherever a class name
can appear, a recursively defined, anonymous frame description can be used.
Ontology Engineering for the Semantic Web. Work on engineering ontolo-

gies for the Semantic Web is still in an early stage. Several efforts have been
proposed for ontology languages for the Semantic Web, such as RDF(S) and
DAML+OIL. However, comprehensive engineering support for instantiating
these languages is still lacking.
An interesting approach is described in (Noy et aI., 2000) where the existing
knowledge model ofthe Protege ontology editor is extended for Semantic Web
languages. Along the same lines the authors describe in (Noy et aI., 2001)
how Protege-2000 can be adapted for editing models in different Semantic Web
languages. It is motivated by the opinion that "developers will likely create
many different representation languages to embrace the heterogeneous nature"
of the Web.
ONTO EDIT and its semantic pattern approach goes beyond of the proposed
DAML+OIL primitives and allows the definition of complex structured ax-
ioms. Thus, a semantic pattern does not only comprise new epistemological
primitives, but likewise to design patterns, it also serves as a means for com-
munication, cataloguing, reverse-engineering, and problem-solving. Thus, it
may contribute to a more efficient exploitation of Semantic Web techniques and
support the engineering of more complex ontologies on the Semantic Web.
Ontology Merging. Several systems and frameworks for supporting the know l-
edge engineer in the ontology merging task have recently been proposed. The
approaches mainly rely on syntactic and semantic matching heuristics which
are derived from the behavior of ontology engineers when confronted with the
task of merging ontologies, i.e. human behaviour is simulated. Although some
of them locally use different kinds of logics for comparisons (e.g. descrip-
tion logics), these approaches do not offer a structural description of the global
merging process.
A first approach for supporting the merging of ontologies is described in
(Hovy, 1998). There, several heuristics are described for identifying corre-
sponding concepts in different ontologies, e.g. comparing the names of two
concepts, comparing the natural language definitions of two concepts by lin-
guistic techniques, and checking the closeness of two concepts in the concept
hierarchy.
The OntoMorph system (Chalupsky, 2000) offers two kinds of mechanisms
for translating and merging ontologies: syntactic rewriting supports the trans-
lation between two different knowledge representation languages and semantic
rewriting offers means for inference-based transformations. It explicitly al-
lows to violate the preservation of semantics in trade-off for a more expressive,
flexible transformation mechanism.
In (McGuinness et aI., 2000) the Chimaera system is described. It provides
support for the merging of ontological terms from different sources, for check-
ing the coverage and correctness of ontologies and for maintaining ontologies
over time. Chimaera supports the merging of ontologies by coalescing two
semantically identical terms from different ontologies and by identifying terms
that should be related by subsumption or disjointness relationships. Chimaera
offers a broad collection of functions, but the underlying assumptions about
structural properties of the ontologies at hand are not made explicit.
Prompt (Noy and Musen, 2000; Noy and Musen, 2001) is an algorithm for
ontology merging and alignment embedded in Protege-2000. It starts with the
identification of matching class names. Based on this initial step an iterative
approach is carried out for performing automatic updates, finding resulting
conflicts, and making suggestions to remove these conflicts. The work is im-
plemented as an extension to the Protege-2000 knowledge acquisition tool and
offers a collection of implemented operations for merging two classes and re-
lated slots.
The tools described above offer extensive merging functionalities most of
them based on syntactic and semantic matching heuristics, which are derived
from the behaviour of ontology engineers when confronted with the task of
merging ontologies. OntoMorph and Chimarea use a description logics based
approach that influences the merging process locally, e. g. checking subsump-
tion relationships between terms. None of these approaches offers a structural
Related Work 209
description of the global merging process. FCA-MERGE can be regarded as

complementary to existing work offering a structural description of the overall
merging process with an underlying mathematical framework.
The work closest to the approach described in this book is described in

(Schmitt and Saake, 1997). They apply Formal Concept Analysis to a related
problem, namely database schema integration. Similar to the approach de-
scribed here, a knowledge engineer has to interpret the results in order to make
modeling decisions. Thus, the technique described here differs with respect to
two points: There is no need of know ledge acquisition from a domain expert in
the preprocessing phase; and it additionally suggests new concepts and relations
for the target ontology. Nevertheless, a combination of both approaches, syn-
tactic and semantic matching and structural descriptions e.g. based on formal
concept analysis like FCA-MERGE may be worth pursuing in future research.
2. Related Work on Knowledge Acquisition and Machine

Learning Frameworks
This section deals with the analysis of existing work in the area of combining
manual knowledge acquisition (KA) with machine learning (ML). Looking
at existing work in the area of "knowledge acquisition and machine learning
frameworks" the general question that has to be asked is the following
"How are machine learning and knowledge acquisition related?"
It appears obvious to both scientific communities that it is necessary to es-

tablish bridges and links. Nevertheless, there is no obvious answer, most of the
researchers have no clear idea of the way this integration could be achieved.
Moreover, the two communities have different methodologies to deal with the
same matter, i.e., knowledge. They have such cultural differences that it makes
the dialog difficult. On the one hand, people coming from machine learning
are used to building efficient and well designed algorithms. Thus, they do not
understand the discussions running around the notion of model. Considering
the algorithms and the programs built by the knowledge acquisition people,
they mainly see editors or graphic display. On the other hand, people coming
from the knowledge acquisition community think that machine learning can
only be applied to trivial tasks where the knowledge representation has been
previously defined, i.e., where the knowledge acquisition processes have been
almost achieved. Even if it is schematic, this view summarizes the general atti-
tude in these two communities. Practically, it explains that most of the time, the
attempts to integrate machine learning and knowledge acquisition depend on
the community of origin. People coming from machine learning usually think
one has to add some graphic environment and some editors to their algorithms.
While people coming from knowledge acquisition think that one can insert ma-
chine learning algorithms considered as black boxes in knowledge acquisition
environments and tools.
The experiences in this work have shown that a tight integration between
the task of manually engineering an ontology and automatically generating
concepts and conceptual relationships is required. In the following, a short
introduction is given into existing work of combining KA and ML on structured
data (such as the tuples in a given database), and an overview is provided of the
work that tries to combine KA and ML for its application on natural language
texts.
KA & ML Frameworks for Structured Data. The literature contains a

number of case studies and systems demonstrating successful applications of
techniques for integrating machine learning with knowledge acquisition. One
of the first approaches to combine know ledge acquisition with machine learning
has been described in (Morik, 1990) with the BLIP system. Their work has
been continued (e.g. (Morik et aI., 1993a)) by presenting the powerful MOBAL
system. MOBAL is based on first-order representation and supports a number
of knowledge acquisition tasks using machine learning techniques (see (Kietz
and Morik, 1994) for taxonomy generation by learning from the A-Box).
In (Buntime and Stirling, 1991) the development of an expert system for
routing in manufacturing of coated steel products is described. Their approach
of interactive induction presents acquired rules to an expert, who has to validate
them and place restrictions on the final rules.
Nedellec and Causse describe two case studies using their tool APT in (Nedel-
lec and Causse, 1992). The application domains covered are a design of loud-
speakers and evaluation of commercial loan applications. The second case
study provides a comparison between knowledge acquisition with and with-
out integrated use of machine learning and shows the application of machine
learning results in positive effects for refinement of the overall system.
Webb (Webb, 1996) describes a case study in which undergraduate computer
science students used his Knowledge Factory to produce expert systems for an
artificial medical domain. In (Webb et aI., 1999) they demonstrate that the in-
tegration of machine learning and knowledge acquisition improves accuracy of
the developed knowledge base and reduces the development time. Their eval-
uation is also based on human subjects with minimal expertise in knowledge
engineering and limited training in the use of the software. The evaluation is
done using several different dimensions like expert system quality, acquisition
difficulty and/or knowledge acquisition time. The overall case study focuses on
a restricted part of the knowledge acquisition cycle, namely the "formulation,
testing and refinement of rules once an appropriate class of model and vocab-
Related Work 211
ulary have been defined" and shows a significant improvement to the overall
development tasks by using machine learning techniques.
KA & ML Frameworks for Text. Seminal work on integrating knowledge

acquisition and machine learning from texts has been introduced by (Skuce et aI.,
1985; Reimer, 1990; Szpakowicz, 1990), e.g. in (Reimer, 1990) an overview of
the wit system is given. The idea of the system is that it understands technical
texts and builds representations of the concepts described therein. Similar to
the ontology learning framework described, the system pursues a bootstrapping
approach where only small domain-specific world knowledge is needed by wit
to begin its operation.
In (Rousselot et aI., 1996) the linguistic and knowledge engineering station
STARTEX is presented intending to help build an ontology from texts. The
system consists of several modules for the extraction of terms and relations of
a given domain. The modules are mainly restricted to simulate text scanning
which "a terminologist uses to analyse a corpus".
Mikheev & Finch (Mikheev and Finch, 1997) have presented their KAWB
Workbench for "Acquisition of Domain Knowledge from Natural Language".
The workbench compromises a set of computational tools for uncovering in-
ternal structure in natural language texts. The main idea behind the workbench
is the independence of the text representation and text analysis phases. At the
representation phase the text is converted from a sequence of characters to fea-
tures of interest by means of the annotation tools. At the analysis phase those
features are used by statistics gathering and inference tools for finding signifi-
cant correlations in the texts. The analysis tools are independent of particular
assumptions about the nature of the feature-set and work on the abstract level
of feature elements represented as SGML items.
In (Faure and Nedellec, 1998; Faure and Poibeau, 2000) the cooperative
machine learning system, ASIUM, which acquires taxonomic relations and
subcategorization frames of verbs based on syntactic input is presented. The
ASIUM system hierarchically clusters nouns based on verbs that they syntac-
tically related and vice versa. Thus, they cooperatively extend the lexicon, the
set of concepts, and the concept hierarchy (£c, C, H C ).
In (Engels et aI., 2001) the commercial Corporum workbench has been de-
scribed. The authors provide a description of a technical solution which is
aimed at helping to Web to become more semantic. A specific feature of the
overall Corporum workbench is the OntoExtract component that is directed to
the generation of a leight-weight ontology based on linguistic analysis.
(Grefenstette, 1994) proposes mechanisms for automatic thesaurus discov-
ery. In his work methods are developed and evaluated for creating a first-draft
thesaurus from raw text. It describes natural language processing steps of to-
kenization, surface syntactic analysis, and syntactic attribute extraction. From
these attributes, word and term similarity is calculated. A thesaurus is created

showing important common terms and their relation to each other: common
verb-noun pairings, common expressions, and word family members. The tech-
niques are tested on twenty different corpora ranging from baseball newsgroups,
assassination archives or even the textbook itself. The results are shown to con-
verge to a stable state as the corpus grows. In contrast to the work described here,
Grefenstette is not interested in typing the relationships between terms (e.g., the
difference between taxonomic and non-taxonomic relationships). Additionally,
the notion of ontology and a clear separation between terms on the one hand
and concepts on the other hand is not given in his work. Nevertheless, the work
described by Grefenstette shows that the combination of statistics and shallow
linguistic processing techniques significantly outperform non-linguistic-based
techniques for most important words in corpora. Thus, it conforms with the ap-
proach described here that relies also on shallow processing as a preprocessing
step.
In contrast to the tools and approaches described above, the approach de-
scribed in this book defines a common framework in which extraction and
maintenance mechanisms may be easily plugged-in. In addition a tight inte-
gration to a manual engineering system is provided, allowing semi-automatic
bootstrapping of a domain ontology. An important aspect of the framework is
that the means are offered for evaluating the quality of the ontology learning
results.
3. Related Work on Data Import & Processing

This section describes related work referring to chapter 5 that dealed with
techniques for data import & processing. Again, the task of discovering, ac-
cessing, analyzing and transforming existing data for a specific goal is a very
wide field. It is concentrated here on the most important tasks as described in
chapter 5.
As mentioned in Section 2 the techniques for data import & processing may
be compared with the so-called "preprocessing" phase which is well-known
from the area of knowledge discovery in databases. In recent years it has
become obvious that for applying techniques established in machine learning
research to real-life applications, the task of adequate preprocessing of available
data is most important. The same holds also for the task of ontology learning.
The relevant preprocessing steps for ontology learning that have been presented
in chapter 5 may be subsumed under the following generic points:
• Defining and selecting task relevant data.
• Extracting "features" from the selected data.
• Transforming the features into an algorithm-appropriate representation.
Related Work 213
In the following existing work is referred with respect to the points intro-
duced above, namely focused crawling, linguistic processing and the document
wrapper.
Focused Crawling. The need for focused crawling in general has recently
been conceived by several researchers. The main target of all of these ap-
proaches is to focus the search of the crawler and to enable goal-directed
crawling. (Chakrabarti et aI., 1999) present a generic architecture of a focused
crawler. The crawler uses a set of predefined documents associated with topics
in a Yahoo like taxonomy to build a focused crawler. Two hypertext mining
algorithms build the core of their approach: a classifier evaluates the relevance
of a hypertext document with respect to the focus topics, and a destiller that
identifies hypertext nodes that are great access points to many relevant pages
within a few links. The approach presented in (Diligenti et aI., 2000) uses
so-called context graphs as a means to model the paths leading to relevant web
pages. Context graphs in their sense represent link hierarchies within which
relevant web pages occur together in the context of these pages. (Rennie and
McCallum, 1999) propose a machine learning oriented approach for focused
crawling. Their crawler uses reinforcement learning to learn to choose the next
link such over time a reward is maximized. A problem of their approach may
be that the method requires large collections of already visited web pages.
In contrast to the focused crawlers introduced above the crawler proposed

here uses linguistic knowledge combined with the background knowledge con-
tained in the ontology to focus the document crawling process.
Linguistic Processing and Feature Extraction. The discussion of how use-

fullinguistic annotations are for the machine learning task is an old one. For
supervised learning tasks there exists quite a number of evaluations of how doc-
ument preprocessing strategies perform (e.g. (Fuemkranz et aI., 1998». There
are only a few corresponding results for unsupervised learning tasks like the
mechanisms applied for ontology learning.
A general message of the research cited above is that one has to be careful how
to handle linguistic processing techniques. As described here, natural language
texts may be processed at different levels of linguistics (from morphology to
sentence parsing). Indeed, real-world experiences have shown that specific
types of documents may be successfully processed using shallow linguistic
processing combined with domain specific heuristics.
Document wrapper. Semi-structured data is typically processed by so-called

wrappers. The construction of a wrapper can be manually done, or by using a
semi-automatic (Sahuguet and Azavant, 1999) or automatic approach (Kush-
merick et aI., 1997; Ashish and Knoblock, 1997). For the low-level task of
transforming a given more or less well-structured dictionary into the internal
representation described here one easily manually defines the required mapping
and extraction rules. However, the reader may note that recently a number of
tools (e.g. (Sahuguet and Azavant, 1999)) and approaches (see (Kushmerick
et aI., 1997; Ashish and Knoblock, 1997)) for helping the manual or (semi-
)automatic construction of wrappers have been developed.
4. Related Work on Algorithms

This section deals with the description of related work of an important part
of this book, namely the algorithm library for extracting and maintaining on-
tologies as introduced in Chapter 6. As mentioned earlier the related work will
be organized according to the elements for which extraction and maintenance
support is provided in this book.
Lexical Entry Extraction. At the lowest level of ontology learning one typ-
ically has to deal with the task of extracting lexical entries referring to concepts
and relations. A short overview is given how this task has been approached by
several people from different research communities. Much work has been done
in the area of lexical acquisition. Lexical acquisition deals with the task of
acquiring syntactic and semantic classifications of unknown words. A compre-
hensive overview on lexical acquisition in the context of information extraction
is given in (Basili and Pazienza, 1997). In this overview paper the authors
identify the following methodological areas for lexical acquisition:
• Statistical induction using collocations, syntactic features, or lexemes.
• Logical induction using symbolic representations at word, phrase or sen-
tence level.
• Machine readable dictionary (MRD) and lexical knowledge base extrac-
tion including all methods that deal with some systematic sources like dic-
tionaries (like LDOCE) or general purpose lexical knowledge bases (like
WordNet).
• Quantitative machine learning referring to all other inductive methods that
are not purely statistical (e.g. neural networks).
According to the methodological areas the attention is mainly restricted to the
first three points. In contrast to existing work these different methodological
areas are combined into a common view for ontology learning on multiple
sources. In the following a short overview is given of how these methodological
areas have been approached in the existing work. The terminology research
community focuses on the extraction of terminologies from given data. The tool
Related Work 215
Tenninae introduced by (Biebow and Szulman, 1999) supports the acquisition

of a lexicon and concepts. A study and implementation of combined techniques
for the automatic extraction of tenninology has been presented in (Daille, 1996).
The author explores a method in which co-occurences of interest are defined
in tenns of surface of syntactic relationships rather than proximity of words
or tags within a fixed window (e.g. n-gram approach). In her evaluation she
finds out that filtering based on even shallow, a priori linguistic knowledge
proves useful for the task of tenninology extraction. Additionally a number of
alternative statistics are explored and compared with the target to identify which
of them is best for the purpose of identifying lexical patterns that constitute a
domain-specific tenninology.
Concept Hierarchy Extraction. As mentioned earlier (Faure and Nedellec,

1998) have presented a cooperative machine learning system called ASIUM
which is able to acquire semantic knowledge from syntactic parsing. The
ASIUM system is based on a conceptual and hierarchical clustering algorithm.
Basic clusters are fonned on head words that occur with the same verb after the
same preposition. ASIUM successively aggregates clusters to fonn new con-
cepts and the hierarchies of concepts fonn the ontology. The ASIUM approach
differs from the approach in this work because the relation learning is restricted
to taxonomic relations.
In the area of infonnation retrieval some work of automatically deriving a
hierarchical organization of concepts from a set of documents without use of
training data or standard clustering techniques has been presented by (Sanderson
and Croft, 1999). They use a sUbsumption criterion to organize the salient words
and phrases extracted from documents hierarchically.
The work of (Assadi, 1999) reports a practical experiment of construction of a
regional ontology in the field of electric network planning. He describes a clus-
tering approach that combines linguistic and conceptual criteria. As an example
he gives the pattern <NP, line> which results in two categorizations by mod-
ifiers. The first categorization is motivated by the function_oLstructure
modifiers, resulting in a clustering of connection line, dispatching line
and transport line (see Table 9.1). For the other concepts the background
know ledge lacks adequate specifications such that further categorizations could
have been proposed.
In (Hofmann, 1999) a novel statistical latent class model is used for text
mining and interactive infonnation access. In his work the author introduces
a Cluster-Abstraction Model (CAM) that is purely data-driven and utilizes
context-specific word occurrence statistics. CAM extracts hierarchical relations
between groups of documents as well as an abstract organization of keywords.
A proposal categorization The other candidate tenns

connection line mountain line
dispatching line telecommunication line
transport line input line
Table 9.1. Example Categorization
The idea of using lexico-syntactic patterns in the form of regular expressions

for the extraction of semantic relations, in particular taxonomic relations has
been introduced by (Hearst, 1992). In this approach the text is scanned for
instances of distinguished lexico-syntactic patterns that indicate a relation of
interest, e.g. the taxonomic relation. Along the same lines (Morin, 1999) uses
lexico-syntactic patterns without background knowledge to acquire taxonomic
knowledge. In his work he extends the work proposed by (Hearst, 1992) by
using a symbolic machine learning tool to refine lexico-syntactic patterns. In
this context the PROMETHEE system has been presented that supports the
semi-automatic acquisition of semantic relations and the refinement of lexico-
syntactic patterns.
Learning from Dictionaries. One way to acquire semantic knowledge is to

use existing repositories oflexical knowledge, such as dictionaries and thesauri.
Several researchers have taken steps towards extraction of useful lexical infor-
mation from machine readable dictionaries. An overview article on learning
semantics from dictionaries is given in (Ide and Veronis, 1995).
Microsoft's MindNet (Vanderwende, 1995; Richardson, 1997) is an ambi-
tious project for acquiring, structuring, assessing and exploiting semantic in-
formation from natural language text. Particulary, structured text in the form of
dictionaries. In (Richardson, 1997) the functionality of MindNet is described,
including a broad-coverage parsing, the extraction of different labeled, semantic
relations and mechanisms for similarity computation and inference. However,
it remains difficult to judge the quality of the overall approach because it lacks
any formal evaluation.
(Jannink and Wiederhold, 1999) have introduced a new algorithm called
ArcRank for learning from dictionaries. The algorithm is based on a model
of relationships between nodes in a directed labeled graph and used for the
extraction of hierarchical relationships between words in a dictionary. His
work is motivated to integrate databases whose content is similar but whose
terms are different.
In contrast to the research described above in this work the idea is pursued that
the construction of semantic knowledge requires the combination of information
Related Work 217
from multiple sources (according to (Ide and Veronis, 1995». Clearly, coupled
with information from other sources (like free texts) and subjected to by-hand
amelioration the structures extracted from dictionaries are a valuable resource
for building ontologies.
Non-taxonomic Relation Extraction. For purposes of natural language pro-

cessing, several researchers (Basili et ai., 1993; Resnik, 1993; Wiemer-Hastings
et ai., 1998) have researched the acquisition of verb meaning and subcatego-
rizations of verb frames in particular. Resnik (Resnik, 1993) has done some
of the earliest work in this category. His model is based on the distribution of
predicates and their arguments in order to find selectional constraints and to
reject semantically illegitimate propositions like "The number 2 is blue." His
approach combines information-theoretic measures with background knowl-
edge of a hierarchy given by the WordNet taxonomy. He is able to partially
account for the appropriate level of relations within the taxonomy by trading off
a marginal class probability against a conditional class probability, but he does
not give any application-independent evaluation measures for his approach. He
considers the question of finding appropriate levels of generalization within a
taxonomy to be very intriguing and concedes that further research is required
on this topic (see p. 123f in (Resnik, 1993» .
In (Basili et ai., 1993) a technique for the acquisition of statistically signif-
icant selectional restrictions from corpora is introduced. Selectional restric-
tions are acquired by a two-step approach: First, statistically prevailing coarse
grained conceptual patterns are used by a linguist to identify the relevant se-
lectional restrictions in sub languages. Second, semi-automatically acquired
coarse selectional restrictions are used as the semantic bias of a system called
Aristo-Lex for the automatic acquisition of case-based semantic lexicons.
The proposal by Byrd and Ravin (Cooper and Byrd, 1997; Byrd and Ravin,
1999) comes close to the work described here. The target is to design a document
search and retrieval system termed "Lexical Navigation" which provides an
interface allowing the user to expand or refine a query based on the actual
content of the collection. Thus, their idea is to use a lexical network containing
domain-specific vocabularies and relationships that are automatically extracted
from the collection. They extract named relations when they find particular
syntactic patterns, such as an appositive phrase. They derive unnamed relations
from concepts that co-occur by calculating the measure for mutual information
between terms - as similar as we do. Eventually, it is hard to assess their
approach as their description is rather high-level and lacks concise definitions.
To contrast the proposed approach with the research just cited, the reader
may note that all the verb-centered approaches may miss important conceptual
relations not mediated by verbs. Regarding evaluation, they have only appealed
to the intuition ofthe reader (Byrd and Ravin, 1999; Faure and Nedellec, 1998)
or used application-dependent evaluation measures.
In the area of text mining, (Feldman and Hirsh, 1996) have presented an
approach for association mining in the presence of background knowledge. In
this paper the system FACT for knowledge discovery from text is presented.
It is key-word oriented and offers a query-centered mechanism for extracting
associations. Background knowledge is used to constraint the desired results
of the query process. The evaluation of their approach is restricted to efficiency
without considering the quality of the extracted associations.
Machine Learning for Information Extraction. The "marriage" between

information extraction and machine learning has been described in (Freitag,
1998). The underlying idea is rather than spending weeks or months manually
adapting an information extraction system to a new domain, one would like a
system that can be trained on some sample documents and than is expected to do
a reasonable job of extracting information from new ones. In (Yangarber et aI.,
2000) the authors present an automatic discovery procedure called ExDisco
which identifies a set of event patterns from un-annotated text, starting from
a small set of seed patterns. Their approach shows a significant performance
improvement on actual extraction tasks in contrast to manually constructed
systems.
Ontology Maintenance. The data-oriented mechanisms for supporting on-

tology maintenance have been presented in chapter 6. In the following an
overview of existing work that share similarities with a proposed approach
is given. Research on ontology maintenance is still in an early stage. As
mentioned earlier if one talks about ontology maintenance one may roughly
distinguish between the refinement and improvement of a knowledge model or
the pruning or deletion of structures contained in a knowledge model.
Ontology Pruning. (Peterson et aI., 1998) have described strategies that leave
the user with a coherent ontology (i.e. no dangling or broken links). The un-
derlying system of their approach is called Knowledge Bus. It is a system that
generates information systems (databases and programming interfaces) from
application-focused subsets of the CyC ontology2. In their approach the fol-
lowing four major components are distinguished: The sub-ontology extractor
identifies a domain-relevant section of the ontology. The logic program gen-
erator takes the extraction and translates it into a logic program which can be
evaluated by a deductive query engine3 . Then the API generator takes the
logic-based model and exposes it to application developers as an object model
through strongly typed object-oriented APIs. Finally, the runtime system sup-
ports access to the generated databases.
Related Work 219
A similar strategy has been described by (Swartout et ai., 1996) where on-
tology pruning is considered as the task of "intelligent" deletion of ontological
structures that leave the user with a coherent ontology.
In contrast to the work described here, both "ontology pruning" approaches

use external decision criteria (e.g. user input) to derive pruning strategies.
The pruning approach described here works bottom-up by looking at domain-
specific texts.
Ontology Refinement. Hahn and Schnattinger (Hahn and Schnattinger, 1998)

introduced a methodology for the maintenance of domain-specific taxonomies.
An ontology is incrementally updated as new concepts are acquired from real-
world texts. The acquisition process is centered around linguistic and con-
ceptual "quality" of various forms of evidence underlying the generation and
refinement of concept hypotheses.
The system Camille4 was developed as a natural language understanding
system, e.g. when the parser comes across words that it does not know, Camille
tries to infer whatever it can about the meaning of the unknown word (Hast-
ings, 1994). If the unknown word is a noun, semantic constraints on slot-fillers
provided by verbs give useful limitations about what the noun could mean.
The meaning of a noun can be derived, because constraints are associated with
verbs. Learning unknown verbs is more difficult, thus, verb acquisition has
been the main focus of the research on Camille. Camille was tested on sev-
eral real-world domains within information extraction tasks (MUC), where the
well-known scoring methods precision and recall, taken from the information
retrieval community, have been calculated. For the lexical acquisition task re-
call is defined as the precentage of correct hypobook. A hypobook was counted
as correct if one of the concepts in the hypobook matched the target concept.
Precision is the total number of correct concepts divided by the number of con-
cepts generated in all the hypobook. Camille has achieved a recall of 42% and
a precision of 19% on a set of 50 randomly-selected sentences containing 17
different verbs.
5. Related Work on Evaluation

This section describes related work on techniques for the evaluation of know l-
edge acquisition and engineering in general 5 and the semi-automatic generation
of ontologies in specific. In the last decade a number of successful expert sys-
tems have been constructed using numerous know ledge engineering techniques.
Having the vision of a Semantic Web in mind, more techniques for generating
knowledge bases on the web are constantly being evolved. However, there
is still little agreement on a range of important issues, e.g. one may ask the
following questions
• How good is a specific knowledge engineering technique A?

• Given the knowledge engineering techniques A and B, which one should be
used for some specific problem in a given domain?
• What is a good knowledge acquisition tool?
• How to reduce the ontology constructionlmaintenance/re-use effort?
Given the current state-of-the-art in empirical methods for KA these ques-
tions cannot be answered in a definitive manner. A "good" controlled exper-
iment must have certain features such as addressing some explicit, refutable
hypobook, being repeatable or precisely defining the measurement techniques.
The SISYPHUS experiments (see (Linster, 1992; Schreiber and Birmingham,
1996» provided a shared framework, a prerequisite for any repeatable experi-
ment. However, the SISYPHUS experiments had no refutable hypobook, and
defined no measures which could permit a rigourous quantitative evaluation of
the 'different techniques.
In the area of knowledge-based systems using ontologies there is very little
work describing evaluation of the systems. (Noy et aI., 2000) describes an
empirical evaluation of the knowledge acquisition tool Protege-2000 with the
target of building domain knowledge bases. In this case study military experts
are the subjects. They had no experience in knowledge acquisition or computer
science in general. Evaluation criteria are defined along several dimensions,
namely the knowledge-acquisition rate, the ability to find errors, the quality of
knowledge entries, the error-covery rate, the retention of skills and the sub-
jective opinion. The results document the ability of these subjects to work on
a cOnIplex knowledge-entry task and highlight the importance of an effective
user interface enhancing the knowledge-acquisition process.
As mentioned earlier an indirect evaluation technique based on the idea of

having a gold standard is used in the approach described here. The evaluation
of ontology learning performance using this gold standard technique helped in
characterizing the effects that different ontology learning techniques have on
the results of learning. It provides rough, methodological guidelines to help
the ontology engineer selecting the most suitable method for a given corpus or
task and to provide support to create a new one.
Comparing Conceptual Structures. Similarity measures for ontological

structures have been widely researched, e.g. in cognitive science, databases,
software engineering(Spanoudakis and Constantopoulos, 1994), and AI (e.g.,
(Rada et aI., 1989; Agirre and Rigau, 1996; Hovy, 1998». Though this re-
search covers many wide areas and application possibilities, all of the research
has restricted its attention to the determination of similarity of lexical entries,
Related Work 221
concepts, and relations mainly within one ontology. The nearest to the proposed
comparison between two ontologies come (Bisson, 1992) and (Weinstein and
Birmingham, 1999).
(Bisson, 1992) introduces several similarity measures in order to locate a new
complex concept into an existing ontology by similarity rather than by logic
subsumption. Bisson restricts the attention to the semantic comparison level.
In contrast to the work described here the new concept is described in terms
of the existing ontology. Furthermore, he does not distinguish relations into
taxonomic relations and relations, thus ignoring the semantics of inheritance.
(Weinstein and Birmingham, 1999) compute description compatibility in or-
der to answer queries that are formulated with a conceptual structure that is
different from the one of the information system. A comprehensive introduc-
tion into the approach of Weinstein is given in (Weinstein, 1990). In contrast
to the proposed approach their measures depend to a very large extent on a
shared ontology that mediates between locally extended ontologies. Also their
algorithm seems less suited to evaluate similarities of sets of lexical entries,
taxonomies, and relations.
Evaluating Ontology Learning. The first work on systematically evaluating

an ontology learning technique has been introduced by (Bisson et aI., 2000).
In their paper the Mo'K workbench, that supports the development of cluster-
ing methods for ontology building is described. The underlying idea is that
the ontology developer is assisted in the exploratory process of defining the
most suitable learning methods for a given task. Therefore, the workbench
provides facilities for evaluation, comparison, characterization and elaboration
of conceptual clustering methods. Their empirical evaluation has shown that
the quality of learning decreases with the generality of the corpus.
Notes
1 A detailed introduction is online available at
http://www.ontoknowledge.org/downl/de115.pdf and given in (Staab et aI.,
2000a; Staab et aI., 200le; Maedche et aI., 200le)
2 http://www.cye.com!
3 In their approach they use the freely avalaible XSB system
http://xsb.soureeforge.neti
4 Contextual Acquisition Mechanism for Incremental Lexeme Learning
5 A comprehensive overview is available at
http://www.cse.unsw.edu.au. a web page maintained by Tim Menzies.
Chapter 10
CONCLUSION & OUTLOOK
This book describes a new approach for semi-automatically extracting and

maintaining ontologies from existing Web data towards a Semantic Web. On-
tology Learning may add significant leverage to the Semantic Web, because it
propels the construction of ontologies, which are needed fast and cheap as a
basis for the Semantic Web. Manual ontology engineering has been considered
as a starting point in this book. Coming up with the knowledge acquisition
bottleneck of defining ontologies the manual engineering framework has been
extended. By extending manual engineering the paradigm of balanced cooper-
ative modeling (e.g., each modeling step may be done manually or supported
automatically by an algorithm) has been pursued. The comprehensive frame-
work for Ontology Learning crossing the boundaries of single disciplines,
has touched on a number of challenges. The good news however is that one
does not need perfect or optimal algorithmic support for cooperative modeling
of ontologies. At least according to the collected experiences "cheap" methods
applied on multiple sources (e.g., data in the form offree text and dictionaries)
in an integrated engineering and learning environment may yield tremendous
help for the ontology engineer. This has been proven by the evaluation study
comparing manual engineering with automatic ontology learning techniques. It
has been shown that the low recall of humans may be compensated by automatic
means using Web data-driven ontology learning techniques.
The concluding chapter is split into four sections describing the most impor-
tant contributions of this book, the insights gained, the open questions and
topics for future research.
1. Contributions
The contributions made by this book fall into the following three main areas:
• It addresses the question, how to embed semi-automatic means in the

ontology engineering process by providing a comprehensive framework
for ontology learning. This framework has been implemented in the on-
tology learning environment TEXT- TO-ONTO, applied and evaluated in
real-world case studies. An important aspect is that it addresses a wide
range of different input data, from existing ontologies to free natural lan-
guage texts available in large amounts on the current Web.
• It presents several ontology learning techniques that have been adapted

from existing work (e.g., hierarchical clustering, association rules, pattern
matching) with several extensions to the actual ontology engineering task.
A particular emphasis is placed on the important aspect of data import and
processing from natural language texts that heavily influences the quality
of ontology learning results. Additionally, the ontology merging technique
FCA-MERGE has been presented that allows to combine existing ontolo-
gies for reusing them within the ontology learning framework.
• It shows how to evaluate ontology learning using several different mea-

sures applied within a gold standard setting. It provides a case study for
evaluating human modeling based on the evaluation framework, evaluating
the proposed ontology learning techniques, and, finally giving methodolog-
ical guidelines on how to apply ontology learning.
2. Insights into Ontology Learning

In addition to the major contributions given above this research provides
several additional insights into ontology learning. This section lists the most
salient insights.
Bootstrapping. Bootstrapping initializes an ontology learning algorithm with

seed information; it then iterates, applying learning to calculate labels for the
unlabeled data, and incorporating some of these labels into the training input
for the learner. The bootstrapping approach is essential for the difficult task of
ontology learning and has been shown to be very useful in the case studies that
have been carried out.
Growing Knowledge Models. A basic foundation for supporting growing

knowledge models is the possibility of including background knowledge (ex-
isting ontological structures) in the actual learning of new structures. Thus, if
an algorithm works on pure text without having any background knowledge
available one cannot expect that correct and complex ontological structures are
generated automatically.
Conclusion & Outlook 225
Multiple Sources. If a new ontology is being developed, one should always

check the available sources and try to combine them, e.g. in the case studies
carried out it has been experienced that dictionaries serve as a stable resource
for ontology learning. This holds especially true for structured information
contained in a database or described in some kind of schemata (see next section
on future work).
Linguistic Knowledge. Linguistic know ledge improves the results generated

by ontology learning algorithms. However, one has to handle the usage of
linguistic knowledge carefully and in dependency of the the given Web data.
In the work described here, a good recall coming with a lower precision of
linguistic annotations has been preferred.
Preprocessing. It is difficult to find the right data representation for the ap-
plication of an ontology learning algorithm. A similar experience was found
in machine learning and knowledge discovery. A useful approach is the user
is guided in the difficult preprocessing task, e.g., by accessing an "experience
base" of successful preprocessing strategies (see also (Engels, 1999)).
No single approach is best. None of the different methods for ontology

learning proposed in this book has shown to always perform best. Thus, the
combination of different methods (even if they produce the same results) fol-
lowing a multi-strategy approach seems promising (see future work of the next
section).
User Interfaces. To apply ontology learning in real-world settings the impor-

tance of user interfaces should not be underestimated. Especially the difficult
task of data import and processing has to be supported by user interfaces. Also
the presentation of results generated by the algorithms should provide graphical
means.
3. Unanswered Questions
While a number of problems remain with the single disciplines, more chal-
lenges regarding the particular problem of Ontology Learning for the Semantic
Web arise. Any book raises new questions while it answers old ones. A book
like this one, the subject of which is novel and relatively unexplored, seems to
raise more questions than it answers. In this section the open questions that are
not answered are identified:
• First, as mentioned earlier we are still in an early stage with respect to pro-
viding methodological guidelines in applying ontology learning to support
ontology engineering. Thus, in the future the integration between a com-
prehensive methodology with support of the application of semi-automatic
means is required. This holds especially true for the difficult tasks of data
import and processing, where experiences have to be collected and have to
be provided to the ontology engineer.
• Second, the attention has been restricted in ontology learning to the con-
ceptual structures that are (almost) contained in RDF(S) proper. Additional
semantic layers on top of RDF (e.g., future OIL or DAML+OIL with ax-
ioms, AO) will require new means for improving ontology engineering with
axioms, too! Thus, one important open question is how axioms can be ac-
quired from existing Web data.
• Third, a tight integration of techniques for the extraction of ontological struc-
tures from databases, semi-structured data, and existing instances with
the techniques proposed in this book have to be established. Nevertheless,
it is expected the more available data resources are included in the ontology
learning process, the better the overall performance will be.
4. Future Research
A number of future research topics and challenges at the end of each chapter
have already been listed, e.g.,
• More comprehensive, multi-lingual natural language processing support,
e.g. for automatically deriving lexical entries referring to non-taxonomic
relations using on verb-centered approaches.
• Including structural properties, e.g. HTML tags, to use the explicit content
contained in HTML tables. An interesting and promising table mining
approach has been introduced by (Chen et aI., 2000)).
• Multi-relational representations for the application of more logic-oriented
machine learning techniques.
• Multi-strategy learning techniques that e.g. use voting for combining the
results generated by different algorithms.
• Comparison of the results generated by different evaluation measures and
techniques.
• Development of multi-lingual standard data sets for ontology learning.
Finally, the following three important tasks for the future development and
application of ontology learning are considered as especially important.
Learning Ontologies and Knowledge Bases in parallel. An interesting as-

pect that has to be further researched is the analysis of the interaction of ontology
learning with semantic annotation (Erdmann et aI., 2001) and ontology-based
Conclusion & Outlook 227
infonnation extraction (Maedche et aI., 2001a) towards the automatic genera-

tion of knowledge bases for the Semantic Web.
Ontology Learning in the Semantic Web. With the XML-based names-

pace mechanisms the notion of an ontology with well-defined boundaries, e.g.
only definitions that are in one file, will disappear. Rather, the Semantic Web
may yield an "amoeba-like" structure regarding ontology boundaries, because
ontologies refer to each other and import each other (cf. e.g. the DAML+OIL
primitive import). However, it is not clear how the semantics of these structures
will look like. In the light of these facts the importance of ontology learning
methods like ontology pruning and crawling of ontologies and relational meta-
data will drastically increase. Furthennore, a tight integration with ontology
engineering mechanisms such as modularization mechanisms and principles
has to be provided.
Semantic Web Mining. It has been already mentioned that looking at the
user's behaviour may indicate necessary ontology changes and updates. In
the research area of Web Mining one applies data mining techniques in the
web. In general it is distinguished between web usage mining analyzing the
user behavior, web structure mining exploring the hyperlink structure, and web
content mining exploiting the contents of the documents in the web. A problem
of the current approaches is that they operate on syntactic, often meaningless
structures such as hyperlinks. In the near future, approaches that exploit the
complex structures contained in the Semantic Web in combination with the
analysis of user behaviour should be researched.
References
Abecker, A., Bernardi, A., Hinkelmann, K., KUhn, 0., and Sintek, M. (1998). Towards a technol-
ogy for organizational memories. IEEE Intelligent Systems and Their Applications, 13(3 ):4~
48.
Abiteboul, S., Buneman, P., and Suciu, D. (1999). Data on the Web: From Relations to Semistruc-
tured Data and Xml. Morgan Kaufmann Publishers, CA.
Abiteboul, S., Hull, R., and Vianu, V. (1994). Foundations of Databases. Addison Wesley, Mas-
sachusetts.
Adam, N. and Yesha, Y. (1996). Strategic directions in electronic commerce and digital libraries:
Towards a digital agora. ACM Computing Surveys, 28(4):818-835.
Agirre, E. and Rigau, G. (1996). Word sense disambiguation using conceptual density. InProc.
ofCOLING-96, pages 16-22.
Agrawal, R., Imielinski, T., and Swami, A. (1993). Mining Associations between Sets of Items
in Massive Databases. In Proceedings of the 1993 ACM SIGMOD International Conference
on Management of Data, Washington, D.C., May 26-28, 1993, pages 688-692. ACM Press.
Amann, B. and Fundulaki, I. (1999). Integrating Ontologies and Thesauri to Build RDF Schemas.
In Proceedings ofthe European Conference ofDigital Libraries - ECDL' 1999, Paris, France,
1999, pages 234-253.
Angele, J., Schnurr, H.-P., Staab, S., and Studer, R. (2000). The times they are a-changin' -
the corporate history analyzer. In Mahling, D. and Reimer, U., editors, Proceedings of the
Third International Conference on Practical Aspects of Knowledge Management. Basel,
Switzerland, October 30-31,2000.
http://www.research.swisslife.chlpakm2000/.
Appelt, D., Hobbs, J., Bear, J., Israel, D., and Tyson, M. (1993). FASTUS: A finite state processor
for information extraction from real world text. In /JCAI-93: Proceedings of the J3Interna-
tional Joint Conference on Artificial Intelligence. Chambery, France, August 28 - September
3, 1993, pages 1172-1178, Chambery, France.
Ashish, N. and Knoblock, C. (1997). Semi-automatic wrapper generation for internet informa-
tion sources. In Proceedings of the Second IFCIS International Conference on Cooperative
Information Systems, Kiawah Island, South Carolina, USA, June 24-27, 1997, Sponsored by
IFC1S, The Intn'l Foundation on Cooperative Information Systems, pages 1~169. IEEE-CS
Press.
Assadi, H. (1999). Construction of a regional ontology from text and ist use within a documentary
system. In N. Guarino (ed.), Formal Ontology in Information Systems, Proceedings of FOIS-
98, Trento, Italy, 1999, pages 236-249.
Baldwin, B., Morton, T., Bagga, A., Baldridge, J., Chandraseker, R., Dimitriadis, A., Snyder, K.,
and Wolska, M. (1998). Description of the UPENN CAMP system as used for coreference.
In (MUC7, 1998).
Barker, K., Delisle, S., and Szpakowicz, S. (1998). Test-driving TANKA: Evaluating a semi-
automatic system of text analysis for knowledge acquisition. In Mercer, R. and Neufeld, E.,
editors, Advances in Artificial Intelligence. Proceedings of the 12Biennial Conference of the
Canadian Society for Computational Studies of Intelligence (AI '98). Vancouver, Canada,
June 18-20, 1998, LNAI 1418, pages 60-71, Berlin. Springer.
Basili, R. and Pazienza, M. T. (1997). Lexical acquisition and information extraction. In SCIE
1997: Rome, Italy, 1997.
Basili, R., Pazienza, M. T., and Velardi, P. (1993). Acquisition of selectional patterns in a sub-
language. Machine Translation, 8(1):175-20l.
Bateman, J. A. (1993). Ontology construction and natural language. In Proceedings of the Inter-
national Workshop on Formal Ontology in Conceptual Analysis and Knowledge Representa-
tion, Padova, March 1993, pages 83-93.
Bateman, J. A. (1995). On the relationship between ontology construction and natural language:
a socio-semiotic view. International Journal on Human-Computer Studies, 43:929-944.
Benjamins, R., Duineveld, A. J., Stoter, R., Weiden, M. R., and Kenepa, B. (1999). Wondertools?
A comparative study of ontological engineering tools. In Proceedings ofthe 12th International
Workshop on Knowledge Acquisition, Modeling and Mangement (KAW'99), Banff, Canada,
October 1999.
Berners-Lee, T. (1999). Weaving the Web. Harper, San Francisco.
Berners-Lee, T., Hendler, J., and Lassila, O. (2001). The semantic web. Scientific American.
Biebow, B. and Szulman, S. (1999). TERMINAE: A a linguistics-based tool for the building of a
domain ontology. In EKAW '99 - Proceedings of the 11th European Workshop on Knowledge
Acquisition, Modeling, and Management. Dagstuhl, Germany; LNCS, pages 49-66, Berlin.
Springer.
Bisson, G. (1992). Learning in FOL with a similarity measure. In Proc. of AAA1-1992, pages
82-87.
Bisson, G., Nedellec, c., and Canamero, D. (2000). Designing clustering methods for ontology
building: The Mo'K workbench. In (Staab et al., 2oooc).
Bloom, P. (2000). How Children learn the Meanings of Words. MIT Press, Massachusetts.
Boch, T. (2001). Separating taxonomic from non-taxonomic relations discovered from text.
Master's thesis, University of Karlsruhe.
Boyens, K. (2001). OntoKick - Ignition for Ontologies. Master'S thesis, University of Karlsruhe.
Brachman, R. (1979). On the epistomoiogical status of semantic networks. Associative Networks,
pages 3-50.
Bray, T., Hollander, D., and Layman, A. (1999). Namespaces in XML. Technical report, W3C.
W3C Recommendation. http://www.w3.orgITRlREC-xml-names.
Brickley, D. and Guha, R. (2000). Resource Description Framework (RDF) Schema Specifica-
tion. Technical report, W3C. W3C Candidate Recommendation.
http://www.w3.orglTRl2ooo/CR-RDF-schema-20000508.
Brill, E. (1993). Automatic grammar induction and parsing free text: A transformation-based
approach. In ACL'93 - Proceedings of the Annual Meeting of the Associationfor Computa-
tional Linguistics, pages 259-265, Ohio.
Brown, A. (2000). Large-Scale Component-Based Development. Prentice Hall.
Buitelaar, P. (1998). CORELEX: Systematic Polysemy and Underspecification. PhD thesis, Bran-
deis University, Department of Computer Science.
REFERENCES 231
Buneman, P., Davidson, S. B., Fernandez, M. F., and Suciu, D. (1997). Adding structure to
unstructured data. In Afrati, F. N. and Kolaitis, P., editors, Proceedings ofthe 6th International
Conference on Database Theory-ICDT'97, Delphi, Greece, pages 336-350. Springer.
Buntime, W. and Stirling, D. (1991). Interactive Induction. J.E. Hayes and D.Michie and E.
Tyugu (Eds.), Clarendon Press, Oxford.
Byrd, R. and Ravin, Y. (1999). Identifying and extracting relations from text. In NLDB '99 - 4th
International Conference on Applications of Natural Language to Information Systems.
Campbell, K., Oliver, D. E., Spackman, K., and Shortliffe, E. H. (1998). Representing Thoughts,
Words, and Things in the UMLS. Technical report, SMI Stanford Medical Informatics.
Chakrabarti, S., van den Berg, M., and Dom, B. (1999). Focused crawling: a new approach to
topic-specific web resource discovery. In Proceedings of www-s.
Chalupsky, H. (2000). OntoMorph: A translation system for symbolic knowledge. In Proc. of
KR-2000, Breckenridge, CO, USA, pages 471-482.
Chaudhri, v., Farquhar, A., Fikes, R., Karp, P., and Rice, J. (1998). OKBC: A Programmatic
Foundation for Knowledge Base Interoperability. In Proceedings 15th National Conference
on Artificial Intelligence (AAAI-9S), pages 600-607.
Chen, H. (1999). Semantic research for digital libraries. D-Lib Magazine, 5(10).
Chen, H.-H., Tsai, S.-c., and Tsai, J.-H. (2000). Mining Tables from Large Scale HTML Texts
. In Proceedings of the 18th International Conference on Computational Linguistics, Saar-
bruecken, Germany, July 2000.
Chen, P. (1976). The entity-relationship model- toward a unified view of data. ACM Transactions
on Database Systems, 1(1):9-36.
Chinchor, N., Hirschman, L., and Lewis, D. (1993). Evaluating message understanding sys-
tems: An analysis of the third message understanding conference (MUC-3). Computational
Linguistics, 19(3).
Chomsky, N. (1965). Aspects of the Theory of Syntax. MIT Press, Cambridge.
Christophides, V. and Plexousakis, D., editors (2000). Proceedings of the ECDL-2000 Workshop
- Semantic Web: Models, Architectures and Management.
Conen, W. and Klapsing, R. (2000). A logical interpretation ofRDF. RDF Interest Group Mailing
List, http://nestroy.wi-inf. uni -essen.delrdfllogical_interpretationl.
Cooper, J. W. and Byrd, R. J. (1997). Lexical navigation: Visually prompted query expansion
and refinement. In Proceedings of the International Conference on Digital Libraries DL'97,
pages 237-246.
Corby, 0., Dieng, R., and Hebert, C. (2000). A conceptual graph model for w3c resource descrip-
tion framework. In ICCS 2000 - International Conference on Conceptual Structures. Darm-
stadt, Germany, August 2000, Lecture Notes in Artificial Intelligence LNAI-1867. Springer.
Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., and Slattery, S.
(1999). Learning to construct knowledge bases from the world wide web. Artificial Intelli-
gence, lI8( 1-2):69-113.
Cumby, C. and Roth, D. (2000). Relational representations that facilitate learning. In Proc. of
KR-2000, Breckenridge, Colorado, USA, 12-15 April 2000, pages 425-434.
Dagan, I., Lee, L., and Pereira, F. (1999). Similarity-based models of word coocurrence proba-
bilities. Machine Learning, 34(1):43-69.
Dahlgren, K. (1995). A linguistic ontology. International Journal of Human-Computer Studies,
43(5).
Daille, B. (1996). Study and Implementation of Combined Techniques for Automatic Extraction
of Terminology, chapter 3, pages 49-67. Klavans J. L. and Resnik P. (ed.): The Balancing
Act, Combining Symbolic and Statistical Approaches to Language, MIT Press, Cambridge
Mass., London England.
de Saussure, F. (1916). Course in general linguistics. McGraw Hill, New York.
Decker, S., Brickley, D., Saarela, 1., and Angele, J. (1998). A Query and Inference Service for
RDF. In Proceedings ofthe W3C Query Language Workshop (QL-98), Boston, MA, December
3-4.
Decker, S., Fensel, D., van Harrnelen, F., Horrocks, I., Melnik, S., Klein, M., and Broekstra,
J. (2000a). Knowledge representation on the web. In Proceedings of the 2000 International
Workshop on Description Logics (DL2000), Aachen, Germany.
Decker, S., Jannink, J., Mitra, P., Staab, S., Studer, R., and Wiederhold, G. (2000b). An infonna-
tion food chain for advanced applications on the www. In ECDL 2000 - Proceedings of the
Fourth European Conference on Research and Advanced Technology for Digital Libraries.
Lisbon, Portugal, September 18-20, 2000, LNCS, pages 490-493. Springer.
Decker, S. and Melnik, S. (2000). A layered approach to infonnation modeling and interoper-
ability on the web. In (Christophides and Plexousakis, 2(00).
Decker, S., Mitra, P., and Melnik, S. (2000c). Framework for the Semantic Web - An RDF
Tutorial. IEEE Internet Computing.
Delteil, A., Faron-Zucker, C., and Dieng, R. (2001). Learning ontologies from RDF annotations.
In Maedche, A., Staab, S., Nedellec, c., and Hovy, E., editors, Proceedings of /JCA/-OI
Workshop on Ontology Learning OL-200I, Seattle, August 2001, Menlo Park. AAAI Press.
Diligenti, M., Coetzee, F., Lawrence, S., Giles, C. L., and Gori, M. (2000). Focused Crawling us-
ing Context Graphs. In Proceedings of the International Conference on Very Large Databases
(VLDB-OO), 2000, pages 527-534.
Doan, A., Domingos, P., and Levy, A. (2000). Learning source descriptions for data integration.
In Proceedings of the International Workshop on The Web and Databases - WebDB-2000,
pages 81-86.
Eco, U. (1981). Zeichen. Einfiihrung in einen Begriffund seine Geschichte. Suhrkamp (edition
suhrkamp), FrankfunlM.
Ehrig, M. (2001). Ontology-based Focused Crawling of Documents and Relational Metadata.
Master's thesis, University of Karlsruhe.
Engels, R. (1999). Component-Based User Guidance in Knowledge Discovery and Data Mining.
PhD thesis, University of Karlsruhe.
Engels, R., Bremdal, B., and Jones, R. (2001). CORPORUM: a workbench for the Semantic
Web. In Proceedings of the First Workshop on Semantic Web Mining, Freiburg, Germany,
September 2001. online available at http://semwebmine200l.aifb.uni-karlsruhe.de/online/.
Erdmann, M. (2001). Ontologien zur konzeptuellen Modellierung der Semantik von XML (in
german). PhD thesis, University of Karlsruhe.
Erdmann, M., Maedche, A., Schnurr, H.-P., and Staab, S. (2001). From manual to semi-automatic
semantic annotation: About ontology-based text annotation tools. ETAI- Semantic Web Jour-
nal, Linkoeping Electronic Articles, 16(1).
Erdmann, M., Maedche, M., Staab, S., and Decker, S. (2000). Ontologies in RDF(S). Technical
Report 401, Institute AIFB, Karlsruhe University.
Esposito, F., Ferilli, S., Fanizzi, N., and Semeraro, G. (2000). Learning from Parrsed Sentences
with INTHELEX. In Proceedings ofCoNLL-2000 and LLL-2000-Intemational Conference
on Grammar Inference (ICGI-2000), to appearin : Lecture Notes in Artificial Intelligence,
Springer.
Euzenat,1. (2000). Towards fonnal knowledge intelligibility atthe semioticIevel. In Proceedings
of the ECAI-2000 Workshop on Applied Semiotics ASC-2000, Berlin, Germany, 2000.
Faure, D. and Nedellec, C. (1998). A corpus-based conceptual clustering method for verb frames
and ontology acquisition. In In LREC workshop on Adapting lexical and corpus resources to
sublanguages and applications, Granada, Spain, Mai 1998.
REFERENCES 233
Faure, D. and Poibeau, T. (2000). First experiments of using semantic knowledge learned by
asium for information extraction task using intex. In Proceedings ofthe ECAI '2000 Workshop
Ontology Learning.
Feldman, R. and Dagan, I. (1995). Knowledge discovery in textual databases (kdt). In Proceed-
ings ofKDD-95, pages 112-117. ACM.
Feldman, R. and Hirsh, H. (1996). Mining associations in text in the presence of background
knowledge. In Proceedings ofthe Second International Conference on Knowledge Discovery
from Databases, pages 343-346.
Fellbaum, C. (1998). WordNet - An electronic lexical database. MIT Press, Cambridge, Mas-
sachusetts and London, England.
Fensel, D. (2001). Ontologies: Silver Bullet for Knowledge Management and Electronic Com-
merce. Springer, Berlin - Heidelberg - New York.
Fensel, D., Van Harmelen, F., Decker, S., Erdmann, M., and Klein, M. (2000). OIL in a nutshell.
In Dieng, R., editor, Knowledge Acquisition, Modeling, and Management, Proceedings of
the European Knowledge Acquisition Conference (EKAW-2000), Lecture Notes in Artificial
Intelligence, LNAI, pages 1-16. Springer-Verlag.
Fikes, R., Farquhar, A., and Rice, 1. (1997). Tools for assembling modular ontologies in Ontolin-
gua. In Proc. of AAAI 97, pages 436-441.
Fikes, R. and McGuiness, D. (2001). An Axiomatic Semantics for RDF, RDF(S), and DAML+OIL.
Technical report, Stanford University, KSL.
Fisher, D., pazzani, M., and Langley, P. (1991). Concept Formation: Knowledge and Experience
in Unsupervised Learning. Morgan Kaufmann, San Francisco.
Fong, 1. (1997). Converting relational to object-oriented databases. SIGMOD Record, 26(1):53-
58.
Franconi, E. and Ng, G. (2000). The i.com tool for Intelligent Conceptual Modeling. In Proceed-
ings of 7th International Workshop on Knowledge Representation meets Databases KRDB-
2000. Berlin, pages 45-53.
http://sunsite.informatik.rwth-aachen.de/Publications/CEUR -W SlVol-29/.
Frantzi, K., Ananiadou, S., and Mirna, H. (2000). Natural language processing for digital libraries:
Automatic recognition of multi-word terms: the c-value/nc-value method. International Jour-
nal on Digital Libraries, 3(2).
Frege, G. (1922). Begriffschrift. Lubrecht & Cramer Ltd., London.
Freitag, D. (1998). Machine Learning for Information Extraction in Information Domains. PhD
thesis, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA,USA.
Fuernkranz, 1., Mitchell, T., and Riloff, E. (1998). A Case Study in Using Linguistic Phrases
for Text Categorization on the WWW. In Proc. of AAAIIICML Workshop Learning for Text
Categorization, Madison, WI, 1998. AAAI Press.
Ganter, B. and Wille, R. (1999). Formal Concept Analysis: Mathematical Foundations. Springer,
Berlin - Heidelberg - New York.
Genesereth, M. R. (1998). Knowledge interchange format. draft proposed american national stan-
dard (dpans). ncits.t2/98-004. http://logic.stanford.edulkif/dpans.html seen at Sep 7,2000.
Ginsberg, M. (1991). Knowledge interchange format: the KIF of death. AI Magazine, 12(3):57-
63.
Goldman, R. and Widom, J. (1997). DataGuides: Enabling Query Formulation and Optimization
in Semistructured Databases. In Proceedings of the Conference on Very Large Databases-
VLDB'1997, Athens, Greece, 1997.
Gomez-Perez, A. (1996). A framework to verify knowledge sharing technology. Expert Systems
with Application, 11(4):519-529.
Grefenstette, G. (1994). Explorations in Automatic Thesaurus Discovery. PhD thesis, University
of Pittsburgh.
Grishman, R. and Sundheim, B. (1996). Message Understanding Conference - 6: A Brief History.

In Proceedings of the 15th International Conference on Computational Linguistics (COL-
ING'96), pages 466-471, Kopenhagen, Denmark, Europe.
Grosso, E., Eriksson, H., Fergerson, R. w., Tu, S. w., and Musen, M. M. (1999). Knowledge
modeling at the millennium - the design and evolution of Protege-2000. In Proc. the 12th
International Workshop on Knowledge Acquisition, Modeling and Mangement (KAW'99),
Banff, Canada, October 1999.
Gruber, T. (1993a). A translation approach to portable ontology specifications. Knowledge Ac-
quisition, 5: 199-220.
Gruber, T. R. (1993b). Toward principles for the design of ontologies used for knowledge sharing.
Technical Report KSL-93-04.
Guarino, N. (1998). Formal ontology and information systems. In Proceedings of FOIS '98 -
Formal Ontology in Information Systems, Trento, Italy, 6-8 June 1998. lOS Press.
Guarino, N. and Welty, C. (2000). Identity, unity, and individuality: Towards a formal toolkit for
ontological analysis. Proceedings of ECA/-2000.
Hahn, U. and Romacker, M. (2000). Content management in the SYNDIKATE system - How
technical documents are automatically transformed to text knowledge bases. Data & Knowl-
edge Engineering, 35(2):137-159.
Hahn, U. and Schnattinger (1998). Ontology engineering via text understanding. In IFlP'98-
Proceedings of the 15th World Computer Congress, Vienna and Budapest.
Hamp, B. and Feldweg, H. (1997). Germanet - a lexical-semantic net for german. In Proceedings
of the ACL Workshop on Automatic Information Extraction and Building of Lexical Semantic
Resources for NLP Applications, Madrid, 1997.
Han, 1. and Kamber, M. (2001). Data Mining - Concepts and Techniques. Morgan Kaufmann.
Handschuh, S. (2001). Ontoplugins - a flexible component framework. Technical report, Uni-
versity of Karlsruhe.
Handschuh, S., Maedche, A., and Staab, S. (2001). CREAM - Creating relational metadata with
a component-based, ontology driven framework. In Proceedings ofthe First ACM-Conference
on Knowledge Capture, K-CAP'OI, Victoria, Canada, October, 2001.
Hastings, P. M. (1994). Automatic Acquisition of Word Meaning from Context. PhD thesis, Uni-
versity of Michigan.
Hearst, M. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings
of the 14th International Conference on Computational Linguistics. Nantes, France.
Heflin, 1. and Hendler, 1. (2000). Dynamic ontologies on the web. In Proceedings of the National
Conference on Artijiciallntelligence - AAA1'2000, USA. AAAI Press.
Hjelm, 1. (2001). Creating the Semantic Web with RDF. Wiley.
Hobbs, 1. (1993). The generic information extraction system. In Proceedings ofthe Fifth Message
Understanding Conference (MUC-5), Morgan Kaufmann, 1993.
Hofmann, T. (1999). The Cluster-Abstraction Model: Unsupervised Learning of Topic Hierar-
chies from Text Data. In Proceedings of 16th International Conference on Artijiciallntelli-
gence (lJCAI-99), Stockholm, Sweden, 1999, pages 682-587.
Horrocks, I. (1998). Using an expressive description logic: Fact or fiction? In Proceedings of the
Sixth International Conference on Principles of Knowledge Representation and Reasoning
(KR'98), Trento, Italy, June 2-5,1998, pages 636--Q49. Morgan Kaufmann.
Hotho, A., Maedche, A., and Staab, S. (2001a). Ontology-based Text Clustering. In Proceedings
ofthe UCA/-2001 Workshop "Text Learning: Beyond Supervision". Seattle, August 03,2001.
Hotho, A., Maedche, A., Staab, S., and Studer, R. (2001b). SEAL-II - The soft spot between
richly structured and unstructured knowledge. Universal Computation Science, 7(5).
REFERENCES 235
Hovy, E. (1998). Combining and standardizing large-scale, practical ontologies for machine
translation and other uses. In Proc. of the First Int. Can! on Language Resources and Eval-
uation (LREC).
Hudson, R (1990). English Word Grammar. Basil Blackwell, Oxford, UK.
Ide, N., McGraw, T., and Welty, C. (1997). Representing technical documents in the classic
knowledge representation system. In Proceedings ofthe Tenth workshop of the Text-Encoding
Initiative. November, 1997.
Ide, N. and Veronis, J. (1995). Knowledge extraction from machine-readable dictionaries: An
evaluation. In P Steffens (ed.): Machine Translation and the lexicon, Lecture Notes on Arti-
ficial Intelligence LNAI-898, pages 19-34. Springer.
ISO 704 (1987). Principles and methods of terminology. Technical report, International Standard
ISO.
Jannink, J. and Wiederhold, G. (1999). Thesaurus entry extraction from an on-line dictionary. In
Proceedings of Fusion '99, Sunnyvale CA, July 1999.
Jaquemin, C. (2001). Spotting and Discovering Terms through Natural Language Processing.
MIT Press, Massachusetts.
Jasper, R and Uschold, M. (1999). A Framework for Understanding and Classifying Ontology
Applications. In Proc. the 12th International Workshop on Knowledge Acquisition, Modeling
and Mangement (KAW'99), Banff, Canada, October 1999.
Jones, R., McCallum, A., Nigam, K., and Riloff, E. (1999). Bootstrapping for Text Learning
Tasks. In Working notes of the IJCA/'99 workshop on Text Mining: Foundations, Techniques
and Applications, pages 52-63. MIT Pressl AAAI Press.
Kamada, T. and Kawai, S. (1989). An algorithm for drawing general undirected graphs. Infor-
mation Processing Letters, 31(7):7-15.
Kaufman, L. and Rousseeuw, P. (1990). Finding Groups in Data: An Introduction to Cluster
Analysis. John Wiley.
Kesseler, M. (1995). A Schema-based Approach to HTML Authoring. In Proceedings of the 4th
International World Wide Web Conference WWW-4, Boston, 1995.
Kietz, J.-U. and Morik, K. (1994). A polynomial approach to the constructive induction of
structural knowledge. Machine Learning, 14(1):193-217.
Kietz, J.-U., Volz, R, and Maedche, A. (2000a). A method for semi-automatic ontology acqui-
sition from a corporate intranet. In EKAW-2000 Workshop "Ontologies and Text", Juan-Les-
Pins, France, October 2000.
Kietz, J.-U., Volz, R., and Maedche, A. (2000b). Semi-automatic ontology acquisition from
a corporate intranet. In International Conference on Grammar Inference (ICGI-2000), to
appear: Lecture Notes in Artificial Intelligence, LNAI.
Kifer, M., Lausen, G., and Wu, J. (1995). Logical Foundations of Object-Oriented and Frame-
Based Languages. Journal of the ACM, 42(4):741-843.
Klettke, M. (1998). Acquisition of Integrity Constraints in Databases. DISDBIS 51. infix, Sankt
Augustin, Germany. In German.
Klettke, M., Bietz, M., Bruder, I., Heuer, A., Priebe, D., Neumann, G., Becker, M., Bedersdorfer,
1., Uszkoreit, H., Maedche, A., Staab, S., and Studer, R (2001). GETESS - Ontologien, objek-
trelationale Datenbanken und Textanalyse als Bausteine einer semantischen Suchmaschine.
Datenbank-Spektrum, 1(1): 14-24.
Kramer, R, Nikolai, R, and Habeck, C. (1997). Thesaurus federations: loosely integrated the-
sauri for document retrieval in networks based on Internet Technologies. Journal of Digital
Libraries, 1(2): 112-131.
Kuehn, L. (2001). Human resource topic broker (in german). Master's thesis, University of
Karlsruhe.
Kushmerick, N., Weld, D., and Doorenbos, R. (1997). Wrapper Induction for Information Ex-
traction. In UCAI-1997 - Proceedings ofthe 15th International Joint Conference on Artificial
Intelligence, Nagoya, Japan, August 23-29, 1997., pages 729-737, San Francisco. Morgen
Kaufmann.
Lacher, M. and Groh, G. (2001). Facilitating the exchange of explicit knowledge through ontology
mappings. In Proc.of 14th International FLAIRS conference, May 2001, pages 305-309.
AAAI Press.
Lamping, L., Rao, R., and Pirolli, P. (1995). A focus+context technique based on hyperbolic
geometry for visualizing large hierarchies. In Proceedings of the ACM SIGCHI Conference
on Human Factors in Computing Systems, pages 401-408.
Lang, S. and Lockemann, P. (1995). Datenbankeinsatz. Springer, Berlin.
Lassila, O. and Swick, R. (1999). Resource Description Framework (RDF) Model and Syntax
Specification. Technical report, W3C. W3C Recommendation.
http://www.w3.orgITRlREC-rdf-syntax.
Lee, L. (1999). Measures of distributional similarity. In Proceedings ofthe ACL '99, pages 25-32.
Levenshtein, I. V. (1966). Binary Codes capable of correcting deletions, insertions, and reversals.
Cybernetics and Control Theory, 10(8):707-710.
Linster, M. (1992). A review of sisyphus 91 and 92: Models of problem-solving knowledge. In
Aussenac, N., Boy, G., Gaines, B., Linser, M., Ganascia, J.-G., and Kordratoff, Y, editors,
Knowledge Acquisition for Knowledge-Based Systems, pages 159-182. Springer-Verlag.
Litkowski, K. (1978). Models of the semantic structure of dictionaries. Journal of Computational
Linguistics, 81 :25-74.
MacGregor, R. (1991). Inside the LOOM description classifier. SIGART Bulletin, 2(3):88-92.
Maedche, A., Neumann, G., and Staab, S. (2001 a). Bootstrapping an Ontology-Based Informa-
tion Extraction System. Intelligent Exploration of the Web, Series "Studies in Fuzziness and
Soft Computing", Springer.
Maedche, A., Schnurr, H.-P., Staab, S., and Studer, R. (2000). Representation language-neutral
modeling of ontologies. In Frank, editor, Proceedings ofthe German Workshop "Modellierung-
2000". Koblenz, Germany, April, 5-7, 2000. FOlbach-Verlag.
Maedche, A. and Staab, S. (2000a). Discovering conceptual relations from text. In ECAl-2000 -
European Conference on Artificial Intelligence. Proceedings ofthe 13th European Conference
on Artificial Intelligence. lOS Press, Amsterdam.
Maedche, A. and Staab, S. (2000b). Mining ontologies from text. In Proceedings of EKAW-2000,
Springer Lecture Notes in Artificial Intelligence (LNAI-1937), Juan-Les-Pins, France, 2000.
Springer.
Maedche, A. and Staab, S. (2000c). Semi-automatic engineering of ontologies from text. In Pro-
ceedings of the 12th Internal Conference on Software and Knowledge Engineering. Chicago,
USA, July, 5-7, 2000. KSI.
Maedche, A. and Staab, S. (2001 a). Learning ontologies for the semantic web. In WWW-ll
Workshop on the Semantic Web, Hong-Kong, 2001.
Maedche, A. and Staab, S. (2001b). On Comparing Ontologies. Technical report, Number 408,
Institute AIFB, Univ. of Karlsruhe.
Maedche, A. and Staab, S. (2001c). Ontology learning for the semantic web. IEEE Intelligent
Systems, 16(2).
Maedche, A., Staab, S., Nedellec, C., and Hovy, E., editors (2001 b). Proceedings of the IJ-
CAl'2001 Workshop on Ontology Learning - OL'2001.
Maedche, A., Staab, S., Stojanovic, N., Studer, R., and Sure, Y (2001c). SEmantic PortAL -
The SEAL approach. to appear in: Creating the Semantic Web. D. Fensel, J. Hendler, H.
Lieberman, W. Wahlster (eds.) MIT Press, MA, Cambridge.
Maedche, A., Staab, S., and Studer, R. (2001 d). Ontologien. Wirtschaftsinformatik,43(4).
REFERENCES 237
Maedche, A. and Volz, R. (2001). The Ontology Extraction & Maintenance Environment Text-
To-Onto. In Proceedings of the ICDM-2001 Workshop on the Integration of Data Mining and
Knowledge Management, San Jose, USA, November, 31, 2001.
Manning, C. and Schuetze, H. (1999). Foundations of Statistical Natural Language Processing.
MIT Press, Cambridge, Massachusetts.
Markert, K. and Hahn, U. (1997). On the interaction of metonymies and anaphora. In Proc. of
IJCAI-97, pages lO10-1015.
McGuinness, D., Fikes, R., Rice, 1., and Wilder, S. (2000). The Chimaera ontology environment.
In Proc. of AAAI-2000, pages 1123-1124.
Michalski, R. and Kaufmann, K. (1998). Data mining and knowledge discovery: A review of
issues and multi strategy approach. In Machine Learning and Data Mining Methods and
Applications. John Wiley, England.
Mikheev, A. and Finch, S. (1997). A workbench for finding structure in text. In Proceedings
of the Conference on Applied Natural Language Processing (ANLP-97), Washington D.C.,
i997.
Miller, G. (1996). The science of words. Freeman, New York.
Morgenstern, L. (1998). Inheritance comes of age: Applying nonmonotonic techniques to prob-
lems in industry. Artificial Intelligence, 103(1-2):237-271.
Mori, M. (1997). Finite-state transducers in language and speech processing. Computational
Linguistics, 23(3).
Morik, K. (1990). Integrating manual and automatic knowledge acquisition - BLIP. In Readings
in, Knowledge Acquisition: Current Practices and Trends. Ellis Horwood series in artificial
intelligence, Horwood, i990.
Morik, K. (1993). Balanced cooperative modeling. Machine Learning, 11:217-235.
Morik, K. and Brockhausen, P. (1996). A Multistrategy Approach to Relational Knowledge
Discovery in Databases. In Proceedings of the AAAi Workshop on Multistrategy Learning
(MSL-96), Palo Alto, 1996.
Morik, K., Wrobel, S., Kietz, 1.-U., and Emde, W. (1993a). Knowledge acquisition and machine
learning: Theory, methods, and applications. Academic Press, London.
Morik, K., Wrobel, S., Kietz, J.-U., and Emde, W. (1993b). Knowledge acquisition and machine
learning: Theory, methods, and applications. Academic Press, London.
Morin, E. (1999). Automatic acquisition of semantic relations between terms from technical
corpora. In Proc. of the Fifth International Congress on Terminology and Knowledge Engi-
neering - TKE'99.
MUC7 (1998). MUC-7 - Proceedings of the 7th Message Understanding Conference.
Mueller, H. A., Jahnke, J. H., Smith, D. B., Storey, M.-A., Tilley, S. R., and Wong, K. (2000).
Reverse Engineering: A Roadmap. In Proceedings of the 22nd international Conference on
Software Engineering (ICSE-2000), Limerick, Ireland. Springer.
Muggleton, S. (1992). Inductive Logic Programming. Academic Press.
Nedellec, C. and Causse, K. (1992). Knowledge Refinement using Knowledge Acquisition
and Machine Learning. In Proceedings of the European Knowledge Acquisition Workshop
(EKAW-92), 1992. Springer.
Nestorov, S., Abiteboul, S., and Motwani, R. (1997). Inferring Structure in Semistructured Data.
volume 26, pages 39-43.
Neuhaus, P. and Hahn, U. (1996). Trading of completeness for efficiency: The Parse Talk perfor-
mance grammer approach to real world text parsing. In FLAIRS-96: Proceedings of the 6th
Florida Artificial Intelligence Research Symposium, Key West, Florida, May 20-22.
Neumann, G:, Backofen, R., Baur, 1., Becker, M., and Braun, C. (1997). An information extrac-
tion core system for real world german text processing. In ANLP '97 - Proceedings of the
Conference on Applied Natural Language Processing, pages 208-215, Washington, USA.
Noy, N. and Hafner, C. (1997). The State of the Art in Ontology Design - A Survey and Com-
parative Review. AI Magazine, 36(3).
Noy, N. F., Fergerson, R. w., and Musen, M. (2000). The knowledge model of protege-20oo:
Combining interoperability and flexibility. In Proceedings of the Conference on Knowledge
Acquisition and Management (EKAW-2000), Juan-Les-Pins, France, pages 17-32.
Noy, N. F. and Musen, M. A. (2000). PROMPT: Algorithm and Tool for Automated Ontology
Merging and Alignment. In Proceedings of the 17th National Con! on Artificial Intelligence
(AAAI'2000), Austin, Texas., pages 450-455. MIT Press/AAAI Press.
Noy, N. F. and Musen, M. A. (2001). Anchor-PROMPT: Using non-local context for semantic
matching. In Proceedings of the UCAI-2001 Workshop on Ontologies & Information Fusion.
Seattle, August 03, 2001.
Noy, N. F., Sintek, M., Decker, S., Crubezy, M., Fergerson, R. w., and Musen, M. A. (2001).
Creating semantic web contents with protege-2ooo. IEEE Intelligent Systems, 16(2).
Nwana, H. S. (1995). Software agents: An overview. Knowledge Engineering Review, 11(2):205-
244.
Ogden, C. and Richards, I. (1923). The Meaning ofMeaning: A Study ofthelnfluence ofLanguage
upon Thought and of the Science of Symbolism. Routledge & Kegan Paul Ltd., London, 10
edition.
Omelayenko, B. (2001). Learning of Ontologies for the Web: the Analysis of Existent Ap-
proaches. In Proc. of the International Workshop on Web Dynamics, London, UK, 2001.
Papazoglou, M., Proper, H., and Yang, J. (2000). Landscaping the information space of large
multi-database networks. In ????
Papazoglou, M. P., Proper, H. A., and Yang, J. (1995). Knowledge navigation in networked digital
libraries. Data and Knowledge Engineering, 36(3):251-281.
Peirce, C. (1885). On the Algebra of Logic. American Journal of Mathematics.
Pereira, F., Tishby, N., and Lee, L. (1993). Distributation Clustering of English Words. In Pro-
ceedings of the ACL-93, 1993, pages 183-199.
Pemelle, N., Rousset, M. C., and Ventos, Y. (2001). Automatic Construction and Refinement
of a Class Hierarchy over Semistructured Data. In Maedche, A., Staab, S., Nedellec, c.,
and Hovy, E., editors, Proceedings of UCAI-Ol Workshop on Ontology Learning OL-2001,
Seattle, August 2001, Menlo Park. AAAI Press.
Peterson, B., Andersen, w., and Engel, J. (1998). Knowledge bus: Generating application--
focused databases from large ontologies. In Proc of KRDB 1998, Seattle, Washington, USA,
pages 2.1-2.10.
Pinkerton, B. (1994). Finding What People Want: Experiences with the WebCrawler. In WWW2
- Proceedings of the 2nd International World Wide Web Conference, Chicago, USA, May,
October 17-20, 1994.
Pirlein (1995). Wiederverwendung von Common Sense Ontologien im Knowledge Engineering
(in german). PhD thesis, University of Karlsruhe.
Piskorski, 1. and Neumann, G. (2000). An intelligent text extraction and navigation system. In
Proceedings of the 6th Conference on Computer-Assisted Information Retrieval, Paris, 2000.
Rada, R., Mili, H., Bicknell, E., and Blettner, M. (1989). Development and application of a metric
on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics, 19(1).
Ramanathan, S. and Hodges, J. (1997). Extraction of object-oriented structures from existing
relational databases. SIGMOD Record, 26(1):59-M.
Reimer, U. (1990). Automatic knowledge acquisition from texts: Learning terminological knowl-
edge via text understanding as inductive generalization. In Proceedings of the Workshop on
Knowledge Acquisiiton and Knowledge-based Systems (KAW-90), Banff, 1990.
REFERENCES 239
Rennie,1. and McCallum, A. (1999). Using Reinforcement Learning to Spider the Web Effi-
ciently. In Proceedings of the International Conference on Machine Learning (ICML-99),
1999.
Resnik, P. S. (1993). Selection and Information: A Class-based Approach to Lexical Relation-
ships. PhD thesis, University of Pennsylania.
Richardson, S. D. (1997). Determing Similarity and Inferring Relations in a Lexical Knowledge
Base. PhD thesis, City University of New York.
Romacker, M., Markert, K., and Hahn, U. (1999). Lean semantic interpretation. In Proc. of
IlCAl-99, pages 868-875.
Rousselot, F., Barthelemy, T., de Beuvron, F., Frath, P., and Oueslati, R. (1996). Terminolog-
ical competence and knowledge acquisition from texts. In Proceedings of the EKAW'I996
Workshop.
Sahuguet, A. and Azavant, F. (1999). Building light -weight wrappers for legacy web data-sources
using w4f. In VLDB '99, Proceedings of 25th International Conference on Very Large Data
Bases, September 7-10, 1999, Edinburgh, Scotland, UK, pages 738-741. Morgan Kaufmann.
Salton, G. (1988). Automatic Text Processing. Addison-Wesley.
Sanderson, M. and Croft, B. (1999). Deriving Concept Hierarchies from Text. In Proceedings
of the International Conference on Information Retrieval- SIGIR'99, August 1999, Berkley
CA, USA.
Schlobach, S. (2000). Assertional mining in description logics. In Proceedings of the 2000
International Workshop on Description Logics (DL2000), pages 89-97.
http://SunSITE.Informatik.RWTH-Aachen.DElPublications/CEUR -WSIV01-33/.
Schmitt, I. and Saake, G. (1997). Merging inheritance hierarchies for database integration. In
Proc. of the 3rd International Conference on Cooperative Information Systems - CooplS'98,
pages 322-331. IEEE Computer Science Press.
Scholze, J. and Woods, W. (1992). The kl-one family. F Lehmann (ed .. ), Semantic Networks in
Artificial Intelligence, Pergamon Press.
Schreiber, A. T. and Birmingham, W. P. (1996). The Sisyphus-VT initiative. International Journal
of Human-Computer Studies, 44(3/4).
Sheth, A. and Larsen, J. (1990). Federated database systems for managing distributed, hetero-
geneous and autonomous databases. ACM Computing Surveys, 22(3).
Skuce, D., Matwin, S., Tauzovich, B., Oppacher, F., and Szpakowicz, S. (1985). A logic-based
knowledge source system for natural language documents. Data and Knowledge Engineering,
1:201-231.
Spanoudakis, G. and Constantopoulos, P. (1994). Similarity for analogical software reuse: A
computational model. In Proc. of ECAl-1994, pages 18-22.
Sparck-Jones, K. and Willett, P., editors (1997). Readings in Information Retrieval. Morgan
Kaufmann.
Srikant, R. and Agrawal, R. (1995). Mining generalized association rules. In Proc. ofVLDB '95,
pages 407-419.
Srikant, R., Vu, Q., and Agrawal, R. (1997). Mining association rules with item constraints. In
Proceedings of the AAAI'97.
Staab, S., Angele, J., Decker, S., Erdmann, M., Hotho, A., Maedche, A., Studer, R., and Sure,
y. (2000a). Semantic Community Web Portals. In Proceedings of the 9th World Wide Web
Conference (WWW-9), Amsterdam, Netherlands.
Staab, S., Braun, C., Dtisterhoft, A., Heuer, A., Klettke, M., Melzig, S., Neumann, G., Prager,
B., Pretzel, J., Schnurr, H.-P., Studer, R., Uszkoreit, H., and Wrenger, B. (1999). GETESS-
searching the web exploiting german texts. In Proceedings of the 3rd Workshop on Cooper-
ative Information Agents, LNCS-1652, Berlin. Springer.
Staab, S., Erdmann, M., and Maedche, A. (2001a). Engineering Ontologies using Semantic Pat-
terns. In Proceedings of the UCAI-2001 Workshop on E-Business & Intelligent Web. Seattle,
August 03,2001.
Staab, S., Erdmann, M., and Maedche, A. (2001b). From manual to semi-automatic seman-
tic annotation: About ontology-based text annotation tools. ETAI - Semantic Web Journal,
Linkoeping Electronic Articles, 16(1).
Staab, S., Erdmann, M., Maedche, A., and Decker, S. (2000b). An extensible approach for
modeling ontologies in RDF(S). In (Christophides and Plexousakis, 2(00).
Staab, S. and Maedche, A. (2000). Ontology engineering beyond the modeling of concepts and
relations. In Benjamins, v., Gomez-Perez, A., and Guarino, N., editors, Proceedings of the
ECAI-2000 Workshop on Ontologies and Problem-Solving Methods. Berlin, August 21-22,
2000.
Staab, S. and Maedche, A. (200 1). Knowledge portals - ontologies at work. A1 Magazine, 21 (2).
Staab, S., Maedche, A., Nedellec, c., and Hastings, P., editors (2000c). Proceedings of the
ECAl'2000 Workshop on Ontology Learning - OL'2000.
Staab, S. and Schnurr, H.-P. (2000). Smart task support through proactive access to organizational
memory. Journal of Knowledge-based Systems, 13(5). Special issue on AI and Knowledge
Management.
Staab, S., Schnurr, H.-P., Studer, R., and Sure, Y. (2001c). Knowledge processes and ontologies.
IEEE Intelligent Systems, 16(1).
Stumme, G. and Maedche, A. (2001a). Developing Federated Ontologies for the Semantic Web
using FCA-MERGE. In Proceedings of the UCA/-2001 Workshop on Ontologies & Infor-
mation Fusion. Seattle, August 03, 2001.
Stumme, G. and Maedche, A. (2001b). FCA-Merge: Bottom-Up Merging of Ontologies. In
UCAI-2001 - Proceedings of the 17th International Joint Conference on Artificial Intelli-
gence, Seattle, USA, August, 1-6,2001, San Francisco. Morgen Kaufmann.
Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., and Lakhal, L. (2000). Fast computation of
concept lattices using data mining techniques. In Proc. KRDB '00, Berlin, 2000. CEUR-
Workshop Proc.,
http://sunsite.inforrnatik.rwth-aachen.de/Publications/CEUR-WS/.
Sure, Y., Maedche, A., and Staab, S. (2000). Leveraging corporate skill knowledge - From ProPer
to OntoProper. In Mahling, D. and Reimer, U., editors, Proceedings ofthe Third 1nternational
Conference on Practical Aspects of Knowledge Management. Basel, Switzerland, October
30-31, 2000.
http://www.research.swisslife.chlpakm2000/.
Swartout, B., Patil, R., Knight, K., and Russ, T. (1996). Toward distributed use of large-scale
ontologies. In Proc. the 10th International Workshop on Knowledge Acquisition, Modeling
and Mangement (KAW'96), Banff, Canada, November 9-14, 1996.
Szpakowicz, S. (1990). Semi-automatic acquisition of conceptual structure from technical texts.
International Journal of Man-Machine Studies, 33(4):385-397.
Tari, Z., Bukhres, 0., Stokes, 1., and Harnmoudi, S. (1998). The Reengineering of Relational
Databases based on Key and Data Correlations. In Proceedings of the 7th Conference on
Database Semantics (DS-7), 7-10 October 1997, Leysin, Switzerland. Chapman & Hall.
Uschold, M. and Gruninger, M. (1996). Ontologies: Principles, methods and applications. Knowl-
edge Sharing and Review, 11(2):93-155.
Uschold, M. and King, M. (1995). Towards a Methodology for Building Ontologies. In Proceed-
ings of the IJCAl'95 Workshop on Basis Ontological Issues in Knowledge Sharing.
van Heijst, G. (1995). The Role ofOntologies in Knowledge Engineering. PhD thesis, Universiteit
van Amsterdam.
REFERENCES 241
van Heijst, G., Schreiber, A. T., and Wielinga, B. J. (1997). Using explicit ontologies for kbs
development. International Journal of Human-Computer Studies, 42(2): 183-292.
Vanderwende, L. H. (1995). The Analysis of Noun Sequences using Semantic Informnation
Extractedfrom On-Line Dictionaries. PhD thesis, Georgetown University.
Visser, P. and Tamma, V. (1999). An Experience with Ontology Clustering for Information
Integration. In Proceedings of the IJCAI'99 Workshop on Intelligent Information Integration,
Stockholm, Sweden, July 31, 1999.
Volz, R. (2000). Akquisition von Ontologien mit Text-Mining-Verfahren (in german). Master's
thesis, University of Karlsruhe.
Wang, K. and Liu, H. (1997). Schema Discovery for Semi-Structured Data. In Proceedings of
the Third International Conference on Knowledge Discovery and Data Mining (KDD-97),
Newport Beach, California, USA, August 14-17, 1997. AAAI Press.
Wang, K. and Liu, H. (1998). Discovering typical structures of documents: A road map approach.
In Proc of SIGlR-98.
Webb, G. (1996). Integrating Machine Learning with Knowledge Acquisition through direct
interaction with domain experts. volume 9, pages 236--252, Berlin. Springer Verlag.
Webb, G., Wells, J., and Zheng, Z. (1999). An Experimental Evaluation of Integrating Machine
Learning with Knowledge Acquisition. Machine Learning, 35(1):5-23.
Wei, F. (1999). F-logic Semantics and Implementation of Internet Metadata. Master's thesis,
University of Freiburg.
Weinstein, P. (1990). Integrating Ontological Metadata: algorithms that predict semantic com-
patibility. PhD thesis, Computer Science and Engineering Department, University of Michi-
gan, Pittsburgh, USA.
Weinstein, P. and Birmingham, W. (1999). Comparing concepts in differentiated ontologies. In
Proc. of KAW-99, Banff, Canada, 1999.
Welty, C. and Ide, N. (1999). Using the right tools: enhancing retrieval from marked-up docu-
ments. Journal Computers and the Humanities, 33(10):59-84.
Wersig, G. (1985). Thesaurus-Leitfaden. K.G.Saur Verlag KG, MUnchen.
Wiederhold, G. (1992). Mediators in the architecture offuture information systems. IEEE Com-
puter, 25(3):38-49.
Wiemer-Hastings, P., Graesser, A., and Wiemer-Hastings, K. (1998). Inferring the meaning of
verbs from context. In Proceedings of the Twentieth Annual Conference of the Cognitive
Science Society.
Wittgenstein, L. (1922). Tractuatus Logico-Philisophicus. Routledge & Kegan Paul Ltd., Lon-
don.
Wrobel, S. (1994). Concept Formation and Knowledge Revision. Kluwer, Dordrecht, Boston,
London.
Yangarber, R., Grishman, R., Tapanainen, P., and Huttunen, S. (2000). Automatic Acquisition
of Domain Knowledge for Information Extraction. In Proceedings of COLlNG '2000, Saar-
brucken, Germany, 2000.
Index
Application ontologies, 22 GermaNet,86

Architecture for Ontology Learning, 68
Association Rules, 137 Hierarchical Clustering, 129, 173
Attribute, III
Averaged relation overlap, 190
Information Extraction, 101
Averaged string matching, 184
Instance I, 20
Averaged taxonomic similarity, 187
Instances, 65
Intensional definition, 21
Balanced Cooperative Modeling, 68
Item constraints, 144
Class, 39
Classification of ontologies, 22 Knowledge base axioms AKB, 20
Clause level processing, 105 Knowledge base structure KB, 20, 21
Compound Analysis, 103 Knowledge Management, 25
Concept C, 18
Concept hierarchy He, 18 Lattice, 96
Concept match (CM), 189 Levenshtein Edit Distance, 184
Conceptual cotopy, 185 Lexical acquisition, 223
Conceptual similarity, 185 Lexical Analysis, 103
Confidence, 139 Lexical Entry Extraction, 173
Lexical similarity, 184
Data for Ontology Learning, 62 Lexicon C, 18
Data model component, 161
Database relation, III Maintenance, 147
Description Logics, 44 Meaning triangle, 14
Dictionary, 134
Merging, 89
Domain ontologies, 22
Metadata, 36
Morphology, 103
E-Business,25
Multi-Strategy Learning, 152
Evaluation, 179
Multirelational Data, 118
Extensional definition, 21
F-Logic,47 Named Entity Recognition, 105

FCA-Merge, 89 Natural Language Understanding, 25
Finite State Transducers, 102 NLP Architecture, 101
First-Order Logic, 44
Focused Crawling, 99 Object (in RDF), 37
Formal concept, 89 OIL, 71
Formal Concept Analysis, 90 OntoEdit, 163
Formal context, 89 Ontology 0, 18
Ontology axioms A 0 , 20 Resource Description Framework, 34

Ontology Comparison, 182 Reuse, 63
Ontology Engineering, 213
Ontology Extraction, 78 Schemata, 63
Ontology in Computer Science, 15 Semantic Patterns, 45
Ontology in philosophy, \3 Semantic Web, 24
Ontology Learning Cycle, 75 Semantic Web Mining, 237
Ontology Learning Phases, 75 Semi-structured data, 65
Ontology mapping, 52 Semi-structured Documents, 67
Ontology Server, 177 Semiotics, 18
Ontology structure 0,18 Shallow Text Processing, 100
Ontology-based Applications, 23 SiLRI, 72, 168
Spring-Embedding algorithm, 175
ParseTalk, 102 String matching, 184
Part-of-Speech (POS) Tagger, 104 Subcategorization Frame, 102
Pattern Debugger, 172 Subject (in RDF), 37
Precision, 6 Support, 138
Predicate (in RDF), 37 SWRC Ontology, 165
Preprocessing, 68 SWRC ontology, 165
Processing component, 162
Property, 39 Task ontologies, 22
Pruning, 78, 147,228 Taxonomic overlap, 186
Text-To-Onto, 161
RDF,36
Tfidf, 127
RDF(S) syntax, 41
Tokenizer, 103
RDF-Schema, 36, 38
Top-Level ontologies, 22
Recall,6
Tuple, III
Reference function F, g, 18
Reference functions F, g, I, 20
Refinement, 80, 149 Upwards cotopy UC, 188
Regular Expressions, \35
Reification, 36 Web Documents, 66
Relation S, 18 Web Schemata, 65
Relation overlap, 188 WordNet, 18,86
Relational Data, 110
Resource, 39 XML Namespaces, 40

Ontology Learning For The Semantic Web 2002

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ontology Learning For The Semantic Web 2002

Uploaded by

Copyright:

Available Formats

THE KLUWER INTERNATIONAL SERIES

IN ENGINEERING AND COMPUTER SCIENCE

SPRINGER SCIENCE+BUSINESS MEDIA, LLC

TK5105.888 .M33 2002

Copyright © 2002 by Springer Science+Business Media New York

Printed an acid-free paper.

2. ONTOLOGY - DEFINITION & OVERVIEW 11

Part II Ontology Learning for the Semantic Web

Part III Implementation & Evaluation

Part IV Related Work & Outlook

1.1 Reading this Book 8

5.6 Natural Language Processing System Architecture 98

8.9 Precision and Recall for Concept Hierarchy Modeling 186

3.1 Mapping of 0 and KB to F-Logic 48

last two decades when elaborating methodologies for knowledge acquisition

R. Studer, University of Karlsruhe

1. Motivation & Problem Description

This book is based on the idea of applying knowledge discovery techniques

Additionally, if one wants to connect ontologies with existing data sources,

Knowledge acquisition in the form of manual ontology engineering in general

• How do humans perform in ontology engineering compared to ontology

will be approached and results will be provided in the evaluation chapter of

• Chapter 2 introduces the origin of ontologies, a formal definition of an

• Chapter 3 provides a comprehensive framework for layered ontology en-

Part II - Ontology Learning for the Semantic Web.

Part III - Implementation & Evaluation.

1/ Ontology Learning for the

The Text-To-Onto Chapter 7

IV Related Work &

Figure 1.1. Reading this Book

Part IV - Related Work & Outlook.

• Chapter 10 concludes with a short summary of the methodological and

ONTOLOGY - DEFINITION & OVERVIEW

A sign, a representamen, is something which stands to somebody for something in some

"Ontology" is a philosophical discipline, a branch of philosophy that deals

Figure 2.1. The Meaning Triangle

The diagram illustrates that although symbols cannot completely capture

1. Ontologies for Communication - A Layered Approach

Figure 2.2. Ontologies for Communication

• Syntax deals with the study of relationships between signs.

DEFINITION 2.1 An ontology structure is a 5-tuple

• a concept hierarchy H C: H C is a directed relation H C ~ C x C which

• a function rel : 'R -+ C x C, that relates concepts non-taxonomicalll.

• A set of ontology axioms A 0, expressed in an appropriate logical language,

DEFINITION 2.2 A lexicon for the ontology structure

• two relations .1' ~ £c x C and g ~ £n x 'R called references for concepts

.1'(L) = {C E CI(L, C) E .1'}

An Example. Let us consider a short example of an instantiated ontology

/ works at _______ ~\_---------------. X4(X2.X3)

"' ......... _----_ ... -,"

DEFINITION 2.3 A knowledge base structure is a 4-tuple

• An ontology 0 := (C, n, He, rei, AO)

• A set I whose elements are called instances.

• A function inst : C -+ 2I called concept instantiation.

• Afunction instr : n -+ 2IxI called relation instantiation.

DEFINITION 2.4 A lexicon for the knowledge base structure

• a relation :1 ~ CI X I reference for instances, respectively. Based on :1,

The concrete realization and representation of a knowledge base in a specific

An Example. The following is a short example of an instantiated knowledge

2. Development & Application of Ontologies

Classification of Ontologies. The ontology and knowledge base definitions

alization as a main criterion has been introduced by Guarino (Guarino, 1998).

Figure 2.4. Different Kinds of Ontologies and Their Relationship