Professional Documents
Culture Documents
Knowledge,
Information and Data:
Copyright © 2005 by Idea Group Inc. All rights reserved. No part of this book may be
reproduced in any form or by any means, electronic or mechanical, including photocopying,
without written permission from the publisher.
Transformation of knowledge, information and data : theory and applications / Patrick van
Bommel, editor.
p. cm.
Includes bibliographical references and index.
ISBN 1-59140-527-0 (h/c) — ISBN 1-59140-528-9 (s/c) — ISBN 1-59140-529-7 (eisbn)
1. Database management. 2. Transformations (Mathematics) I. Bommel, Patrick van, 1964-
QA76.9.D3T693 2004
005.74—dc22 2004017926
British Cataloguing in Publication Data
A Cataloguing in Publication record for this book is available from the British Library.
All work contributed to this book is new, previously-unpublished material. The views expressed
in this book are those of the authors, but not necessarily of the publisher.
Transformation of
Knowledge, Information
and Data:
Theory and Applications
Table of Contents
Preface ............................................................................................................. vi
Chapter I
Transformation-Based Database Engineering ........................................... 1
Jean-Luc Hainaut, University of Namur, Belgium
Chapter II
Rule-Based Transformation of Graphs and the Product Type .............. 29
Renate Klempien-Hinrichs, University of Bremen, Germany
Hans-Jörg Kreowski, University of Bremen, Germany
Sabine Kuske, University of Bremen, Germany
Chapter III
From Conceptual Database Schemas to Logical Database Tuning ...... 52
Jean-Marc Petit, Université Clermont-Ferrand 2, France
Mohand-Saïd Hacid, Université Lyon 1, France
Chapter IV
Transformation Based XML Query Optimization ................................... 75
Dunren Che, Southern Illinois University, USA
Chapter V
Specifying Coherent Refactoring of Software Artefacts with
Distributed Graph Transformations ........................................................... 95
Paolo Bottoni, University of Rome “La Sapienza”, Italy
Francesco Parisi-Presicce, University of Rome “La Sapienza”, Italy
and George Mason University, USA
Gabriele Taentzer, Technical University of Berlin, Germany
Chapter VI
Declarative Transformation for Object-Oriented Models .................. 127
Keith Duddy, CRC for Enterprise Distributed Systems Technology
(DSTC), Queensland, Australia
Anna Gerber, CRC for Enterprise Distributed Systems Technology
(DSTC), Queensland, Australia
Michael Lawley, CRC for Enterprise Distributed Systems Technology
(DSTC), Queensland, Australia
Kerry Raymond, CRC for Enterprise Distributed Systems Technology
(DSTC), Queensland, Australia
Jim Steel, CRC for Enterprise Distributed Systems Technology
(DSTC), Queensland, Australia
Chapter VII
From Conceptual Models to Data Models ............................................ 148
Antonio Badia, University of Louisville, USA
Chapter VIII
An Algorithm for Transforming XML Documents Schema into
Relational Database Schema .................................................................... 171
Abad Shah, University of Engineering & Technology (UET),
Pakistan
Jacob Adeniyi, King Saud University, Saudi Arabia
Tariq Al Tuwairqi, King Saud University, Saudi Arabia
Chapter IX
Imprecise and Uncertain Engineering Information Modeling in
Databases: Models and Formal Transformations ................................ 190
Z. M. Ma, Université de Sherbrooke, Canada
Chapter X
Analysing Transformations in Performance Management .................. 217
Bernd Wondergem, LogicaCMG Consulting, The Netherlands
Norbert Vincent, LogicaCMG Consulting, The Netherlands
Chapter XI
Multimedia Conversion with the Focus on Continuous Media ......... 235
Maciej Suchomski, Friedrich-Alexander University of
Erlangen-Nuremberg, Germany
Andreas Märcz, Dresden, Germany
Klaus Meyer-Wegener, Friedrich-Alexander University of
Erlangen-Nuremberg, Germany
Chapter XII
Coherence in Data Schema Transformations: The Notion of Semantic
Change Patterns ........................................................................................ 257
Lex Wedemeijer, ABP Pensioenen, The Netherlands
Chapter XIII
Model Transformations in Designing the ASSO Methodology ......... 283
Elvira Locuratolo, ISTI, Italy
Preface
Background
Data today is in motion, going from one location to another. It is more and more
moving between systems, system components, persons, departments, and orga-
nizations. This is essential, as it indicates that data is actually used, rather than
just stored. In order to emphasize the actual use of data, we may also speak of
information or knowledge.
When data is in motion, there is not only a change of place or position. Other
aspects are changing as well. Consider the following examples:
• The data format may change when it is transferred between systems.
This includes changes in data structure, data model, data schema, data
types, etc.
• Also, the interpretation of data may vary when it is passed on from one
person to another. Changes in interpretation are part of data semantics
rather than data structure.
• The level of detail may change in the exchange of data between depart-
ments or organizations, e.g., going from co-workers to managers or from
local authorities to the central government. In this context, we often see
changes in level of detail by the application of abstraction, aggregation,
generalization, and specialization.
• Moreover, the systems development phase of data models may vary.
This is particularly the case when implementation-independent data mod-
els are mapped to implementation-oriented models (e.g., semantic data
models are mapped to operational database specifications).
Framework
Transformation techniques received a lot of attention in academic as well as in
industrial settings. Most of these techniques have one or more of the following
problems:
• Loss of data: the result of the transformation does not adequately de-
scribe the original data.
• Incomprehensibility: the effect of the transformation is not clear.
• Focus on instances: data instances are transformed, without incorpora-
tion of data types.
• Focus on types: data types are transformed, without incorporation of
data instances.
• Correctness: the transformation has no provable correctness.
Correctness
Evidently, the correctness of transformations is of vital importance. What pur-
pose would transformations have, if the nature of the result is uncertain? A
general setup for guaranteeing transformation correctness consists of three
steps.
ix
transformation kernel
source target
operations operations
operation transformation
Sequences of Transformations
Transformations may be composed or applied in sequences. Such sequences
sometimes consist of a relatively small number of steps. In more complex prob-
lem areas, however, this is no longer possible. Then, transformation sequences
will be longer and due to the various options in each transformation step, the
outcome of the overall sequence is not a priori known. This is particularly the
case when non-deterministic (e.g., random or probabilistic) transformation pro-
cesses are considered.
x
Although the outcome is not a priori known, it is often desirable to predict the
nature of the result. One way of predicting the behavior of probabilistic trans-
formation processes, is through the use of Markov theory. Here the probabili-
ties of a single transformation step are summarized in a transition matrix, such
that transformation sequences can be considered by matrix multiplication.
We will illustrate the definition of a single-step matrix for two basic cases. In
the first case, consider a transformation in a solution space S where each input
x∈S has as possible output some y∈N(x), where N(x)⊆S and x∉N(x). So each
neighbor y∈N(x) can be produced from x by the application of some transfor-
mation rule. Then the probability P(x,y) for the transformation of x into some
y∈N(x) has the following property:
Evidently for y∉N(x) we have P(x,y)=0. With this property it is guaranteed that
P(x,y) is a stochastic matrix, since 0 ≤ P(x,y) ≤ 1 and Σy∈S P(x,y) = 1. Note that
in the above transformation the production of all results is equally likely.
In the second case, we consider situations where the production of all results is
not equally likely. Consider a transformation in a solution space S where each
input x∈S has as possible output some y∈B(x), where B(x)⊆N(x) contains all
better neighbors of x. Then the probability P(x,y) for the transformation of x
into some y∈B(x) is given by the above mentioned formula (1). However, as a
result of accepting only improving transformations, this formula now does not
guarantee P(x,y) to be a stochastic matrix. The consequence of rejecting all
neighbours in N(x)-B(x) is, that a transformation may fail. So now we have to
consider P(x,x). This probability has the following property:
Fundamentals of Transformations
Section I is about fundamentals and consists of five chapters. The focus of
Chapter I is databases: Transformation-Based Database Engineering. Here
we consider the basic theory of the transformation of data schemata, where
reversibility of transformations is also considered. We describe the use of basic
transformations in the construction of more complex (higher-level) transforma-
tions. Several possibilities are recognized here, including compound transfor-
mations, and predicate-driven and model-driven transformations. Basic trans-
formations and their higher-level derivations are embedded within database (for-
ward) design processes as well as within database reverse design processes.
Most models to be transformed are defined in terms of graphs. In Chapter II
we will therefore focus on graph transformations: Rule-Based Transforma-
tion of Graphs and the Product Type. Graph transformations are based on
rules. These rules yield new graphs, produced from a given graph. In this ap-
proach, conditions are used to have more control over the transformation pro-
cess. This allows us to indicate the order of rule application. Moreover, the
result (product) of the transformation is given special attention. In particular,
the type of the product is important. This sets the context for defining the pre-
cise relation between two or more graph transformations.
Having embedded our transformations within the graph transformation context,
Chapter III proceeds with graphs for concrete cases: From Conceptual Data-
base Schemas to Logical Database Tuning. Here we present several algo-
rithms, aiming at the production of directed graphs. In databases we have sev-
eral aims in transformations, including efficiency and freedom from null values.
Note that wellformedness of the input (i.e., a conceptual model) as well as
wellformedness of the output (i.e., the database) is addressed.
It is evident that graphs have to be transformed, but what about operations on
graphs? In systems design this corresponds with query transformation and op-
timization. We apply this to markup languages in Chapter IV: Transformation
Based XML Query Optimization. After representing document type defini-
tions in terms of a graph, we consider paths in the graph and an algebra for text
search. Equivalent algebraic expressions set the context for optimization, as we
know from database theory. Here we combine the concepts from previous chap-
ters, using rule-based transformations. However, the aim of the transformation
process now is optimization.
In Chapter V, the final chapter of Section I, we consider a highly specialized
fundament in the theory behind applications: Specifying Coherent Refactoring
of Software Artefacts with Distributed Graph Transformations. Modifica-
tions in the structure of systems are recorded in terms of so-called “refactoring”.
This means that a coordinated evolution of system components becomes pos-
xii
Elaboration of
Transformation Approaches
In Section II, we consider elaborated approaches to transformation. The focus
of Chapter VI is object-oriented transformation: Declarative Transformation
for Object-Oriented Models. This is relevant not only for object-oriented data
models, but for object-oriented programming languages as well. The transfor-
mations under consideration are organized according to three styles of trans-
formation: source-driven, target-driven, and aspect-driven transformations. Al-
though source and target will be clear, the term “aspect” needs some clarifica-
tion. In aspect-driven transformations, we use semantic concepts for setting up
the transformation rule. A concrete SQL-like syntax is used, based on rule —
forall — where — make — linking statements. This also allows for the defini-
tion of patterns.
It is generally recognized that in systems analysis we should use conceptual
models, rather than implementation models. This creates the context for trans-
formations of conceptual models. In Chapter VII we deal with this: From Con-
ceptual Models to Data Models. Conceptual models are often expressed in
terms of the Entity-Relationship approach, whereas implementation models are
often expressed in terms of the relational model. Classical conceptual model
transformations thus describe the mapping from ER to relational models. Hav-
ing UML in the conceptual area and XML in the implementation area, we now
also focus on UML to XML transformations.
We proceed with this in the next chapter: An Algorithm for Transforming
XML Documents Schema into Relational Database Schema. A typical ap-
proach to the generation of a relational schema from a document definition,
starts with preprocessing the document definition and finding the root node of
the document. After generating trees and a corresponding relational schema,
we should determine functional dependencies and other integrity constraints.
During postprocessing, the resulting schema may be normalized in case this is
desirable. Note that the performance (efficiency) of such algorithms is a criti-
cal factor. The proposed approach is illustrated in a case study based on library
documents.
Transformations are often quite complex. If data is inaccurate, we have a fur-
ther complication. In Chapter IX we deal with this: Imprecise and Uncertain
Engineering Information Modeling in Databases: Models and Formal
Transformations. Uncertainty in information modeling is usually based on fuzzy
xiii
Additional Topics
In Section III, we consider additional topics. The focus of Chapter X is the
application of transformations in a new area: Analysing Transformations in
Performance Management. The context of these transformations is an orga-
nizational model, along with a goal model. This results in a view of organiza-
tional management based on cycles of transformations. Typically, we have trans-
formations of organizational models and goal models, as well as transforma-
tions of the relationship between these models. Basic transformations are the
addition of items and detailing of components.
Next we proceed with the discussion of different media: Multimedia Conver-
sion with the Focus on Continuous Media. It is evident that the major chal-
lenge in multimedia research is the systematic treatment of continuous media.
When focusing on transformations, we enter the area of streams and convert-
ers. As in previous chapters, we again base ourselves on graphs here, for in-
stance chains of converters, yielding a graph of converters. Several qualities
are relevant here, such as quality of service, quality of data, and quality of
experience. This chapter introduces specific transformations for media-type
changers, format changers, and content changers.
The focus of Chapter XII is patterns in schema changes: Coherence in Data
Schema Transformations: The Notion of Semantic Change Patterns. Here
we consider updates of data schemata during system usage (operational
schema). When the schema is transformed into a new schema, we try to find
coherence. A catalogue of semantic changes is presented, consisting of a num-
ber of basic transformations. Several important distinctions are made, for ex-
ample, between appending an entity and superimposing an entity. Also, we have
the redirection of a reference to an owner entity, along with extension and
restriction of entity intent. The basic transformations were found during empiri-
cal studies in real-life cases.
In Chapter XIII, we conclude with the advanced approach: Model Transfor-
mations in Designing the ASSO Methodology. The context of this methodol-
ogy is ease of specifying schemata and schema evolution during system usage.
The transformations considered here particularly deal with subtyping (also called
is-a relationships). This is covered by the transformation of class hierarchies or
more general class graphs. It is evident that schema consistency is one of the
properties required. This is based on consistency of class definitions, with in-
xiv
Conclusions
This book contains theory and applications of transformations in the context of
information systems development. As data today is frequently moving between
systems, system components, persons, departments, and organizations, the need
for such transformations is evident.
When data is in motion, there is not only a change of place or position. Other
aspects are changing as well. The data format may change when it is trans-
ferred between systems, while the interpretation of data may vary when it is
passed on from one person to another. Moreover, the level of detail may change
in the exchange of data between departments or organizations, and the systems
development phase of data models may vary, e.g., when implementation-inde-
pendent data models are mapped to implementation-oriented models.
The theory presented in this book will help in the development of new innova-
tive applications. Existing applications presented in this book prove the power
of current transformation approaches. We are confident that this book contrib-
utes to the understanding, the systematic treatment and refinement, and the
education of new and existing transformations.
Further Reading
Kovacs, Gy. & van Bommel, P. (1997). From conceptual model to OO data-
base via intermediate specification. Acta Cybernetica, (13), 103-140.
Kovacs, Gy. & van Bommel, P. (1998). Conceptual modelling based design of
object-oriented databases. Information and Software Technology, 40(1), 1-14.
van Bommel, P. (1993, May). A randomised schema mutator for evolutionary
database optimisation. The Australian Computer Journal, 25(2), 61-69.
van Bommel, P. (1994). Experiences with EDO: An evolutionary database
optimizer. Data & Knowledge Engineering, 13, 243-263.
van Bommel, P. (1995, July). Database design by computer aided schema trans-
formations. Software Engineering Journal, 10(4), 125-132.
van Bommel, P., Kovacs, Gy. & Micsik, A. (1994). Transformation of database
populations and operations from the conceptual to the Internal level. In-
formation Systems, 19(2), 175-191.
xv
van Bommel, P., Lucasius, C.B. & Weide, Th.P. van der (1994). Genetic algo-
rithms for optimal logical database design. Information and Software
Technology, 36(12), 725-732.
van Bommel, P. & Weide, Th.P. van der (1992). Reducing the search space for
conceptual schema transformation. Data & Knowledge Engineering, 8,
269-292.
Acknowledgments
The editor gratefully acknowledges the help of all involved in the production of
this book. Without their support, this project could not have been satisfactorily
completed. A further special note of thanks goes also to all the staff at Idea
Group Publishing, whose contributions throughout the whole process from in-
ception of the initial idea to final publication have been invaluable.
Deep appreciation and gratitude is due to Theo van der Weide and other mem-
bers of the Department of Information Systems at the University of Nijmegen,
The Netherlands, for the discussions about transformations of information models.
Most of the authors of chapters included in this book also served as reviewers
for chapters written by other authors. Thanks go to all those who provided
constructive and comprehensive reviews. Special thanks also go to the publish-
ing team at Idea Group Publishing, in particular to Michele Rossi, Carrie
Skovrinskie, Jan Travers, and Mehdi Khosrow-Pour.
In closing, I wish to thank all of the authors for their insights and excellent
contributions to this book.
Fundamentals of
Transformations
Transformation-Based Database Engineering 1
Chapter I
Transformation-Based
Database Engineering
Jean-Luc Hainaut, University of Namur, Belgium
Abstract
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
2 Hainaut
These definitions still hold for database schemas, which are special kinds of
abstract program schemes. The concept of transformation is particularly attrac-
tive in this realm, though it has not often been made explicit (for instance, as a
user tool) in current CASE tools. A (schema) transformation is most generally
considered to be an operator by which a data structure S1 (possibly empty) is
replaced by another structure S2 (possibly empty) which may have some sort of
equivalence with S1. Some transformations change the information contents of
the source schema, particularly in schema building (adding an entity type or an
attribute) and in schema evolution (removing a constraint or extending a
relationship type). Others preserve it and will be called semantics-preserving or
reversible. Among them, we will find those which just change the nature of a
schema object, such as transforming an entity type into a relationship type or
extracting a set of attributes as an independent entity type.
Transformations that are proved to preserve the correctness of the original
specifications have been proposed in practically all the activities related to
schema engineering: schema normalization (Rauh, 1995), DBMS2 schema
translation (Hainaut, 1993b; Rosenthal, 1988), schema integration (Batini, 1992;
McBrien, 2003), schema equivalence (D’Atri, 1984; Jajodia, 1983; Kobayashi,
1986; Lien, 1982), data conversion (Navathe, 1980; Estiévenart, 2003), reverse
engineering (Bolois, 1994; Casanova, 1984; Hainaut, 1993, 1993b), schema
optimization (Hainaut, 1993b; Halpin, 1995) database interoperability (McBrien,
2003; Thiran, 2001) and others. The reader will find in Hainaut (1995) an
illustration of numerous application domains of schema transformations.
The goal of this chapter is to develop and illustrate a general framework for
database transformations in which all the processes mentioned above can be
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation-Based Database Engineering 3
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
4 Hainaut
the reference model on which the operators are built. According to its generality
and its abstraction level, this model defines the scope of the theory, that can
address a more or less wide spectrum of processes. For instance, building a
theory on the relational model will allow us to describe, and to reason on, the
transformation of relational schemas into other relational schemas. The 1NF4
normalization theory is a popular example. Another example would be a
transformational theory based on the ORM (Object-Role model) that would
provide techniques for transforming (normalizing, optimizing) conceptual schemas
into other schemas of the same abstraction level (de Troyer, 1993; Proper, 1998).
The hard challenge is to choose a unique model that can address not only intra-
model transformations, but inter-model operators, such as ORM-to-relational
conversion.
To identify such models, let us consider a set of models Γ that includes, among
others, all the operational formalisms that are of interest for a community of
practitioners, whatever the underlying paradigm, the age and the abstraction
level of these formalisms. For instance, in a large company whose information
system relies on many databases (be they based on legacy or modern technolo-
gies) that have been designed and maintained by several teams, this set is likely
to include several variants of the ERA model, UML class diagrams, several
relational models (e.g., Oracle 5 to 10 and DB2 UDB), the object-relational
model, the IDMS and IMS models and of course the standard file structure model
on which many legacy applications have been developed.
Let us also consider the transitive inclusion relation “≤” such that M ≤ M’, where
M≠M’ and M,M’ ∈ Γ, means that all the constructs of M also appear in M’.5 For
instance, if M denotes the standard relational model and M’ the object-
relational model, then M ≤ M’ holds, since each schema expressed in M is a valid
schema according to model M’.
Now, we consider a model M* in Γ, such that:
∀M∈Γ, M≠M0: M0 ≤ M.
(ΓxΓ, ≤) forms a lattice of models, in which M0 denotes the bottom node and M*
the upper node.
M0, admittedly non-empty, is made up of a very small set of elementary abstract
constructs, typically nodes, edges and labels. An ERA schema S comprising an
entity type E with two attributes A1 and A2 would be represented in M0 by the
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation-Based Database Engineering 5
nodes n1, n2, n3 which are given the labels “E”, “A1” and “A2”, and by the edges
(n1,n2) and (n1,n3).
On the contrary, M* will include a greater variety of constructs, each of them
being a natural abstraction of one or several constructs of lower-level models.
This model should include, among others, the concepts of object type, attribute
and inter-object association, so that the contents of schema S will be represented
in M* by an object type with name “E” comprising two attributes with names “A1”
and “A2”.
Due to their high level of abstraction, models M0 and M* are good candidates to
develop a transformational theory relying on a single model. Considering the
context-dependent definition of Γ, M0 and M*, we cannot assert that these
concepts are unique. Therefore, there is no guarantee that a universal theory can
be built.
Approaches based on M0 generally define data structures as semantics-free
binary graphs on which a small set of rewriting operators are defined. The
representation of an operational model M such as ERA, relational or XML, in M0
requires some additional features such as typed nodes (object, attribute, associa-
tion and roles for instance) and edges, as well as ad hoc assembly rules that
define patterns. A transformation specific to M is also defined by a pattern, a sort
of macro-transformation, defined by a chain of M0 transformations. McBrien
(1998) is a typical example of such theories. We can call this approach
constructive or bottom-up, since we build operational models and transforma-
tions by assembling elementary building blocks.
The approaches based on M* naturally require a larger set of rewriting rules. An
operational model M is defined by specializing M*, that is, by selecting a subset
of concepts and by defining restrictive assembly rules. For instance, a relational
schema can be defined as a set of object types (tables), a set of attributes
(column), each associated with an object type (at least one attribute per object
type) and a set of uniqueness (keys) and inclusion (foreign keys) constraints.
This model does not include the concept of association. The transformations of
M are those of M* which remain meaningful. This approach can be qualified by
specialization or top-down, since an operational model and its transformational
operators are defined by specializing (i.e., selecting, renaming, restricting) M*
constructs and operators. DB-MAIN (Hainaut, 1996b) is an example of this
approach. In the next section, we describe the main aspects of its model, named GER.6
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
6 Hainaut
It is important to note that these levels are not part of the model. The schema of
Figure 1 illustrates some major concepts borrowed to these three levels. Such a
hybrid schema could appear in reverse engineering.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation-Based Database Engineering 7
PERSON
Name
Address
T
PRODUCT ORDER
EMPLOYEE
PRO_CODE ORD-ID CUSTOMER
Employe Nbr
CATEGORY DATE_RECEIVED Customer ID
Date Hired
DESCRIPTION ORIGIN id: Customer ID
UNIT_PRICE DETAIL[1-5] array id: Employe Nbr
id: PRO_CODE REFERENCE 0-N
acc QTY-ORD
acc: CATEGORY id: ORD-ID of
ref: ORIGIN
ref: DETAIL[*].REFERENCE 1-1
ACCOUNT
PRODUCT.DAT
Account NBR
PRODUCT Amount
id: of.CUSTOMER
Account NBR
One remarkable characteristic of wide spectrum models is that all the transfor-
mations, including inter-model ones, appear as intra-model operators. This has
highly interesting consequences. First, a transformation Σ designed for manipu-
lating schemas in an operational model M1 can be used in a model M2 as well,
provided that M2 includes the constructs on which Σ operates. For instance, most
transformations dedicated to COBOL data structure reverse engineering appear
to be valid for relational schemas as well. This strongly reduces the number of
operators. Secondly, any new model can profit from the techniques and
reasoning that have been developed for current models. For instance, designing
methods for translating conceptual schemas into object-relational structures or
into XML schemas (Estiévenart, 2003), or reverse engineering OO-databases
(Hainaut, 1997) have proved particularly easy since these new methods can be,
to a large extent, derived from standard ones.
The GER model has been given a formal semantics in terms of an extended NF2
model (Hainaut, 1989, 1996). This semantics will allow us to analyze the
properties of transformations, and particularly to precisely describe how, and
under which conditions, they propagate and preserve the information contents of
schemas.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
8 Hainaut
Let us note that we have discarded the UML class model as a candidate for M*
due to its intrinsic weaknesses, including its lack of agreed-upon semantics, its
non-regularity and the absence of essential concepts. On the contrary, a
carefully defined subset of the UML model could be be a realistic basis for
constructive approaches.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation-Based Database Engineering 9
Transformation: Definition
The definitions that will be stated here are model-independent. In particular, they
are valid for the GER model, so that the examples will be given in the latter. Let
us denote by M the model in which the source and target schemas are expressed
by S the schema on which the transformation is to be applied and by S’ the schema
resulting from this application. Let us also consider sch(M), a function that returns
the set of all the valid schemas that can be expressed in model M, and inst(S), a
function that returns the set of all the instances that comply with schema S.
A transformation Σ consists of two mappings T and t (Figure 4):
• T is the structural mapping from sch(M) onto itself, that replaces source
construct C in schema S with construct C’. C’ is the target of C through T,
and is noted C’ = T(C). In fact, C and C’ are classes of constructs that can
be defined by structural predicates. T is therefore defined by the minimal
precondition P that any construct C must satisfy in order to be transformed
by T, and the maximal postcondition Q that T(C) satisfies. T specifies the
rewriting rule of Σ.
• t is the instance mapping from inst(S) onto inst(S’), that states how to
produce the T(C) instance that corresponds to any instance of C. If c is an
instance of C, then c’ = t(c) is the corresponding instance of T(C). t can be
specified through any algebraic, logical or procedural expression.
T
C C' = T(C)
inst_of inst_of
t
c c' = t(c)
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
10 Hainaut
Reversibility of a Transformation
Similarly, in the pure software engineering domain, Balzer (1981) introduces the
concept of correctness-preserving transformation aimed at compilable and
efficient program production.
We have discussed the concept of reversibility in a context in which some kind
of instance equivalence is preserved. However, the notion of inverse transfor-
mation is more general. Any transformation, be it semantics-preserving or not,
can be given an inverse. For instance, del-ET(et_name), which removes entity
type with name et_name from its schema, clearly is not a semantics-preserving
operation, since its mapping t has no inverse. However, it has an inverse
transformation, namely create-ET(CUSTOMER). Since only the T part is defined,
this partial inverse is called a structural inverse transformation.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation-Based Database Engineering 11
Thanks to the formal semantics of the GER, a proof system has been developed
to evaluate the reversibility of a transformation. More precisely, this system
relies on a limited set of NF2 transformational operators whose reversibility has
been proven, and that can generate a large number of GER transformations.
Basically, the system includes five families of transformations, that can be
combined to form more complex operators:
• denotation, through which a new object set is defined by a derivation rule
based on existing structures,
• project-join which is a variant of the decomposition theorem,
• composition which replaces two relations by one of them and their
composition,
• nest-unnest, the typical 1NF ↔ N1NF operators, and
• container, that states the equivalence between non-set containers (e.g.,
bags, lists, arrays) and sets .
Thanks to a complete set of mapping rules between the GER model and the NF2
model in which these basic transformations have been built, the latter can be
applied to operational schemas. Figure 5 shows how we have defined a
decomposition operator for normalizing relationship types from the basic project-
join transformation. It is based on a three-step process:
1. Source schema (Figure 5, top-left) is expressed in the NF2 formalism
(bottom-left):
{entities:A,B,C; R(A,B,C); A → B}
3. NF2 schema is expressed in the GER, leading to the target schema (Figure
5, top-right).
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
12 Hainaut
R2 0-N C
C
R: A → B
Since the the GER ↔ NF2 mappings are symmetrically reversible and the
project-join is an SR-transformation, the ERA transformation is symmetrically
reversible as well. It can be defined as follows:
T1 = T11οT12 οT13
T1' = T11'οT12'οT13'
We note the important constraint R1[A]=R2[A] that gives the project-join transfor-
mation the SR property, while Fagin’s theorem merely defines a reversible
operator. We observe how this constraint translates into a coexistence constraint
in the GER model that states that if an A entity is connected to a B entity, it must
be connected to at least one C entity as well, and conversely.
The reader interested in a more detailed description of this proof system is
refered to Hainaut (1996).
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation-Based Database Engineering 13
Σ2
A1 0-N r 0-5 B ⇒ A1 A1[0-5]
r into reference attribute B.A1
(T2) and conversely (T2').
id: A1
⇐ id: A1 ref: A1[*]
T2'
Σ3
A1 ⇒ A1 0-5 ra2 1-N A2
entity type EA2 (T3) and
conversely (T3').
⇐
A2[0-5] A3 id: A2
A3
T3'
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
14 Hainaut
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation-Based Database Engineering 15
Higher-Level Transformations
The transformations described in the section, Schema Transformation Basics,
are intrinsically atomic: one elementary operator is applied to one object instance,
and (Σ4 excluded) none can be defined by a combination of others (orthogonal-
ity). This section develops three ways through which more powerful transfor-
mations can be developed.
Compound Transformations
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
16 Hainaut
dom(Year) = [2000..2004]
attribute Year, yielding entity type YEAR, with attribute Year. Finally, the entity
type EXPENSE is transformed into relationship type expense (Σ1-inverse).
Predicate-Driven Transformations
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation-Based Database Engineering 17
of roles of the relationship type falls in the range [<n1>..<n2>]. The symbol “N”
stands for infinity.
Model-Driven Transformations
(p ⇒ P) ∧ (PM’ ⇒ Q)
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
18 Hainaut
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation-Based Database Engineering 19
Users requirements
Database design
Logical design Logical schema
Users views
Operational code
Ignoring the view design process for simplification, database design can be
modeled by (the structural part of) transformation DB-design:
code = DB-design(UR)
where code denotes the operational code and UR the users requirements.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
20 Hainaut
CS = C-design(UR)
LS = L-design(CS)
PS = P-design(LS)
code = Coding(PS)
Conceptual Design
This process includes, among others, two major sub-processes, namely Basic
Analysis, through which informal or semi-formal information sources are
analyzed and their semantic contents are translated into conceptual structures,
and (Conceptual) Normalization, through which these raw structures are given
such additional qualities as readability, normality, minimality, extensibility, com-
pliance with representation standards, etc. (Batini, 1992; Blaha, 1998). This
second process is more formal than the former, and is a good candidate for
transformational modeling. The plan of Figure 15, though simplistic, can improve
the quality of many raw conceptual schemas.
Logical Design
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation-Based Database Engineering 21
namely ideal and empirical. The ideal design produces a logical schema that
meets two requirements only: it complies with the target logical model M and it
is semantically equivalent to the conceptual schema. According to the transfor-
mational paradigm, the logical design process is a M-driven transformation
comprising SR-operators only. The plan of Figure 13 illustrates this principles for
relational databases. Similar plans have been designed for CODASYL DBTG,
Object-relational and XML (Estievenart, 2003) databases, among others. Em-
pirical design is closer to the semi-formal way developers actually work, relying
on experience and intuition, rather than on standardized procedures. Other
requirements such as space and time optimization often are implicitly taken into
account, making formal modeling more difficult, if not impossible. Though no
comprehensive model-driven transformations can describe such approaches,
essential fragments of empirical design based on systematic and reproducible
rules can be described by compound or predicate-driven transformations.
Coding
Quite often overlooked, this process can be less straightforward and more
complex than generally described in the literature or carried out by CASE tools.
Indeed, any DMS can cope with a limited range of structures and integrity
constraints for which its DDL provides an explicit syntax. For instance, plain
SQL2 DBMSs know about constraints such as machine value domains, unique
keys, foreign keys and mandatory columns only. If such constructs appear in a
physical schema, they can be explicitly declared in the SQL2 script. On the other
hand, all the other constraints must be either ignored or expressed in any other
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
22 Hainaut
way, at best through check predicates or triggers, but more frequently through
procedural sections scattered throughout the application programs. Distinguish-
ing the DDL code from the external code, the operational code can be split into
two distinct parts:
Despite this variety of translation means, the COD process typically is a two-
model transformation (in our framework, GER to DMS-DDL) that can be
automated.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation-Based Database Engineering 23
Conceptualization
Cleaning
Extraction
Physical schema
Parsing Refinement
codeddl codeext
CS = DBRE(code)
where code denotes operational code and CS the conceptual schema.
LS = Extraction(code)
CS = Conceptualization(LS)
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
24 Hainaut
PS = Parsing(codeddl)
PS = Refinement(PS,codeext)
LS = Cleaning(PS)
Through this analysis, we have shown that, if database design and reverse
engineering can be modeled by transformations, then database reverse engineer-
ing is, to a large extent, the inverse of database design. This induces important
consequences. In particular:
• database reverse engineering requires a deep understanding of empirical
database design methodologies, and
• the Conceptualization process can be analyzed and specified by identify-
ing the strategies and the transformations that are most popular in empirical
logical design, and by considering their inverse.
Among the operators that have been described, the transformations Σ1-inverse,
Σ2-inverse, Σ3-direct and Σ3-inverse, Σ4-direct and Σ4-inverse, Σ5-inverse, Σ6-
direct, Σ7-direct, form a sound (but unfortunately not complete) basis for
conceptualizing logical schemas. This process can be supported by predicate-
driven and model-driven transformations, but, even more than for forward
engineering, reverse engineering heavily relies on human expertise. An in-depth
description of a wide-scope reverse engineering methodology can be found in
Hainaut (2002).
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation-Based Database Engineering 25
References
Balzer, R. (1981). Transformational implementation: An example. IEEE TSE,
SE-7(1).
Batini, C., Ceri, S. & Navathe, S. B. (1992). Conceptual Database Design.
Benjamin/Cummings.
Blaha, M. & Premerlani, W. (1998). Object-oriented Modeling and Design
for Database Applications. Prentice Hall.
Bolois, G. & Robillard, P. (1994). Transformations in reengineering techniques.
Proceedings of the Fourth Reengineering Forum ‘Reengineering in
Practice’. Victoria, Canada.
Casanova, M. & Amaral De Sa, A. (1984). Mapping uninterpreted schemes into
entity-relationship diagrams: Two applications to conceptual schema de-
sign. IBM Journal of Research & Development, 28(1).
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
26 Hainaut
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation-Based Database Engineering 27
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
28 Hainaut
Endnotes
1
Computer-aided Software Engineering.
2
Database Management System.
3
Entity-relationship-attribute model. The UML class model is a variant of
the ERA model.
4
1NF, or First Normal Form designates the class of relations defined on
simple domains (which are neither relations nor powersets). By contrast,
a non 1NF relation is said to be in N1NF, or NF2 for short.
5
Defining more formally what the assertion this construct of M also
belongs to M’ exactly means would require a development which would be
useless in this paper. Therefore, we will rely on an intuitive meaning of this
relation only. For example, the concepts of field and of column will be
considered the same though some slight differences exist between them.
The same can be said for entity type (ERA), object class (UML), segment
type (IMS), record type (standard files, CODASYL) and table (SQL2).
6
For Generic Entity-Relationship model.
7
For Data Management System, a term that encompasses file managers and
DBMSs.
8
The so-called decomposition theorem of the 1NF relational theory (Fagin,
1977) is an example of reversible transformation. Grossly sketched, it
states that the schema {R(A,B,C); A→→BC} can be losslessly replaced by
{R1(A,B); R2(A,C)}, since, for any instance r of R, the relation r = r[A,B]*r[A,C]
holds. However, there is no reason for any arbitrary instances r1 of R1 and
r2 of R2 to enjoy the inverse property r1 = (r1*r2)[A,B]. Therefore, this
transformation is not symmetrically reversible. This example and some of
its variants are developed in (Hainaut, 1996).
9
Data Definition Language: that part of a database language intended to
declare the data structures of the database.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Rule-Based Transformation of Graphs and the Product Type 29
Chapter II
Rule-Based
Transformation of
Graphs and the
Product Type
Renate Klempien-Hinrichs, University of Bremen, Germany
Abstract
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
30 Klempien-Hinrichs, Kreowski, and Kuske
Introduction
The area of graph transformation brings together the concepts of rules and
graphs with various methods from the theory of formal languages and from the
theory of concurrency, and with a spectrum of applications (Figure 1).
Graphs are important structures in computer science and beyond to represent
complex system states, networks, and all kinds of diagrams. The application of
rules provides graphs with a dynamic dimension yielding a rich methodology of
rule-based graph transformation. The three volumes of the Handbook of
Graph Grammars and Computing by Graph Transformation give a good
overview of the state of the art in theory and practice of graph transformation
(Rozenberg, 1997; Ehrig, Engels, Kreowski & Rozenberg, 1999; Ehrig, Kreowski,
Montanari & Rozenberg, 1999).
Although one encounters quite a large number of different approaches to graph
transformation in the literature, nearly all of them contain five basic features.
• Graphs to represent complex relations among items in an intuitive but
mathematically well-understood way.
• Rules to describe possible changes and updates of graphs in a concise way.
• Rule applications to perform the possible changes and updates on graphs
explicitly as they are embodied in the rules.
• Graph class expressions to specify special classes of graphs to be used
as initial as well as terminal graphs.
• Control conditions to regulate the applications of rules such that the
inherent non-determinism of rule application can be cut down.
rules graphs
theory application
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Rule-Based Transformation of Graphs and the Product Type 31
Graph Transformation
In this section we introduce main concepts of graph transformation like graphs,
rules, and transformation units. The concepts are illustrated with a simple
example from the area of graph theory. In the literature one can find many more
applications of graph transformation which underline the usefulness from a
practical point of view. These are, for example, applications from the area of
functional languages (Sleep, Plasmeijer & van Eekelen, 1993), visual languages
(Bardohl, Minas, Schürr & Taentzer, 1999), software engineering (Nagl, 1996),
and UML (e.g., Bottoni, Koch, Parisi-Presicce & Taentzer, 2000; Engels,
Hausmann, Heckel & Sauer, 2000; Fischer, Niere, Torunski & Zündorf, 2000;
Petriu & Sun, 2000; Engels, Heckel & Küster, 2001; Kuske, 2001; Kuske,
Gogolla, Kollmann & Kreowski, 2002).
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
32 Klempien-Hinrichs, Kreowski, and Kuske
Graphs
First of all, there is a class of graphs G, that may be directed or undirected, typed
or untyped, labelled or unlabeled, simple or multiple. Examples for graph classes
are labeled directed graphs, hypergraphs, trees, forests, finite automata, Petri
nets, etc. The choice of graphs depends on the kind of applications one has in
mind and is a matter of taste.
In this chapter, we consider directed, edge-labeled graphs with individual,
multiple edges. A graph is a construct G = (V, E, s, t, l) where V is a set of
vertices, E is a set of edges, s, t: E → V are two mappings assigning each edge
e ∈ E a source s(e) and a target t(e), and l: E → C is a mapping labeling each
edge in a given label alphabet C. A graph may be represented in a graphical way
with circles as nodes and arrows as edges that connect source and target, with
the arrowhead pointing to the target. The labels are placed next to the arrows.
In the case of a loop, i.e., an edge with the same node as source and target, we
may draw a flag that is posted on its node with the label inside the box. To cover
unlabeled graphs as a special case, we assume a particular label * that is invisible
in the drawings. This means a graph G is unlabeled if l(e) = * for all e ∈ E. For
instance the graph in Figure 2 consists of six nodes, one of them with a begin-
flag, another with an end-flag, and a third one with an unlabeled flag. Moreover,
it consists of seven directed edges where some of them are labeled with p. The
p-edges form a simple path (i.e., a path without cycles) from the begin-flagged
node to the end-flagged node. If one takes the subgraph induced by the edges
of the simple path and the begin- and end-flag and removes all occurrences of
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Rule-Based Transformation of Graphs and the Product Type 33
Figure 2.
Figure 3.
the label p, one gets the string graph (i.e., a graph that is a simple path from a
begin-flagged node to an end-flagged node) that is shown in Figure 3.
String graphs can be used to represent natural numbers. The string graph in
Figure 3 represents the number four because it has four unlabeled edges between
its begin-flagged and its end-flagged node. Whenever a string graph represents
a natural number k in this way, we say that it is the k-string graph.
To be able to transform graphs, rules are applied to graphs yielding graphs. Given
some class R of graph transformation rules, each rule r ∈ R defines a binary
relation ⇒r ⊆ G×G on graphs. If G ⇒r H, one says that G directly derives H by
applying r.
There are many possibilities to choose rules and their applications. Rule classes
may vary from the more restrictive ones, like edge replacement (Drewes,
Kreowski & Habel, 1997) or node replacement (Engelfriet & Rozenberg, 1997),
to the more general ones, like double-pushout rules (Corradini et al., 1997),
single-pushout rules (Ehrig et al., 1997), or PROGRES rules (Schürr, 1997).
In this chapter, we use rules of the form r = (L →K R) where L and R are graphs
(the left- and right-hand side of r, respectively) and K is a set of nodes shared
by L and R. In a graphical representation of r, L and R are drawn as usual, with
numbers uniquely identifying the nodes in K. Its application means to replace an
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
34 Klempien-Hinrichs, Kreowski, and Kuske
For example (Figure 4), the rule move has as its left-hand side a graph consisting
of an end-flagged node 1, a node 2 with unlabeled flag, and an unlabeled edge
from node 1 to node 2. The right-hand side consists of the same two nodes where
node 1 has no flag and node 2 has an end-flag. Moreover, there is a p-labeled
edge from node 1 to node 2. The common part of the rule move consists of the
nodes 1 and 2.
The application of move labels an unlabeled edge with p if the edge connects an
end-flagged node and a node with an unlabeled flag, moves the end-flag from
the source of the edge to its target, and removes the unlabeled flag. For example,
the application of move to the graph above yields the graph shown in Figure 5.
Figure 4.
Figure 5.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Rule-Based Transformation of Graphs and the Product Type 35
Note that this rule cannot be applied to the former graph in any other way; for
instance, its left-hand side requires the presence of an unlabeled flag.
The aim of graph class expressions is to restrict the class of graphs to which
certain rules may be applied or to filter out a subclass of all the graphs that can
be derived by a set of rules. Typically, a graph class expression may be some
logic formula describing a graph property like connectivity, or acyclicity, or the
occurrence or absence of certain labels. In this sense, every graph class
expression e specifies a set SEM(e) of graphs in G. For instance, all refers to all
directed, edge-labeled graphs, whereas empty and bool designate a class of
exactly one graph each (the empty graph EMPTY for empty, and the graph
TRUE consisting of one true-flagged node for bool). Moreover, graph specifies
all unlabeled graphs each node of which carries a unique flag (which is unlabeled,
too). Also, a particular form of the graphs may be requested; e.g., the expression
nat defines all k-string graphs.
Control Conditions
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
36 Klempien-Hinrichs, Kreowski, and Kuske
case studies where transformation units are employed to model the semantics of
functional programming languages (Andries et al., 1999), UML state machines
(Kuske, 2001), and logistic processes (Klempien-Hinrichs, Knirsch & Kuske,
2002).
Transformation Units
In general, a graph transformation system may consist of a huge set of rules that
by its size alone is difficult to manage. Transformation units provide a means to
structure the transformation process. The main structuring principle of transfor-
mation units relies on the import of other transformation units or — on the
semantic level — on binary relations on graphs. The input and the output of a
transformation unit each consists of a class of graphs that is specified by a graph
class expression. The input graphs are called initial graphs and the output graphs
are called terminal graphs. A transformation unit transforms initial graphs to
terminal graphs by applying graph transformation rules and imported transforma-
tion units in a successive and sequential way. Since rule application is non-
deterministic in general, a transformation unit contains a control condition that
may regulate the graph transformation process.
A graph transformation unit is a system tu = (I, U, R, C, T) where I and T are
graph class expressions, U is a (possibly empty) set of imported graph transfor-
mation units, R is a set of rules, and C is a control condition.
To simplify technicalities, we assume that the import structure is acyclic (for a
study of cyclic imports see Kreowski, Kuske and Schürr (1997)). Initially, one
builds units of level 0 with empty import. Then units of level 1 are those that
import only units of level 0, and units of level n+1 import only units of level 0 to
level n, but at least one from level n.
In graphical representations of transformation units we omit the import compo-
nent if it is empty, the initial or terminal component if it is set to all, and the control
condition if it is equal to true.
In the following, we present some examples of transformation units. We start
with very simple specifications of natural numbers and truth values because they
are auxiliary data types to be used later to model the more interesting examples
of simple paths, long simple paths, and Hamiltonian paths.
The first transformation unit nat0 (Figure 6, left side) constructs all string graphs
that represent natural numbers by starting from its initial graph, which represents
0, and transforming the n-string graph into the n+1-string graph by applying the
rule succ.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Rule-Based Transformation of Graphs and the Product Type 37
Figure 6.
The second transformation unit nat1 (Figure 6, right side) is a variant of nat0,
but now with all n-string graphs as initial graphs. Consequently, it describes
arbitrary additions to arbitrary n-string graphs by sequentially increasing the
represented numbers by 1.
The third transformation unit nat2 (Figure 7) also transforms string graphs into
string graphs. It has two rules pred and is-zero. The application of the rule pred
to the n-string graph (with n ≥ 1 since otherwise the rule cannot be applied)
converts it into the n–1-string graph. The second rule is-zero can be applied only
to the 0-string graph but does not transform it, which means that this rule can be
used as a test for 0. Moreover, the transformation unit nat2 imports nat1 so that
arbitrary additions can be performed, too. The rules of nat2 and the imported unit
nat1 can be applied in arbitrary order and arbitrarily often. Hence nat2 converts
n-string graphs into m-string graphs for natural numbers m, n. Therefore nat2
can be considered as a data type representing natural numbers with a simple set
of operations.
Figure 7.
nat2
initial: nat
uses: nat1
rules:
end end
pred:
1 1
is-zero:
1 1
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
38 Klempien-Hinrichs, Kreowski, and Kuske
Figure 8.
bool0
initial:empty
rules:
true
set-true: empty
true
terminal:
The forth transformation unit, bool0 = (empty, ∅, set-true, true, bool), is shown
in Figure 8. It has a single initial graph, the empty graph EMPTY. It does not
import other transformation units and it has one rule set-true which turns EMPTY
to the graph TRUE. The control condition allows all transformations, meaning
that TRUE may be added arbitrarily often to EMPTY. However, the terminal
graph class expression specifies the set consisting of TRUE, which ensures that
the rule set-true is applied exactly once to the initial graph.
One can consider bool0 as a unit that describes the type Boolean in its most
simple form. At first sight, this may look a bit strange. But it is quite useful if one
wants to specify predicates on graphs by nondeterministic graph transformation:
If one succeeds to transform an input graph into the graph TRUE, the predicate
holds, otherwise it fails. In other words, if the predicate does not hold for the input
graph, none of its transformations yield TRUE.
The transformation unit simple-path given in Figure 9 constitutes an example of
another kind. As an initial graph, it admits all unlabeled graphs with exactly one
flag on every node. It chooses an arbitrary simple path in an initial graph by
labeling the edges of the path with p and adding a begin-flag and an end-flag to
the beginning and the end of the path, respectively. This is done with the help of
two rules start and move. The rule start turns an unlabeled flag of an arbitrary
node into two flags, respectively labeled with begin and end, and the rule move
is the same as above, i.e., it labels with p an edge from an end-flagged node to
a node with an unlabeled flag, moves the end-flag to the other node, and removes
the unlabeled flag. The control condition is a regular expression which is satisfied
if first the rule start is applied, followed by move applied arbitrarily often. The
terminal graph class expression admits all graphs, which is why it is not explicitly
shown.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Rule-Based Transformation of Graphs and the Product Type 39
Figure 9.
simple-path
initial: graph
rules:
begin end
start:
1 1
end end
p
move:
1 2 1 2
cond:
cond:start ; move
start ; *move
*
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
40 Klempien-Hinrichs, Kreowski, and Kuske
Product Type
As the iterated application of rules transforms graphs into graphs yielding an
input-output relation, the natural type declaration of a graph transformation unit
tu = (I, U, R, C, T) is tu: I→T where moreover the initial and terminal graphs are
subtypes of the type of graphs that are transformed by the unit. But in many
applications one would like to have a typing that allows one to consider several
inputs and maybe even several outputs, or at least an output of a type different
from all inputs. For instance, a test whether a given graph has a simple path of
a certain length would be suitably declared by long-simple-path: graph × nat
→ bool (or something like this) asking for a graph and a non-negative integer as
inputs and a truth value as output.
Such an extra flexibility in the typing of graph transformations can be provided
by products of graph transformation units, together with some concepts based on
the products. In more detail, we introduce the following new features:
1. The product of graph transformation units providing tuples of graphs to be
processed and particularly tuples of initial and terminal graphs as well as
tuples of rules and calls of imported units, called action tuples, that can be
executed on graph tuples in parallel.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Rule-Based Transformation of Graphs and the Product Type 41
2. The embedding and projection of a product into resp. onto another product
that allow one to choose some components of a product as inputs or outputs
and to copy some components into others.
3. The semantics of a product of graph transformation units is the product of
the component semantics such that — intuitively seen — all components
run independently from each other. If one wants to impose some iteration
and interrelation between the components, one can use control conditions
for action tuples like for rules and imported units.
The product type generalizes the notion of pair grammars and triple grammars
as introduced by Pratt (1971) and Schürr (1994), respectively.
m
prod = tu1 × … × tum = ∏ tu
i =1
i
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
42 Klempien-Hinrichs, Kreowski, and Kuske
If not all initial graph class expressions of a product are meant as inputs, but some
of them are just of an auxiliary nature for intermediate computations or to be used
as outputs, one may choose the input types and embed their product into the
actual product that provides the graph tuples to be transformed. This is possible
whenever the auxiliary components have unique initial graphs and if every
chosen input type is a subtype of the corresponding initial graphs.
Let prod = tu 1 × … × tum be a product of transformation units and let X be a set
of graph class expressions that is associated with the product components by an
injective mapping ass: X→{1,…,m} such that SEM(x) ⊆ SEM(Iass(x)) for all x ∈ X.
Assume, moreover, for all j ∈ {1,…,m}\ ass(X) that either SEM(Ij) = {Gj} for
some graph Gj or SEM(x) ⊆ SEM(Ij) for some chosen x ∈ X, which will be
denoted by copy: x→j. Then we get an embedding of the product of the graphs
in SEM(x) for x ∈ X into the product of initial graphs of the product prod:
m
embed: ∏
x∈X
SEM ( x) → ∏ SEM ( I )
j =1
j
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Rule-Based Transformation of Graphs and the Product Type 43
components given by the copy relation. All remaining components of the product
of units are completed by the single initial graphs of these components.
As a simple example, let prod = simple-path × nat2 × bool0 and let X =
{graph,nat}. Consider the initial graph class expressions graph, nat and empty
of the transformation units simple-path, nat2, and bool0, respectively. Every
pair (G1,G 2) ∈ SEM(graph) × SEM(nat) can be embedded into SEM(graph) ×
SEM(nat) × SEM(empty) by choosing ass(graph) = 1 and ass(nat) = 2, i.e., we
get embed((G 1,G2))=(G 1,G 2,EMPTY) for every pair (G1,G2)∈SEM(graph) ×
SEM(nat).
Conversely, if one wants to get rid of some component graphs, the well-known
projection may be employed. The same mechanism can be used to multiply
components, which allows one, in particular, to copy a component graph into
another component.
Let Y be a set which is associated with the product prod by ass: Y → {1, …, m}.
Then one can consider the product of the terminal graphs in SEM(Tass(y)) for all
y ∈ Y as the semantics of the association ass, i.e.:
SEM(ass) = ∏ SEM (T
y∈Y
ass ( y )
).
m
proj: ∏ SEM (T ) → SEM (ass )
i
i =1
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
44 Klempien-Hinrichs, Kreowski, and Kuske
in i and the outj are graph class expressions. The intention is that trans relates
the product of inputs SEM(in 1) × … × SEM(in k) with the product of outputs
SEM(out1) × … × SEM(outl). This is obtained by using a product prod of graph
transformation units tu1,…,tu k+l such that SEM(ini) ⊆ SEM(Ii) for i = 1,…,k and
SEM(Tj) ⊆ SEM(outj) for j = k+1,…,k+l. The first k inclusions allow one to
embed the inputs into the initial graph tuples of the product prod if, for j =
k+1,…,k+l, we can choose some i with copy: i → j or SEM(Ij) = {Gj} for some
graph G j. The last l inclusions allow one to project the terminal graph tuples of
prod onto outputs. Therefore, the semantic relation of trans has the proper
form, but the output tuples are totally independent of the input tuples due to the
product semantics. To overcome this problem, we generalize the notion of
control conditions in such a way that it applies not only to the control of rule
applications and calls of imported units, but also to action tuples.
A control condition regulates the use of rules and imported units formally by
intersecting the interleaving semantics with the semantic relation given by the
control condition. This is easily generalized to action tuples if one replaces the
interleaving semantics by the step semantics of the product of graph transforma-
tion units.
In concrete cases, the control condition may refer to action tuples, just as it can
refer to rules and imported units. To make this more convenient, action tuples
may get identifiers.
As an example how the features based on the product may be used, we specify
the test long-simple-path that transforms graphs and non-negative integers as
inputs into truth values as output.
It is modeled on top of the product of the units simple-path, nat2 and bool0. The
typing is appropriate as graph and nat specify the initial graphs of simple-path
and nat2 respectively, and bool refers to the terminal graph of bool0.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Rule-Based Transformation of Graphs and the Product Type 45
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
46 Klempien-Hinrichs, Kreowski, and Kuske
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Rule-Based Transformation of Graphs and the Product Type 47
if each NP-problem can be reduced to the semantic relation of tu1. So the graph-
transformational variants of reductions may be used to investigate the class NP
in the same way as ordinary reductions are useful. But, as many interesting
problems in NP are graph problems, graph-transformational reductions may be
quite suitable.
As an illustrating example, we specify a reduction from the Hamiltonian-path
problem HP into the unit long-simple-path. We assume that HP is a predicate
with the typing HP: graph → bool that yields TRUE for an input graph G if and
only if G has a simple path that visits all nodes. An explicit specification by graph
transformation is not needed, but it would look similar to simple-path, only
making sure that all nodes are involved. Due to the typing, the reduction must
consist of a graph transformation unit of the type HP-2-lsp: graph → graph × nat
that copies the input graph as output graph and computes the number of nodes
minus one of the input graph as second output. For this purpose, the product of
units mark-all-nodes, graph and nat0 will be used. The unit graph = (graph,
∅, ∅, true, graph) takes unlabeled graphs as initial and terminal graphs and is
without import and rules such that its semantics is the identity relation on
SEM(graph), i.e., the input graph becomes the output graph. The unit mark-all-
nodes consists of unlabeled graphs as initial graphs, of one rule mark that
replaces the unlabeled flag by another flag (ok-labeled for example), and of
graphs without unlabeled flags as terminal graphs. This is an auxiliary unit the
meaning of which is that each derivation from an initial to a terminal graph has
the number of nodes as length. Hence, an action tuple that applies the rule mark
in the first component allows one to count the number of nodes.
Summarizing, we get the following specification:
Note that the length of all computations is bounded by the number of nodes of the
input graph and that each computation can be prolonged until all nodes are
marked. As one always marks the first node without increasing the initial
integer 0 and as all other nodes are marked while the integer is increased by one
in each step, one ends up with the number of nodes minus one as integer output.
And the runtime of HP-2-lsp is linear. If one composes the semantic relation of
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
48 Klempien-Hinrichs, Kreowski, and Kuske
Conclusions
In this chapter, we have given an introductory survey of graph transformation
with graphs, rules, rule application, graph class expressions, and control condi-
tions as basic features. As all of the concepts are handled in a generic,
parametric way, this covers nearly all of the graph transformation approaches
one encounters in the literature (see, e.g., Rozenberg, 1997, for an overview).
Readers who are interested in seeing the full spectrum of applications of graph
transformation and its relation to the theory of concurrency are referred to the
Handbook of Graph Grammars and Computing by Graph Transformation,
Vol. 2 and 3 (Ehrig, Engels, Kreowski & Rozenberg, 1999; Ehrig, Kreowski,
Montanari & Rozenberg, 1999).
In addition, we have proposed the new concept of product types that allows one
to transform a tuple of graphs by the synchronous transformation of the
components. This is quite helpful to specify transformations with a flexible
typing, i.e., with an arbitrary sequence of input graphs and an arbitrary sequence
of output graphs. Moreover, the types of the input and output graphs need not be
subtypes of the same type of graphs anymore. As a consequence, the product
type is particularly useful if one wants to transform graph transformations into
each other. Further investigation of the product type may concern the following
aspects:
As we used graph-transformational versions of the truth values and the natural
numbers in our illustrating examples, one may like to combine graph types with
arbitrary abstract data types.
In the presented definition, we consider the product of graph transformation
units. But one may like to import products in units and to use components that are
again products. Whether such a composite use of products works must be
investigated.
The transformation of graph transformation units is only tentatively sketched. It
must be worked out how it helps to study refinement and semantic equivalence
and other interesting relationships between graph transformation systems.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Rule-Based Transformation of Graphs and the Product Type 49
Acknowledgments
The research presented here was partially supported by the EC Research
Training Network SegraVis (Syntactic and Semantic Integration of Visual
Modelling Techniques) and the Collaborative Research Centre 637 (Autono-
mous Cooperating Logistic Processes: A Paradigm Shift and Its Limitations)
funded by the German Research Foundation (DFG).
References
Andries et al. (1999). Graph transformation for specification and programming.
Science of Computer Programming, 34(1), 1-54.
Bardohl, R., Minas, M., Schürr, A. & Taentzer, G. (1999). Application of Graph
Transformation to Visual Languages. In H. Ehrig, G. Engels, H.-J. Kreowski
& G. Rozenberg (Eds.), Handbook of Graph Grammars and Computing
by Graph Transformation, Vol. 2: Applications, Languages and Tools
(pp. 105-180). Singapore: World Scientific.
Bottoni, P., Koch, M., Parisi-Presicce, F., & Taentzer, G. (2000). Consistency
Checking and Visualization of OCL Constraints. In A. Evans, S. Kent & B.
Selic (Eds.), Proceedings of UML 2000 – The Unified Modeling
Language. Advancing the Standard, Lecture Notes in Computer Sci-
ence (Vol. 1939, pp. 294-308). Springer.
Corradini, A., Montanari, U., Rossi, F., Ehrig, H., Heckel, R. & Löwe, M. (1997).
Algebraic Approaches to Graph Transformation – Part I : Basic Concepts
and Double Pushout Approach. In G. Rozenberg (Ed.), Handbook of
Graph Grammars and Computing by Graph Transformation, Vol. 1:
Foundations (pp. 163-245). Singapore: World Scientific.
Drewes, F., Kreowski, H.-J. & Habel, A. (1997). Hyperedge Replacement
Graph Grammars. In G. Rozenberg (Ed.), Handbook of Graph Gram-
mars and Computing by Graph Transformation, Vol. 1: Foundations
(pp. 95-162). Singapore: World Scientific.
Ehrig et al. (1997). Algebraic Approaches to Graph Transformation – Part II:
Single Pushout Approach and Comparison with Double Pushout Approach.
In G. Rozenberg (Ed.), Handbook of Graph Grammars and Computing
by Graph Transformation, Vol. 1: Foundations (pp. 247-312). Singapore:
World Scientific.
Ehrig, H., Engels, G., Kreowski, H.-J. & Rozenberg, G. (Eds.) (1999). Hand-
book of Graph Grammars and Computing by Graph Transformation,
Vol. 2: Applications, Languages and Tools. Singapore: World Scientific.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
50 Klempien-Hinrichs, Kreowski, and Kuske
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Rule-Based Transformation of Graphs and the Product Type 51
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
52 Petit and Hacid
Chapter III
From Conceptual
Database Schemas to
Logical Database
Tuning
Jean-Marc Petit, Université Clermont-Ferrand 2, France
Abstract
This chapter revisits conceptual database design and focuses on the so-
called “logical database tuning”. We first recall fundamental differences
between constructor-oriented models (like extended Entity-Relationship
models) and attribute-oriented models (like the relational model). Then, we
introduce an integrated algorithm for translating ER-like conceptual
database schemas to relational database schemas. To consider the tuning
of such logical databases, we highlight two extreme cases: null-free
databases and efficient — though non redundant — databases. Finally, we
point out how SQL workloads could be used a posteriori as a help for the
designers and/or the database administrators to reach a compromise
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Database Schemas to Logical Database Tuning 53
between these extreme cases. While a lot of papers and books have been
devoted for many years to database design, we hope that this chapter will
clarify the understanding of database designers when implementing their
databases and database administrators when maintaining their databases.
Introduction
Semantic data modeling is the activity of specifying the structure and the
semantics of the data to be managed within an application. Since the 1970s,
semantic data modeling has been the subject of a large body of work in several
areas, including databases, information systems, software engineering and
knowledge representation. For database design, approaches to data modeling
advocate the use of abstract formalisms, such as the popular Entity Relationship
model (Chen, 1976), for describing data, mostly based on the notion of class or
entities.
Two main families of semantic data models are addressed in the literature:
• Attribute-oriented models: Data structure is captured through the notion
of attributes, i.e., objects and relationships between objects are modeled
thanks to attributes. Most of data semantics is expressed by means of
additional constraints. The relational data model or object-oriented data
models fall into this family.
• Constructor-oriented models: Data semantics is captured through vari-
ous constructors, including attributes but also a constructor for objects and
another one for relationships between objects. A key feature of such
models is that most of data semantics is already expressed by the
constructors. Entity-Relationship models (Chen, 1976) fall into this family.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
54 Petit and Hacid
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Database Schemas to Logical Database Tuning 55
logical data models. Next, the chapter discusses strategies for logical database
tuning. The chapter then concludes.
Background
This section is intended to bring to the fore the abstracting power of constructor-
oriented data models with respect to attribute-oriented data models as done
previously by others (e.g., Hull, 1987; Navathe, 1992).
Numerous investigations have been conducted in the data modeling area in order
to seek more appropriate formalisms for accurately representing real-world
applications. These investigations have resulted in a class of data models called
semantic data models (Peckham, 1988).
Semantic data models provide constructors for explicitly representing the
semantics of the application. The constructs implement information modeling
tools called data abstractions. These abstractions enable a complex world to be
examined in terms of a simplified world that incorporates the most significant
points. Most importantly, data abstractions provide the basis for a step-wise
design methodology for databases.
Each data model has its own structuring mechanism from which to build
application schemas. In semantic data models this mechanism is in terms of
semantic structures expressed in some textual language or in graph-theoretic
terms.
In such data models, most of the data semantics has to be captured by so-called
constructors, which are high-level abstraction mechanisms. Two main families
of semantic data models were addressed:
• With attribute-oriented data models, the main constructor is related to the
notion of attributes. Basically, an attribute associates a meaningful name
in the application context with a type, the permitted values. They are used
to describe the characteristics (or properties) of objects of the real-world,
but also to describe the relationships between objects.
• With constructor-oriented data models, many constructors are available
to produce a conceptual schema. Among the main constructors, we find
again the notion of attributes, but also an explicit constructor intended to
capture the relationships between objects.
The analysis of the above cited models makes it clear that, although they address
the same issues, attribute-oriented models seem to be less expressive, or at least
less simple, than the constructor-oriented models.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
56 Petit and Hacid
Simplified notations of relational databases used in this chapter are given in this
section (for more details, see, e.g., Mannila, 1994; Levene, 1999; Ramakrishnan,
2003).
An attribute A is associated with a domain, the set of its possible values. A
relation schema R is associated with a set of attributes, noted schema(R). A
relation r is defined over a relation schema R and corresponds to a set of tuples,
an element of the cartesian product of attribute domains of R. A database
schema R is a set of relation schemas. A relational database d (or database) over
a database schema R corresponds to a set of relations, a relation of d being in
one-to-one correspondence with a relation schema of R.
In this chapter, we are interested in two main types of constraints of the relational
model: functional dependencies and inclusion dependencies. The former
allows us to specify constraints within a relation schema and the latter allows us
to specify constraints between two relation schemas (though possible within one
relation schema). The set of functional dependencies associated with a database
schema is denoted by F, and the set of inclusion dependencies associated with
a database schema is denoted by I. In the sequel, a relational database schema
will be denoted by a triple (R, F, I).
A set of attributes X of a relation schema R is a (minimal) key of R with respect
to F if (1) F logically implies Xàschema(R) and (2) no subset of X satisfies this
property.
Let X and Y be two sets. X+Y (respectively, X-Y) stands for X∪Y (respectively,
X\Y) and we omit the brackets for sets reduced to singletons, i.e., the set {A}
is denoted by A.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Database Schemas to Logical Database Tuning 57
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
58 Petit and Hacid
which is neither weak nor inherited. We require the existence of a (minimal) key
for each strong entity-type (It should be noted that we use this requirement here
to simplify the rest of this chapter. In practice, if such a key does not exist, a
surrogate key can be defined at the logical level.)
Graphical notations used to express a schema are not detailed here — they are
just given in our running example.
Example: For the sake of comparison, we borrow our running example from a
widely disseminated textbook with only minor changes (Ramakrishnan, 2003).
The application deals with the management of projects by employees in a
company organized in several departments. Figure 1 shows the diagram for the
application example. More details can be found in Ramakrishnan (2003).
In this diagram, we have three strong entity-types (People, Projects,
Departments), one inherited entity-type (Employees), one weak entity-type
(Dependents) and two relationship-types (monitors, sponsors). Keys are
underlined in the diagram. The one-to-many relationship-type sponsors between
Projects and Departments has to be read as follows: A department sponsors
one project only, whereas a project could be sponsored by several departments.
Note that the relationship-type monitors associates an entity-type (Employees)
with a relationship-type (sponsors). Dependents is a weak entity-type: its
identification is defined by its local key pname and by the key of Employees,
which is itself derivable from the key associated with People, i.e., ssn.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Database Schemas to Logical Database Tuning 59
Note that our conditions differ from those given in Mannila (1994), where for
example a weak entity-type may be involved in different paths in the id-
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
60 Petit and Hacid
hierarchy. Some contributions (e.g., Mannila, 1994; Levene, 1999) consider also
inherited entity-types as weak entity-types.
A Classical Framework
From a formal point of view, the translation process involves four steps:
1. Verifying whether the conceptual schema is well formed.
2. Translating the conceptual schema into an oriented graph: doing so, some
classical misconceptions can be rectified and thus a formal treatment can
be carried out.
3. Dealing with attribute naming problems: indeed, assigning names to the
attributes in a relational database schema turns out to be a challenging task.
4. Translating the conceptual schema to a logical database schema: often, the
whole process is reduced to this step according to a set of rules.
The first step is certainly the most easy to perform. Note that some tools
supporting both syntactic and semantic checking of a given conceptual schema
exist (e.g., Franconi, 2000). The second step is described in the next section and
could be coupled with the first step: if the graph is acyclic, then the first condition
of a well-formed conceptual schema is reached.
The third step is trickier to achieve. In fact, the intuition is that at the conceptual
level, attribute names are local, i.e., their scope is delimited by the entity or the
relationship-type in which they are defined. This is not true anymore with the
relational model at least from a theoretical point of view: an attribute name has
a global scope within a relational database schema and its meaning is related to
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Database Schemas to Logical Database Tuning 61
the underlying assumption, such that for instance URSA for Universal Rela-
tion Schema Assumption (many others do exist). In this setting, this apparently
simple problem turns out to be technically difficult. In this chapter, we avoid such
problems by considering that the scope of an attribute is local, i.e., valid only
within its relation schema. More details on this issue can be found in Rosenthal
(1994) and Mannila (1994).
In the following, we will focus on steps two and four only since we believe they
are crucial for such a mapping.
Formally speaking, we obtain a multi-graph since more than one edge can be
defined between two nodes. Nevertheless, we do not take advantage of the
multi-graph structure. Therefore, we will speak about graph in the sequel.
Example: From our previous example, we derive the oriented graph depicted in
Figure 2. We can easily verify that this oriented graph is acyclic.
With such graphs at hand, we shall use in the next subsection the notion of
successor of a node x in V, denoted by successor(x), and defined as follows:
Integrated Algorithm
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
62 Petit and Hacid
Algorithm MapERtoR
Input: S, a well formed conceptual schema
N, the notation to be used, i.e., UML or Merise
Output: (R, F, I) a relational database schema
Begin
1. (R, F, I) = ({},{},{})
Build the directed graph G = (V,E) from S
3. while (V not empty) do
X := empty
5. for all x ∈ V such that successor(x)=x do
X := X + x
7. case x do :
x is an strong entity-type : (R, F, I) += Map_Entity (x)
9. x is an relationship-type : (R, F, I) += Map_Relationship (x)
x is an inherited entity-type : (R, F, I) += Map_Inherited(x)
11. x is an weak entity-type : (R, F, I) += Map_Weak (x)
end case
13. end for all
V := V – X
15. E = E - {(x→y) ∈ Ε | x ∈ X or y ∈ X}
end while
return (R, F, I)
End
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Database Schemas to Logical Database Tuning 63
The complexity of this algorithm is clearly polynomial in the size of the input. The
terminating condition is ensured whenever the graph G is acyclic, a condition
which is met if the conceptual schema S is well-formed.
Example: Continuing our example, three iterations are necessary: The first
iteration gives three candidates {People, Projects, Departments}. Then,
these nodes are removed and we get {Employees, sponsors} during the
second iteration. Finally, it retains only one node {Dependents, monitors}.
At a given iteration, there is no particular order to be taken into account to
consider elements. For instance, the three elements of the first iteration can be
treated in any order. Then the Algorithm 1 calls three main procedures, one for
each type of main constructors: Map_Entity, Map_Relationship,
Map_Inherited and Map_Derived. They are described below.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
64 Petit and Hacid
Algorithm Map_Relationship
Input: A, a relationship type between O1, …, On
Output: (R, F, I) a relational database schema
Begin
1. Let R1, …, Rn be the relation schemas created from O1, …, On respectively.
Let Ki be a key of Ri, i ∈ {1, n} and K = K1 + … + Kn
3. Create a new relation schema R from A
R = R + {R}
5. schema(R) = attributes of A + K
F = F + {K → schema(R) };
7. for each Oi participating in relationship-type A do
I = I + {R[Ki] ⊆ Ri[Ki] }
9. if cardinality between Oi and A is equal to 1 then
if US notation then
11. F = F + {K- Ki → Ki }
else
13. F = F + { Ki → K- Ki }
end if
15. end if
end for
17. return (R, F, I)
End
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Database Schemas to Logical Database Tuning 65
determine the ith component (line 11). Otherwise, the ith component determines
all other components (line 13).
Note that in case of a reflexive relationship-type, there exists O i and O j referring
to the same entity-type (line 1). In that case, attribute(s) of one of the
corresponding key (Ki or Kj) must be renamed (line 2).
Example: First recall that the conceptual model used in this chapter complies
with U.S. notation. Now, consider the relationship-type sponsors: A new
relation schema sponsors is added to R with schema(sponsors)=
{did,pid,since}. Attributes did and pid are new attributes. The func-
tional dependency sponsors: did,pid→since is added to F (line 6) and
two leaving inclusion dependencies are created from sponsors: one to
Departements (sponsors[did] ⊆ Departments[did]) and the other
one to Projects (sponsors[pid] ⊆ Projects[pid]) (line 8). The
cardinality constraint equals to 1 between Projects and sponsors yields to the
creation of a functional dependency sponsors: did→pid (line 11). From the
two functional dependencies defined over sponsors, did turns out to be a key
and sponsors is trivially in BCNF.
Algorithm Map_Inherited
Input: E, an inherited object-type derived from O1, …, On i.e. E is-a O1, …, E is-a On
Output: (R, F, I) a relational database schema
Begin
1. Let R1, …, Rn be the relation schemas created from O1, …, On respectively.
Let Ki be a key of Ri, i ∈ {1, n} and K = K1 + … + Kn
3. Create a new relation schema R from E
R = R + {R}
5. schema(R) = attributes of E + K
for each key K defined over E do
F = F + {R: K → schema(R)}
end for
for each E is-a Oi do
7. I = I + {R[Ki] ⊆ Ri[Ki] }
F = F + {Ki → schema(R) }
9. end for
return (R, F, I)
End
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
66 Petit and Hacid
Note that if we have E is-a A, E is-a B, E is-a C, A is-a D and B is-a D, then E
has two different keys, derived from D and from C.
Algorithm Map_Derived
Input: W, a weak entity-type derived from O
Output: (R, F, I) a relational database schema
Begin
1. Let S be the relation schema created from O
Let K be a key of S
3. Create a new relation schema R from W
R = R + {R}
5. schema(R) = attributes of W + K
Let LK be the local key of W
7. F = F + {LK + K → schema(R)}
I = I + {R[Ki] ⊆ S[Ki] }
9. return (R, F, I)
End
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Database Schemas to Logical Database Tuning 67
Some formal properties can be stated for such kinds of mappings (Markowitz,
1992; Rosenthal, 1994; Mannila, 1994). Without going into many details, the
database schema obtained after applying the algorithm MapERtoR has the
following properties:
• Each relation schema is in BCNF.
• I is made up of key-based inclusion dependencies (their right-hand sides
are keys) and the oriented graph of I (a node corresponds to a relation
schema and an edge between R and S corresponds to an element
R[X] ⊆ S[Y] of I) is acyclic.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
68 Petit and Hacid
Then, we will show how SQL workloads can be used to tune the design of
database schemas with respect to SQL accesses performed over the database
during some periods of time.
Occurrence of null values is quite common in real life databases and is known
to be one of the major difficulties for database programmers when they have to
write SQL queries. In fact, problems raised by null values depend on which kind
of attributes they occur:
• On duplicated attributes, i.e., attributes which enable attribute-oriented
models to simulate constructor-oriented models: null values can be a
nightmare to compose queries involving joins, specific RDBMS functions,
etc.
• On non-duplicated attributes: null values are not very challenging for
designing SQL queries. Most of the time, null values were missing at the
insertion time of a tuple, but such values are not used anymore to navigate
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Database Schemas to Logical Database Tuning 69
through the database schema. These attributes are descriptive only: they
are defined within a relation schema and convey part of the information or
semantics of this relation schema.
To get a “null free” database on a duplicate attribute, the logical database schema
obtained from the conceptual schema with the MapERtoR algorithm has to be
implemented as it is, i.e., no transformation (or degradation) has to be performed.
However, the price to pay is that the length of join paths is maximized.
Indeed each functional dependency turns out to be a key or a super-key and each
inclusion dependency turns out to be a foreign key, both of them being enforced
to be not null by the RDBMS.
To be compliant with the first option, a very well known transformation can be
done: instead of creating a new relation schema for each one-to-many or one-
to-one binary relationship-type, it consists of migrating attributes of the relation-
ship-type (if any) and a foreign key into the entity-type (or relationship-type) that
participates with the cardinality constraint equals to one.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
70 Petit and Hacid
sponsors. In that case, 99% of tuples of Departements get null values on the
duplicated attribute pid, and less importantly, on attribute since. Such kind of
problems never append with database schema produced by the algorithm
MapERtoR.
To sum up, such kind of logical database schema is often chosen to produce
physical database schemas, its main advantage being to minimize the length of
join paths, and thus to be rather efficient. The often misunderstood problem of
such schemas concerns the number of null values which can be generated once
the database is operational. For database designers, it might not be an important
issue at database design time, but that could become a nightmare for database
programmers who have to devise SQL queries in presence of null values on
duplicated attributes.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Database Schemas to Logical Database Tuning 71
Clearly, a compromise has to be reached between the two opposite goals. In the
spirit of (Ramakrishnan, 2003), we argue that a good design cannot be obtained
at database design time: too many parameters have to be taken into account at
an early stage of the design, specifically those related to application programs
accessing the database.
Nevertheless, an optimal design could be defined and obtained with respect to
the database accesses as given by SQL workloads. We argue that SQL
workloads could be used to tune the database design of operational databases
since they offer a nice setting in which logical database tuning can be treated
objectively — with respect to SQL workloads — instead of subjectively — with
respect to the database designer expertise.
SQL workloads represent a set of SQL accesses performed over the database
during some periods of time. They should be representative of the database
activity, either Select From Where SQL queries or update SQL queries (insert/
delete/update). Recently, SQL workloads can be easily gathered from opera-
tional databases by means of advanced functions available on top of major
RDBMS products: a representative workload can be generated by logging
activity on the server and filtering the events we want to monitor (Agrawal,
2001).
The key idea is to tune the design with respect to three main goals: minimizing
the occurrence of null values, maximizing both the efficiency of cost-sensitive
SQL queries performed against the database and data integrity of the database.
Example: Assume that SQL workloads reveal that cost-sensitive SQL queries
occur in a majority of cases between Departments and Projects. In that
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
72 Petit and Hacid
case, the logical database schema given in Table 1 could be transformed (or
denormalized) only with respect to the length of join paths implied by SQL
workloads, i.e., the relation schema sponsors could be merged into Depart-
ments, the rest of the logical schema remaining unchanged, up to some
transformations of the set of inclusion dependencies.
The idea of tuning database parameters at the physical or logical level is not new.
For instance, they are proven to be useful in many database applications such as
physical database tuning, e.g., automatic index definition (Agrawal, 2001),
logical database tuning (Lopes, 2002) or materialized view selections in a data
warehouse (Agrawal, 2001).
Conclusions
Relational database technology and semantic data modeling have been two major
areas of database research in recent decades. Relational database technology
is based on solid theoretical foundations, and it is understood what constitutes a
well-designed relational database schema. Semantic modeling, on the other
hand, provides a rich set of data abstraction primitives which can capture
additional semantics of the application in the database schema. Until recently,
relational database technology and semantic modeling have evolved almost
separately. There is a need for establishing and understanding connections
between semantic models and the relational model. This chapter is an attempt to
investigate this connection. We tackled this problem by restricting the class of
data dependencies to functional dependencies and inclusion dependencies. The
results of our work are directed toward the understanding of the properties of
relational translations of (extended) ER schemas.
We clarified two main steps in such a translation: (1) the order of the translation
of entity and relationship-types and (2) the translation of cardinalities for
relationship-types, whatever the convention chosen to interpret these cardinali-
ties (for example, UML class diagrams or conceptual data schemas of Merise).
These considerations are simple though very important in practice. Between the
desire to get efficient databases for end-users and the desire to get null-free
databases on duplicated attributes for database programmers, we have pointed
out how SQL workloads could be used to reach a compromise among contradic-
tory objectives.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Database Schemas to Logical Database Tuning 73
References
Agrawal, S., Chaudhuri, S. & Narasayya, V.R. (2001). Materialized View and
Index Selection Tool for Microsoft SQL Server 2000, ACM SIGMOD
2001, California, May 21-24.
Chen, P. (1976). The Entity-Relationship Model - Toward a Unified View of
Data. ACM TODS, 1(1), 9-36.
Fahrner, C. & Vossen, G. (1995). A Survey of Database Design Transforma-
tions Based on the Entity-Relationship Model. DKE, 15, 213-250.
Franconi, E. & Ng, G. (2000). The i.com tool for Intelligent Conceptual
Modeling. Proceedings of the Seventh International Workshop on
Knowledge Representation Meets Databases (KRDB 2000), Berlin,
Germany, 2000 (pp. 45-53).
Hacid, M.S., Petit, J.M. & Toumani, F. (2001). Representing and reasoning on
database conceptual schemas. Knowledge and Information Systems,
3(1), 52-80.
Hull, R. & King, R. (1987). Semantic Database Modelling: Survey, Applications,
and Research Issues. ACM Computing Surveys, 19(3), 201-260.
Jacobson, I., Booch, G. & Rumbaugh, J.E. (1999). Excerpt from “The Unified
Software Development Process”: The Unified Process. IEEE Software,
16(3), 82-90.
Levene, M. & Loizou, G. (1999). A Guided Tour of Relational Databases and
Beyond. Springer.
Lopes, S., Petit, J.M. & Toumani, F. (2002). Discovering interesting inclusion
dependencies: Application to logical database tuning. Information Sys-
tems, 27(1), 1-19.
Mannila, H. & Räihä, K.J. (1994). The Design of Relational Databases (2nd
ed.). Addison-Wesley.
Markowitz, V. & Shoshani, A. (1992). Representing Extended Entity-Relation-
ship Structures in Relational Databases: A Modular Approach. ACM
TODS, 17(3), 423-464.
Miller, R.J., Ioannidis, Y.E. & Ramakrishnan, R. (1994). Schema equivalence
in heterogeneous systems: Bridging theory and practice. Information
Systems, 19(1), 3-31.
Moulin, P., Randon, J., Teboul, M., Savoysky, S., Spaccapietra, S. & Tardieu, H.
(1976). Conceptual Model as a Data Base Design Tool. Proceeding of the
IFIP Working Conference on Modelling in Data Base Management Sys-
tems. In G. M. Nijssen (Ed.), Modelling in Data Base Management
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
74 Petit and Hacid
Endnote
1
Other extensions could have been integrated into our ER model such as
multi-valued attributes or composite attributes. In order to ease the
presentation of the mapping, they are not taken into account in this chapter
but could be integrated into our framework without any major difficulty.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation Based XML Query Optimization 75
Chapter IV
Transformation Based
XML Query
Optimization
Dunren Che, Southern Illinois University, USA
Abstract
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
76 Che
Introduction
With the advent of the Internet and World Wide Web (WWW), the repositories
of SGML (Standard Generalized Markup Language) compliant structured
documents have been fast mounting up. XML (Extensible Markup Language),
as the new proposal of W3C (World Wide Web Consortium) for standard, is
hurriedly getting the dominance of representing data in the Web and elsewhere.
Therefore, commensurate management technology, including efficient query
processing and optimization for XML data is specially needed. It has been
commonly recognized that structured documents in general and SGML/XML
documents in particular should benefit from the same type of database manage-
ment functionality as offered to traditional data. This requires the storage of the
documents within a database (which we call a structured-document database)
and management of these documents by a database management system
(DBMS). Within the context of this chapter, structured documents refer to
documents according to the SGML/XML/HTML standards (Cover, 2002).
Efficient processing and execution of declarative queries over structured-
document databases are an essential issue for structured-document database
management systems. This issue, however, has not been adequately studied.
Structured-document query optimization is fundamentally different from classi-
cal query optimization in two aspects. First, because of the high complexity of the
intrinsic data model behind XML data, the search space for query optimization
is much larger, which means the efficiency of traditional optimization approaches
will degrade unacceptably when applied to XML data. In other words, we have
to work out a much more progressive way of pruning the search space to achieve
acceptable performance. Second, the structure of XML documents, which can
be interrogated in an XML query and is normally implied in the DTD or XML
schema of the documents, provides opportunities for efficient semantic query
optimization, which shall be effectively exploited to achieve better optimization
efficiency for XML queries.
This chapter addresses the optimization issue of structured-document queries in
a database environment. Considering the dominance that XML has already
gained, our discussion is focused on XML-compliant documents, which are more
generally referred to as XML data. The query optimization strategy we present
here is transformation-based. The optimization of a query is accomplished
through a series of equivalent transformations applied to the query. Transforma-
tion rules in our system are all derived from the equivalences that we identified
in the specific context of XML document queries. The main theme of this chapter
is XML-document specific equivalences and the transformation rules derived for
XML query optimization.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation Based XML Query Optimization 77
The remainder of this chapter is organized as follows: the chapter first gives a
brief review of related work. Next, it provides the preliminaries needed for the
subsequent discussion of this chapter. The chapter then addresses selected
equivalences that we have identified and used in our test bed. This is followed
by a section that discusses the intended application of our equivalences, i.e.,
deterministic transformations for XML query optimization, followed by an
optimization example. The chapter then concludes with a summary of the
discussion of this chapter and indicates future directions.
Related Work
Since SGML/XML entered the arena of database technology, a lot of work has
been done on the various aspects related to XML data management, e.g.,
structured-document data modeling (Abiteboul, 1997; Özsu, 1997; Navarro,
1997; Lee, 1997; Yan, 1994; Conse, 1994; Chaudhuri, 1996; Morishima, 1997;
Gonnet, 1992; Fernadez, 2000; Florescu, 1999; Shanmugasundaram, 1999;
Bohannon, 2002; Klettke, 2000), XML document indexing (Chan, 2002; Grust,
2002; Li, 2001; Milo, 1999), and advanced algorithms for fast query processing
(Fernandez, 1998; McHugh, 1997; Gottlob, 2002; Li, 2001; Guha, 2002; Chien,
2002; Srivastava, 2002).
As structured-documents are essentially semistructured, the work done on
semistructured data management (e.g., Deutsch, 1999; McHugh, 1997) actually
addresses similar issues as structured-document management. Lore (McHugh,
1997, 1999), a DBMS designed for semistructured data and later migrated to
XML, has a fully-implemented cost-based query optimizer that transforms a
query into a logical query plan, and then explores the (exponential) space of
possible physical plans looking for the one with the least estimated cost. Lore is
well known by its DataGuide path index that together with stored statistics
describing the “shape” of the database provides the structure knowledge about
the data to help Lore’s optimizer prune its search space for a better plan. In this
sense, Lore is related to our work, but we capture the structure knowledge of
document data mainly from the DTDs and apply this knowledge on conducting
exclusively deterministic transformations on query expressions.
In the work of Fernandez (1998), a comparable strategy for exploiting a grammar
specification for optimizing queries on semistructured data is discussed, where
effort is made to make complete use of the available grammar for expanding a
given query. Our focus is different. We identify transformations that introduce
improvements on query expressions in a very goal-oriented manner.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
78 Che
The work of Consens and Milo (1994) also exploits the document type definition
knowledge for query optimization. They replace a query algebra operator with
a cheaper one whenever the DTD allows. However, the DTDs considered in
their study are simpler than the ones of SGML/XML, and the authors do not
consider different grammar constructors.
Our work is innovative in systematically addressing the query optimization issue
for structured documents through algebraic transformations. Our approach
exploits the structure knowledge implied by the DTD and other heuristics to
conduct strongly goal-driven, deterministic and thus highly efficient transforma-
tions on query expressions for optimization.
Preliminaries
In this chapter, we are interested in structured documents that follow the SGML/
XML standards (Cover, 2002). Our optimization techniques perform algebraic
transformations on query expressions based on the PAT algebra (Salminen,
1994). The main theme of this work is to exploit the structure knowledge about
the documents, which is usually characterized by the DTD or XML schema of
the documents. The structure knowledge is used to conduct profitable transfor-
mations on query expressions for optimization. In this section we first introduce
a few DTD-related notions that are important for the subsequent discussion of
this chapter, followed by the PAT algebra, which forms the basis of query
transformations in our work.
DTD-Related Notions
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation Based XML Query Optimization 79
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
80 Che
It is obvious that the edge and path concepts in a DTD graph are the graphic
counterparts of directly-contains/contained-in and contains/contained-in
relationships among element types of the DTD. Literally, the term “a path from
ET1 to ET2” is different from “a path between ET1 and ET2”. The latter does
not concern the direction of the path. Path and edge are important concepts for
the identification of relevant structural properties of documents for semantic
query optimization, which we discuss in subsequent sections.
Notice that the above notions are defined at the type level of document
components, but imply structural relationships at the instance level. As long as
DTDs are available, we rely on DTDs to extract the structure knowledge of
documents, otherwise we need to obtain this useful structure knowledge for
query optimization by means of a document parser.
In addition to the above notions, further notions regarding the properties of the
structural relationships among document components are defined and used for
deriving the core transformation equivalences for query optimization.
PAT Algebra
The PAT algebra (Salminen, 1994) was designed as algebra for searching
structured documents. We adopted the PAT algebra and extended it according
to the features of SGML/XML compliant documents. The PAT algebra is set
oriented, in the sense that each PAT algebraic operator and each PAT
expression evaluate to a set of elements. Herein, we present a restricted version
of it to serve the purpose of this chapter.
A PAT expression is generated according to the following grammar:
“E” (as well “E1” and “E2”) generally stands for a PAT expression, etn
introduces a document element type’s name, “r” is a regular expression
representing a matching condition on the textual content of the document
elements, and “A” designates an attribute of the elements.
∪, ∩ and – are the standard set operators, union, intersection and difference.
The two operands of a set operator have to be type-compatible, i.e., returning the
same type of elements.
σr(E) takes a set of elements and returns those whose content matches the
regular expression r, whileσA,r(E) takes a set of elements and returns those
whose value of attribute A matches the regular expression r. Operator ⊂ returns
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation Based XML Query Optimization 81
all elements of the first argument that are contained in an element of the second
argument, while ⊃ returns all elements of the first argument that contain an
element of the second argument.
In the subsequent discussion, we use ext(E) and τ(E) as shorthands to denote the
element extent determined by the expression E after evaluation and the result
type of the elements in ext(E), respectively.
Following is a query example formulated according to the PAT algebra with
regard to the DTD shown in Figure 1:
Query example. Find all the paragraphs containing both “xpath” and “xlink” in
any article.
Structured-Document Indices
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
82 Che
Considering that the number of the PAT algebraic operators and the complexity
of their potential interactions are rather high, we also apply certain criteria in the
selection of the equivalences to restrict their number. Otherwise, we would
obtain an unmanageably large set of potential transformations. The criteria we
observe are as follows:
• Equivalence must have the potential to imply a profitable transformation.
• Equivalence must not imply transformations to further complicate or
expand the query expressions.
• Equivalences must not require complex conditions to be checked.
• Equivalences must not target at merely generating alternative expressions.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation Based XML Query Optimization 83
Set-Oriented Equivalences
The “subset laws” are useful for simplification of query expressions involving a
subset relationship. The proof of the above equivalences is straightforward
except for the commutativity and associativity rules. In the following, as an
example, we give the proof of the commutativity law, ε 6, with regard to only the
⊂ operator.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
84 Che
Although we have ruled out in the previous section the situations where
complicated exploitation of the DTD knowledge is needed, there are particularly
interesting situations that the DTD structure can be used most profitably to
achieve query optimization. We show a few of such special cases and present
the corresponding equivalences in this subsection.
First, we introduce the notions of exclusivity, obligation, and entrance loca-
tions.
In a given DTD, some types may be shared among others. For example, the
element type Authors shown in Figure 1 is contained in element type Article and
in ShortPaper. But the types that are not shared, i.e., exclusively contained-in
another type, bear potential for query optimization.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation Based XML Query Optimization 85
If two element types are not related by exclusivity and obligation, it may be
worthwhile to check whether a third element type, called entrance location,
exists that could render us opportunities for applying a potential structure index
or shortening the navigation path needed for evaluating an involved containment
operation.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
86 Che
The correctness of this equivalence trivially follows from the entrance location
definition. The equivalence corresponding to the ⊃ operation can be defined likewise.
In general, the adding of an additional element type, as introduced by an entrance
location, to a PAT expression is detrimental to the evaluation efficiency of the
query. Therefore, the leftward application of these equivalences is obviously
favorable, while the left to right-side transformation will only be applied under
certain special conditions to ensure final, obvious improvements on the query
expressions, e.g., to enable application of a structure index. We will show this
by the following transformation rule that combines exclusivity and entrance
location:
The correctness of this equivalence becomes evident when the omitted interme-
diate term, E1⊂(E3⊂E2), is added.
Analogously, the equivalence that combines obligation and entrance location
can be defined, but omitted herein.
When free(E1) additionally holds, Iτ(E1)(E2) is subset of ext(τ (E1)), and the
intersection is thus redundant and can be omitted.
Application of Equivalences
We envision two types of applications of our equivalences. One typical way is
to directly apply the equivalences on query expressions for generating more
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation Based XML Query Optimization 87
alternatives for each input query expression, and then according to a certain
criterion, e.g., cost comparison, choose the one with the cheapest cost. The
second way is to conduct only beneficial transformations on query expressions
toward the goal of optimization, which is usually achieved with resort to
heuristics.
The strategy adopted in our work is strongly heuristics-based as it applies only
deterministic transformations on query expressions for optimization. Here, the
determinism consists in: (1) all transformation rules are unidirectional, of
which each obtains determinate improvement on its input queries, (2) once a new
transformation is performed, the previous candidate (i.e., the input expression to
the transformation rule) is immediately discarded. The whole optimization
process conducted according to this strategy is a linear, deterministic process,
which step-wise improves on an input query and leads to the unique, final, optimal
alternative of the input query. Query optimization thus accomplished is highly
efficient because of the determinism nature.
Control Strategy
In our system, all the performed transformations are required to lead to step-by-
step improvements on an input query expression until a final expression is
reached. Examples of significant improvements may be a potential structure
index being introduced or the input query expression being completely rewritten
to be more evaluation-efficient. In both cases, equivalent transformations are
performed according to XML-specific semantics at the PAT algebra level. So
the key transformations pursued in our system are heuristics-based semantic
transformations, which are usually conducted more efficiently by starting from
a carefully chosen standard format. In our system, this standard format is
achieved via a normalization step, called normalization phase, which also
carries out necessary simplification on input query expressions. The second
phase is, of course, the semantic optimization phase. During semantic transfor-
mation, in order to introduce a major improvement into an expression, element
names may be substituted and the newly introduced element names may have
redundancy with other parts of the expression. Therefore, a final cleaning-up
or simplification phase is employed.
In the following, we present sample transformation rules (Che, 2003), and then
show an optimization example using these rules.
Transformation rules are derived from the more general equivalences. Trans-
formation rules are unidirectional and take the form “(E1)⇒(E2)”. An addi-
tional precondition may be added to some rules to determine the applicability of
the rule to a specific input expression.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
88 Che
Transformation Rules
Normalization Rules
Semantic Rules
Semantic rules are developed with the predominant goal: to enable the exploita-
tion of structure indices during optimization, which in most cases is not readily
achievable, rather, relies on deep exploration of DTD knowledge such as
obligation, exclusivity, and entrance location.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation Based XML Query Optimization 89
Numerous cases have been identified for introducing structure indices into a
query (Che, 2003). The simplest case is to directly use an available structure
index between the two element types involved in a query:
Iτ(E1)(E2) denotes a structure index operation defined between τ(E1) and τ(E2),
where the subscript τ(E1) indicates the result type of the operation.
This rule is based on the index substitution equivalence, ε15, to interpolate the
index operation into a query expression.
The second case is designed to reveal the applicability of a potential structure
index that is not directly available. The corresponding rule combines the
commutativity and associativity laws into a single transformation:
Simplification Rules
The third phase reapplies most of the simplification rules of Phase 1, and
introduces additional rules such as R10 to simplify new subexpressions pertain-
ing to the index operator Iτ(E1).
An Optimization Example
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
90 Che
Transformation Example
The query retrieves all the paragraphs containing both “xpath” and “xlink” from
any article.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation Based XML Query Optimization 91
Acknowledgments
The author would like to show great appreciation to his former colleagues at
Fraunhofer-IPSI (formerly known as GMD-IPSI), Germany. This continuing
research at the author’s current affiliation was originally initiated at GMD-IPSI
in close collaboration with Prof. Karl Aberer, Dr. Klemens Böhm, and Prof. M.
Tamer Özsu (during his visit to GMD-IPSI on his sabbatical leave).
References
Abiteboul, S., Cluet, S., Christophides, V., Milo, T., Moerkotte, G. & Simeon, J.
(1997). Querying Documents in Object Databases. International Jour-
nal on Digital Libraries, 1(1), 5-19.
Bertino, E. (1994). A Survey of Indexing Techniques for Object-Oriented
Database Management Systems. In J.C. Freytag, D. Maier & G. Vossen
(Eds.), Query Processing for Advanced Database Systems. Morgan
Kaufmann Publishers, 383-418.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
92 Che
Bohannon, P., Freire, J., Roy, P. & Simeon, J. (2002). From XML Schema to
Relations: A Cost-Based Approach to XML Storage. Proceedings of the
18th International Conference on Data Engineering (ICDE’02), (pp.
64-73).
Böhm, K., Aberer, K., Özsu, T. & Gayer, K. (1998). Query Optimization for
Structured Documents Based on Knowledge on the Document Type
Definition. Proceedings of IEEE International Forum on Research and
Technology Advances in Digital Libraries (ADL’98), (pp. 196-205).
Chan, C. Y., Felber, P., Garofalakis, M. & Rastogi, R. (2002). Efficient Filtering
of XML Documents with XPath Expressions. Proceedings of Interna-
tional Conference on Data Engineering, (pp. 235-244).
Chan, C. Y., Garofalakis, M. N. & Rastogi, R. (2002). RE-Tree: An Efficient
Index Structure for Regular Expressions. Proceedings of VLDB 2002,
(pp. 263-274).
Chaudhuri, S. & Gravano, L. (1996). Optimizing Queries over Multimedia
Repositories. Proceedings of SIGMOD’96, (pp. 91-102).
Che, D. (2003). Implementation Issues of a Deterministic Transformation
System for Structured Document Query Optimization. Proceedings of
2003 International Database Engineering & Application Symposium.
Che et al. (2003). Query Processing and Optimization in Structured Document
Database Systems. Manuscript in preparation for publication on the VLDB
Journal.
Chien, S., Vagena, Z., Zhang, D., Tsotras, V.J. & Zaniolo, C. (2002). Efficient
Structural Joins on Indexed XML Documents. Proceedings of VLDB
2002, (pp. 263-274).
Consens, M. & Milo, T. (1994). Optimizing Queries on Files. Proceedings of the
1994 ACM SIGMOD International Conference on Management of
Data, (pp. 301-312).
Cover, R. (2002). Online Resource for Markup Language Technologies.
Retrieved from the WWW: http://xml.coverpages.org
Deutsch, A., Fernandez, M. & Suciu, D. (1999). Storing Semistructured Data
with STORED. Proceedings of ACM SIGMOD 1999, (pp. 431-442).
Fernandez, M. F. & Suciu, D. (1998). Optimizing Regular Path Expressions
Using Graph Schemas. Proceedings of the 14th International Confer-
ence on Data Engineering, (pp. 14-23).
Florescu, D. & Kossmann, D. (1999). Storing and Querying XML Data Using
an RDMBS. IEEE Data Engineering Bulletin, 22(3), 27-34.
Gonnet, G. H., Baeza-Yates, R. A. & Snider, T. (1992). Information Retrieval-
Data Structures and Algorithms. New Indices for Text: PAT trees and
PAT arrays. New York: Prentice Hall.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Transformation Based XML Query Optimization 93
Gottlob, G., Koch, C. & Pichler, R. (2002). Efficient Algorithms for Processing
XPath Queries. Proceedings of VLDB 2002, (pp. 95-106).
Grust, T. (2002). Accelerating XPath location steps. Proceedings of SIGMOD
Conference 2002, (pp. 109-120).
Guha, S., Jagadish, H. V., Koudas, N., Srivastava, D. & Yu, T. (2002).
Approximate XML joins. Proceedings of the ACM SIGMOD Confer-
ence, (pp. 287-298).
Klettke, M. & Meyer, H. (2000). XML and Object-Relational Database Systems
- Enhancing Structural Mappings Based on Statistics. Proceedings of the
International Workshop on the Web and Databases (WebDB), (pp. 151-
170).
Lee, K., Lee, Y. K. & Berra, P. B. (1997). Management of Multi-Structured
Hypermedia Documents: A Data Model, Query Language, and Indexing
Scheme. Multimedia Tools and Applications, 4(2), 199-224.
Li, Q. & Moon, B. (2001). Indexing and Querying XML Data for Regular Path
Expressions. Proceedings of the 27th International Conference on Very
Large Databases, (pp. 361-370).
McHugh, J., Abiteboul, S., Goldman, R., Quass, D. & Widom, J. (1997). Lore:
A Database Management System for Semistructured Data. SIGMOD
Record, 26(3), 54-66.
McHugh, J. & Widom, J. (1999). Query Optimization for XML. Proceedings of
the 25th International Conference on Very Large Databases, 315-326.
Milo, T. & Suciu, D. (1999). Index Structures for Path Expressions. Proceed-
ings of ICDT 1999, (pp. 277-295).
Morishima, A. & Kitagawa, H. (1997). A Data Modeling and Query Processing
Scheme for Integration of Structured Document Repositories and Rela-
tional Databases. Proceedings of the Fifth International Conference on
Database Systems for Advanced Applications (DASFAA 1997), (pp.
145-154).
Navarro, G. & Baeza-Yates, R. (1997). Proximal Nodes: A Model to Query
Document Databases by Content and Structure. ACM Transaction on
Information Systems, 15(4), 400-435.
Özsu, M. T., Iglinski, P., Szafron, D. & El-Medani, S. (1997). An Object-
Oriented SGML/HiTime Compliant Multimedia Database Management
System. Proceedings of Fifth ACM International Multimedia Confer-
ence (ACM Multimedia’97), (pp. 239-249).
Salminen, A. & Tompa, F. W. (1994). PAT Expressions: An Algebra for Text
Search. Acta Linguistica Hungarica, 41(1), 277-306.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
94 Che
Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D.J. & Naughton
J.F. (1999). Relational Databases for Querying XML Documents: Limita-
tions and Opportunities. Proceedings of VLDB, (pp. 302-314).
Srivastava, D., Al-Khalifa, S., Jagadish, H. V., Koudas, N., Patel, J. M. & Wu,
Y. (2002). Structural Joins: A Primitive for Efficient XML Query Pattern
Matching. Proceedings of ICDE’02, (pp. 141-150).
Yan, T. W. & Annevelink, J. (1994). Integrating a Structured-Text Retrieval
System with an Object-Oriented Database System. Proceedings of the
20th VLDB Conference, (pp. 740-749).
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Specifying Coherent Refactoring of Software Artefacts 95
Chapter V
Specifying Coherent
Refactoring of
Software Artefacts with
Distributed Graph
Transformations
Paolo Bottoni, University of Rome “La Sapienza”, Italy
Abstract
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
96 Bottoni, Parisi-Presicce, and Taentzer
Introduction
Software is subject to changes and a piece of software may need changes for
several reasons. One such reason is the introduction of new requirements that
cause the need for design changes. The introduction of a new requirement can
be a consequence of either the iterative development process chosen for the
project that constructs the system incrementally, or the fact that the requirement
was overlooked in the initial specification and design of the system. As a simple
example, consider an application developed around a single specific algorithm.
If a new algorithm to perform the same calculations (graph layout, for example)
becomes available, it may be useful to modify the application to add the option
of using the new algorithm.
Object-oriented programming has made many changes easy to implement, often
just by adding new classes, as opposed to more traditional approaches requiring
many modifications. But adding classes may not be sufficient. Even in the simple
example above, the application must evolve by means other than class addition.
If the designer has not foreseen the possibility of alternatives for the algorithm,
the class with the original algorithm would probably need to be “split” into
algorithm-specific elements and general ones, the latter to be “moved” to a new
class that will then provide the means to choose between the two algorithms,
placed in separate components.
Another reason for wanting to modify an object-oriented program is to be able
to reuse (part of) it. As an example, consider the case of two teams developing
two class libraries independently. The two libraries may contain different classes
implementing the same basic objects (windows, lists) or the same operations to
manipulate them with different names. In order to integrate the libraries, it is best
to remove these inconsistencies, by changing one library to use the basic classes
or the operation names of the other one. Simple modifications such as the change
of an operation name are not easy to implement, as they require searches for the
procedures that can invoke them or for the other operations that they would
override with the new name.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Specifying Coherent Refactoring of Software Artefacts 97
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
98 Bottoni, Parisi-Presicce, and Taentzer
In the rest of this introduction, we set the background for our work, by introducing
the refactorings used in the motivating example and reviewing some approaches
to refactoring and software evolution via graph rewriting, and illustrating
motivations for the coherent refactoring of code and models. Background
notions on graph transformation are then given. Next, the problem of maintaining
consistency among specification and code is reformulated as the definition of
suitable distributed graph transformations, and our approach is illustrated with
two important refactorings. The next section discusses the principles under
which one can establish correspondences between abstract representations of
the code and of the model. Section 6 discusses forms of behavior preservation
and sketches out how formal results for graph transformation help in reasoning
about it. Conclusions are then given.
Selected Refactorings
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Specifying Coherent Refactoring of Software Artefacts 99
which are accessed by the extracted code and have a local scope be passed as
parameters, and that the removed code form a block, i.e., it has a single entry
point and a single exit point.
Related Work
Several tools have been developed to assist refactoring. Some are packaged as
stand-alone executables, while others integrate refactorings into a development
environment. Many tools refer directly and exclusively to a specific language, for
example C# Refactory (http://www.xtreme-simplicity.net/) for C#, or
CoreGuide6.0 (http://www.omnicore.com) for Java. Xrefactory (http://
www.xref-tech.com) assists in modifying code in C and Java. All of these
provide a variety of refactorings, typically renamings and method extraction.
However, none of them mentions diagrams and the effects on other views of the
system, including documentation.
The class diagram, referred to as “the model,” is instead considered in objectiF
(http: //www.microtool.de/objectiF), which, in addition to supporting a variety of
languages, allows transformations of both the code and the class model, with
changes propagated automatically to both views. Other kinds of diagrams,
especially those describing behavioral aspects of the system, are not refactored.
Eclipse (http://www.eclipse.org) integrates system-wide changes of code with
several refactoring actions (such as rename, move, push down, pull up,
extract). Class diagrams are implicitly refactored, too. Finally, JRefactory
(http://jrefactory.sourceforge.net) supports 15 refactorings including: pushing
up/down, methods/fields, and extract method/interface. The only diagrams
mentioned are class diagrams which are reverse engineered from the .java
files.
Reverse engineering is present in Fujaba (Niere et al., 2001), where the user can
reconstruct the model after a chosen set of changes of the code. A more efficient
option would be to define the effects of a refactoring on the different parts of the
model. This is more easily realized on structural models, where transformations
on such diagrams are notationally equivalent to the lexical transformation on the
source code, than on behavioral specifications. Modern refactoring tools,
however, work on abstract representations of the code, rather than on the code
itself, typically in the form of an Abstract Syntax Tree (AST), following Roberts’
(1999) line.
Refactorings are also defined on model diagrams. Sunyé et al. (2001) illustrate
refactoring of statecharts, typically to extract a set of states to be part of a
composite state. Transformations of concrete diagrams are specified by pre and
post-conditions, written as OCL constraints. Metz et al. (2002) consider the
UML metamodel to propose extensions to use case models, which would allow
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
100 Bottoni, Parisi-Presicce, and Taentzer
significant refactorings of such models and avoid improper current uses. These
papers, however, do not consider the integration with possible source code
related to these models.
Current class diagram editors do not extend changes to all other related
diagrams, limiting their “automation” to the source code, with the result that
direct intervention is needed to restore consistency among possibly various UML
diagrams representing the same subsystem. We adopt UML metamodel in-
stances and draw a correspondence between these and abstract syntax trees
representing code. Hence, a common graph-based formalism can be used as
basis for an integrated management of refactoring both the code and the model
in an integrated way.
Graph rewriting has been introduced as a basis for formalising refactoring in
work by Mens, alone (2000, 2001) and with others (2002). In these papers, a non-
standard graph representation of code is used, so that the availability of AST
representations is not exploited. Moreover, integrated refactoring of model and
code by graph transformation has not been considered up to now.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Specifying Coherent Refactoring of Software Artefacts 101
refactoring supported by the CASE tool in which the user acts on the code or on
the model, the corresponding modifications on the other graph must be enforced.
After refactoring, the cycle can start again with new developments, and so on.
While refactoring tools work on both abstract and concrete representations of
code, they are usually restricted to the manipulation of structural aspects of the
model, namely class diagrams. Although this is intuitively justifiable by the stated
assumption that refactoring does not affect the behavior of systems, the
combination of refactoring with other forms of code evolution can lead to
inconsistencies between the model and the code. This could be avoided by a
careful consideration of what a refactoring involves, as shown in the following
two subsections.
Modification of Collaborations
Activity graphs are special types of state machines used for describing complex
processes, involving several classifiers, where the state evolution of the involved
elements is modeled. Suppose that an action is performed to the effect of setting
a field variable to some value, say x = 15. Hence, a state s appears in the model
indicating that an assignment has to occur at that time. If the EncapsulateVariable
refactoring is subsequently applied to the variable x, the code x = 15 is replaced
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
102 Bottoni, Parisi-Presicce, and Taentzer
:C :C :E :C :D :E
a) b) c)
by setX(15). The state in the activity diagram now becomes a CallState s’.
(Compare similar modifications in activity diagrams in Figures 2c and 3c.)
Example of Refactoring
class Audio {
protected MusicSource ms; private Environment env;
public MusicDescription preferences;
protected findMusicSource() { // lookup for a music source }
protected void playMusic() {
ms = findMusicSource(); Music toPlay = ms.provideMusic(this);
// code to set the playing environment env
toPlay.play(env);
continued at top of page 103
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Specifying Coherent Refactoring of Software Artefacts 103
}
}
class Music {
void play(Environment env) { // code to play in the environment env }
}
class MusicSource {
public Music provideMusic(Audio requester) {
MusicDescription desc = requester.preferences;
// code to retrieve music according to desc and sending it back as result
}
}
class Environment { // fields and methods to define a playing environment }
With a view to the possibility of reuse, the programmer decides to protect the
preferences, by applying the EncapsulateVariable refactoring. After this first
step, the affected code looks as follows, where the parts in bold mark the changed
elements. The new situation is reflected in the model diagrams of Figure 3.
class Audio {
protected MusicSource ms; private Environment env;
private MusicDescription preferences;
protected findMusicSource() { // same implementation as before }
protected void playMusic() { // same implementation as before }
public MusicDescription getPreferences() { return preferences; }
public void setPreferences(MusicDescription desc) { preferences = desc; }
}
class MusicSource {
public Music provideMusic(Audio requester) {
MusicDescription desc = requester.getPreferences();
// same code using desc as before
}
}
The code above presents several possibilities for refactorings, allowing the
introduction of an abstract notion of player able to retrieve a content source,
interrogate it to obtain some content and set an environment for it to play.
Concrete players will differ for the type of source they have to retrieve and the
way in which they define the environment. On the other hand, content sources
must have a generic ability to accept a player and sending the appropriate content
to it, while the different forms of content will have specific realizations of the play
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
104 Bottoni, Parisi-Presicce, and Taentzer
method. To this end, a first step is to extract the code for playing in an
environment from playMusic to a setEnvironment method. Method playMusic is
then renamed to playContent, while findMusicSource is renamed to findSource
and the variable musicSource to source, while in class Music, provideMusic is
renamed to provideContent. Refactorings are then performed to introduce new
classes and interfaces in an existing hierarchy, by creating and inserting the
abstract class AbstractPlayer and the interfaces Content and ContentSource. We
can now pull up methods and variables from Audio to AbstractPlayer. Finally, all
return and parameter types referring to the concrete classes are now changed
to the newly inserted types. The resulting code is reported below. Again, parts
in bold show the modified parts with respect to the previous version. The reader
can reconstruct the UML diagrams according to these modifications.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Specifying Coherent Refactoring of Software Artefacts 105
Figure 2. Components of the UML model for the first version of code – (a)
class diagram; (b) sequence diagram; (c) activity diagrams
Audio # ms MusicSource
+ MusicDescription preferences + Music provideMusic(Audio requester)
# MusicSource findMusicSource()
# void playMusic()
- env Music
# play(Environment env)
Environment
(a)
ms
provideMusic(a)
toPlay
play(env)
(b)
Audio::playMusic()
Audio::provideMusic()
a:Audio ms:MusicSource toPlay:Music
a:Audio ms:MusicSource
findMusicSource()
provideMusic(a)
start retrieval
ms:
retrieve music
play(env)
(c)
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
106 Bottoni, Parisi-Presicce, and Taentzer
Graph Transformation
Graphs are often used as abstract representations of code and diagrams, e.g.,
UML diagrams. Formally, a graph consists of a set of vertices V and a set of
edges E such that each edge e in E has a source and a target vertex s(e) and t(e)
in V, respectively. Each vertex and edge may be attributed by some data value
or object, expressed by elements of an algebra on some algebraic signature Σ.
Here, we consider typed attributed graphs. For graph manipulation, we adopt the
double-pushout approach to graph transformation, DPO (Corradini et al., 1997),
Audio
- MusicDescription preferences # ms MusicSource
# MusicSource findMusicSource() + Music provideMusic(Audio requester)
# void playMusic()
+ MusicDescription getPreferences()
+ void setPreferences(MusicDescription desc)
- env Music
Environment # play(Environment env)
(a)
Audio::provideMusic()
a:Audio ms:MusicSource toPlay: Music
playMusic() a:Audio ms:MusicSource
findMusicSource()
start retrieval
ms
provideMusic(a)
getPreferences()
getPreferences()
desc
retrieve music
toPlay
play(env)
(b) (c)
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Specifying Coherent Refactoring of Software Artefacts 107
based on category theory. Using typed graphs, structural aspects appear at two
levels: the type level (modeled by a type graph T) and the instance level (modeled
by an instance graph G). G is correctly typed if it can be mapped in a structure-
preserving manner to T, formally expressed by a graph homomorphism.
A graph rule r: L → R is a pair of T-typed instance graphs L, R such that L ∪ R is
defined, i.e., graph objects occurring in both L and R have the same type and
attributes and, if they have the same edges, will also have the same source and
target vertices. The left-hand side L represents the modification pre-conditions,
while the right-hand side R shows its effect. Vertex identity is expressed via
names, while edge identity is inferred from the identity of the connected vertices.
Additionally, graph rules comprise attribute computations where left-hand sides
may contain constants or variables of set X, while right-hand sides capture the
proper computations, denoted as elements of an algebraic term TΣ (X).
A rule may also contain a set of negative application conditions (NAC),
expressing graph parts that must not exist for the rule to be applicable. NACs
are finite sets of graphs NAC={N i| L ⊆ N i, i ≥ 0 }, specifying a conjunction of
basic conditions, and can refer to values of attributes (Fischer et al., 1999). For
a rule to be applicable, none of the prohibited graph parts Ni – L present in a NAC
may occur in the host graph G in a way compatible with a rule match m. A match
is an injective graph homomorphism m: L ∪ R → G ∪ H, such that m(L) ⊆ G and
m(R) ⊆ H, i.e., the left-hand side of the rule is embedded into G and the right-
hand side into H. In this chapter we use dotted lines to denote NACs. Non-
connected NACs denote different negative application conditions (see Figure 14
for an example). A graph transformation from a graph G to a graph H, p(m):
G ⇒ H, is given by a rule r and a match m with m(L – R) = G – H and m(R – L)
= H – G, i.e., precisely that part of G is deleted which is matched by graph objects
of L not belonging to R and symmetrically, that part of H is added which is
matched by new graph objects in R. Operationally, the application of a graph rule
is performed as follows: First, find an occurrence of L in graph G. Second,
remove all the vertices and edges from G matched by L – R. Make sure that the
remaining structure D= G–m(L–R) is still a proper graph, i.e., no edge is left
which dangles because its source or target vertex has been deleted. In this case,
the dangling condition is violated and the application of the rule at match m is
not possible. Third, glue D with R–L to obtain graph H. A typed graph
transformation system GTS=(T,I,R) consists of a type graph T and a finite set
R of graph rules with all left and right-hand sides typed over T. GTS defines
formally the set of all possible graphs by Graphs(GTS)={G|I ⇒ *R G} where
G ⇒*R H ≡ G ⇒r1(m1) H1 ... ⇒rn(mn) Hn = H with r1, ..., rn in R and n >= 0. It
follows from the theory that each graph G is correctly typed.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
108 Bottoni, Parisi-Presicce, and Taentzer
Transformation Units
Transformation units (Kreowski et al., 1997) are a general concept to control rule
application, by control conditions specified by expressions over rules. We use it
in the context of distributed graph transformation, in which a transformation unit
consists of a set of rules and a control condition over C describing how rules can
be applied. Typically, C contains expressions on sequential application of
rules, as well as conditions and loops, e.g., by applying a rule as long as possible.
We relate rule expressions to graph rules by giving names to rules and passing
parameters to them, to be matched against specific attributes of some vertex. By
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Specifying Coherent Refactoring of Software Artefacts 109
this mechanism, we can restrict the application of rules to those elements which
carry an actual reference to the code to be refactored. To this end, the rules
presented in the transformation units are meant as rule schemes to be instantiated to
actual rules, assigning the parameters as values of the indicated attributes.
The abstract representations of code and UML models are given in the form of
graphs, obeying the constraints imposed by a type graph. For the code, we refer
to the JavaML definition of an abstract syntax for Java (Badros, 2000), and we
consider the type graph provided by its DTD. Indeed, any JavaML document is
structured as a tree, i.e., a special kind of graph where an XML element is
represented by a typed vertex and its attributes by vertex attributes. The graph
edges show the sub-element relation and are untyped and not attributed. We call
this graph the code graph. For UML (OMG, 2002), the abstract syntax of the
UML metamodel provides the type graph to build an abstract representation of
the diagram that we call the model graph.
As an example, Figure 4 shows the code graph for class Audio. For space
reasons, we omit the representation of fields ms and env and method
findMusicSource. Figure 5 presents the model graph for the class diagram of
Figure 2a (without dependencies). Only the important fields of model elements
are shown. Details of model elements occurring in more than one figure are
shown only in one. Vertices that would be directly connected to a class vertex
in the code graph, appear in the model graph as feature elements for which the
class is an owner. Figures 6 and 7 present the components of the model graph
for the sequence and activity diagrams of Figure 2.
The model graphs, though presented separately, are different views of one large
graph representing the whole model. Indeed, behavioral diagrams are associated
with model elements which own or contribute to the model’s behavior. As an
example, object m1:Method for playMusic appears in Figure 5 as a behavioral
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
110 Bottoni, Parisi-Presicce, and Taentzer
Figure 4. A part of the code graph for the first version of the code of class
Audio
: java-class-file : java-source-program
3 1 : assignment-expr
: block op = „=„ : lvalue a2‘: var-set
name= „ms“
2
: local-variable c6‘: send
name = „toPlay“ message= „findMusicSource“
: type
id = „Audio:var1“
name= „Music“
c7‘: send : target a2‘: var-ref
message= „provideMusic“ name= „ms“
a3:AssociationEnd
c4:Class
name= „env“
name= „Environment“
c3:Class visibility=#private
name= „MusicSource“ type type
type c5:Class
owner o3:Operation name= „Music“
a2:AssociationEnd
feature specification owner
name= „ms“ o4:Operation
visibility=#protected m3:Method
specification feature
name= „provideMusic“
visibility=#public :Method :Parameter
p1:Parameter name= „play“ name= „env“
name= „requester“ visibility=#protected kind=#in
:Parameter
kind=#in
kind=#return
feature of class Audio, and in Figure 7 as the element whose behavior is defined
by the component of the model graph for activity diagrams.
Conversely, object o2:operation appears as the specification for playMusic in
Figure 5, and as the operation for a CallOperationAction object in Figure 6. In the
next section, transformation units for distributed transformations are used to
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Specifying Coherent Refactoring of Software Artefacts 111
:InteractionInstanceSet
activator :Stimulus
Figure 7. Abstract graph for the activity diagram of Figure 2c for executing
playMusic
behavior m1: Method
name= „playMusic“
:ActivityGraph
c6:CallOperationAction c7:CallOperationAction
:Procedure
o1:Operation o3:Operation
name= „findMusicSource“ name= „provideMusic“
c8:CallOperationAction
o4:Operation
name= „play“
contents :ActionState target
:Transition
source
:Action :Procedure
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
112 Bottoni, Parisi-Presicce, and Taentzer
Encapsulation of Variables
: type : type
name = t name = void
c‘: class c‘: class
name = cname name = cname
m1‘: method
a‘: field name= „set“+varname
name= varname id = cname + „:mth“ + d
a‘: field visibility = private visibility = x
name= varname
visibility = x
m2‘: method
1: type : block
: return name= „get“+varname
name = t
id = cname + „:mth“ + c
visibility = x : asignment-expr
: var-ref op= „=“
name= „varname“ : formal_arguments
1: type : lvalue
name = t
: type p1‘: formal_argument
name = t name = „arg“
id = cname + „:frm“ + d
: var-ref
name= „arg“ a1‘: var-set
id = cname + „:frm“ +d name= „varname“
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Specifying Coherent Refactoring of Software Artefacts 113
Figure 9. Two NACs for the rule in Figure 8, to check that no method exists
with the same signature as the inserted setter and getter methods
1: type : type
name= t name = t
c‘: class
c‘: class name = cname
name = cname
: formal_argument
name = „arg“
a‘: field
a‘: field name= varname
name= varname : formal_arguments
Rules operate locally on the components of the model graph for the diagrams
above. Figure 12 shows the encapsulate_variable_model rule acting on the class
diagram. Negative application conditions analogous to those for the code graphs
are also used, guaranteeing a check of the overall consistency of the represen-
tations. Consequently, we also need to compute the transitive closure of the
inheritance relation for model graphs (not shown). Rules
encapsulate_variable_model and encapsulate_variable_code are applied in paral-
lel along their common subrule shown in grey.
Figure 10. Rule to replace accesses to varname in cname with calls to the
getter
1: var-ref 1: var-ref
id = i id = i
: target
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
114 Bottoni, Parisi-Presicce, and Taentzer
Figure 11. Rule to replace updates of varname in cname with calls to the
setter
Figure 12. LHS and RHS of the rule for variable encapsulation on the class
diagram component of the model graph
op2: Operation
c: Class c: Class
owner feature
name = cname name = cname
owner
owner owner
m2: Method
feature name= „set“+varname
a: Attribute feature visibility = x
name= varname a: Attribute feature
visibility = x name= varname
visibility = private m1: Method
name= „get“+varname
op1: Operation visibility = x
type type
1: Classifier 1: Classifier type p1: Parameter
name = t name = t name = „arg“
type
: Parameter
kind = #return
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Specifying Coherent Refactoring of Software Artefacts 115
Finally, we consider the required modifications for sequence diagrams, for the
case of variable encapsulation. Since sequence diagrams do not show read and
write actions on attributes, the encapsulation does not directly cause a refactoring.
In order to maintain a consistent model, the user has to specify if and where the
refactoring should be represented for this part of the model. In particular,
whenever a method m is called, in which the encapsulated variable is used, it is
necessary to introduce a Stimulus s’ to call the relevant setter or getter method.
From the ordering of subtrees in the code graph of m, one can identify the
stimulus s for which s’ is the successor (or predecessor) in the new activation
sequence, and pass it as a parameter to the rule. For space reasons, we omit the
representation of the relative rule getEncVarInInteraction.
The rules in Figures 10, 11, and 13 must be applied at all possible instances of their
LHS in the distributed graphs. There may be several such instances, and we want
to apply a transformation in a transactional way, i.e., the overall application is
possible only if corresponding parts can be coherently transformed. Hence,
transition units specify some form of control on the application. In particular, the
control construct asOftenAsPossible states that a local rule must be applied in
parallel on all (non-conflicting) instances of the antecedent. Contextual elements
can be shared by different instances, but no overlapping is possible on elements
removed or transformed by the rule. Moreover, the construct || indicates the
distributed application of two or more rules.
1:CallState
1:ActionState
2:Procedure
2:Procedure
c2:CallOperationAction
ac:ReadAttributeAction
m1: Method
a: Attribute op: Operation
name= varname name= „get“+varname
c: Class
name = cname c: Class
m1: Method name = cname
op: Operation
name= „get“+varname a: Attribute
name= varname
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
116 Bottoni, Parisi-Presicce, and Taentzer
The user can also decide to request a modification of interaction diagrams. In this
case, he or she has to interactively provide a value for the stimulus after or before
which to place the new call, and the transformation unit is completed by the
following construct.
By applying the transformation unit, both code and model graphs are transformed
to reflect the existence and usage of the new methods. As an example, the graph
in Figure 14 is a subgraph of the resulting code graph (the body of playMusic is
not shown as it remained unchanged) obtained by applying the transformation
unit EncapsulateVariable, i.e., the local rule encapsulate_variable_code has been
applied once with arguments cname = “Audio” and varname = “preferences”.
Extract Method
In the words of Martin Fowler, “If you can do Extract Method, it probably means
you can go on more refactorings. It’s the sign that says, ‘I’m serious about this’.”
We present our approach to managing this refactoring, without figures due to
lack of space. A more detailed version, but with a different code representation,
is in Bottoni et al. (2003).
The pre-condition that the name for the new method does not exist in the class
hierarchy is checked for variable encapsulation. In general, we can assume that
the code and model graphs are complemented by all the needed gen edges. The
pre-condition that the code to be extracted be a block is easily checkable on the
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Specifying Coherent Refactoring of Software Artefacts 117
Figure 14. Code graph for class Audio after the EncapsulateVariable
refactoring
: java-class-file : java-source-program
code graph. Indeed, this code can be a whole subtree rooted in a block, if, switch,
loop, do-loop vertex, or a collection of contiguous subtrees of a same method
vertex, composed of stmt-exprs not comprising any construct try, throw, return,
continue, break, synchronized, and such that no label appears in them.
We then need to identify all the variables to be passed to the new method. The
code graph is inspected to identify all the var-set and var-ref elements where the
name of the variable is not the name of a formal-argument of the original method
or a name for a local-variable declaration present in the subtree to be moved.
Additionally, if the subtree presents some local-variable vertex, we check that
there are no var-set or var-ref elements for that variable in the subtrees remaining
with the original method. The creation of the call for the new method is achieved
by substituting the removed subtrees with a send element with the name of the
new method as value of the attribute message, target this, and the list of formal-
arguments as derived before. In the model, we modify the class diagram by
simply showing the presence of the new method in the class, as the effects on
the referred variables and the existence of a call for this method are not reflected
at the structural level. For the activity diagrams, we need to identify the Action
associated with a given Operation. Such an Action can be further detailed through
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
118 Bottoni, Parisi-Presicce, and Taentzer
a collection of Actions associated with it. So, we need to identify all those vertices
which correspond to roots of the moved subtrees, detach them from the
description of the Operation, and exploit them to create the description of the
Operation associated with the new method.
For interaction diagrams, one must identify the existing instances of Stimulus
occurring before and/or after the extracted code, and insert the Stimulus to a
CallOperationAction, for an Operation with the name of the new Method, with
equal receiver and sender. Moreover, each instance of CallOperationAction,
originating from the original Operation instances and related to a vertex in the
extracted subtrees, must now be related to an instance of Stimulus, whose
activator is the Stimulus for the new Operation. The existing predecessor and
successor associations for the first and last such instances of Stimulus are
transferred to the new Operation. These transformations must be applied as often
as possible, so as to affect all the descriptions of the behavior of the refactored
method. Indeed, calls to such methods can occur in different scenarios, meaning
that the sequence diagrams for all such scenarios must be modified.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Specifying Coherent Refactoring of Software Artefacts 119
latter vertex to the other two, one with the role of specialization, the other a
generalization. In JavaML, a superclass vertex, with a name attribute, constitutes
a leaf of the tree rooted in the class vertex. In such a case, IG would contain only
a CLASS vertex, mapping to a class vertex in AST through µAST, and to a class
vertex in UML through µUML. The definition of the morphisms requires checking
that the superclass relation is consistently represented in the two graphs. A
similar situation occurs for the implements construct.
As concerns behavioral aspects, the method vertex in JavaML contains all the
information present in the code to characterize the method, in particular its
signature and its body. However, in the UML metamodel, this information is
distributed across an Operation vertex, maintaining information about the signa-
ture, and a Method vertex which simply contains the code of the method body. As
regards the signature, similarly to before, we relate Method and Operation
vertices and we check the agreement of the type information, without associating
the type subvertices for method to the Classifier vertices describing those types in
UML. This is due to the fact that a type vertex is present in JavaML every time
it is necessary, but, in a UML diagram, it needs to be present only once, and
associated with other vertices an arbitrary number of times.
To model not only the static declaration of a method, but also its behavior through
collaboration, sequence, state, or activity diagrams, we recur to action seman-
tics as defined in OMG (2003). Here, a Method is associated with a Procedure,
which has a Composition relation with an Action vertex. We put such an Action in
correspondence with the stmt-elems vertex, usually a block, which is the root of
the subtree for the description of the method vertex. In general, we want to put
into relation semantically equivalent elements, so we will consider the different
types of Action that can be associated with stmt-elems. A major difference exists,
though. The JavaML file presents the stmt-elems of a block in an order which
corresponds to the sequence of statements in the original code. The UML model
on the other hand, does not require an order to be specified for independent
actions. Control flow actions indeed exist, such as ConditionalAction or LoopAction,
and idioms such as Iteration can be expressed. However, actions not related
through some chain of DataFlow objects, need not be realized in any given order.
If desired, though, the modeler can prescribe the existence of ControlFlow
objects, defining predecessor and successor Actions.
The process of building such correspondences, i.e., of introducing elements in the
interface graph and establish the morphisms from this to the code and model
graphs can be modeled by rewriting rules. Figure 15 shows two local rules whose
distributed application on the code and model graph, respectively, produces the
following effect: if there are, both in the code and model graph, elements
representing a class s which is a superclass for a class c whose representations
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
120 Bottoni, Parisi-Presicce, and Taentzer
: java-class-file : java-class-file
:Generalization :Generalization
in the two graphs have already been put in correspondence, as witnessed by the
identifiers c1 and c1’ for the two instances, then the two representations of class
s are put in correspondence, as witnessed by the generation of the identifiers c2
and c2’.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Specifying Coherent Refactoring of Software Artefacts 121
Well-Formedness Constraints
This kind of behavior preservation is not sufficient to ensure that the resulting
graph is an acceptable code or model graph. Well-formedness constraints are
needed to rule out undesired configurations of the produced graph (instance of
the type graph). For example, we have seen the requirement that no names,
whether for variable or method, are in conflict in any class.
Refactoring-specific constraints addresses the problem of unwanted side ef-
fects. These constraints can be expressed with pre and/or post-conditions. With
the latter, if the post-condition is not met, the transformation must be “undone”
and the previous model restored. With the former (more efficient) method,
application conditions are checked to prevent the transformation by a refactoring
if it produces unwanted effects. For example, a new method m defined in class
C should not override an existing method m with the same signature in a subclass
of C, or be overridden by an existing method with the same signature defined in
a superclass of C. This constraint is needed, for example, in both sample
refactorings presented in the section, Refactoring by Graph Transformation.
Not all constraints can be expressed by simple “forbidden” graphs. More general
constraints can be defined by using propositional logic (Matz, 2002) to compose
“atomic” constraints, formed by simple forbidden graphs, and injective graph
morphisms describing the conditional existence of graph (sub)structures (Koch,
Parisi Presicce, 2002).
For example, to express the fact that no method of arity one is allowed to have
the same name and a parameter of the same type as another method in the same
class, we can write the formula NOT two_methods_with_same_signature where
the constraint graph is presented in Figure 16. This formula is satisfied only if it
there are not two methods named mnew in a model graph having each exactly one
parameter of the same type.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
122 Bottoni, Parisi-Presicce, and Taentzer
two_methods_with_same_signature:
: Class owner
name = target
feature
owner : Method
name=mnew
feature
parameter
: Method type
name=mnew : Classifier : Parameter
kind = #in
parameter type
: Parameter
kind = #in
Consistent Refactorings
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Specifying Coherent Refactoring of Software Artefacts 123
Conclusions
We have presented a graph transformation-based approach to maintaining
consistency between code and model diagrams in the presence of refactorings.
The approach allows the coordinated transformation of two graphs representing
the abstract syntax, as derived from the code by a parser, and the UML model
of the software system. A correspondence is established between these two
graphs, starting from the correspondence between types of vertices in the
abstract syntax trees, as defined by the JavaML markup language, and types of
elements and associations in the UML diagrams, as defined by the UML meta-
model.
Although the approach has been demonstrated using Java and the JavaML
coding of its abstract syntax, it can be applied to any type of abstract syntax for
object-oriented languages, provided that a non-ambiguous correspondence
between the abstract syntax and the UML model components can be estab-
lished. As a consequence, an integrated tool which is able to perform refactoring
on code and model diagrams while maintaining the original correspondences
between these components is imaginable. This would require integrating the
ability of modern refactoring tools to manipulate ASTs, with a more general
interpreter for transformation units. Indeed, it is not needed that the tool exploits
graph transformations in order to manipulate the tree. As all refactorings are
individually described by a transformation unit, and a tool has a finite number of
them available, it is sufficient that the tree transformation is wrapped. In this way,
the parameters can be communicated to the other parts of a distributed
transformation. If the transformation occurs on a part of the code for which the
corresponding parts of the model have been identified, the relevant modifications
would automatically be performed.
The opposite process could also be envisaged in which a refactoring of a model
would reflect a modification of the corresponding code. This can be easily
performed on structural diagrams, for which we have seen that there is a close
correspondence between elements of JavaML and of the UML meta-model.
Future work will have to identify refactorings in the behavioral diagrams for
which it is possible to identify the needed transformations in the code.
Acknowledgments
Partially supported by the EC under Research and Training Network SeGraVis.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
124 Bottoni, Parisi-Presicce, and Taentzer
References
Badros, G. (2000). JavaML: A Markup Language for Java Source Code. 9th
Int. World Wide Web Conference. JavaML-Homepage. Retrieved from
the WWW: http://www.cs.washington.edu/homes/gjb/JavaML
Bottoni, P., Parisi Presicce, F. & Taentzer, G. (2003). Specifying Integrated
Refactoring with Distributed Graph Transformations. In J. L. Pfaltz, M.
Nagl, & B. Böhlen (Eds.), Applications of Graph Transformations with
Industrial Relevance. Second International Workshop, AGTIVE 2003,
LNCS 3062, Springer, pp. 220-235.
Bottoni, P., Schuerr, A. & Taentzer, G. (2000). Efficient Parsing of Visual
Languages based on Critical Pair Analysis (and Contextual Layered Graph
Transformation). Proc VL 2000, (pp. 59-61).
Corradini, A. Montanari, U. Rossi, F. Ehrig, H., Heckel, R. & Löwe, M. (1997).
Algebraic approaches to graph transformation part {I}: Basic concepts and
double pushout approach. In G. Rozenberg (Ed.), Handbook of Graph
Grammars and Computing by Graph transformation, Vol. 1. World
Scientific, 163-246.
Fischer, I., Koch, M., Taentzer, G. & Volle, V. (1999). Visual Design of
Distributed Systems by Graph Transformation. In H. Ehrig, H.J. Kreowski,
U. Montanari & G. Rozenberg (Eds.), Handbook of Graph Grammars
and Graph Transformation, (Vol. 3, pp. 269-340).
Fowler, M. (1999). Refactoring: Improving the Design of Existing Pro-
grams. New York: Addison-Wesley.
Heckel, R. & Wagner, A. (1995). Ensuring Consistency in Conditional Graph
Grammars: A Constructive Approach. Proceedings of SEGRAGRA’95,
ENTCS, Vol.2. Retrieved from the WWW: http://www.elsevier.nl/
locate/entcs/volume2.html
Koch, M. & Parisi Presicce, F. (2002). Describing policies with graph con-
straints and rules. In A. Corradini, H. Ehrig, H.J. Kreowski & G. Rozenberg
(Eds.), Proc. ICGT 2002, LNCS 2505, Springer, 223-238.
Kreowski, H.J., Kuske, S. & Schürr, A. (1997). Nested graph transformation
units. International Journal on Software Engineering and Knowledge
Engineering, 7(4), 479-502.
Matz, M. (2002). Design and Implementation of a Consistency Checking
Algorithm for Attributed Graph Transformation. (In German.) Diploma
Thesis, Technical University of Berlin.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Specifying Coherent Refactoring of Software Artefacts 125
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
126 Bottoni, Parisi-Presicce, and Taentzer
Section II
Elaboration of
Transformation
Approaches
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Declarative Transformation for Object-Oriented Models 127
Chapter VI
Declarative
Transformation for
Object-Oriented
Models
Keith Duddy, CRC for Enterprise Distributed Systems Technology
(DSTC), Queensland, Australia
Abstract
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
128 Duddy, Gerber, Lawley, Raymond, and Steel
Introduction
In Model-Driven Architecture - A Technical Perspective (2001), the Object
Management Group (OMG) describes an approach to enterprise-distributed
system development that separates the specification of system functionality
from the specification of the implementation of that functionality on a specific
technology platform. The MDA approach envisions mappings from Plat-
form Independent Models (PIMs) to one or more Platform Specific Models
(PSMs).
The potential benefits of such an approach are obvious: support for system
evolution, high-level models that truly represent and document the implemented
system, support for integration and interoperability, and the ability to migrate to
new platforms and technologies as they become available.
While technologies such as the Meta Object Facility (MOF v1.3.1, 2001) and the
Unified Modelling Language (UML, 2001) are well-established foundations on
which to build PIMs and PSMs, there is as yet no well-established foundation
suitable for describing how we take an instance of a PIM and transform it to
produce an instance of a PSM.
In addressing this gap, our focus is on model-to-model transformations and not
on model-to-text transformations. The latter come into play when taking a final
PSM model and using it to produce, for example, Java code or SQL statements.
We believe that there are sufficient particular requirements and properties of a
model-to-text transformation, such as templating and boilerplating, that a
specialised technology can be used. One such technology is Anti-Yacc (Hearnden
& Raymond, 2002) and we deal briefly with such concrete syntax issues later in
the chapter.
This chapter focuses on a particular program transformation language, designed
specifically for use with object-oriented models and programming languages.
We provide an overview of the general problem of software model transforma-
tion and survey some technologies that address this space. The technology we
then describe is designed to satisfy a set of identified requirements and is
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Declarative Transformation for Object-Oriented Models 129
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
130 Duddy, Gerber, Lawley, Raymond, and Steel
• A program may traverse the model using CORBA or Java interfaces, and
populate another model in a different repository.
• Partial transformations of data may be described in the CWM.
In developing our response to the QVT RFP, the authors considered a number
of alternative approaches (Gerber, Lawley, Raymond, Steel & Wood, 2002). The
results, along with a review of other submissions to the QVT RFP, are
summarised below.
Chapter 13 of the OMG’s Common Warehouse Metamodel Specification (2001)
defines a model for describing transformations. It supports the concepts of both
black-box and white-box transformations. Black-box transformations only asso-
ciate source and target elements without describing how one is obtained from the
other. White-box transformations, however, describe fine-grained links between
source and target elements via the Transformation element’s association to a
ProcedureExpression. Unfortunately, because it is a generic model and re-
uses concepts from UML, a ProcedureExpression can be expressed in any
language capable of taking the source element and producing the target element.
Thus CWM offers no actual mechanism for implementing transformations,
merely a model for describing the existence of specific mappings for specific
model instances.
Varró and Gyapay (2000) and Varró, Varraó and Pataricza (2002) describe a
system for model transformation based on Graph Transformations (Andries et
al., 1999). In their approach, a transformation consists of a set of rules combined
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Declarative Transformation for Object-Oriented Models 131
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
132 Duddy, Gerber, Lawley, Raymond, and Steel
NamedElt
name : String
Named
1 -name : String
Classifier
isAbstract : boolean
type
0..1 key *
sub *
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Declarative Transformation for Object-Oriented Models 133
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
134 Duddy, Gerber, Lawley, Raymond, and Steel
detailed list of these is presented in our response (DSTC, IBM & CBOP, 2003)
to the OMG’s QVT RFP.
The major functional requirements are as follows. A model-transformation
language must be able to:
• match elements, and ad-hoc tuples of elements, by type (include instances
of sub-types) and precise-type (exclude instances of sub-types);
• filter the set of matched elements or tuples based on associations, attribute
values, and other context;
• match both collections of elements not just individual elements. For
example, we may need to count the number of Attributes a Class has;
• establish named relationships between source and target model elements.
These relationships can then be used for maintaining traceability informa-
tion;
• specify ordering constraints (of ordered multi-valued attributes or ordered
association links), either when matching source elements or producing
target elements;
• handle recursive structure with arbitrary levels of nesting. For example, to
deal with the subclassing association in our example Class model;
• match and create elements at different meta-levels;
• support both multiple source extents and multiple target extents.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Declarative Transformation for Object-Oriented Models 135
Our experiences have shown that there are three fairly common styles to
structuring a large or complex transformation, reflecting the nature of the
transformation. They are:
• Source-driven, in which each transformation rule is a simple pattern (often
selecting a single instance of a class or association link). The matched
element(s) are transformed to some larger set of target elements. This style
is often used in high-level to low-level transformations (e.g., compilations)
and tends to favour a traversal style of transformation specification. This
works well when the source instance is tree-like, but is less suited to graph-
like sources;
• Target-driven, in which each transformation rule is a complex pattern of
source elements (involving some highly constrained selection of various
classes and association links). The matched elements are transformed to a
simple target pattern (often consisting of a single element). This style is
often used for reverse-engineering (low-level to high-level) or for perform-
ing optimizations (e.g., replacing a large set of very similar elements with
a common generic element);
• Aspect-driven, in which the transformation rule is not structured around
objects and links in either the source or target, but more typically around
semantic concepts, e.g., transforming all imperial measurements to metric
ones, replacing one naming system with another, or the various parts of the
object-relational transformation described above.
A Declarative Object-Oriented
Transformation Language
We describe a declarative object-oriented transformation environment that
satisfies the requirements described in the previous section. We present both a
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
136 Duddy, Gerber, Lawley, Raymond, and Steel
formal model for transformations and a concrete syntax and illustrate the
transformation language through a series of simple examples.
This section presents the transformation language that we have designed to
address the problems faced when realising the MDA, by illustrating how the
language would be used to solve the object-relational mapping problem at hand.
A transformation in our language consists of the following major concepts:
transformation rules, tracking relationships, and pattern definitions.
• Transformation rules are used to describe the things that should exist in
a target extent based on the things that are matched in a source extent.
Transformation rules can be extended, allowing for modular and incremen-
tal description of transformations. More powerfully, a transformation rule
may also supersede another transformation rule. This allows for general-
case rules to be written, and then special-cases dealt with via superseding
rules. For example, one might write a naive transformation rule initially,
then supersede it with a more sophisticated rule that can only be applied
under certain circumstances. Superseding is not only ideal for rule optimi-
zation and rule parameterization, but also enhances reusability since
general purpose rules can be tailored after-the-fact without having to
modify them directly.
• Tracking relationships are used to associate a target element with the
source elements that lead to its creation. Since a tracking relationship is
generally established by several separate rules, other rules are able to
match elements based on the tracking relationship independently of which
rules were applied or how the target elements were created. This allows
one set of rules to define what constitutes a particular relationship, while
another set depends only on the existence of the relationship without
needing to know how it was defined. This kind of rule decoupling is essential
for rule reuse via extending and superseding to be useful.
Establishing and maintaining Tracking relationships is also essential for
supporting round-trip development and the incremental propagation of
source-model updates through the transformation to the target model(s).
• Pattern definitions are used to label common structures that may be
repeated throughout a transformation. A pattern definition has a name, a set
of parameter variables, a set of local variables, and a term. Parameter
variables can also be thought of as formal by-reference parameters.
Pattern definitions are used to name a query or pattern-match defined by
the term. The result of applying a pattern definition via a pattern use is a
collection of bindings for the pattern definition’s parameter variables.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Scope -scope
-representation
-compoundTerm 0..1 -expr -use
* -tgt
MofFeature
-extraction
-feature
Declarative Transformation for Object-Oriented Models
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
137
138 Duddy, Gerber, Lawley, Raymond, and Steel
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Declarative Transformation for Object-Oriented Models 139
The first key element of our transformation is that a Class will be transformed
into a Table with an object-id Column, so this becomes our first rule. We also
want to make sure that we preserve a tracking relationship between the table we
create and the class from which we create it. The next major mapping, from an
Attribute to a Column, is similar, as is the rule for DataTypes. As such, we start
with the following simple rules:
RULE class2table
FORALL Class Cls
MAKE Table Tbl, Column idCol,
idCol.name="id", Col.owner=Tbl
LINKING Cls to Tbl by c2t;
RULE attr2col
FORALL Attribute Att
MAKE Column Col
LINKING Att to Col by a2c;
Both Class and Attribute are subtypes of NamedElt, and we want their names
to be mapped to the names of their corresponding Tables and Columns. We can
make sure we have the right Class-Table or Attribute-Column pair by looking up
the tracking relationships we established earlier. We can then write a rule from
an OO NamedElt to a Relational Named like this:
RULE named2named
FORALL NamedElt n1
WHERE c2t LINKS n1 to n2
OR a2c LINKS n1 to n2
MAKE Named n2,
n2.name = n1.name;
We see here that trackings can be used to tie rules together, thus giving us the
ability to express rules as fine-grained mappings rather than having to write
complex, coarse-grained rules.
However, further inspection of our class diagram reveals that DataType names
must also be mapped. Rather than adding another OR clause to our rule, we
introduce generalization to our tracking relationships. So, we make another
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
140 Duddy, Gerber, Lawley, Raymond, and Steel
tracking relationship that stands as a superset of the two we have already used,
and look up the parent tracking rather than alternating over the children, like so:
RULE named2named
FORALL NamedElt n1
WHERE named2named LINKS n1 to n2
MAKE Named n2, n2.name=n1.name;
Next, we need to make sure that the column resulting from the transformation
of an attribute will be contained by the appropriate table, i.e., the table resulting
from the transformation of the attribute’s containing class. We do this by again
looking up the tracking relationships established in our earlier rules. This gives
us the following rule:
RULE clsAttr2tblCol
FORALL Attribute Att, Class Cls
WHERE Att.owner = Cls
AND c2t LINKS Cls to Tbl
AND a2c LINKS Att to Col
MAKE Table Tbl, Column Col,
Col.owner = Tbl;
We already have a rule for transforming Attributes. However, we now find that
we wish to transform multi-valued attributes differently. The values of a multi-
valued attribute will be stored in a separate table, with one column for the values
and one column for the Class’s object-id.
This new rule for attributes will need to match a subset of the cases that were
true for the previous rule, and we can reuse the earlier Attribute rule’s matching
pattern by using rule extension. However, we also want to indicate that the
earlier Attribute rule should not run when this new Attribute rule runs, and we
can do this using rule supersession.
So now we have a rule for transforming Attributes to Columns, and another for
linking the Column to a Table. However, we find that we want to map multi-
valued attributes differently. The Column for a multi-valued Attribute should
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Declarative Transformation for Object-Oriented Models 141
instead have its own Table, with another Column to link back to the key in the
main Table for the Class. Therefore, we make a new rule that will supersede the
rule that puts Columns in Tables, and link the Attribute’s Column to a new Table
with a new key Column.
Having created and placed these Columns, we need to give them an appropriate
type. So we need rules for mapping DataTypes to Types, and for assigning the
appropriate Type to a Column. The latter case requires two rules, since an
Attribute with a Class type is typed for a key value, but an Attribute with a
DataType type is mapped for the corresponding Type.
RULE datatype2type
FORALL DataType Dt
MAKE Type T
LINKING Dt to T by dt2t;
RULE atype2ctype
FORALL Attribute Att, DataType Dt
WHERE a2c LINKS Att to Col
AND dt2t LINKS Dt to T
AND Att.type = Dt
MAKE Column Col, Type T,
Col.type = T;
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
142 Duddy, Gerber, Lawley, Raymond, and Steel
RULE actype2ctype
FORALL Attribute Att, Class C
WHERE Att.type = C
AND a2c LINKS Att to Col
MAKE Column Col, Type T,
Col.type = T, T.name = "String";
PATTERN hasAttr(C, A)
FORALL Class C, Attribute A, Class C2
WHERE A.owner = C
OR (C.super = C2 AND hasAttr(C2, A));
Having defined this pattern, we can make a rule for creating a column for each
inherited attribute. To handle the linking of these columns to their tables, we need
to change the Attribute to Column tracking to include Class as a source, by
modifying the earlier rules, attr2col and clsAttr2tblCol. The new rule,
as well as these modified rules, is below:
RULE superattr2col
FORALL Attribute Att, Class Cls
WHERE hasAttr(Cls, Att)
AND c2t LINKS Cls to Tbl
MAKE Table Tbl, Column Col
LINKING Att, Cls to Col by a2c;
RULE attr2col
FORALL Attribute Att, Class C
WHERE Att.owner = C
MAKE Column Col
LINKING Att, Cls to Col by a2c;
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Declarative Transformation for Object-Oriented Models 143
RULE clsAttr2tblCol
FORALL Attribute Att, Class Cls
WHERE c2t LINKS Cls to Tbl
AND a2c LINKS Att, Cls to Col
MAKE Table Tbl, Column Col,
Col.owner = Tbl;
Advanced Transformations
As with expert systems, a substantial transformation embodies a significant
investment in capturing domain knowledge and, therefore, the careful organisation
and structuring of the transformation will aid its long-term maintenance and
evolution.
Several features of the transformation language described in this chapter are key
to supporting both re-use and maintenance of transformation definitions. These
features are the supersedes and extends relationships, and dynamic typing of
variables.
Duddy, Gerber, Lawley, Raymond & Steel (2003) describe in detail how
superseding and extending can be used in the context of a transformation to an
Entity Java Bean (EJB) model. Specifically, they show how a mapping that
results in remote access to EntityBeans can be modified to instead employ the
Session Façade pattern (Brown, 2001) using a SessionBean that delegates
methods to local EntityBeans. One could also use an extra source model as a
parameterisation, or marking model, to provide finer grain control over which
rules are applied to which source model elements.
Dynamic typing simplifies rule writing: if an object bound to a variable does not
have the attribute or reference mentioned in a term, then the term simply
evaluates to false rather than requiring explicit runtime-type introspection and
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
144 Duddy, Gerber, Lawley, Raymond, and Steel
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Declarative Transformation for Object-Oriented Models 145
typically addressing issues such as horizontal and vertical white space, and
delimiters.
Finally, it would often be very useful to be able to visualise or edit a graphical
representation of the models being transformed. However, since much of the
time their metamodels may be purpose-designed and therefore have no standard
graphical representation (let alone a tool to display/edit the model), it would be
extremely useful to be able to generate such a tool in a manner analogous to the
HUTN approach. That is, to employ a set of standard, simple visual concepts
(box, line, label, containment, proximity, etc.) to render a given model. Such a tool
is currently under development by the authors.
Conclusions
In this chapter we have introduced the problem of model-to-model transforma-
tion for the purpose of building distributed systems from high-level models
describing the system to be built in platform-independent terms then generating
the system implementation for a particular, technology specific, platform. This
is the vision embodied in the OMG’s Model Driven Architecture (MDA).
We have described the functional and non-functional design requirements
identified for a language suitable for writing transformation definitions and
presented a language satisfying these requirements along with examples of a
usable, familiar concrete syntax for the language. We have also briefly touched
on issues relating to advanced transformations and mentioned a number of
additional technologies required for dealing with textual and graphical forms of
the models.
It should be noted that the transformation language presented here is evolving as
we gain further experience and as a result of the OMG’s RFP process. In
particular, influenced by the Compuware/Sun submission, we are extending the
concept of Trackings to more closely approximate a class model. Also, compo-
sition of transformations is essential for the use and extension of existing
transformations. While there is no explicit mention of this in the language
presented here, the ability to reference elements in one MOF model from another
MOF model should be sufficient for simple composition of transformations.
However, more sophisticated forms of composition, such as producing a
transformation that maps A to C from one that maps A to B and one that maps
B to C or producing a transformation that merges A and B to produce C from the
A to B and B to C transformations, is the subject of future research.
Additionally, the transformations discussed in this chapter have generally dealt
with transformation in a single direction, from model A to model B. Another use
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
146 Duddy, Gerber, Lawley, Raymond, and Steel
References
A Human-Usable Textual Notation for the UML Profile for EDOC: Request
for Proposal. (1999). OMG Document ad/99-03-12.
Alcatel, Softeam, Thales & TNI-Valiosys. (2003). Response to the MOF 2.0
Queries/Views/Transformations RFP. OMG document ad/03-03-35.
Andries et al. (1999). Graph transformation for specification and programming.
Science of Computer Programming, 34(1), 1-54.
Boldsoft, Rational Software Corporation, IONA & Adaptive Ltd. (2003).
Response to the UML 2.0 OCL RFP. OMG Document ad/2003-01-02.
Brown, K. (2001). Rules and Patterns for Session Facades. IBM’s WebSphere
Developer Domain. Retrieved from the WWW: http://www.boulder.
ibm.com/wsdd/library/techarticles/0106_brown/sessionfacades.html
Codagen Technologies Corporation. (2003). MOF 2.0 Query/Views/Transfor-
mations. OMG Document ad/2003-03-23.
Common Warehouse Metamodel. (CWM) Specification (2001). OMG Docu-
ments ad/01-02-01, ad/01-02-02, ad/01-02-03.
Compuware Corporation & Sun Microsystems. (2003). XMOF Queries Views
and Transformations on Models using MOF, OCL and Patterns. OMG
Document ad/2003-03-24.
DSTC, IBM, & CBOP. (2003). MOF Query/Views/Transformations Initial
Submission. OMG Document ad/2003-02-03.
Gerber, A., Lawley, M., Raymond, K., Steel, J. & Wood, A. (2002). Transfor-
mation: The Missing Link of MDA. Proceedings of the First Interna-
tional Conference on Graph Transformation (ICGT’02), Barcelona,
Spain (pp. 90-105). LNCS 2505.
Hearnden, D. & Raymond, K. (2002). Anti-Yacc: MOF-to-text. Proceedings
of the Sixth IEEE International Enterprise Distributed Object Comput-
ing Conference, Lausanne, Switzerland. IEEE.
Human-Usable Textual Notation (HUTN) Specification. (2002). OMG Docu-
ment ptc/02-12-01.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Declarative Transformation for Object-Oriented Models 147
Johnson, S. (1974). YACC - Yet Another Compiler Compiler. CSTR 32, Bell
Laboratories.
Meta Object Facility (MOF) v1.3.1. (2001). OMG Document: formal/01-11-02.
Model Driven Architecture - A Technical Perspective. (2001). OMG Docu-
ment ormsc/01-07-01.
MOF 2.0 Queries/Views/Transformations: Request for Proposal. (2002).
OMG Document ad/02-04-10.
Peltier, M., Bezevin, J. & Guillaume, G. (2001). MTRANS: A general frame-
work, based on XSLT, for model transformations. Proceedings of the
Workshop on Transformations in UML, Genova, Italy.
Peltier, M., Ziserman, F., & Bezevin, J. (2000). On levels of model transforma-
tion. In XML Europe 2000, Paris.
Unified Modelling Language v1.4. (2001). OMG Document: formal/01-09-
67.
Varró, D. & Gyapay, S. (2000). Automatic Algorithm Generation for Visual
Control Structures. Retrieved February 8, 2001 from the WWW: http://
www.inf.mit.bme.hu/FTSRG/Publications/TR-12-2000.pdf
Varró, D., Varraó, G. & Pataricza, A. (2002). Designing the Automatic
Transformation of Visual Languages. Science of Computer Program-
ming, 44(2), 205-227.
XMI Production of XML Schema. (2001). OMG Document ptc/2001-12-03.
XML Metadata Interchange (XMI) Version 1.2. (2002). OMG Document
formal/2002-01-02.
XSL Transformations (XSLT) v1.0. (1999). W3C Recommendation. Retrieved
from the WWW: http://www.w3.org/TR/xslt
Endnote
* Jim Steel is now at INRIA/Irisa, University of Rennes 1, France.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
148 Badia
Chapter VII
From Conceptual
Models to Data Models
Antonio Badia, University of Louisville, USA
Abstract
This chapter describes transformations between conceptual models (mainly
entity-relationship diagrams and also UML) and data models. It describes
algorithms to transform a given conceptual model into a data model for a
relational, object-relational, object-oriented and XML database. Some
examples are used to illustrate the transformations. While some
transformations are well known, some (like the transformation into XML or
into object-relational schemas) have not been investigated in depth. The
chapter shows that most of these transformations offer options which
involve important trade-offs that database designers should be aware of.
Introduction
Conceptual models aim at capturing the structure of reality, are high-level and
computer independent. Data models, on the other hand, aim at representing
reality in the computer, and are therefore less abstract. It is assumed that, in
creating an Information System, a conceptual model will be developed as part of
the Requirements Specification, from which a data model will be derived later on,
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Models to Data Models 149
in the Design phase (Davis, 1993). Thus, mappings between conceptual models
and data models are one of the most vital transformations in the development of
an Information System (Elmasri and Navathe, 2003). The purpose of this chapter
is to present transformations between conceptual models and data models.
Transformations between well known and used conceptual models (Entity-
Relationship diagrams and UML Class Diagrams) and the most common and
important data models (relational and object-oriented) have been developed and
are well understood. However, new data models like XML and the Object-
Relational data model are not included in these mappings. Translation into XML
is the focus of some research lately, but this is a relatively new area and not much
work has been done yet. Translation into Object-Relational databases is a
virtually unexplored topic, perhaps because it is felt that existing mappings into
the (pure) relational models are easy to extend to this case. However, Object-
Relational databases provide options to the designer that are not available in the
relational case. Therefore, some guidance is needed for the choices that appear
in the mapping process.
In this chapter, we review existing mappings and extend them to include these
new data models. We start with a review of the basic concepts, to establish some
vocabulary and make the chapter self-contained, followed by a description of
recent work in the area, including new and existing translations. One of the
purposes of the chapter is to put all of this research in a wider perspective and
examine the different approaches, something that is missing from the current
literature.
Background
For lack of space, we do not discuss conceptual or data models in depth; we
assume the reader is familiar with the basic ideas. However, we review some
basic concepts to establish a vocabulary.
Conceptual Models
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
150 Badia
(1,1) (1,M)
street
Employee Department
(1,M)
pname
(1,M) (1,M)
types are depicted as rectangles with a name inside, attributes as ovals, and
relationships as lines with a diamond shape on them.
Entity types represent things either real or conceptual. They denote sets of
objects, not particular objects; in this respect they are close to classes in object-
oriented models. The set of objects modeled by an entity type are called its
extension; particular objects are called entities.
Relationships are connections among entity types. Relationships may involve any
number of entity types; those involving two entity types (called binary relation-
ships) are the most common. However, n-ary relationships (involving n > 2 entity
types) are also possible (in particular, relationships relating one entity type to
itself are allowed). Relationships are fundamental in an E-R model in that they
carry very important information, in the form of constraints: participation
constraint tells us whether all objects in the extension of an entity type are
involved in the relationship, or whether some may not be. For example,
Department and Employee have a relationship Works-for between them. If
all employees work for some department, then participation of Employee in
Works-for is total. However, if there can be employees which are not assigned
to a particular department, then participation is partial. Cardinality constraint
tells us how many times an object in the entity type’s extension may be involved
in a relationship, and allows us to classify binary relationship as one-to-one,
one-to-many or many-to-many. There are several notations to state con-
straints in an E-R diagram. The one chosen here associates with each entity type
E and relationship R a pair of numbers (min,max), where min represents the
minimum number of times an entity in E appears in R (thus, min represents the
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Models to Data Models 151
participation constraint by being 0 for partial and 1 for total), and max represents
the maximum number of times an entity in E appears in R (thus, max represents
the cardinality constraint by being 1 for one-to relationships and greater than 1
for many-to relationship; the latter case is traditionally represented by using the
letters M or N). Thus, the (1,1) by Employee and Works-For indicates that all
employees work for exactly one department; the (0,M) by Department and
Manages indicates that not all departments manage projects, but those that do
may manage more than one.
Entity types and relationships may have attributes, which are properties with a
value. Attributes convey characteristics or descriptive information about the
entity type or relationship to which they belong. Attributes may be simple or
composite (made up of simpler parts, like the attribute Address of entity
Department in the example, which is made up of parts named street, city
and zip), single or multivalued (being capable of having one or several values
for a particular entity; multivalued attributes are displayed as dual ovals, like
locations in our example, meaning that some department may have multiple
locations), primitive or derived (a derived attribute value is computable from
other information in the model). A key attribute is an attribute whose value is
guaranteed to exist and be different for each entity in the entity type. Therefore,
this attribute (primary key) is enough to point out a particular entity. All entity
types are assumed to have at least one key attribute.
A contentious issue is whether attributes are required (i.e., every entity of the
type must have a value for each attribute of the type) or optional (i.e., some
entities of the type may or may not have values for some attributes). Different
authors take different views on this issue, some even arguing that it is a mistake
to consider attributes optional (Bodart et al., 2001). Since this has an impact
when transforming E-R models into different data models, we will point out how
to deal with each view.
Some E-R models admit weak entities, entities with no key attributes; these
entities are connected by a one-to-many relationship to a regular entity, called the
strong entity. What characterizes a weak entity is its lack of clear identity
(reflected in the lack of a key) and its dependence for existence on the strong
entity. As a typical example, entity Employee has an associated weak entity
Dependent (shown as a double box). Clearly, an employee may be associated
with several dependents (hence the one-to-many relationship), and if an em-
ployee is deleted from the model (say the employee is fired), then the associated
dependents also go away.
Many proposals for additional features have been made over the years. The most
successful one is the addition of class hierarchies, by introducing IS-A (class/
subclass) relations between entities. This addition, obviously motivated by the
success of object-oriented methods for analysis, allows the designer to recognize
commonalities among entities; usually this means shared attributes exist. Shared
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
152 Badia
attributes are removed and put together in a new entity (class) that is a
generalization of the others, and a class-subclass relationship is created. As in
object-oriented approaches, inheritance of attributes is assumed. In Figure 1,
entity type Employee has two subtypes, Hourly-employee and Salaried-
Employee. The IS-A relationship is indicated by a downward triangle in the line
joining the involved entity types. The IS-A relationship can be annotated to
distinguish several situations: whether the subclasses are disjoint or not; and
whether the subclasses together cover the superclass (i.e., every entity of the
superclass must also belong to one of the subclasses) or not. Note that both
dimensions are orthogonal to each other; hence, two annotations are needed to
determine the exact situation.
The Unified Modeling Language (UML) (Rumbaugh et al., 1999) is the
conceptual model for object-oriented design. In its current incarnation, it is a
large, complex model made up of several different parts. Here we concentrate
on static diagrams, which are the diagrams explaining the structure of informa-
tion in the system and its interconnections. The previous E-R diagram is given
in Figure 2 as a UML diagram for illustration purposes.
A static or class diagram has many similarities to an E-R diagram. The world
is made up of classes, which are the equivalent of entity types. As in E-R models,
classes are defined based on their attributes. Composite attributes can be
modeled by a structured domain, a domain with an internal structure (as an
example, see address in Department). In theory, a multivalued attribute
requires its own separate class; however, modern versions of UML allow
specifying a cardinality constraint on the link between attribute and class. This
seemingly minor point is very important since it gives the model the flexibility to
deal with optional attributes (by setting minimum cardinality to 0) and multivalued
attributes in a uniform framework, similar to XML (however, this is far from an
accepted norm; see next subsection). It is usual in class diagrams to give a data
type for attributes, and it is possible to give them a default value. In UML,
classes also have methods, procedural attributes which describe the behavior of
the objects of the class. Methods, though, cannot be completely specified in the
conceptual model because they are not declarative, but procedural (i.e., code).
In a class diagram, classes are depicted as boxes divided into three parts: the top
one contains the class name; the bottom one, methods; and the middle one,
attributes.
Relationships are also present in UML, although there they are called associa-
tions. Associations are displayed as lines connecting the involved classes.
Associations can have attributes (displayed on a box related to the association
by a dashed line). However, unlike relationships, associations cannot involve
more than two classes. Therefore, if a relationship relates more than two classes,
it must be reified into a class (which is then connected by binary associations to
the related classes; see Review in Figure 2); the resulting class is sometimes
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Models to Data Models 153
Department
Employee
1..1 1..N
dname
ssn 1..N
address:{street,
name
1..N city,zip}
Reviews
1..1
Salaried 1..N
Hourly Project
Employee
Employee
pname
hourly rate salary budget
called an association class. Associations can have roles and cardinality and
participation constraints, though. Two special kinds of association are aggrega-
tion and composition. Their semantics are somewhat vague; aggregation is
supposed to represent a part-whole relationship, while composition is supposed
to be somewhat stronger then aggregation, in that an object contains another one
as a part. The main difference seems to be that objects that stand in a composition
association have a lifetime dependence: when an object that contains another is
destroyed, the contained objects must be destroyed, too (in some versions,
composition may involve an arbitrary number of objects). Thus, what we called
weak entities in the E-R model should be captured as a class associated by
composition with another class. However, the semantics of UML are informal
and fail to capture the special meaning that a part-whole relationship has
(Moschnig-Pitrik et al., 1999). Aggregations are displayed with a hollow
diamond, and associations with a filled diamond. Another special association is
the one between class and subclass, called a generalization. This is displayed
as an arrowhead.
Data Models
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
154 Badia
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Models to Data Models 155
collection type. Atomic objects are not objects without internal structure; they
correspond to atomic or structured literals. For each object, properties (i.e.,
attributes and relationships) and operations are specified.
The Object-Relational (also called extended-relational, or universal) data
model is exemplified by the latest version of SQL, SQL-99 (Melton, 2003). It can
be seen as an attempt to capture as many as possible of the object-oriented
concepts introduced in the previous subsection and wrap them around a
relational shell. A more modest view of it regards the model as extending the base
of the relational model (instead of the model itself) by making it easier to add
more complex data types to serve as domain definitions. Here we will describe
the basics of the standard (Melton, 2003), since each commercial DBMS has its
own version of the model, with different names and different syntax.
The basic idea is to substitute domains by (possibly complex) types, called User
Defined Types (UDTs). The name comes from the fact that the model provides
constructors so that users can define their own types as needed by an application;
the emphasis is in extendibility. UDTs come in two types, distinct types and
structured types. Distinct types are based on a single built-in data type. Distinct
types are not compatible with any other type (including the one they are based
on). The following is an example of an UDT called age based on the built-in type
integer:
CREATE TYPE age AS INTEGER (CHECK age BETWEEN 0 and 100) FINAL;
FINAL refers to whether type can be extended with subtypes: distinct types
cannot, structured ones can. Optionally, operations and comparisons can be
defined for types. Structured types can have internal structure, with parts called
attributes. Attributes do not have to be built-in; they may be complex (SQL99
offers “built-in” structured types ARRAY and ROW), but cannot be of the type
being defined (i.e., recursion is not allowed). Structured types do not have
identity. One cannot specify constraints on the attributes of a structured type.
It is possible to define a hierarchy of types by defining a new type UNDER
another; (single) inheritance applies. Types may be NOT INSTANTIABLE
(cannot have values of that type; this is used for abstract superclasses).
Structured types can be used as columns of tables, and also as tuples of tables.
A typed table is a table whose rows are of a structured type. The attributes of
the type become attributes of the table. Typed tables also have inheritance
hierarchies, corresponding to the hierarchies of the types. Subtables cannot have
primary keys, but supertables can. They can also have UNIQUE, NOT NULL,
CHECK and integrity constraints. There is also a self-reference value created
automatically, unique for each row of the table. SQL99 provides REFERENCE
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
156 Badia
types, which give structured types an id. Maximal supertables must specify the
self-referencing column (it is inherited by subtables, which cannot specify such
a column on their own). References can be generated by system, by using some
built-in type, or by using some attributes of the type.
Semistructured data is assumed not to have a strict type, but to posses
irregular, partial organization (Abiteboul et al., 1999). Because, in addition, the
data may evolve rapidly, the schema for such data is usually large, dynamic, and
is not strictly respected. Data models have been proposed that try to adapt to
those characteristics. Several of them have in common that they can be thought
of as directed, labeled graphs (trees, in some cases) (Papakonstantinou et al.,
1995). In this model, nodes are objects, and edges point to other objects which
are components of the given object. However, this model, while very general,
gives very few constraints as to what the data looks like. In practice, some
information about the components of objects is usually available. Thus, in most
cases the term semistructured data is reserved for data that does have some
(loose) structure. XML (Bray et al., 1999) provides the tools to describe such
information, either through a DTD or an XML Schema (note, however, that XML
has additional properties over semistructured data — order, attributes — while
semistructured data does not assume the existence of any schema. Hence, both
terms are, strictly speaking, different).
Basically, in XML an object may have attributes and elements. Simple parts of
an object can be represented by attributes or elements, while complex parts are
represented by elements, since they in turn are also objects. Elements are
defined by a pair of matching start and end tags. Elements may have embedded
subelements, indicated by nested pairs of tags (empty elements are also
allowed). A root element may be declared which is not a subelement of any other
element. Subelements have associated information to indicate a cardinality
constraint, in the form of a pair of attributes minOccurs and maxOccurs: none
or several occurrences (a Kleene star, indicated by ‘*’ in DTDs) is represented
by setting minOccurs to 0 and maxOccurs to ‘’unbounded’’; none or one
occurrences (indicated by ‘?’ in DTDs) is represented by setting minOccurs to
0 and maxOccurs to 1; one or several occurrences (indicated by ‘+’ in DTDs)
is represented by setting minOccurs to 1 and maxOccurs to ‘’un-
bounded’’. Finally, exactly one occurrence (no mark in DTDs) is represented
by setting minOccurs and maxOccurs to 1. There is also a “choice” construct
(indicated by ‘|’ in DTDs) which represents union of types.
Attributes in XML are associated with types, not with elements, and are
expressed as an equation attribute name = value within the starting tag of the
corresponding element; the names must be unique within an element. Attributes
can be mandatory (expressed with the keyword #REQUIRED) or optional
(expressed with the keyword #IMPLIED). Keywords ID and IDREF are used to
establish links among elements, denoting the pointed at and pointing elements in
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Models to Data Models 157
the link, respectively. Atomic elements and attributes are simply of #PCDATA
or CDATA type (this is a string, but is used to represent any atomic type: integer,
...). Attributes are only for complex types; for simple types, there are facets
(restrictions on simple types).
Since in XML, unlike HTML, tags are not pre-defined, one must have an idea of
what tags one may find in a given documents. A Document Type Definition
(DTD) or an XML schema can be used to give information about the components
of an object. An XML document is well-formed when it respects the general
XML syntax (nested, balanced tags), and is valid (with respect to a given XML
schema or DTD) if it respects the tag definitions of that XML schema or DTD.
DTDs do not distinguish between entity types and relationships; relationships are
implicitly expressed through the element-subelement connection. Because rela-
tionships are attached to objects, XML DTDs are much better at handling
hierarchical relations than many-to-many relations (or n-ary relationships (n > 2)
or relationships with attributes), and indeed this is the way most DTDs are used
(Sahuget, 2001).
XML schemas generalize DTDs. An XML schema has XML syntax (start/end
tags), and enforces type checking and references. It classifies elements as
simple type (integers, strings) or complex type (regular expressions, like in
DTDs). For a complex type, it defines the structure. An XML schema also
introduces the notion of Namespace, a collection of tags and their definitions.
There is a notion of key and foreign key. In XML Schema, a key for a complex
type T is declared by giving it a name, a path selector (that leads from the root
to T) and a field (to say which element(s) within T form the key). Note that a key
can be made up by several elements; also, several alternative keys can be
specified. Foreign keys are defined with a keyref keyword and a reference for
the key being denoted, using also a path and a field selector. Therefore, integrity
constraints can be specified in XML Schema.
One can specify mixed content, in which plain text and tags are intertwined.
There is also an Any type that allows anything. Finally, inheritance can be
modeled with derived types. There are two ways to derive: by extension and by
restriction. In the second, types are derived by giving a base type and restricting
the properties (cardinalities, choices) of the elements.
Translations
We divide the translations according to the target data model, as this creates the
more salient differences. We first give an algorithm that uses the E-R model as
source, and then point out the differences needed to handle UML.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
158 Badia
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Models to Data Models 159
Employee(ssn, name)
Dependant(ssn, lname, date-of-birth)
Department(dname,address-street,address-city, address-zip)
Dept-Loc(dname,location)
Project(pname, budget)
Works-for(ssn, dname, start-date)
Manages(dname,pname)
Started(dname,pname, date)
Reviews(ssn,pname,dname).
optional), the only way to get rid of nulls is to break up every table into a set of
tables, each one containing the key of the entity and one of its attributes. This
way, entities with no value for a given attribute simply would have no entry in the
appropriate table, but no nulls would be produced. However, this approach
produces a heavily partitioned database, with most queries requiring a large
number of joins. Hence, the issue is usually ignored in the process.
The above algorithm admits some inlining, as follows: when a binary relationship
R exists between entities E1 and E2, and R is one-to-many or one-to-one, it is
possible to express R not by creating a new table, but by modifying the table
corresponding to the entity on the one side (if R is one-to-one, either entity will
do). Say this entity is E1; then the key of E2 is added to the schema of E1, together
with any attributes R may have. Note that if participation of E1 in the relationship
is not total, this will result in some nulls; hence, this procedure must be used only
when participation of E1 is total or nulls can be dealt with adequately. In the
database of our example, we could create, instead of tables Employee and
Works-for, a single table Employee(ssn,name,dname,start-date).
The procedure must be extended to deal with IS-A links. If entity type E1 is a
subtype of entity type E2, a table must be created for each. However, there are
two options as to how to configure the relations:
1. Add to the table for the subtype (E1) all attributes of the supertype (E2).
That is, repeat all attributes of the schema of E2 in the schema of E1. This
implies that objects that are in subclass E2 will be stored in its own table.
Applied to the E-R model of Figure 1, this approach creates tables
Employee(ssn,name), Hourly-Employee(ssn,name,hourly-rate),
Salaried-Employee(ssn,name,salary). Note that if the two sub-
classes are a cover of the superclass, the table for Employee would not
be necessary (would remain empty); but then, queries asking for all
employees would have no direct reference. Note also that if there were
overlap (although obviously not the case in this example), information
about employees could be duplicated.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
160 Badia
2. Give as the schema of the superclass (E2) only those attributes that are
proper of E2 and in the subclass (E1) only those attributes that are proper
of the superclass. This implies that objects which are in subclass E2 will be
stored in two tables: in the table for E1, the inherited attributes will be taken
care of; in the table for E2, we will put only those attributes that are proper
of E2. Applied to the E-R model of Figure 1, this approach creates tables
Employee(ssn,name), Hourly-Employee(ssn,hourly-rate),
Salaried-Employee(ssn,salary). Note that if the two subclasses
are a cover, the table for the superclass is still needed. Note also that if the
subclasses overlap, there would be no redundancy on this schema.
Which option is better depends entirely on the intended use of the database, (i.e.,
the applications it must support) and on the characteristics of the class/subclass
relationship (overlap, cover). Clearly, queries that refer to the whole superclass
frequently will be more efficient over the first option, while queries that refer
to specific classes more frequently will be more efficient over the second
option.
As for the translation from UML, the basis is exactly the same: each class is
translated into a table, and each association is also translated into a table. Note
that all associations in UML are binary, and hence no rule is needed to handle
other kinds of associations. Note also that when associations have attributes, or
have an arity greater than two, they get promoted to classes (the so-called
association classes); in this case, the class and the associated associations should
be translated as one since they represent a single relationship. As an example,
the class Review in Figure 2 represents a ternary relationship, and is connected
by three binary associations to classes Employee, Department and Project. A
single table should be created, containing the attributes of class Review and
representing the associations by adding the primary keys of Employee, Depart-
ment and Project as foreign keys in the new class. Otherwise (if we treated
association classes like other classes), the resulting database would be techni-
cally correct but would introduce artificial (vertical) partitioning of the database,
which in turn would make querying more expensive by having additional joins to
compute. Finally, in those cases in which attributes have cardinality constraints,
we still follow the same procedure as for the E-R model, simply identifying
attributes as multivalued if their max cardinality is greater than 1 (attributes
whose min cardinality is 0 are considered nullable in the database, but must still
be present in the table).
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Models to Data Models 161
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
162 Badia
When starting with an E-R diagram, all of the above applies. The most important
difference is that there will be n-ary relationships. To translate an n-ary
relationship R involving entities E1,…, En, one must do the following: R must be
reified, just like a binary relationship with attributes. Thus, a class R’ must be
created, with attributes all attributes of R, and then n binary relations between
R’ and each Ei’ must be established. The final result is the same as the procedure
shown above, since the process of reification is basically the same transforma-
tion that takes place in UML when an association class is created to represent
n-ary relations (n > 2).
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Models to Data Models 163
The object-relational data model offers several degrees of freedom with respect
to the (purely) relational data model. These degrees of freedom, in turn, raise
different possibilities about the translation from conceptual models. Unfortu-
nately, almost no research has been done on possible different translations, their
weaknesses and strengths, and their comparison. Here, we outline two different
approaches to the translations and evaluate them.
The first translation is based on the intuition that the real goal of the object-
relational data model is to allow for new data types in the relational model,
without changing its structure. Under this view, a translation into object-
relational databases would be the same as a translation into a purely relational
database, with the following differences:
• subtypes of an entity type can be directly mapped into subtables of a table,
since the object-relational approach supports inheritance directly.
• multivalued atomic attributes can be modeled as an attribute with internal
ROW or ARRAY constructor, at least if an upper bound in the number of
values is known1. This will help reduce table fragmentation and will allow,
in many cases, to map one entity under an entity class into one tuple in one
table.
Because this is almost the same translation as the purely relational, we call this
approach conservative. There is also another possible approach, which is more
extreme, in that it uses the full-blown potential of the object-relational model. In
the extreme approach, we treat the data model as one of objects represented in
tables (i.e., in tabular form). Hence, we not only map subtypes into subtables and
multivalued attributes into complex attributes, we also map weak entities into
complex subelements of the strong entity. That is, given weak entity E1 related
by a 1-M relationship to strong entity E2, we create a type for E1, and include,
as an attribute of E2, a collection attribute representing an ARRAY or SET of
the type created for E1.
1-M relationships can also be mapped by embedding the entities in the M side into
the table for the entities in the 1 side. If the relationship has attributes, a table is
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
164 Badia
created that contains, on each tuple, the attributes of the relationship plus a
complex type corresponding to the entity. This table is then embedded as a
subtable (complex) attribute of the table representing the entity on the 1 side.
Note that there may be further references to the embedded entity, since it may
be involved in other relationships; these are handled by using REF types. Binary
relationships without attributes can be implemented through REFs (or sets of
REFs), in a manner similar to the object-oriented model.
Note that this approach creates a series of choices as to how to represent many-
relationships and complex types: directly by embedding or inlining, or indirectly
through references. Since this creates differences in how the relations are
physically stored on disk, and therefore on access patterns, which option to use
depends on the expected workload (queries), size and performance requirements
of the database. As an example, consider the entity type Department, con-
nected by a 1-M relationship to entity type Employee (i.e., a Department may
have several Employees, each Employee belongs to a single Department). One
option is to create the complex type for Department using the complex type for
Employee as a (multivalued, complex) element. In this approach, there is only
one table (Department), and queries asking about Department information
(including Employees per Department) may run somewhat faster, since
Departments are explicitly joined to Employees. However, queries about Em-
ployees may be somewhat slower (of course, the final performance depends on
many other factors, like clustering, indices, etc.). This approach is somewhat
biased towards Department (considered a “primary” type), in detriment of
Employee (considered a “secondary” type). Moreover, if Employee is con-
nected by another relationship to another entity, this second relationship has to
be handled by reference — unless it is also 1-M. In our example, Employee is
involved in a 1-M relationship with Dependant. Then it is possible to inline the
dependant type within the Employee type, which in turn is within the Depart-
ment type. Direct access to dependants may be slow and complex in this
schema, though. However, Employee is also involved in a many-to-many-to-
many relationship with Department and Project. Then it is not possible to
inline this relationship since this would cause redundancy (a project would be
repeated on each employee that reviewed it). In this case, using a separate
Review table with references to Employees may be a better choice. The
problem applies to any relationship that is many-to-many, which cannot be
embedded without causing redundancy.
In contrast, the original option is to create a complex type for Department, one
for Employee, a table for each, and handle the relationship either with a separate
table (as in the relational model), or by using references (sets of references to
Employee on each Department, and a reference to Department on each
Employee). This approach is more neutral, in that both Departments and
Employees are considered “top level” tables. Further relationships with other
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Models to Data Models 165
tables, like Project, are easier this way. On the other hand, every join between
Departments and Employees needs support from separate indices and/or
algorithms. Note that both approaches would create subtables Hourly-
Employee(hourly-rate) and Salaried-Employee(salary), under table
Employee. A possible heuristic to decide between the conservative and extreme
approaches is this: the extreme approach works well for weak entity types, and
for entity types connected only by a 1-M relationship to other entity types.
However, the presence of M-N relationships strongly suggests the use of the
conservative approach.
Finally, we note that both approaches can handle required attributes but have
problems with optional attributes. Since both approaches ultimately make object
types fit into table schemas, the absence of a value creates a null in any case.
Mapping from E-R to XML is made complicated by the fact that XML is
decidedly a hierarchical model while E-R is more of a flat model (all entities are
at the same level) (Mani et al., 2001; Lee et al., 2003; Conrad et al., 2000; Dos
Santos et al., 2001). This presents two options as to how to transform an E-R
model into XML: to somehow transform the E-R model to fit the hierarchical
nature of XML, or to create a flat model that can be expressed in XML Schema.
The choice is similar to the conservative and extreme approaches outlined in the
previous subsection. In the case of XML, the first option may be taken to an
extreme by choosing some entity type to become the root of the XML model, and
embedding everything under it. M-N relationships, relationships with attributes,
and n-ary relationships (n > 2) are difficult to represent in this manner;
hierarchical relationships are, after all, 1-M. A less extreme approach would be
to identify some entity types in the E-R model which are somehow “more
important” and use them as first elements under an artificial root; this is the
approach used in Bird et al. (2000). The second approach is basically to use the
E-R to relational translation, and express the resulting (flat) database in XML
Schema. Thanks to its ability to represent foreign key-primary key relationships,
XML Schema can faithfully represent the flat structure of a relational database.
Here, we apply both approaches to our example and compare the results.
The first option requires us to choose a “root of the hierarchy” entity. Assume
that, when applied to the E-R model of Figure 1, this method chooses Employee
as the root; this results in the DTD of Figure 7.
Note that this approach is inherently biased. Also, many-to-many relationships
are described as one-to-many, since this is the only type of relation compatible
with the hierarchical organization. Moreover, attributes in relationships are lost,
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
166 Badia
Figure 7.
too. Hence, this approach leaves much to be desired. The work of Bird et al.
(2000) creates hierarchies when the structure is favorable (one-to-many rela-
tionships), but breaks it up in favor of a flat approach in other cases.
The second approach takes the flat approach for all of the models, and it can be
expressed in the algorithm of Figure 8. To start the algorithm, we declare one
complex element D to be the root element of the schema; D represents the whole
database. Then, one element is created per entity type and one per relationship;
links are established through keys and foreign keys. Inheritance is modeled with
derived types. Constraints on inheritance (exclusive or not, total covering or not),
cannot be modeled in XML Schema at the moment. All attributes (optional,
choice, and multivalued) are treated equally, since they can be handled in XML
Schema. If an attribute is optional but not multivalued, we use ‘?’; if an attribute
is optional and multivalued, we use ‘*’. Hence, XML is the only data model that
can adequately deal with optional attributes.
The decision to consider attributes as subelements instead of XML attributes is
based on two factors: it allows for more parallelism with the relational model, and
hence facilitates integration of the two; and it allows saving the XML attributes
for metadata. However, clearly the option exists to treat E-R single-valued,
simple attributes as XML attributes.
As for UML, the whole process is simplified by the fact that all relations
(associations) are binary. Hence, if an association is 1-to-many, we can consider
a hierarchical organization, by making the entity on the many side a subelement
of the entity on the 1 side. However, many-to-many associations still require two
separate elements and the use of primary keys and foreign keys in XML Schema
to be adequately modeled.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Models to Data Models 167
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
168 Badia
everything into relations, and hence objects are diluted in that the information
about an object may be distributed in several tuples of several relations. Finally,
the entity-relationship conceptual model treats them both as equally important;
this has probably contributed to their popularity, since many applications may be
best described using both constructs. Another conclusion is that all data models,
except XML, fail to accommodate irregular data. In particular, all models except
XML cannot deal with optional attributes without resorting to nulls or other
special values.
The most important conclusion of this overview is that there is a severe lack of
research in transformations for the object-relational and semistructured data
models, as can be seen from the fact that several translations, with different
advantages and drawbacks, are possible, and none is considered the right
translation. The object-relational data model offers several possibilities for all but
the most trivial conceptual models; the trade-offs among these options are not
well understood. Hopefully, new research will shed light into these issues in the
near future.
References
Abiteboul, S., Buneman, P. & Suciu, D. (1999). Data on the Web: From
Relations to Semistructured Data and XML. Morgan Kaufmann.
Bird, L., Goodchild, A. & Halpin, T. (2000). Object Role Modeling and XML
Schema. Proceedings of E-R 2000, Springer-Verlag (pp. 309-322).
Bodart, F., Patel, A., Sim, M. & Weber, R. (2001). Should Optional Properties
Be Used in Conceptual Modeling? A Theory and Three Empirical Tests.
Information Systems Research, 12(4), 384-405.
Bray, T., Paoli, J. & Sperberg-McQueen, C.M. (Eds.). Extensible Markup
Language (XML) 1.0.W3C Recommendation (2 nd edition). Retrieved
from the WWW: http://www.w3.org/TR/REC-xml-20001006
Camps-Pare, R. (2002). From Ternary Relationship to Relational Tables: A
Case Against Common Beliefs. SIGMOD Record, 31(2), 46-49.
Cattell et al. (Eds.). (2000). The Object Data Standard: ODMG 3.0. Morgan
Kauffman.
Chen, P. (1976). The Entity-Relationship Model-Toward a Unified View of
Data. ACM Transactions on Database Systems, 1(1), 9-36.
Conrad, R., Scheffner, D. and Freytag, J. C. (2000). XML Conceptual Modeling
Using UML. Proceedings of ER 2000, Springer-Verlag, (pp. 558-571).
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
From Conceptual Models to Data Models 169
Endnote
1
Technically, the SQL-99 standard does not have a SET constructor, and
hence does not have the ability to capture multivalued attributes (or many-
relationships). Indeed, without such a constructor the standard does not
even have the ability to represent nested relations (Melton, 2003). How-
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
170 Badia
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
An Algorithm for Transforming XML Documents Schema 171
Chapter VIII
An Algorithm for
Transforming XML
Documents Schema
into Relational
Database Schema
Abad Shah, University of Engineering & Technology (UET), Pakistan
Jacob Adeniyi, King Saud University, Saudi Arabia
Tariq Al Tuwairqi, King Saud University, Saudi Arabia
Abstract
The Web and XML have influenced all walks of lives of those who transact
business over the Internet. People like to do their transactions from their
homes to save time and money. For example, customers like to pay their
utility bills and other banking transactions from their homes through the
Internet. Most companies, including banks, maintain their records using
relational database technology. But the traditional relational database
technology is unable to provide all these new facilities to the customers. To
make the traditional relational database technology cope with the Web and
XML technologies, we need a transformation between the XML technology
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
172 Shah, Adeniyi, and Al Tuwairqi
Introduction
An electronic document on the Web contains regular and irregular structures
that may not be understood by users (Suciu, 1999; Abiteboul & Vianu, 1997;
Brayan, 1997). Such a document (or data) is referred to as semistructured data
(Suciu, 1999; Abiteboul, 1997). Contrary to the data in relational databases
(RDBs), the semistructured data is stored without any schema or with a vague
schema (Buneman, 1997; Suciu, 1999; Buneman, 1997). Beside the Web, there
are many other sources of semistructured data, such as heterogeneous network-
ing of integrated systems, file systems, electronic mail systems, digital libraries,
etc. (Abiteboul, 1997; Buneman, 1997).
The introduction of Extensible Markup Language (XML) as a standard data/
information representation has facilitated the publication of electronic data on
the Web (W3C, 2003). This language also provides a hierarchical format for the
data exchange over the Web with structure (St. Laurent, 1999; Bray, Paoli,
Sperberg-McQueen, & Maler, 2002). Information in an XML document is
represented as nested element structures, which start with a root element. An
element can have an attribute or a sub-element (for further details about XML
see W3C (2003) and Bray et al. (2002)). An XML document has an optional part,
which is called Document Type Declaration/Description (DTD). A DTD of
an XML document is considered as the schema of the XML document (W3C,
2003; Bray et al., 2002; Men-Hin & Fu, 2001).
A relational database (RDB) has two main components, a schema and data files
(or operational files) which are created according to the schema. As said earlier,
a DTD is considered as a schema of an XML document, but there are noticeable
differences between a RDB schema and an XML document schema (DTD). We
give a complete comparison between them in Table 1. The basic difference
between them is that a DTD represents a hierarchical structure whereas a RDB
schema represents a relational (tabular) structure. We can consider XML
documents schema analogous to the classical hierarchical data model.
XML is considered as the best tool for representing and exchanging information
on the Web (St. Laurent, 1999; Bray, Paoli, Sperberg-McQueen & Maler, 2002).
The language allows users to define and also display data on the Web. These
features make XML powerful and different from Hypertext Markup Language
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
An Algorithm for Transforming XML Documents Schema 173
(HTML) (Suciu, 1999; Comer, 2000). XML enables the user to define his own
structures using the syntax of the elements in a DTD. A DTD describes the
structure of information in an XML document in a hierarchical way (Bray, Paoli,
Sperberg-McQueen & Maler, 2002). The structure of a DTD consists of
elements which are further specified by attributes and/or sub-elements. Recur-
sive and optional types of the sub-element can be defined using the operations
* (zero or more times), + (one or more times),? (optional), and | (or). Many types
of data value can be assigned to attributes, i.e., string-type or entity. The data
value ANY means that an arbitrary declaration can be made by the programmer.
An element in an XML document is uniquely identified by a special attribute ID.
This unique attribute of an element can be regarded as the primary key of the
element. As it has been mentioned in Table 1, a DTD does not support the
concept of the composite ID (or key). An attribute can be referenced in another
element through a field called IDREF, and it is a type-less attribute. The concept
of an IDREF is similar to the concept of a foreign key in relational databases.
There is no concept of a root of a DTD (Bray et al., 2002).
Nowadays, financial organizations want to empower their customers so that they
can perform their transactions from their homes through the Internet. For these
financial organizations to provide their customers with this facility, it is necessary
and beneficial that the databases (which are mostly RDBs) should be presented
and processed in the XML format. We therefore need a technique that will
process and transform an RDB and queries into an XML format and vice versa.
This technique (or transformation) is essential because most of the commercially
available database management systems (DBMSs) are relational DBMSs.
This transformation will integrate and handle heterogeneous RDBs in the same
manner. Researchers agree that the currently available RDB technology inde-
pendently is not adequate to achieve this objective of using them on the Web
without such transformation (Shanmugasundaram et al., 1999). Recently, some
researchers have proposed algorithms for this transformation (Shanmugasundaram
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
174 Shah, Adeniyi, and Al Tuwairqi
et al., 1999; Williams et al., 2000; Marni & Lee, 2002; Men-Hin, & Fu; 2001).
In these transformation algorithms, most investigators have considered a DTD
as a schema of the XML document, and they have used the tree data structure
during the transformation. We know that the processes of creating and maintain-
ing tree data structures are costly and affect the performance of the transfor-
mation process as pointed out by Shanmugasundaram et al. (1999). Also, there
are many syntax options that are available for writing DTDs. Most of the existing
transformation algorithms (from DTD into RDB schema) are unable to handle
DTDs written in different ways (Shanmugasundaram et al., 1999; Men-Hin &
Fu, 2001). In this chapter, we propose a different approach for transforming any
DTD of an XML document into relational database schema. This approach can
handle DTDs written in different ways and transform them into relational
database schema in a simple and elegant way.
The remainder of this chapter is organized as follows. We describe and analyze
the existing approaches for transforming a DTD of an XML document into a
relational database schema. We then present our proposed approach for
transforming a DTD into a relational database schema, and demonstrate the
proposed approach in a case study. Finally, we give our concluding remarks and
future direction of this work.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
An Algorithm for Transforming XML Documents Schema 175
In the first algorithm of Men-Hin and Fu, functional dependencies are found in
Step 5, first by analyzing the XML data, and then by applying the algorithm:
Efficient discovery of functional and approximate dependencies using partition-
ing. Step 6 of this algorithm is time-consuming, according to Men-Hin and Fu.
They modified this step to make the first algorithm more efficient (Men-Hin &
Fu, 2001). The modified algorithm decomposes a DTD into small prototypes in
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
176 Shah, Adeniyi, and Al Tuwairqi
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
An Algorithm for Transforming XML Documents Schema 177
sion definition is used as the first step in this transformation process of XML
schema into relational database schema. The entities and relationships, which
form the basic items of data modeling, are represented as elements and attributes
of a DTD.
The process of mapping XML schema (or DTD) into RDB schema has several
issues, and they have been pointed out (Marni & Lee, 2002). One of the most
important among them is the semantic constraint which exists in the XML model.
Since relational database schema cannot express these constraints in the XML
schema languages, a useful and meaningful subset of those constraints should
therefore be found in the mapping process. This process of finding the subset
needs simplification of an XML schema. The concept of inlining technique is
used for generating an efficient relational schema (Marni & Lee, 2002).
However, the inline technique that is presented in this work generates a huge
number of relations. In addition, this work does not present any proposal for
assigning data types to attributes of tables after or during the transformation
process. The transformation process of an XML DTD to relational data schema
is the mapping of each element in the DTD to a relation, and it maps the attributes
of an element to the attributes of the relation. However, there is no correspon-
dence between elements and attributes of DTDs and entities and attributes of ER
model. The attributes in an ER model are often represented as elements in a
DTD. The following DTD definition illustrates this issue of the transformation.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
178 Shah, Adeniyi, and Al Tuwairqi
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
An Algorithm for Transforming XML Documents Schema 179
Pre-Processing Algorithm
As it has been mentioned earlier, there is no standard and fixed method for
writing a DTD of an XML document. In other words, different users can write
DTDs in different ways using the options provided by the syntax of DTD. The
main objective of the Pre-processing is to enable the overall transformation
process to handle DTDs that are written in different ways. Hence, the main
function of this algorithm is to transform DTDs written in different forms into a
uniform and standard form. The output of this algorithm is the standard DTD
denoted as DTD s, and it is used as the input to the Transformation Algorithm
(Figure 1).
Now we summarize the objectives of the standard DTD and give its format (or
syntax). The main objectives of the standard DTD are listed below:
(i) to provide input to Transformation Algorithm in a uniform and standard
form; and
(ii) to enable the overall transformation process handle DTDs written in
different ways.
1.<!DOCTYPE DTD_Name [
3.<!ATTLIST element_name
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
180 Shah, Adeniyi, and Al Tuwairqi
Pre-Processing Algorithm are given in Figure 4. The algorithm takes any DTD
(DTDa) and transforms it into the standard DTD (DTD s).
In the Global Schema Extraction Algorithm (Men-Hin & Fu, 2001), elements and
attributes are treated equally and represented as separate nodes in the prototype
tree. The same approach is used in the DTD-splitting Schema Extraction
Algorithm (Men-Hin & Fu, 2001) and Basic Inlining Algorithm
(Shanmugasundaram et al., 1999). But our proposed Pre-Processing Algorithm
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
An Algorithm for Transforming XML Documents Schema 181
(iv) For each Root_element Find total number of main_elements, say that
they are n;
(vi) FOR i= 1 to n
/* the sub-element has one of the following features (i)it has no sub-
IF m > 0 THEN
FOR j=1 to m
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
182 Shah, Adeniyi, and Al Tuwairqi
attribute:
DTDs as
attribute_name ID #REQUIRED
Transform it in DTDs as
as it is defined in DTDa
add ‘>’
GOTO attribute;
(ix)END Algorithm;
Transformation Algorithm
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
An Algorithm for Transforming XML Documents Schema 183
of RDB_schema;
(ii) Find total number of main_elements in DTDs (say they are n);
ELSE
FOR i = 1 to n
IF m = 0 THEN i++
ELSE
FOR j = 0 to m
Scan attribute_namej;
Tablei;
The outer loop deals with elements of the DTDs and transforms them into
corresponding tables/relations. The inner loop transforms every attribute of the
element into the attributes of the corresponding relation. In Step (iii) of the
algorithm (Figure 5), it transforms ID and IDREF attributes of an element into
the primary key and foreign key of the relation, respectively. Note that since the
syntax of DTD does not support the concepts of composite key, our proposed
transformation process, therefore, does not support this concept.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
184 Shah, Adeniyi, and Al Tuwairqi
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
An Algorithm for Transforming XML Documents Schema 185
4 1 Books 4 1 title N N 1
Library
2 author N N 2
3 PName N N 3
4 LCNo Y - 4
2 Publisher 3 1 PName Y - 5
2 PAddr N N 6
3 PCity N N 7
3 Borrowers 4 1 Name N N 8
2 Addr N N 9
3 City N N 10
4 CardNo Y - 11
4 Loans 3 1 CardNo Y - 12
2 LCNo N Y 13
3 Date N N 14
2 author N N
3 PName N Y
4 LCNo Y -
2 PAddr N N
3 PCity N N
2 Addr N N
3 City N N
4 CardNo Y -
2 LCNo N Y
3 Date N N
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
186 Shah, Adeniyi, and Al Tuwairqi
Regular
Global DTD-splitting Williams
Basic Inlining SI&HI * Tree Our Approach
Extracting Extracting et al.
Grammars
no such
Data
Structure
relational regular tree abstract data
graph graph tree tree grammars
Used structure structure is
used
ensures that
preserves some some rules
creates a each support pre-pressing
eliminates operators to are
Operators relation for element is XML- algorithm
operators preserve some specified to
Handling every element represented schema processes
from DTD sub-elements handle
in the DTD only once (not DTD) them
occurrences them
in a relation
number of
actual data simple, direct
the attributes of the
handles fields are preserves maintains mapping &
common schema is less
Advantage fragmentation available in entities and semantics maintains the
elements than the
Problem ** relational definitions constrains semantics of
are shared algorithms
schema RDB
basic inlining
some rules
large of the
number of
number of mapping
possible works with a complex data types
large no. of joins in are vague
✦ relations it
minimal limited number mapping assigning is
Disadvs. mapping such as
dependencies of elements and process with human
creates for assigning
is attributes intervention
particular of data
exponential
elements types to
attributes
*. A direct mapping of elements to relations leads to excessive fragmentation of attributes (for more details see (Marni, M &
Lee, D. 2002 )).
** Performance.
).Disadvantages.
)). Performance.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
An Algorithm for Transforming XML Documents Schema 187
Table_Publisher
PName PAddr PCity
Table_Borrowers
Name Addr City CardNo
Table_Loans
CardNo LCNo Date
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
188 Shah, Adeniyi, and Al Tuwairqi
Melton, 2002). Most of these approaches use the tree data structure, which
requires costly operations in creating and maintaining it. Our approach does not
use a tree data structure and this makes it simpler and more efficient than those
approaches that use the tree data structure. We have also given a brief
comparative study of our proposed approach and some existing approaches in
Table 5.
Future Directions
References
Abiteboul, S. (1997). Querying semi-structured Data. Proceedings of Interna-
tional Conference on Database Theory.
Abiteboul, S., Buneman, P. & Suciu, D. (2000). Data on the Web. CA: Morgan
Kaufmann.
Abiteboul, S. & Vianu, V. (1997). Querying the Web. Proceedings of the
ICDT.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
An Algorithm for Transforming XML Documents Schema 189
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
190 Ma
Chapter IX
Imprecise and
Uncertain Engineering
Information Modeling
in Databases:
Models and
Formal Transformations
Z. M. Ma, Université de Sherbrooke, Canada
Abstract
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Imprecise and Uncertain Engineering Information Modeling 191
Introduction
Nowadays computer-based information systems have become the nerve center
of current manufacturing systems. Engineering information modeling in data-
bases is thus essential. From the point of view of database systems, engineering
information modeling can be identified at two levels: conceptual data modeling
and logical database modeling. Correspondingly, we have conceptual data
models and logical database models for engineering information modeling,
respectively. Product data models, for example, can be viewed as a class of
conceptual data models that take into account the needs of engineering data
(Shaw, Bloor & Pennington, 1989). Much attention has been directed at
conceptual data modeling of engineering information because conceptual data
models can capture and represent rich and complex semantics in engineering
applications at a high abstract level. Limited by the power of traditional ER/EER
(Chen, 1976) in engineering modeling, the International Organization for Stan-
dardization (ISO) has developed the Standard for the Exchange of Product Data
(STEP, ISO 10303) in order to define a common data model and procedures for
the exchange of information. STEP provides a means to describe a product
model throughout its life cycle and to exchange data between different units.
STEP consists of four major categories: description methods, implementation
methods, conformance testing methodology and framework, and standard-
ized application data models/schemata. EXPRESS (Schenck & Wilson,
1994), being the description methods of STEP and a conceptual schema
language, can model product design, manufacturing, and production data and
EXPRESS model hereby becomes a major one of the conceptual data models for
engineering information modeling (Eastman & Fereshetian, 1994). Note that,
however, not being the same as ER/EER and IDEF1X, EXPRESS is not a
graphical schema language. In order to construct an EXPRESS data model at a
higher abstract level, EXPRESS-G is introduced as the graphical representation
of EXPRESS. Here EXPRESS-G can only express a subset of the full language
of EXPRESS. In addition to EXPRESS-G, it is also suggested in STEP that
IDEF1X or ER/EER can be used as one of the optional languages for EXPRESS
data model design. Then EXPRESS-G, IDEF1X, ER/EER, or even UML data
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
192 Ma
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Imprecise and Uncertain Engineering Information Modeling 193
Basic Knowledge
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
194 Ma
given range (interval or set) of values, but we do not know exactly which one to
choose at present. For example, “between 20 and 30 years old” and “young”
for the attribute Age are imprecise and vague values, respectively. In general,
vague information is represented by linguistic values.
Imprecise values generally denote range-values with the form [ail, ai2, ..., aim] or
[ai1-ai2] for the discrete and continuous universe of discourse, respectively,
meaning that exactly one of the values is the true value for the single-valued
attribute, or at least one of the values is the true value for the multivalued
attribute. Imprecise information hereby has two interpretations: disjunctive
information and conjunctive information. In addition to range values, there exists
one special kind of imprecise information, namely, null values (Codd, 1987;
Motor, 1990; Parsons, 1996; Zaniola, 1986).
Information uncertainty is related to the truth degree of its attribute value, and
it means that we can apportion some, but not all, of our belief to a given value or
a group of values. For example, the sentence, “I am 95 percent sure that Tom
is married” represents information uncertainty. The random uncertainty, de-
scribed using probability theory, is not considered in this chapter.
Generally speaking, several different kinds of imperfection can co-exist with
respect to the same piece of information. For example, that it is almost sure that
John is very young involves information uncertainty and vagueness simulta-
neously. Imprecision, uncertainty, and vagueness are three major types of
imperfect information. Vagueness and uncertainty can be modeled with possi-
bility theory (Zadeh, 1978). Therefore, we mainly concentrate on fuzzy extension
of EXPRESS-G model and fuzzy nested relational databases in this chapter.
Fuzzy data is originally described as fuzzy set by Zadeh (1965). Let U be a
universe of discourse, then a fuzzy value on U is characterized by a fuzzy set F
in U. A membership function µF: U → [0, 1] is defined for the fuzzy set F, where
µF (u), for each u ∈ U, denotes the degree of membership of u in the fuzzy set
F. Thus the fuzzy set F is described as follows:
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Imprecise and Uncertain Engineering Information Modeling 195
Here, for each ui ∈ U, πX (ui) denotes the possibility that X takes value ui. Let
πX and F be the possibility distribution representation and the fuzzy set represen-
tation for a fuzzy value, respectively. Then X and F are usually regarded as the
same things, i.e., πX = F.
EXPRESS-G
EXPRESS-G is the graphical representation of EXPRESS, which uses graphical
symbols to form a diagram (Eastman & Fereshetian, 1994; Shaw, Bloor &
Pennington, 1989). Note that it can only represent a subset of the full language
of EXPRESS. EXPESS-G provides support for the notions of entity, type,
relationship, cardinality, and schema. The functions, procedures, and rules in
EXPRESS language are not supported by EXPRESS-G.
EXPRESS-G has three basic kinds of symbols for definition, relationship, and
composition. Definition and relation symbols are used to define the contents and
structure of an information model. Composition symbols enable the diagrams to
be spread across many physical pages.
Definition symbols. A definition symbol is a rectangle enclosing the name of
the thing being defined. The type of the definition is denoted by the style of the
box. Symbols are provided for simple type, defined type, entity type, and schema.
• Simple type symbols. A number of predefined simple types offered by
EXPRESS language include Binary, Boolean, Integer, Logical, Number,
Real, and String. The symbol for them is a solid rectangle with a double
vertical line at its right end. The name of the type is enclosed within the box.
The symbols for these simple types are shown in Figure 1.
• Type symbols. The symbols for the select, enumeration and defined data
types are dashed boxes as shown in Figure 2.
• Entity symbols. The symbol for an entity is shown in Figure 3.
• Schema symbols. The symbol for a schema is shown in Figure 4.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
196 Ma
Note that the lines with open circles denote relationship directions in EXPRESS-
G.
Composition symbols. Graphical representation of models often spans many
pages. Each page in a model must be numbered so that we can keep track of
where we are in the model and enable inter-page referencing. In addition, a
schema may utilize definitions from another schema. Therefore, there are two
kinds of composition symbols for page references and inter-schema references,
which are shown in Figure 6 and Figure 7, respectively. EXPRESS-G provides
two levels of modeling, namely, schema level model and entity level model.
Therefore, we discuss the fuzziness in the entity level model and in the schema
level model in the following, respectively.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Imprecise and Uncertain Engineering Information Modeling 197
Schema.def Schema.def
alias alias
Definition referenced from another schema Definition used from another schema
be atomic. It is clear that this assumption limits the expressive power of the
traditional relational database model in modeling complex objects with rich data
types and semantic relationships in the real applications.
The first attempt to relax this limitation was made by Makinouchi (1977). In this
initial work, attribute values in the relational instance may be atomic or set-
valued. We call such relational databases non-first normal form (NF2) relational
databases. After Makinouchi’s proposal, NF2 database model is further ex-
tended (Motor & Smets, 1997; Schek & Scholl, 1986; Yazici et al., 1999). The
NF2 database model in common sense now means that attribute values in the
relational instances are either atomic or set-valued and even relations them-
selves. So NF2 databases are called nested relational databases also. In this
chapter, we do not differentiate between these two notions. It can be seen that
the NF2 database model is a generalized relational data model, but it can model
complex objects and relationships. A formal definition of NF2 database model
(Yazici et al., 1999) is given as follows.
Definition. Let a relation r have schema R = (A1, A2, ..., An) and let Dl, D2,
..., Dn be corresponding domains from which values for attributes (A1, A2, ...,
An) are selected. Attribute Aj is a higher order attribute if its schema appears
on the left-hand side of a rule; otherwise, it is simple.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
198 Ma
Car
denoted by r, consists of attributes (A1, A2, ..., An). A tuple of an NF2 relation
is an element in r and denoted as <a1, a2, ..., an> consisting of n components.
Each component aj (1 ≤ j ≤ n) may be an atomic or null value or another tuple.
If Aj is a higher order attribute, then the value aj need not be a single value, but
an element of the subset of the Cartesian product of associated domains D jl, Dj2,
..., Djm.
Let us look at the hierarchy structure of car products shown in Figure 8 (Erens,
Mckay & Bloor, 1994; Li, Zhang & Tso, 2000; Zhang & Li, 1999). The Car
structure can be defined as a nested data model utilizing the following forms:
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Imprecise and Uncertain Engineering Information Modeling 199
Unnest (Ozsoyoglu et al., 1987; Roth, Korth & Batory, 1987; as well as Pack
and Unpack in Ozsoyoglu et al., 1987), have been introduced. The Nest operator
can gain the nested relation including complex-valued attributes. The Unnest
operator is used to flatten the nested relation. That is, it takes a relation nested
on a set of attributes and desegregates it, creating a “flatter” structure. The
formal definitions and the properties of these two operators as well as the
ordinary relational algebra for the NF2 data model can be found in Colby (1990)
and Venkatramen and Sen (1993).
The NF2 data model is useful in an engineering data modeling due to its capacity
of modeling complex objects with hierarchical structure, which are very common
in engineering areas. Let’s look at the Instance-As-Type (IAT) problem
proposed in Erens, Mckay and Bloor (1994) and Li, Zhang and Tso (2000). The
IAT means that an object appears as a type in one information base, but also as
an instance in another information base at the same time. An IAT phenomenon,
for example, occurs in the interior of car in the above example. The IAT
problems can result in the more difficulty and cost in the maintenance of
information consistency. So we must resolve them in product data modeling, or
update anomaly occur. It is shown in Table 2.1 that the NF2 data model can avoid
the IAT problems naturally.
Extended EXPRESS-G
Since entity and schema are keys of the EXPRESS-G information model, in this
section we extend EXPRESS-G for fuzzy information modeling in entity level
and schema level models, respectively. In an entity level model, we mainly
investigate the fuzziness in data types, attributes, entities, and relationships. In
a schema level, we mainly investigate the fuzziness between schemas. The
corresponding notations are hereby introduced.
An entity level model is an EXPRESS-G model that represents the definitions and
relationships that comprise a single schema. So the components of such a model
consist of type, entity, relationship symbols, role, and cardinality information.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
200 Ma
Fuzziness can also be found in type modeling. First let’s have a look at
enumeration type. As we know, an enumeration type is an ordered list of values
represented by name, where the list has a perfect boundary. A value named
either belongs to the enumeration type or does not belong to the enumeration
type. It is possible, however, that a value belongs to the enumeration type with
degree, namely, the value is fuzzy. For example, an enumeration type is
HairType = ENUMERATION OF (Black, Red, Brown, Golden, Grey) and the
hair type of a person is red and brown.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Imprecise and Uncertain Engineering Information Modeling 201
A defined data type is created based on the underlying type. The defined data
type generally has the same domain of values as the underlying type unless a
constraint is put on it. The underlying type can be simple type, collection type,
enumeration type, select type, and named type. We have shown that a value of
simple type, collect type, or enumeration type may be fuzzy or imprecise. The
imprecision and fuzziness for the values of select type or entity type are shown
in the following. Thus, the value of a defined data type may be imprecise or fuzzy.
A select type defines a named collection of other types called a select list. A
value of a select type is a value of one of the types specified in the select list
where each item is an entity type or a defined type. The imprecision or fuzziness
of a value of select type comes from the imprecision or fuzziness of its
component type, fuzzy or imprecise entity type and defined type.
The symbols for modeling imprecise and fuzzy type are shown in Figure 11 and
Figure 12, respectively.
Fuzzy entity modeling can be classified into two levels. The first level is the
fuzziness in the entity sets, namely, an entity has a degree of membership in the
µ (E)/E
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
202 Ma
model. For example, an entity Engine may be fuzzy in the product data model.
The second level is related to the fuzzy occurrences of entities. For an entity
Research Student, for example, it is not uncertain if John is a Ph.D. student.
Such an entity is represented using the definition symbol in Figure 13.
For the first level fuzziness, memberships can be placed inside the solid rectangle
as well as the name of the entities. Let E be an entity and µ (E) be its grade of
membership in the model, then “ µ (E)/E” is enclosed in the solid rectangle.
If µ (E) = 1.0, “1.0/E” is usually denoted “E” simply. The graphical represen-
tation of such entity is shown in Figure 14.
In a classical situation, if there exist two entities E1 and E2 such that for any entity
instance e, e ∈ E2 implies e ∈ E1, then E2 is called a subtype of E1, and E1 is called
a supertype of E2. As mentioned above, an instance of entity, say e, may be fuzzy
for an entity, say E. Therefore, there exists a fuzzy supertype/subtype in
EXPRESS. Let E and S be two fuzzy entities with membership functions µE and
µS, respectively. Then S is a fuzzy subtype of E and E is a fuzzy supertype of
S if and only if the following is true:
Considering a fuzzy supertype E and its fuzzy subtypes S1, S2, …, Sn with
membership functions µE, µS1, µS2, ..., and µSn, respectively, the following
relationship is true:
For the fuzzy subtype with multiple fuzzy supertypes, let E be a fuzzy subtype and
S1, S2, …, Sn be its fuzzy supertypes, which membership functions are respec-
tively µE, µS1, µS2, ..., and µSn.
As mentioned above, there are dashed lines, thick solid lines, and thin solid lines
in EXPRESS-G. Dashed lines and thin solid lines connecting attributes represent
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Imprecise and Uncertain Engineering Information Modeling 203
µ (A)/A µ (A)/A
Schema name
that attributes must belong to the corresponding entity. In fact, it is also possible
that we do not know if an attribute belongs to the corresponding entity for sure.
At this point, such an attribute should be associated with a membership degree
to indicate a fuzzy attribute-entity relationship. We can place membership
degrees upon dashed lines and thin solid lines. In addition, we use thick dashed
solid lines to represent the fuzzy supertype/subtype above. The symbols for these
three lines are shown in Figure 15, where A and µ denote the name of an attribute
and its membership, respectively.
A schema level model is one that displays the schemas, and the relationships
between these schemas. Since fuzziness can occur in entities, the relationships
among these entities may hereby be fuzzy. Following two kinds of schema
relationships Use and Reference in EXPRESS-G, fuzzy Use and Reference
relationships in fuzzy EXPRESS-G are denoted by normal width relation lines
and dashed relation lines with membership degrees as shown in Figure 16.
Through the discussion above, three levels of fuzziness can be found in fuzzy
EXPRESS-G, namely, the fuzziness at the level of attribute value (the third level),
the fuzziness at the level of instance/entity (the second level), and the fuzziness
at the level of entity and attribute (the first level). The fuzziness at the third level
means that attributes take fuzzy values. The second level of fuzziness means that
each instance of an entity belongs to the entity with a membership degree. The
first level of fuzziness means that attributes comprise an entity with membership
degrees or entities comprise a schema with membership degrees.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
204 Ma
Number
Real
Real
Real
String
Type
0.5/Thickness
Body_Id
Capacity
Volume
Material
Length
Air tank
Real
Real
Radius
String
Tank_Id
0.7/Water tank
Tank
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Imprecise and Uncertain Engineering Information Modeling 205
An Example Illustration
In Figure 17, we give a simple fuzzy EXPRESS-G data model utilizing some
notations introduced above. Entity Tank is a supertype, which has three
subtypes, namely, Air tank, Water tank, and Light oil tank. Among these three
subtypes, it is known for certain that entities Light oil tank and Air tank are the
subtypes of entity Tank. In other words, the membership degrees that Light oil
tank and Air tank are the subtypes of Tank are 1.0, respectively. However, it
is not known for certain if entity Water tank must be the subtype of entity Tank.
It is only known that the membership degree that Water tank is the subtypes of
Tank is 0.7. In addition, entity Light oil tank is a fuzzy entity with fuzzy
instances. The entity Air tank has eight attributes. The attribute Body_Id is a
perfect one with string type. Attribute Thickness is an attribute associated with
a membership degree 0.5, which means the possibility that entity Air tank has
attribute Thickness is not certainly 1.0 but is only 0.5. The attributes Volume,
Capacity, and Over height are ones that can take fuzzy real values. The
attributes Length and Radius are imprecise ones that can take range values. It
should be noted that attribute Material is one of enumeration type.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
206 Ma
6. The set of relation values. The corresponding attribute value, say ai, is a
tuple of the form <a i1, ai2, ..., aim> which is an element of Dil × D i2 × ... ×
Dim (m > 1 and 1 ≤ i ≤ n), where each Dij (1 ≤ j ≤ m) may be a domain in
(1), (2), (3), (4), or (5) and even the set of relation values.
Here, we focus on the modeling of these parameter values as well as its structure
information in product data model using fuzzy NF2 database model. A possible
database schema and an instance are partially represented in Table 2 (for space
limitation). Note that the attribute pM can be omitted from the fuzzy NF 2 relation
when all tuples have value 1.0 on pM. Here, “[25, 600],” “[10, 100],” “[30,650],”
and “[15,110]” are imprecise value intervals, and “about 2.5e+03,” “about
1.0e+06,” “about 2.5e+04,” “about 1.0e+07,” “less than 627.50,” “less than
106.75,” “less than 630.00,” and “less than 112.50” are all fuzzy values. Assume
that these fuzzy values are represented by the possibility distributions as follows:
“about 2.5e+03”:
{1.0/2.5e+03, 0.96/5.0e+03, 0.88/7.5e+03, 0.75/1.0e+04, 0.57/1.25e+04, 0.32/
1.5e+04, 0.08/1.75e+04};
“about 1.0e+06”:
{0.05/1.0e+05, 0.18/2.0e+05, 0.37/3.0e+05, 0.55/4.0e+05, 0.69/5.0e+05, 0.78/
6.0e+05, 0.87/7.0e+05, 0.93/8.0e+05, 0.96/9.0e+05, 0.97/1.0e+06};
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Imprecise and Uncertain Engineering Information Modeling 207
The entity instances are identified by their unique identifiers in the EXPRESS
information model. The entity identifiers are just like the keys in (nested)
relational databases, but they are different. The keys are the component parts
of information content whereas the entity identifiers are not. We can view an
entity as a database relation and view the instances of the entity as the tuples of
the database relation. When we would like to represent entity instances in
relational databases, we have to solve the problem of how to identify entity
instances in relational databases. In other words, we must indicate keys of the
tuples originated from entity instances in the relational databases. As we know,
in EXPRESS information, there are attributes with UNIQUE constraints. When
an entity is mapped into a relation and each entity instance is mapped into a tuple,
it is clear that such attributes can be viewed as the key of the tuples to identify
instances. Therefore, an EXPRESS information model must at least contain such
an attribute with UNIQUE constraint when relational databases are used to
model an EXPRESS information model.
In EXPRESS, there is the entity for which attributes are other entities, called
complex entities. Complex entities and subtype/supertype entities in an EX-
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
208 Ma
Formal Mapping
It has been claimed above that the fuzziness in EXPRESS-G can be classified
into three levels. The second level and the third level of fuzziness, namely, the
fuzziness at the level of instance/entity and the fuzziness at the level of attribute
value, can be represented in fuzzy NF2 databases. Relational database models
and nested relational database models only focus on instance modeling and their
meta-structures are implicitly represented in the schemas. The fuzziness at the
level of entity and attribute cannot be modeled in fuzzy NF2 databases due to the
limitation of NF 2 databases in meta-data modeling.
The following three kinds of entities can be identified in an EXPRESS-G model:
1. Member entities. A member entity is the entity that comprises other entities
as a component part or that is the underlying types of enumeration and
select types.
2. Subtype entities. A subtype entity is the entity that is in supertype/subtype
relationships and is the subtype entity of the supertype entity/entities.
3. Root entities. A root entity is neither a subtype entity nor a member entity.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Imprecise and Uncertain Engineering Information Modeling 209
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
210 Ma
A member entity in case (1) is not mapped to a class, but to a complex attribute
of another class that is composed of the member entities. But, the fuzziness in
the member entity can be handled according to the same principles as the
common entity.
Following the formal rules given above, we map the fuzzy EXPRESS-G model
in Figure 17 into the fuzzy nested relational database in Table 2. Note that
attribute “0.5/Thickness” in Figure 17 cannot be mapped into the fuzzy nested
relational database due to its first level of fuzziness.
It should be noted that we do not discuss the mapping of data types in fuzzy
EXPRESS-G models. We assume that fuzzy nested relational databases support
the data types in fuzzy EXPRESS-G models. In fact, the data types supported
by different database products vary. More and more data types are supported
by some latest release of database management systems. Our focus here is on
mapping the entities and the attributes associated with entities in fuzzy EX-
PRESS-G models. We have identified all three kinds of entities in fuzzy
EXPRESS-G models and given the mapping methods that map fuzzy entities and
attributes into fuzzy nested relational databases. So the mapping methods given
in the chapter can be used to solve the problem of fuzzy engineering data model
transformations.
Conclusions
In this chapter, we have proposed a fuzzy extension to EXPRESS-G that can
capture the imprecise and uncertain engineering information. In addition, fuzzy
nested relational databases have been introduced. The formal approaches to
mapping a fuzzy EXPRESS-G schema to a fuzzy nested relational database
schema have been developed in this chapter.
It should be noted that EXPRESS-G is only a subset of the full language of
EXPRESS. Clearly, it is necessary to extend EXPRESS for imprecise and
uncertain engineering information modeling and then map fuzzy EXPRESS
models into databases. In addition, it is also very interesting to formally compare
transformation modeling among EXPRESS-G and other conceptual data models,
such as ER/EER, UML, and IDEF1X. We will investigate these issues in our
future work.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Imprecise and Uncertain Engineering Information Modeling 211
References
Antonsoon, E. K. & Otto, K. N. (1995). Imprecision in Engineering Design.
ASME Journal of Mechanical Design, 117(B), 25-32.
Bordogna, G., Pasi, G. & Lucarella, D. (1999). A Fuzzy Object-Oriented Data
Model for Managing Vague and Uncertain Information. International
Journal of Intelligent Systems, 14, 623-651.
Bosc, P. & Prade, H. (1993). An Introduction to Fuzzy Set and Possibility Theory
Based Approaches to the Treatment of Uncertainty and Imprecision in
Database Management systems. Proceedings of the Second Workshop
on Uncertainty Management in Information Systems: From Needs to
Solutions.
Buckles, B. P. & Petry, F. E. (1982). A Fuzzy Representation of Data for
Relational Database. Fuzzy Sets and Systems, 7(3), 213-226.
Caputo, M. (1996). Uncertainty, Flexibility and Buffers in the Management of
the Firm Operating System. Production Planning & Control, 7(5), 518-
528.
Chaudhry, N., Moyne, J. & Rundensteiner, E. A. (1999). An Extended Database
Design Methodology for Uncertain Data Management. Information Sci-
ences, 121(1-2), 83-112.
Chen, G. Q. & Kerre, E. E. (1998). Extending ER/EER Concepts towards Fuzzy
Conceptual Data Modeling. Proceedings of the 1998 IEEE International
Conference on Fuzzy Systems, 2, 1320-1325.
Chen, P. P. (1976). The Entity-Relationship Model: Toward a Unified View of
Data. ACM Transactions on Database Systems, 1(1), 9-36.
Codd, E. F. (1987). More Commentary on Missing Information in Relational
Databases (Applicable and Inapplicable Information). ACM SIGMOD
Record, 16(1), 42-50.
Colby, L. S. (1990). A Recursive Algebra for Nested Relations. Information
Systems, 15(5), 567-662.
Dubois, D. & Prade, H. (1986). Possibility Theory: An Approach to Comput-
erized Processing. New York: Plenum Press.
Eastman, C. M. & Fereshetian, N. (1994). Information Models for Use in
Product Design: A Comparison. Computer-Aide Design, 26(7), 551-572.
Erens, F., Mckay, A. & Bloor, S. (1994). Product Modeling Using Multiple
Levels of Abstraction Instance as Types. Computers in Industry, 24, 17-28.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
212 Ma
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Imprecise and Uncertain Engineering Information Modeling 213
Ma, Z. M., Zhang, W. J. & Li, Q. (1999). Extending Relational Data Model to
Resolve the Conflicts in Schema Integration of Multiple Databases in
Virtual Enterprise. Proceedings of the 1999 ASME Design Engineering
Technical Conference.
Ma, Z. M., Zhang, W. J. & Ma, W. Y. (1999). Incomplete Information in Product
Concept Design and Its Modeling in Relational Databases. Proceedings of
the 1999 Lancaster International Workshop on Engineering Design,
99-114.
Ma, Z. M., Zhang, W. J., Ma, W. Y. & Chen, G. Q. (2000). Extending
EXPRESS-G to Model Fuzzy Information in Product Data Model. The
2000 ASME Design Engineering Technical Conference.
Makinouchi, A. (1977). A Consideration on Normal Form of Not-Necessarily
Normalized Relations in the Relational Data Model. Proceedings of Third
International Conference on Very Large Databases, 447-453.
McKay, A. (1988). The Structure Editor Approach to Product Description.
Technical Report, University of Leeds, ISS-PDS-Report-4.
Motor, A. (1990). Accommodation Imprecision in Database Systems: Issues
and Solutions. ACM SIGMOD Record, 19(4), 69-74.
Motor, A. & Smets, P. (1997). Uncertainty Management in Information Sys-
tems: From Needs to Solutions. Kluwer Academic Publishers.
Otto, K. N. & Antonsoon, E. K. (1994). Modeling Imprecision in Product
Design. Proceedings of Fuzzy-IEEE 1994, 346-351.
Otto, K. N. & Antonsoon, E. K. (1994). Design Parameter Selection in the
Presence of Noise. Research in Engineering Design, 6(4), 234-246.
Ozsoyoglu, G., Ozsoyoglu, Z. M. & Matos, V. (1987). Extending Relational
Algebra and Relational Calculus with Set-Valued Attributes and Aggregate
Functions. ACM Transactions on Database Systems, 12(4), 566-592.
Parsons, S. (1996). Current Approaches to Handling Imperfect Information in
Data and Knowledge Bases. IEEE Transactions on Knowledge Data
Engineering, 8, 353-372.
Petrovic, D., Roy, R. & Petrovic, R. (1998). Modeling and Simulation of a Supply
Chain in an Uncertain Environment. European Journal of Operational
Research, 109, 299-309.
Petrovic, D., Roy, R. & Petrovic, R. (1999). Supply Chain Modeling Using Fuzzy
Sets. International Journal of Production Economics, 59, 443-453.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
214 Ma
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Imprecise and Uncertain Engineering Information Modeling 215
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
216 Ma
Section III
Additional Topics
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Analysing Transformations in Performance Management 217
Chapter X
Analysing
Transformations in
Performance
Management
Bernd Wondergem, LogicaCMG Consulting, The Netherlands
Norbert Vincent, LogicaCMG Consulting, The Netherlands
Abstract
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
218 Wondergem and Vincent
Introduction
Performance management (PM) is a way of managing in which the organisation’s
strategic goals and its organisation model are made explicit. In addition, the
connection between those two issues is made by stating how the organisation
model produces the organisations output. The explicitly stated goals (the what)
and organisation model (the how) form the core of the management model for
steering the organisation.
In PM, running a business is all about transformations. First, of course, a general
notion of transformation applies: the organisation transforms some form of input
to some form of output. More specific to PM, steering the organisation towards
its strategic goals is done by repeatedly transforming the organisation and the
management model. In this chapter, we focus on these latter forms of transfor-
mation which describe the essence of performance management.
This chapter sets out to do two things. First, we describe a framework for
analysing transformations in performance management. Second, we use the
framework to identify several types of transformations and describe which
properties apply to them. The results of this chapter may enhance the under-
standing of performance management and thus lead to more effective manage-
ment.
This chapter has the following structure: it provides different views of, and
approaches to, PM and presents our vision on the subject. Next, the framework
for analysing transformations is presented: the performance management model.
In the following section, we use this model for describing several types of
transformations. The chapter then deals with future trends. Finally, we provide
concluding remarks and an outlook on further research.
Background
In general, organisations try to find, reach and sustain a strategic position in their
environment. Mintzberg (1991) has classified the ways to do this into two
categories: emergent strategies and planned strategies. “Organisations develop
plans for the future and they evolve patterns out of their past” (Mintzberg, 1994).
Performance management falls into the category of planned strategies.
Performance Management has a typical set-up. First, the organisation formu-
lates a strategy. Formulating a mission, creating a vision and formulating goals
are often seen as preceding steps in strategy-formulation. However, these steps
are not always explicitly taken or repeated in formulating or revising the strategy.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Analysing Transformations in Performance Management 219
Second, the strategy and the corresponding goals are translated into perfor-
mance indicators (PI). PI’s form measurable indicators which give a quantitative
view on the organisation’s performance. The PI’s are often put on a scorecard,
an instrument used for communicating and analysing the performance.
The scorecard is used for steering towards the strategic goals. Therefore, it is
used in a cycle for continuous improvement. Deming’s cycle (Deming, 1982),
consisting of the steps “plan,” “do,” “check” and “act,” probably is the most well-
known variant. In this cycle, strategy formulation forms a part of the “plan” step.
In addition, this step concerns setting up the organisation for the production of
value. Figure 1 sketches the place of the Deming’s cycle in the general set-up
of performance management. In the “do” step, the organisation produces its
products or services and measures its performance through PI’s. This provides
a fact-based insight into the current performance. The results are evaluated and
actions to improve the future performance are defined in the “‘check” step.
Finally, the “act” step consists of implementing the actions. After this, the
strategy may be revised and a new cycle starts. The information that is explicitly
used in the “check” and “act” steps constitutes the so-called performance
management model. This is elaborated upon later in this chapter.
In this chapter, we will consider three aspects of PM as its essence. We define
Performance Management as the management method and instrument that:
1. Translates the organisation’s strategy in measurable indicators. The “what”
of the strategy is thus explicitly translated into quantitative performance
indicators.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
220 Wondergem and Vincent
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Analysing Transformations in Performance Management 221
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
222 Wondergem and Vincent
Goal Model
The goal model explicitly states the desired results. It forms a strategic map,
consisting of measured items, causal relations between them, and performance
indicators. The causal relations between the measured items describe dependen-
cies between the strategic goals and subgoals. The PI’s operationalise the
measured items: they make the measurement of the performance considering the
measured items explicit.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Analysing Transformations in Performance Management 223
The goal model is defined as a tuple GM = (MI, CR, PI, O), where:
• MI is a set of measured items,
• CR is a relation on MI, the causal relations of GM,
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
224 Wondergem and Vincent
A number of properties of the goal model that are used in PM are described
below. The impact value of a causal relation (mi1 à mi2) denotes the part of
mi2 that is explained by mi1. The completeness of an MI, say mi1, is the sum of
all impact values of the causal relations (mi2 à mi1). The score of a PI
expresses the current performance with respect to the PI. The score of a
measured item is then given as the average of the scores of all PI’s that
operationalise it. In order to compute the average of the PI’s, they first have to
be made comparable (Chang & Morgan, 2000).
Organisation Model
The organisation model provides the factors that drive and produce the
organisation’s output. Numerous ways of modeling organisations exist. For our
purposes, it suffices to see the organisation model as a directed graph: OM = (N, L),
where N is a set of organisational elements and L a set of links. The links model
dependencies on the organisational elements.
For PM, it is important that the OM is complete, i.e., that no relevant organisational
elements and links are missed. This would diminish the steering capabilities of the
OM. Furthermore, it is important that the OM provides enough detail, enabling
focused steering.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Analysing Transformations in Performance Management 225
Handles
Handles connect the goals (measured items) to elements from the organisation
model. The handles can thus be formally described as a relation on GM.MI x
OM.N. However, the handles are often not stated explicitly. Rather, they reside
in the heads of managers. In that sense, the handles are subjectively described:
each manager may have his own opinion about which factors influence results.
This is in line with the “idiosyncratic set of knowledge” managers are stated to
form in Van den Bosch and Van Wijk (2001).
The scope of the handles is defined as the portion of all connections between
organisation model and goal model that are included in the handles. The scope
delimits the possible effectiveness of the management model. A large scope may
consider many possibilities to influence a certain goal but may also be time-
consuming in decision making. A small scope may quickly lead to a decision about
how to steer, but may miss relevant options.
Transformations in Performance
Management
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
226 Wondergem and Vincent
Transforming the goal model can be identified at two levels. First, measured
items and causal relations may be altered. Second, performance indicators and
their connection to measured items can be changed.
On the level of MI’s and causal relations, the notion of completeness is important.
If the GM is not complete enough, MI’s may be added. The precondition of such
a transformation can be stated as: the completeness of mi1 is too low, where mi1
is a measured item out of the goal model. The action in the transformation
consists of adding a measured item mi2 and the causal relation (mi2 à mi1). The
post-condition of the transformation then states that the completeness of mi1 is
higher. The proof of this claim hinges on the assumption that the impact value of
mi2 on mi1 is non-zero and positive.
Example: Adding measured items: Consider the goal model of the first example
(Figure 2). Suppose that the MI “Satisfaction of product” is not complete
enough. Therefore, this MI is augmented with the following underlying
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Analysing Transformations in Performance Management 227
Changing MI’s and causal relations can also be required by a change of strategy.
In contrast to the previous transformation, which is internally oriented through
the notion of completeness, a change of strategy is an external factor to the goal
model. Since organisations reside in dynamic environments, strategies need to be
frequently adjusted and sometimes rigorously changed. As said before, the goal
model forms a strategic map. Therefore, the goal model should properly reflect
the strategic goals and relations between them. Coherence between strategy and
goal model is, however, a subjective issue. This means that transformations that
aim at adjusting the goal model to the strategy heavily rely on the business model
of the individual manager.
Changes in PI’s may stem from several reasons, as is shown in Wondergem and
Wulferink (2002). The corresponding transformations of the goal model there-
fore aim at different improvements. A first reason to alter PI’s is the availability
of source data. With these data, the contents of the PI’s are filled. PI’s for
which the source data is not sufficiently (easily) available, may be deleted or
replaced by more suitable variants. Second, PI’s need to be recognisable, which
means that management needs to either be familiar or become familiar with the
PI’s. If PI’s are not recognisable, this hinders effective steering. Insufficiently
recognisable PI’s are therefore replaced by PI’s that better suit the manager’s
experience. As a third reason to change PI’s, we mention dysfunctional
behaviour (Birnberg et al., 1983). PI’s measure specific aspects of the
organisation’s performance, leaving room for “gaming”: giving insufficient
attention to other important issues. These issues can in turn be covered by PI’s
as well, resulting in a balanced set of PI’s. In addition, the measurement of PI’s
should leave little room for “smoothing” the results. PI’s which are automatically
delivered from a data warehouse may serve this purpose, since their computation
from source data is strictly specified. Fourth, PI’s need to be consistent in
definitions, providing the possibility to compare PI’s. This is for instance required
for benchmarking. Finally, PI’s, or rather, the way in which performance is
measured, should fit within the culture of the organisation. Measuring perfor-
mance on individual levels, for instance, requires a culture that supports this.
Otherwise, it may be seen as an intrusion on individual rights and as such may
harm effective steering.
The organisation model reflects the explicitly stated scope managers have for
finding actions for improvement. Transformations of the organisation model may
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
228 Wondergem and Vincent
Transformations of Handles
Handles form the connection between the organisation model and the goal
model. As such, they constitute the possibilities in the OM that a manager
explicitly considers for reaching a certain goal in the GM. In general, managers
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Analysing Transformations in Performance Management 229
start with a certain view on the handles. By experience, they learn that certain
handles do or do not work. This enhances or concentrates the set of options they
consider in steering. In that way, the set of handles is personally defined and
alters over time. In this respect, the professionalisation of management aims at
providing managers with the right set of handles.
Performance Management can be used to transform the control style and the
budgeting process of the organisation. In addition, it makes the current way of
working more explicit and thus transparent. This leads organisations toward a
result-driven style of management.
The control style and organisation’s budgeting process can have an impact on the
way an organisation will set up its performance management model. Different
control styles can be identified. Goold and Cambell, (as described in Strikwerda,
2000), have defined three control styles: strategic planning, strategic control and
financial control. An organisation should select a dominant style to make clear
what the organisation expects from the PMM (De Waal, 2001). In addition, this
signals the dominant type of performance indicators which will be used. Financial
control will use primarily financial indicators, while strategic planning requires
information from non-financial sources as well.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
230 Wondergem and Vincent
The nature of the budgeting process influences the construction of the PMM. In
result-based budgets, as opposed to cost-based budgets, the relationship be-
tween performance and patterns of action-reaction is more complex. This
requires detailed insight into the causal relationship between the MI’s and the
links with handles. The use of a balanced scorecard will thus coincide with the
construction of more professional models of performance management.
Example: Results and indicators: The contact center from example two is
directly responsible for the quality of contact only. The contact center has
two goals: (1) quality of contact, and (2) costs. In the case of financial
control, the contact center mainly will be reviewed on the total costs.
In the strategic planning style not only the costs, but also the quality
becomes more important. Then Table 1 gives a reflection of relevant
performance indicators. Organizations that change their control style from
financial to strategic or vice versa thus also transform their performance
management model.
The Deming circle, as stated before, is widely used to manage the process of
continuous performance improvement. The Deming circle is executed at strate-
gic, tactical and operational levels. The integration between the management
levels (vertical integration) and between the departments on the same level
(horizontal integration) should ensure that the actions are coordinated and all
directed toward the strategic goals. The Deming circle can be seen as the
operationalisation of management control. Management control consists of
several control subsystems each with their own view on strategy (Simons, 1995).
We mention two types: (1) the diagnostic control subsystem, where strategy is
seen as a plan and performance indicators are used as control object and (2)
interactive control, where strategy is viewed as a pattern of actions. In our
consulting practice, we see that performance management often solely focuses
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Analysing Transformations in Performance Management 231
on the diagnostic control system (Wondergem & Eskens, 2003). The diagnostic
control system, however, is a single-loop learning process, while successful
implementation of performance management requires at least a double-loop-
learning process (Kaplan & Norton, 1992). In single-loop learning, the actions
are directed towards realising the target of the PI’s. Double-loop learning
enables organisations to analyse and revise the assumptions underlying the PMM
and uses the evidence to define new norms. With double-loop learning, the
organisation can plan the strategy and the necessary actions to realise the
business strategy. In that, the organisation uses the PMM to operationalise the
business strategy and can evaluate the strategic map and the causal relation-
ships. Double-loop learning can make the connection between the diagnostic and
interactive control system. Finally, deutero learning can be distinguished (Argyris,
1982). Deutero learning is about the speed and quality of the learning process and
thus influences the flexibility and adaptability of the organisation. Considering the
increased dynamic nature of the business environment, we envision that deutero
learning will become a strategic necessity for many organisations.
In general, the business environment transforms from a make-and-sell environ-
ment (industry era) into a sense-and-respond environment (information era).
Changes in the environment make strategies obsolete. Therefore, the speed of
evaluating the chosen strategy increases and the speed and frequency of
walking, through the Deming circle should keep the same pace. As an effect, the
strategic planning horizon shortens and budgets become obsolete sooner. As a
consequence, information should be available in real time and actions are
focused on short-term results. To make sure that the organisation develops the
right product features and is able to adapt to the changes in a flexible manner,
the performance results must be reviewed more frequently. We envision that,
instead of making a yearly budget, organisations make quarterly rolling forecasts
and align their resource allocation with the strategic requirements. This is in line
with the vision of the Working Council for CFO’s (WCC, 2001).
Example: The responsibility of the Contact Center for only the quality of the
contact is based on the assumption that the product (make and sell) and the
service of the product (sense and respond) can be separated and that the
Contact Center is only a service entrance and not a sales point. Nowadays,
however, customers do not separate sales and service; customers calling
a contact center also want information about products or even want to buy
products (cross-selling). Fulfilling customer needs with additional product
features (extra games for a Nintendo) has a high impact on the satisfaction
of the product.
When the satisfaction of the product is declining and the customer Contact
Center meets all of their goals (quality of contact, quality of process,
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
232 Wondergem and Vincent
Conclusions
We have described a framework for analysing transformations in performance
management, including the performance management model. Using this frame-
work, several types of transformation were described. The framework describes
which elements of the organisation and its management model can be trans-
formed by performance management, as well as the factors that play a role in
the transformations. In addition, an initial description of the properties of the
transformations was given and future consequences for the organisation
were sketched.
This chapter has focused strongly on the information that is used in transforma-
tions in performance management, as formulated in the goal model and the
organisation model. As an additional aspect, the section about future directions
sketched possible paths of evolution for organisations that use performance
management. Actually using the information in PM was only briefly touched
upon in this chapter. It is, however, an important issue since it heavily influences
the success of the implementation of PM. The combination of insights into which
information is necessary for steering, how to organise performance management
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Analysing Transformations in Performance Management 233
and knowledge of effective ways of actually using the information, will provide
better means for successfully implementing performance management.
References
Argyris, C. (1982). Reasoning, Learning and Action: Individual and
Organisational. San Francisco, CA: Jossey-Bass.
Birnberg, J.G., Turpolec, L. & Young, S.M. (1983). The organisational context
of accounting. Accounting, Organizations and Society, 8, 111-130.
Chang, R.Y. & Morgan, M.W. (2000). Performance Scorecards. San Fran-
cisco, CA: Jossey-Bass.
COPC. (2002). COPC Performance Management System – Release 3.2B.
Customer Operations Performance Center, Inc.
Deming, W.E. (1982). Out of the crisis: Quality, productivity and competi-
tive position. Cambridge: Cambridge University Press.
De Waal, A. (2001). Towards world-class performance management.
Tijdschrift Financieel Management. In Dutch.
Harry, M.J. (1998). The Vision of Six Sigma, 8 volumes. Phoenix, AZ: Tri Star
Publishing.
INK. (2001). Manual for assessing the position of businesses. Zaltbommel,
The Netherlands: INK.
Kaplan, R. & Norton, D. (1992, January/February). The Balanced Scorecard
– Measures that Drive Performance. Harvard Business Review.
Kaplan, R. & Norton, D. (2000). The Strategy Focused Organization.
Harvard Business School Press.
Locuratolo, E. (2002). Designing Methods for Quality. Information Modelling
and Knowledge Bases XIII. IOS Press.
Mazur, G.H. (1993). QFD for Service Industries. Proceedings of the Fifth
Symposium on Quality Function Deployment, Novi, Michigan.
Mintzberg, H. (1983). Structures in Five: Designing Effective Organiza-
tions. Prentice Hall.
Mintzberg, H. (1991). Strategy and intuition – A conversation with Henry
Mintzberg. Long Range Planning, 24(2), 108-111.
Mintzberg, H. (1994). The rise and fall of strategic planning: Reconceiving
roles for planning, plans, planners. New York: The Free Press.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
234 Wondergem and Vincent
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Multimedia Conversion with the Focus on Continuous Media 235
Chapter XI
Multimedia Conversion
with the Focus
on Continuous Media
Maciej Suchomski,
Friedrich-Alexander University of Erlangen-Nuremberg, Germany
Klaus Meyer-Wegener,
Friedrich-Alexander University of Erlangen-Nuremberg, Germany
Abstract
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
236 Suchomski, Märcz, and Meyer-Wegener
Introduction
Multimedia data are ubiquitous today. The formerly separated areas of music
recordings, radio, and television are all moving to digital formats, which in
essence means that all recordings are becoming data and can be stored and
manipulated as data. Standard data storage systems can be used for sound and
video, and both can be transmitted over computer networks. The multimedia
computer as the endpoint is beginning to replace facilities such as telephone,
radio, VCR, TV, and disk players.
While this looks like integration and simplification, the computers themselves are
anything but homogeneous. They are equipped with many different kinds of
displays and audio/video boards, not to speak of software. Hence, the same piece
of media content must be available in a large variety of formats. User require-
ments regarding platform and quality on one hand and resource limitations on the
other even increase this variety. The simplest way is to create copies in all known
formats, but this has many deficiencies, in particular when updates are neces-
sary. As an alternative, transformations are available. So it seems to be a useful
approach to keep media assets in a single copy and in a neutral format, and to
transform them on request into the format needed by a particular user. This is
even more useful in a large archive of media assets that is used by applications
on many different platforms, e.g., in an authoring or teaching scenario. Assume
for instance an archive of medical images and videos which must be kept without
any loss of information and thus will potentially be rather large. In lectures and
other presentations, however, a compressed version on a laptop computer will be
more appropriate. While some of the images and videos can be transformed
offline before the presentation, a discussion could create the need to access other
objects online. Then a transformation at the time of the request is unavoidable.
In the following, the term “media object” (MO) will be used for any kind of
media data that belong to a single medium, i.e., text, image, audio, or video. Of
course, media objects can be combined into multimedia objects (MMO), but their
handling must be clarified before. If a media object is available in one form and
is then requested in another, it must be transformed. In this chapter, the term
“conversion” will be used to denominate all forms of transformation on
multimedia data. Many conversion algorithms and programs are at hand, so they
should be re-used in this context. In order to fulfill any kind of user request,
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Multimedia Conversion with the Focus on Continuous Media 237
Fundamentals
This section provides technical definitions in the discussed field. Obviously, the
meaning of media data and multimedia data should be defined first. Media data
are text, image (natural pictures, 2D and 3D graphic, 3D pictures), audio (natural
sounds including human voice, synthetic) and video (natural video, 2D and 3D
animation, 3D video). These data have a special digital representation when used
in computers that is called media object (MO). Multimedia data combines
more than one of mentioned media data and is represented by multimedia
objects (MMO’s).
While text and image are well known and reasonably simple to understand, the
emphasis here is on audio and video — called audio stream and video stream
respectively from now on. An audio stream consists of discrete values (samples)
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
238 Suchomski, Märcz, and Meyer-Wegener
usually obtained during the process of sampling audio with a certain frequency.
A video stream consists of images (frames) that have also been captured with
a certain frequency. The characteristic distinguishing these streams from other
kind of data is the time constraint: The occurrence of the events (samples or
frames) is ordered, and the periods between them are constant. These charac-
teristics specify the continuous properties of a data stream (and justify why they
are called time-dependent data or timed data). Thus a media or multimedia object
with this time constraint is usually referred to as a timed MO or a timed MMO
respectively. Because continuous properties are almost always present in
MMO’s, the term timed is usually skipped (just as non-timed is skipped when
referring to MO’s which are not time-dependent, e.g., images). By the way, the
terms “timed” and “time-dependent” are often interchanged with the term
“continuous”.
In reality audio-video streams (as well as non-timed objects) often contain
additional information that describes the stream (object) itself. It is called meta
information, and it includes among others: stream properties (e.g., duration, bit
rate) and quality properties (such as resolution, frame or sample rate, etc.). Meta
information heavily depends on the data it describes, and because each type of
media object differs from the other (e.g., stream properties are not present in
non-timed objects), meta information contains different properties. Because an
MMO consists of more than one MO, it must first identify the MO’s it includes
and second it must store some arrangement information (additional properties in
meta information), e.g., temporal and spatial layout.
In available specifications like MPEG-4 (Battista et al., 1999) and H.263 (ITU,
1996), the pictures (frames) of a video stream are grouped into groups of pictures
(GOP’s) and further into video sequences. To generalize this, a term is adopted
here from Gemmel et al. (1995): A quant is a portion of data that is treated as
one logical unit occurring at a given time. Representatives of quanta are a
sample, a frame, a text, or a combination of them (e.g., a GOP), etc. So, an
abstraction of the streams mentioned so far is a multimedia stream as a timed
data stream that consists of quanta.
In order to explain the transformation of MO’s and MMO’s in the following
sections, a logical description is needed. Some models of MO’s and MMO’s have
already been defined and discussed in the literature. It is common that each MO
has a type, a format and a content. They will be referred to as MO.type (e.g.,
text, audio, etc.), MO.content and MO.format, respectively. Within MO.format,
structure (e.g., frame rate, pixel depth, resolution) and coding scheme (e.g.,
QuickTime, MPEG-1/-2, DivX, XviD, MP3, AAC) can be further distinguished.
An MMO is structured similarly, but it adds to MMO.format data about relations
among the included MO’s, i.e., data on temporal/spatial relations.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Multimedia Conversion with the Focus on Continuous Media 239
converter graph
converter chain
converter converter
Based on this model of MO’s and MMO’s, all components of the conversion
process will now be described. This process is depicted in Figure 1. A converter
applies a conversion function to an MMO. It may add, remove, or change content
as well as format and media type of any part. A chain of converters couples
a few converters in order to perform a conversion process with a more complex
functionality. A chain is a directed, one-path, acyclic graph, i.e., a sequential line
of converters, which passes MMO’s from one node only to the next. A graph
of converters consists of more than one connected converter chains. It is used
for instance, if an MMO must be split into MO’s in order to use media-specific
converters, i.e., converters working only with a given type and/or format of
media object.
Regarding timed (M)MO’s, this general model of conversion graphs must be
extended to reflect the timing. Instead of whole (M)MO’s, only quanta are
passed from one converter to the next in the chain, and that must be done at a
particular point in time, e.g., every 40 milliseconds for a video with 25 frames per
second. The main goal in modeling timed MO or MMO conversions is to define
such an extended model independent of hardware, implementation, and environ-
ment (Marder, 2002). In our opinion, a very promising model is that of jitter-
constrained periodic event streams proposed by Hamann (1997). The author in
2001 added jitter-constrained data streams which suit multimedia very well.
These streams consist of a time stream and a volume stream. The former is
defined as t = ( T, D, τ , t0 ), where T is the average event distance (the period),
D the minimum distance, τ the maximum jitter (lateness), and t0 the starting point.
Analogously, the volume stream is defined as s = ( S, M, á , s0 ), where S is the
average quant size, M the minimum quant size, á the maximum jitter (deviation),
and s0 the initial value. Later, these models will be used in the deployment phase
to derive the important characteristics of the conversion process.
Another important issue regarding transformation of timed (M)MO’s is provid-
ing and controlling the level of quality. Quality of Service (QoS) is defined by
the ITU-T as, “a set of qualities related to the collective behavior of one or more
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
240 Suchomski, Märcz, and Meyer-Wegener
NETWORK
CLIENT SERVER
QoSNET
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Multimedia Conversion with the Focus on Continuous Media 241
Related Work
Media-data transformations have been well-known issues for quite some time.
Many applications for converting or transforming audio and video can be found.
Some built-in support for media files is available in many operating systems
(OS’s). Because of that, they are sometimes called Multimedia OS’s, but usually
the support can neither meet all requirements nor handle all possible kinds of
media data. In order to better solve the problems associated with the large
diversity of formats, frameworks have been proposed, e.g., DirectShow, Java
Media Framework (JMF), CORBA A/V Streams, and MME Toolkit. But to our
knowledge, no published work develops a theory of multimedia transformations
in a broad range, i.e., considering the other solutions. Of course, the use of filters
in multimedia networking and mobile communication has been covered exten-
sively.
The pioneers, Pasquale et al. (1993) and Yeadon (1996), introduced some
generalizations of video transformations. Pasquale defined a filter as a trans-
former of one or more input streams of a multi-stream into an output stream,
where the output stream replaces the input streams in the multi-stream (Pasquale
et al., 1993). He classified filters into three groups regarding their functionality:
selective, transforming, and mixing. Yeadon (1996) presented five generic filter
mechanisms: hierarchical, frame-dropping, codec, splitting/mixing, and parsing.
He also proposed the QoS-Filtering Model which uses a few key objects to
constitute the overall architecture: sources, sinks, filtering entities, streams, and
agents. There are many other papers, e.g., Margaritidis and Polyzos (2000) and
Wittmann and Zitterbart (1997), but they follow or somehow adopt the above-
mentioned classifications of the pioneers. All of them consider only the commu-
nication layer (networking) aspects, which is not sufficient when talking about
multimedia transformations.
The Microsoft DirectX platform is an ideal example of media transformations in
an OS-specific environment. The most interesting part of DirectX is DirectShow
(Microsoft, 2002b), which is responsible for dealing with multimedia files —
especially audio/video. It uses a filter-graph manager and a set of components
working with different formats. These are specially designed “filters” (also
called media codecs). Filter graphs are built manually or automatically (Microsoft,
2002a). Unfortunately, DirectX is only available under one OS family, and it does
not support QoS or real-time, so use at the client side is limited.
The Java Media Framework (JMF) by Sun (Sun Microsystems, 1999) is a
competitor of MS DirectShow. JMF uses processors (similar to filter graphs)
that are built from controls (filters) and are ordered in transformation chains.
Processors can be configured with suitable controls by hand or on the basis of
processor models. In contrast to filter graphs, processors can be combined with
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
242 Suchomski, Märcz, and Meyer-Wegener
each other. JMF is not limited to just one OS, but it does not support QoS or real-
time, either.
Posnak et al. (1997) proposed an adaptive framework for developing multimedia
software components called the presentation processing engine (PPE) frame-
work. PPE relies on a library of reusable modules implementing primitive
transformations (Posnak et al., 1996). They also proposed a mechanism for
composing processing pipelines from these modules.
Another work of some importance is VirtualMedia (Marder, 2000). It defines a
theory of multimedia metacomputing, i.e., a new approach to the management
and processing of multimedia data in web-based information systems. Marder
(2001) offered a solution for application independence of multimedia data by
introducing an advanced abstraction concept (called transformation indepen-
dence). It includes several ideas like device independence, location transpar-
ency, execution transparency, and data independence. In Marder (2002), the
author presented an approach to construct a set of connected filters, a descrip-
tion of the conversion process, and an algorithm to set up the conversion graph.
The transformation issues are solved by using individual signatures (media
signatures as well as filter signatures). Unfortunately, an implementational proof
of concept is still missing.
Other work in the field of audio/video transformation relates to the concept of
video transcoding (Kan & Fan, 1998; Keesm et al., 1996; Morrison, 1997), a
method allowing for interoperability in heterogeneous networks by changing
format, resolution, and/or transmission rate. So, they refer to a converter as a
transcoder. Dogan (2002) talks about video transcoding (VT) in two aspects:
homogeneous and heterogeneous. Homogeneous VT only changes bit rate,
frame rate, or resolution, while heterogeneous VT allows for transformations
between different formats and networks topologies, i.e., different video stan-
dards like H.263 and MPEG-4. Dogan (2002) gives a good overview and
proposes a solution, but he covers only H.263 and MPEG-4, and he does not
address the problem of transformation between different standards.
Last, but not least, related work that must not be omitted is an open-source
program for audio/video transcoding (Östreich, 2003). It is called “transcode,”
and it is still under heavy development, but a stable version is available. The goal
is to produce a utility for video-stream processing that can be run from a Linux
text console. The approach uses raw (uncompressed) data between input and
output, i.e., transcoding is done by loading modules that are either responsible for
decoding and feeding transcode with raw video/audio streams (import modules),
or for encoding the frames (export modules). Up to now, the tool supports many
popular formats (AVI, MOV, ES, PES, VOB, etc.) and compression methods
(video: MPEG-1, MPEG-2, MPEG-4/DivX/XviD, DV, M-JPEG; sound: AC3,
MP3, PCM, ADPCM), but it does not support real-time.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Multimedia Conversion with the Focus on Continuous Media 243
Summarizing, there are interesting solutions for media transformations that are
ready to be applied in certain fields, but still there is no solution that supports QoS,
real-time, and format independence in a single framework.
Modeling Conversions
The new idea here is to build a framework for managing transformations of
MMO’s that work in real-time if necessary and guarantee QoS. An architecture
based on conversion graphs is defined that includes an abstract model of
converters, categories of conversions, and a processing model for converters
(Schmidt et al., 2003; Märcz & Meyer-Wegener, 2002).
The first thing needed to transform MMO’s with QoS constraints is a basic
converter model. It must describe the conversion itself, the transport of data
during the conversion process, and the QoS guarantees. Due to the high diversity
and complexity of multimedia conversions, existing converters are used. In a
case of timed MMO’s however, they must be adjusted to provide real-time and
QoS.
In general a converter can be regarded as a black box. It converts incoming
media objects moi, 1<=i<=n to outgoing media objects mo’k, 1<=k<=m (Figure 3).
Neither input nor output is restricted to just one object, and the numbers need not
match. In many cases, n and m are less than three.
The functionality of the conversion is described by a set of conversion
functions C Ck : mo'k = CCk (mo1, mo2, ..., mon). Generally these functions consist
of three parts. First, each function maps the parameters of the incoming
stream(s) to those of its outgoing stream. For timed MO’s, these parameters are
defined by the model of jitter-constrained periodic streams and include average
quanta size with jitter (for volume streams) or average event distance with jitter
mo1 mo'1
mo2 mo'2
converter
mon mo'm
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
244 Suchomski, Märcz, and Meyer-Wegener
(for time streams). Second, the functions describe the format conversions FCk :
f 'k = FCk(f1, f2, ..., fn) which map the incoming media-object format description(s)
(f i = mo i.format) to its outgoing media-object format description (f' k =
mo'k.format). As a third part, the incoming media objects are themselves
converted to the outgoing media object. This is given by the converter code, in
contrast with the others parts which must be defined during a converter analysis
based on a given converter.
Categories of Transformations
To organize the set of all converters, their conversion functions are grouped in
categories with individual features (Figure 4). The first category is that of
media-type changers. Converters in this category transform the media type of
one of the incoming media objects (moi.type) into another media type for an
outgoing media object (mo’k.type), where ∃i,k: moi.type ‘≠mo’k.type and fi ‘≠f’k
and moi.content = mo’k.content. Because two different media types do not have
common formats, the format of the media object must be changed, too. The
content, however, should remain the same — which is naive in many real
applications, but it is the goal of this operation. A typical example is that the media
type must be changed due to hardware limitations (no audio device), or that
clients are handicapped people. Hence, speech recognition is used to turn audio
into text, and “readers” do it the other way around.
Converters of the second category do not change the media type, but the format
of the media object. They are called format changers, and are described by
∃ i,k: mo i.type = mo’ k.type and f i ≠ f’ k . While not changing the content
(moi.content = mo’k.content) is the goal, it may not be possible in reality, because
media type
format content changers
changers
changers
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Multimedia Conversion with the Focus on Continuous Media 245
some formats require compression and thus a loss of information. The content
of mo1 and mo2 is considered to be the same, if mo 1 can be transformed into mo2
and vice versa without using external knowledge or information. So format
changers are split into two subcategories, namely, lossless and lossy. Lossless
changers have an inverse conversion function C’hC with ∀i∃h : moi = C’hC (C1C(mo1,
mo2, ..., mon), C2C(...), ..., CmC (...)). For lossy conversions, the format change implies
a content loss which must be known to the clients and must be acceptable for
them. More details can be found in Marder (2002). Examples of this category are
the typical encoders and decoders, many of which compress media objects (e.g.,
DivX, MPEG, H.26x, RLE). A format change can be caused by hardware
limitations (channel coding, colour reduction, down-sampling) or by user demand
(resizing).
Finally, the third category of converters are those which explicitly change the
content of a media object, while the format is not affected: ∃i,k: moi.type =
mo’k.type and fi = f’k and moi.content ≠ mo’k.content. They are called content
changers. Examples are editors, edge markers, high-pass or low-pass filters.
A very special case of content change is the dropping of quanta, which can occur
because of resource limitations. In order to guarantee a certain QoS, uncon-
trolled quanta loss must be avoided. However, a special converter can be
scheduled as a lossy converter with a controlled (statistical) quanta loss. This
often means lower resource requirements for the whole converter graph,
compared to the guarantee of no quanta loss. Normally these lower resource
requirements are preferred, but the decision between lossless and lossy delivery
must be made by the user (accepting the quanta loss). This kind of content
change in particular makes sense, if the quanta loss is not even noticed by the
user, that is, the same QoE is produced.
As described above, timed MO’s are structured into quanta (frames, samples,
GOP’s, etc.). Transfer is then done quant by quant, forming a streaming process.
In this process, a converter reads a sequence of quanta (or just one quant) from
its input stream, processes these quanta and writes a sequence of quanta into its
output stream. Here it is no longer sufficient to treat a converter as a black box,
a closer look is needed. Hence, a converter is broken down into an input, an
output, and a conversion (or processing) part. The conversion part is treated as
a black box again. These three parts are organized in the main loop, which is the
characteristic structure of each streaming converter (Figure 5).
In general, the main loop starts after initialization with reading the first quant
(beginning of stream), iterates over all other quanta, and ends with reading the
last quant (end of stream) and writing the associated output. For timed MO’s, the
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
246 Suchomski, Märcz, and Meyer-Wegener
converter begin
main loop
process quanta
main loop operates periodically, so there is a time limit for the processing of each
quant. Since size and contents of the quanta may differ significantly, the
processing time varies. Some quanta are processed completely before the end
of the period, while others need more time than available. It has been observed
that the maximum cumulative deviation from the end of the period is much more
useful in calculating the effects than the distribution of processing times. This is
described by the jitter-constrained periodic time streams that have already been
introduced.
While they model the time jitter and thus the maximum lateness, it is equally
important to handle the deviation of transferred data volumes, e.g., for the
calculation of buffer sizes. This is described as a jitter-constrained periodic
volume stream. The jitter in volume streams results from different quant types
and compression levels. In summary, each converter instance must have a
description of all incoming media objects moi and all outgoing media objects mo’k
as jitter-constrained periodic time and volume streams. Please note that these
descriptions are data-dependent, i.e., the streams are different for different
media objects. It is then necessary to find mappings from the description of the
input streams to the description of the output streams.
To achieve that, the use of resources by the converter must be taken into
account. All resources are managed by the underlying operating system.
Resource usage can again be characterized by jitter-constrained periodic time
and volume streams. Processing the quanta is done in periods with jitter, hence
the use of resources is also periodical (in general with a shorter period), and the
volume of data handled varies around an average size (which may be smaller that
the average quant size), so the same kind of model can be used.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Multimedia Conversion with the Focus on Continuous Media 247
Model-Specific Issues
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
248 Suchomski, Märcz, and Meyer-Wegener
The first assumes that converters are coupled by interfaces. It means that the
format of the output data of one converter is accepted as input by the consecutive
converter. In other words, the interfaces of output and input (metadata of
quanta) must be the same (or at least must allow for the creation of a connection
that makes sense). This means that the input interface of the subsequent
converter includes or extends the output interface specification of the previous
converter. For instance, if the first converter produces quanta in YCrCb colour
format, the second converter must accept this format (it may accept other
formats as well).
The second method is based on the functions of the converters. Each converter
performs a specific task. The managing application stores information on
functions of available converters and on functionally correct chains (e.g., colour
conversion, resizing). Based on this, converters that have the requested function-
ality are chosen to build a chain. For example, display on screen (RGB colour
space) with resolution and colour space different from the source is requested.
So, converters doing resizing and converters doing colour conversion are chosen
in order to build a conversion graph for this task.
These two methods both have their limitations. In particular, they might lead to
a logically correct chain based on interfaces that does not provide the requested
functionality. Similarly, a logically correct chain based on functionalities may
result in which the interfaces do not match each other (so no quant can be passed
from one converter to the next, because that would require a compatible type of
quant). So the two methods can only be applied with some limitations. In order
to be on the safe side, a third and most reasonable method is defined. It combines
both methods and builds a chain using the functions as well as the interfaces of
the converters. The result is a logically correct chain in all respects, i.e., function
and data correctness are provided. The way to build up correct conversion
graphs respecting both functionality and interfaces is given by a couple of
optimisation algorithms, e.g., Greedy, Simulated Annealing or Evolutionary
Algorithms (Michalewicz & Fogel, 2000). In Marder (2002), which was already
described as related work, signatures for media objects and converters (alias
filters) were introduced. Based on these signatures, a set of functionally correct
conversion graphs can be found with the aid of optimisation algorithms. The work
also proposes information how to use these algorithms. Additionally, converter-
graph transformations can be used for further optimisations.
With a given conversion graph, the next step is to guarantee a certain QoS
(Claypol & Tanner, 1999; Abdelzaher & Shin, 1999; Campbell, 1996). This for
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Multimedia Conversion with the Focus on Continuous Media 249
example means that only a limited number of frames in a video stream are
dropped, or that the jitter in frame presentation or recording must not exceed a
given maximum.
Here, a real-time environment with an efficient resource scheduling is needed
that can only be provided by a real-time operating system (RTOS) (QNX, 2001;
Härtig et al., 1998; Yodaiken & Barabanov, 1996). The task of the multimedia
transformation is to provide the scheduling parameters to the RTOS and to do an
admission control to evaluate a converter graph as schedulable or non-schedulable.
This means that some system resources are scheduled at application level
(Märcz & Meyer-Wegener, 2002).
Resource requirements can be divided into two parts, a static part which does not
depend on time (e.g., memory), and a dynamic part referring to active resources
which depends on time (e.g., CPU, DMA, busses). While the static part is
described by the volume required over the whole transformation process, the
dynamic part is treated as a bandwidth (resource use per second). “Bandwidth”
here means the number of operations or the amount of data that must be
processed or transmitted in the time interval between quant arrival (before
processing) and quant departure (after processing). Processing a quant must not
affect the resources needed for the next quant. Hence, the time interval between
quant arrival and quant departure must be long enough to handle all possible
operations on a quant. This can be determined by worst-case analysis or —
considering the high jitter in execution times of quant processing — by mean
values plus a bounded jitter. Both worst-case and mean-plus-jitter can be
described as jitter-constrained periodic streams.
Now, a significant part of the scheduling is the description of the converters
involved, including resource requirements as jitter-constrained periodic streams,
which must be done separately for each converter. This description is easy to
find for converters without content dependencies, i.e., the resource require-
ments can be derived directly from the parameters of the input formats fi =
moi.format. Otherwise the resource requirements of a converter depend on the
media object itself, i.e., the format fi plus the content moi.content. At the moment,
these content-dependent resource requirements are merged into the jitter value
of the jitter-constrained periodic streams. The calculation of an accurate jitter
value from format fi and some characteristic content-description parameters
(e.g., motion complexity) is yet to be developed.
For each input object moi, the function R : s = R (mo ) yields the dynamic
C
r r
C
r i
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
250 Suchomski, Märcz, and Meyer-Wegener
∀r : ∑ Q Cr ≤ C r .
C running
∀r : ∑ Q Cr + ∑ Q Cr ≤ C r ,
CrunningC new chain
which means that the additional requirements of the new converter chain must
be less or equal to the available resources, or the sum of all chains, including this
one, must be less than or equal to the system capacity. In some situations (with
dependencies between two dynamic resources) this describes only the neces-
sary condition.
Evaluation
Many projects have been started by researchers all over the world, but they have
followed different development directions such as multimedia transformations
on gateways in heterogeneous networks, transformations on client-side soft-
ware and hardware systems, end-to-end AV conversion solutions for telecom-
munications, etc. Here, the work focuses on the multimedia database aspect.
The goal of obtaining format independence for stored multimedia data has
provided the background for the investigation of conversion. Two ongoing
projects, memo.REAL (Märcz, 2003) and RETAVIC (Suchomski, 2003), are the
means to evaluate the models of conversion. An overview is given in Figure 7.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Multimedia Conversion with the Focus on Continuous Media 251
converters media data
analysis
benchmarking
memo.REAL
models
(data,
conversion,
resources)
jitter-
adoption
constrained
to realtime
periodic
& QoS stream
(ratecontr.) scheduling
& admission
(bandwidth)
RTOS
4-6)81+ (DROPS)
runtime
(CSI)
The tasks covered by each of the projects are marked in grey, and the
surrounding ellipses represent exterior factors.
The first step is analysis and benchmarking of existing converters with available
media data. The outcome is used in a modeling phase that covers three different
aspects: data, conversion, and resource models (by applying the theory of jitter-
constrained periodic streams). The next step is to develop scheduling and
admission control methods that are strongly influenced by RTOS. In these
projects, DROPS (Härtig et al., 1998) is used. The execution of a conversion
process is also placed in an RTOS, which serves as a fully-controllable run-time
environment. To enable a converter to run in an RTOS, it has to be adapted to
this environment, i.e., it has to be extended to use the component streaming
interface (CSI), which is defined especially for controlling and real-time oper-
ability (Schmidt et al., 2003). It can further be used by a control application. So,
the process of developing real-time converters requires the following inputs: a
given non-real-time converter, the analysis of behaviour, and the description in
terms of the defined models. Moreover, it is very likely that in the implementation,
the model and the adoption will influence each other.
Summarizing, the theses works of five students (master theses and study
projects) have already been finished. They have contributed to RETAVIC and
memo.REAL by producing usable outcome that proves the theoretical ideas.
Namely, a benchmark for AV compression, a bit-rate control algorithm for an
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
252 Suchomski, Märcz, and Meyer-Wegener
Conclusions
While the overall subject of multimedia conversion is very complex and thus still
includes unsolved problems, some initial solutions are available. It is common
knowledge that converters should be organized in chains or graphs instead of
building new converters for each task. In order to make such a conversion graph
executable in a real-time environment, models are needed for the different
components involved. This includes the data being processed. It consists of
quanta which are manipulated and transferred in periods. The converters used
must be described by their functionality, i.e., the mapping of input to output, as
well as the resource requirements. The model of jitter-constrained periodic
streams turned out to be very useful here, for the timing and for the data volumes
handled. This model is rather simple in that it only needs four parameters per
stream. Still it allows for a variety of derivations needed in this context. Based
on such a description, initial scheduling of conversion graphs in a real-time
environment becomes possible. A simple bandwidth model allows us to prevent
the over-use of resources.
Much more work is required. The models need to be refined. In parallel, a system
is being built to evaluate the decisions made on their basis. This will significantly
influence the work on the models, because their simplicity is an advantage that
should not be given up without reason.
References
Abdelzaher, T.F. & Shin, K.G. (1999). QoS Provisioning with qContracts in Web
and Multimedia Servers. Proceedings of the 20th IEEE Real-Time
Systems Symposium, Phoenix, AZ, USA, December 1-3, (pp. 44–53). Los
Alamitos, CA: IEEE Computer Society.
Battista, S., Casalino, F. & Lande, C. (1999). MPEG-4: A Multimedia Standard
for the Third Millennium, Part 1. IEEE Multimedia, 6(4), 74-83.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Multimedia Conversion with the Focus on Continuous Media 253
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
254 Suchomski, Märcz, and Meyer-Wegener
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Multimedia Conversion with the Focus on Continuous Media 255
Pasquale, J., Polyzos, G., Anderson, E. & Kompella, V. (1993). Filter Propaga-
tion in Dissemination Trees: Trading Off Bandwidth and Processing in
Continuous Media Networks. In D. Shepherd, G.S. Blair, G. Coulson, N.
Davies & F. Garcia (Eds.), Network and Operating System Support for
Digital Audio and Video, Fourth International Workshop, NOSSDAV
’93, (pp. 259-268).
Plagemann, T., Saethre, K.A. & Goebel, V. (1995). Application Requirements
and QoS Negotiation in Multimedia Systems. Proceedings of Second
Workshop on Protocols for Multimedia Systems (PROMS’95), Octo-
ber. Salzburg, Austria.
Posnak, E.J., Lavender, R.G. & Vin, H.M. (1997). An Adaptive Framework for
Developing Multimedia Software Components. Communications of the
ACM, 40(10), 43-47.
Posnak, E.J., Vin, H.M. & Lavender, R.G. (1996). Presentation Processing
Support for Adaptive Multimedia Applications. In M. Freeman, P. Jardetzky
& H.M. Vin (Eds.), Proceedings of SPIE Vol. 2667 [Multimedia
Computing and Networking 1996, San Jose, CA, USA, January 29-31,
1996, (pp. 234-245).
QNX. (2001). QNX Neutrino RTOS (version 6.1). QNX Software Systems
Ltd.
Schmidt, S., Märcz, A., Lehner, W., Suchomski, M. & Meyer-Wegener, K.
(2003). Quality-of-Service-based Delivery of Multimedia Database Ob-
jects without Compromising Format Independence. Proceedings of the
Ninth International Conference on Distributed Multimedia Systems,
Miami, FL, USA, September 24-26.
Suchomski, M. (2003). The RETAVIC Project. Retrieved July 25, 2003, from
the WWW: http://www6.informatik.uni-erlangen.de/retavic/
Sun Microsystems, Inc. (1999). Java Media Framework API Guide (Nov. 19,
1999). Retrieved January 10, 2003, from the WWW: http://java.sun.com/
products/java-media/ jmf/2.1.1/guide/
Tsinaraki, Ch., Papadomanolakis, S. & Christodoulakis, S. (2001). A Video
Metadata Model supporting Personalization & Recommendation in Video-
based Services. Proceedings of the First International Workshop on
Multimedia Data and Document Engineering (MDDE 2001).
Wittmann, R. & Zitterbart, M. (1997). Towards Support for Heterogeneous
Multimedia Communications. Proceedings of 6th IEEE Workshop on
Future Trends of Distributed Computing Systems, Bologna, IT, Octo-
ber 31–November 2. Los Alamitos, CA: IEEE Computer Society.
Yeadon, N.J. (1996). Quality of Service Filtering for Multimedia Communi-
cations. Ph.D. thesis. Lancaster, UK: Lancaster University.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
256 Suchomski, Märcz, and Meyer-Wegener
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Coherence in Data Schema Transformations 257
Chapter XII
Coherence in
Data Schema
Transformations:
The Notion of Semantic
Change Patterns
Lex Wedemeijer, ABP Pensioenen, The Netherlands
Abstract
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
258 Wedemeijer
Introduction
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Coherence in Data Schema Transformations 259
Background
Limited Models
conceptual realm:
Symbolic Models
Conceptual
Schema
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
260 Wedemeijer
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Coherence in Data Schema Transformations 261
Consequently, two CSs may cover the same part of reality but their specifica-
tions may be incompatible due to some fundamental differences in their
underlying data model theories (McBrien & Poulovassilis, 1998). Such theoretic
differences do not derive from the outside world and do not reflect evolution of
environment in any way (Bézevin, 2000). Even if the data model theory is firmly
fixed, two CSs modeling the same UoD may differ. This is because many data
model theories are “rich”, i.e., they provide alternative ways to capture and
model a particular real-world feature (Saiedian, 1997; Knapp, 1998). Hence, a
real-world feature can be modeled first in one way and later in another, even if
there is no change in the real world. Again, such differences, commonly referred
to as semantic heterogeneity or discrepancy (Kent, 1991; Tseng, Chiang &
Yang, 1998), do not derive from the outside world and do not reflect evolution
of environment in any way.
Regrettably, standard data model theories such as E-R and UML come in many
variants and can permit a real-world feature to be modeled in several ways.
Moreover, these data model theories may cover aspects that we consider non-
semantic, such as primary key composition. Hence, to present our results, we
cannot rely on standard data model theories. Instead, we have to employ a
semantic data model theory that is “essential” as opposed to “rich” to avoid the
kinds of theoretic differences and heterogeneities as explained above. Main
features of this data model theory are outlined in the appendix.
We emphasize that we focus on the operational life, not on the design phase of
the CS life cycle. Several authors have studied the notion of design primitives
(Batini, Ceri & Navathe, 1992; Fernandez & Yuan, 2000; Hartmann, 2000). In
CS design, an abstracted model is created and gradually adjusted and improved
to capture the semantics of the UoD. However, our interest lies with the
operational CS. We regard design adjustments to be part of the schema
development phase, and do not consider it as true CS evolution. The amount and
characteristics of adjustments in the design phase are a hallmark of the
designer’s ability and experience in modeling, rather than an expression of real
changes in the UoD and corresponding evolution of the operational CS.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
262 Wedemeijer
First principles say that the CS ought to change only if the information structure
of the UoD changes. Literature has it that major business events like mergers,
acquisitions and diversifications are the change drivers that cause change in the
CS (ANSI/X3/Sparc, 1975; Galliers, 1993; Orna, 1999). In theory, CS changes
can only be justified by true structural changes in the UoD. However, we feel
that it is too limited a view to study only justified changes in the CS. Other events,
such as downsizing to another database platform, or a general change in
modeling guidelines, may present engineers with legitimate opportunities to
change the CS. Such CS changes may be unjustified, but in practice, they are
quite common. The same principles also suggest that the timing of a CS change
must coincide with the change driver. However, it is our experience that some
CS changes may be postponed for quite some time, while other changes may
even precede the change in the UoD. The currency conversion to Euro is an
example where CSs have been prepared well in advance of an upcoming change
in the real world.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Coherence in Data Schema Transformations 263
analyzing the possible changes of constructs in the data model theory, a research
approach referred to as taxonomy (Roddick, Craske & Richards, 1993).
However, Brèche (1996) already pointed out the gap between “schema changes
by means of primitives closely related to the respective data model,” sometimes
referred to as syntactic changes, and what he calls “advanced primitives,” i.e.,
changes of a more semantic nature.
Indeed, the example of adding a single “entity” construct into a CS is fictitious:
it never happens in operational systems. An entity is added only if it has some
significance in the CS, and therefore the new entity will always be related to
something else in the schema. We feel that an understanding of syntactic
changes only is inadequate (Liu, Chryssanthis & Chang, 1994). The shortcom-
ings are in both a lack of real-world semantics, and in ignoring the impact of
change.
Change is Data-Preserving
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
264 Wedemeijer
Problem Statement
The majority of current approaches for schema change are targeted towards the
CS design phase. However, schema changes in the “systems development
phase” are fundamentally different from the kinds of changes that are necessary
in the operational phase. Therefore, it cannot be assumed that such design
approaches will suit the needs of maintenance engineers. Indeed, to accommo-
date a change in a running information system and database calls for different
techniques than to change a system being developed.
Design approaches fail to systematically incorporate the knowledge of actual
changes as observed in operational CSs. The customary design approach is to
create a CS design early in the design phase, and not to alter it thereafter.
However, semantic change can and will occur in later phases of the CS life
cycle.
Theoretic approaches to schema-evolution and change taxonomies are primarily
concerned with the potential for change that the data model theory may bring
about in the CS. They do not yield insight into the types of actual changes in
operational CSs or their complications.
CS Change in Business
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Coherence in Data Schema Transformations 265
carefully considered, not only for the stored data but for other components of the
information system as well: existing business procedures, applications and user
interfaces, user training, etc. It is no surprise that engineers try to keep the impact
of change as small as possible. In effect, they aim for an adapted CS that is an
adequate model of the new UoD, but at the same time as close to the old CS as
possible. Indeed, the major difference between CS change in the operational
phase versus design is the need for data coercion. In design, no operational data
has yet been stored and hence, data coercion is irrelevant.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
266 Wedemeijer
It is our experience that many CS modifications are not new and innovative, but
appear to be modeled on some basic “model to be copied” whenever appropriate.
A pattern, as commonly understood in information systems engineering, is an
abstracted, generalized solution that solves a range of problems with shared
characteristics. Gamma, Helm, Johnson and Vlissides (1995) have it that: “each
pattern describes a problem (..), and then describes the core of the solution to that
problem, in such a way that you can use this solution a million times over, without
doing it the same way twice.”
We understand the notion of semantic change pattern to be: any change in the
operational CS that is coherent, has unambiguous semantics, and is accommo-
dated in a single maintenance effort.
The notion of patterns is not new to the database community, however, the
patterns described in current literature concern design only. The semantic
change patterns that we will describe consider the operational life cycle phase,
and account for both the level of CS semantics and the stored data. We restrict,
however, to the conceptual level of the stored data. We do not digress to either
External Schemas or the implementation level of the Internal Schema.
Pattern Detection
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Coherence in Data Schema Transformations 267
A few patterns are outlined in some detail, but due to space limitations, we cannot
describe all patterns in full detail. Instead, we simply show the patterns as a set
of diagrams: before and after the change. We add a brief description to convey
the essence of the pattern, so that the reader will be able to apply the change
patterns in a practical situation by adjusting the appropriate pattern to meet the
needs at hand. We express the diagrams in a variant of the Entity-Relationship
data model theory, but this does not limit the validity of the patterns. The
semantics of the change patterns will still be valid in other data model theories,
the only differences being in syntax.
Append an Entity
This semantic change pattern is perhaps the most familiar one. It clearly
demonstrates schema extensibility that is often alluded to.
• The pattern applies when the relevant real world extends, or equivalently,
when users perceive it to be extended. A typical example is schema
integration: the real world is unaffected, but user perception thereof is
extended and users require that extra information be captured that extends
on data captured about existing real-world objects.
• The pattern syntax is to insert a new entity, and add a reference to an
existing entity that now becomes an “owner” entity. A new specialization
is induced in the existing entity, i.e., the set of instances that is being
referred to by the new member entity. Data manipulation routines must be
adjusted in order to ensure referential integrity when owner instances are
updated or deleted.
• The pattern comes with little or no complications, and no conversion of
stored data is needed.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
268 Wedemeijer
Superimpose an Entity
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Coherence in Data Schema Transformations 269
Connect by Reference
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
270 Wedemeijer
reflexive
Connect by Intersection
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Coherence in Data Schema Transformations 271
On the CS level, elimination is the reversal of the Append pattern, however, not
on the data level. There are numerous reasons why information in the CS may
turn obsolete, but obsolescence alone does not constitute a need to change the
CS. Indeed, eliminations are rarely committed as soon as possible, but are often
postponed until maintenance is done for some other reason.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
272 Wedemeijer
Entity intent is its definition, i.e., the accurate description of relevant real world
objects in the UoD that it is supposed to record. To extend the intent is to allow
more real world objects to be recorded by the entity. For example, a “car” entity
may be extended to also record motorcycles.
This pattern is a clear illustration of extensibility. The semantic change is not very
evident in the CS diagram. The extension is often left implicit, or it may even go
unnoticed. Nonetheless, the CS is changed and the extended entity will structur-
ally have more instances on record. Also, instances may be kept on record that
were previously deleted, e.g., when a “partner” entity is extended to capture ex-
partners as well.
This pattern is the opposite of the previous pattern that extends entity intent.
Fewer instances will be recorded, i.e., some particular specialization is dropped.
Often one or more attributes, often the optional ones, can also be dropped. The
diagram shows an example where two entities are restricted at the same time.
The reference between the entities is restricted accordingly.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Coherence in Data Schema Transformations 273
Promote a Specialization
Relax a Reference
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
274 Wedemeijer
latter variant. This semantic change pattern is not very frequent. The few
examples that we observed in our cases are all associated with minor CS design
flaws that emerged in operation.
Discussion
Validity
Two arguments underpin the validity of the described patterns. First, these
patterns are based on real changes in operational Conceptual Schemas seen to
evolve in the business environment. We extracted the catalogue of 12 semantic
change patterns from a series of case studies, where we established that this
dozen patterns accounts for over 75% of the observed semantic changes.
Evidently, these patterns embody proven practices that can be encountered in the
business environment.
Second, we have found that practitioners readily recognize these patterns, and
we take this as another mark that the notion of Semantic Change Pattern has
practical value and significance. In fact, we are convinced that the listed patterns
are already known and used in practice, but have failed to be recognized as an
important tool for maintenance.
Absent Patterns
We remind the reader that we excluded attributes and constraints from our
investigations, so no change patterns involving attributes and constraints are to
be expected. For entities and references however, certain patterns may have
been expected, such as:
• Eliminate a superimposed entity,
• Redirect a reference to an arbitrary entity,
• Restrict a reference; either from optional to compulsory, or from N:1 to 1:1
cardinality,
• Generalize entities into one unified entity, in particular to enfold a hierarchic
stack of entities into one entity with a reflexive reference (Veldwijk, 1995),
• Reify a reference into an entity.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Coherence in Data Schema Transformations 275
One reason for their absence may be that we have not explored enough cases,
and the semantic change patterns have not yet emerged. However, why did we
expect to see these patterns in the first place? Expectations are guided by what
is advocated as good design practices. Please bear in mind that most literature
focuses on the design phase, and it cannot be assumed that patterns intended for
design will meet the needs of maintenance engineers working in a turbulent
business environment. We already mentioned how the conservative tendency to
safeguard current investments in running information systems generally pre-
vents extensive restructuring of the CS. We consider this call for compatibility
to be a major reason why theoretically attractive patterns like generalization and
reification are unfit for maintenance. A good pattern for design is unacceptable
in maintenance if it causes incompatibilities or a massive impact of change.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
276 Wedemeijer
Future Trends
At the time of writing, the catalogue of Semantic Change Patterns outlined in this
chapter is bound to be an incomplete set. However, completeness has not been
our aim. We want to outline realistic patterns of semantic change for the engineer
to learn and copy from.
As the importance of schema maintenance will continue to increase, researchers
and practitioners will detect more and more change patterns in operational
environments. Hence, the catalogue will continue to be extended. Moreover, we
hope to see semantic change patterns of attributes and constraints added to the
catalogue.
Pro-active use of the patterns in the design phase of the CS life cycle allows a
designer to experiment with various kinds of alterations in the design proposal,
and to get a feeling how the proposed CS will react to likely changes in
requirements. This may uncover flaws in the design that the designer can correct
by applying the pattern.
The semantic change patterns can also be useful in the reverse-engineering of
a legacy CS. Knowing common patterns will help to understand the evolution
history and hence will ease the recovery of the schema.
Finally, the semantic change patterns offer an excellent opportunity to replace
the schema-evolution operations of today’s database management systems with
data conversion solutions that are semantically meaningful. This will provide
maintenance engineers with easy to configure data conversion routines that are
well coordinated with the corresponding CS changes.
Conclusions
This chapter introduced the notion of Semantic Change Patterns and outlined a
catalogue of a dozen patterns relevant in Conceptual Schema evolution. Seman-
tic change patterns have their value in enabling the graceful evolution of database
schemas. The patterns are targeted toward adjusting Conceptual Schema
semantics and database contents in accordance with changing user perception
of the real world and information needs in the business environment.
Changing an operational CS is no small matter. Current business procedures,
user interfaces, applications, etc., all have to be reviewed to determine the full
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Coherence in Data Schema Transformations 277
impact of change, and the possible consequences for the stored data must also
be carefully weighed. As enterprises try to keep the impact of change as small
as possible, the adapted CS is required to be a good model of the new UoD, and
at the same time to be “as close to the old CS as possible”. This usually translates
into the demand for compatibility, thus reducing the need for complex data
conversions and application reprogramming.
The best argument for validity of the semantic change patterns is that these are
based on actual changes observed in operational Conceptual Schemas evolving
in the live business environment. The dozen patterns that we extracted from a
series of case studies, accounts for more than 75% of the observed semantic
changes. In other words, maintenance engineers apply the patterns, without
them having been recognized before.
Much has yet to be learned about the relation between organizational need for
information and the systems -or indeed the Conceptual Schemas- that deliver it.
The literature points out change drivers such as organizational change, mergers
and diversifications, Business Process Reengineering projects, and ongoing
innovations in information system technology and databases. How such change
drivers are related to specific changes taking place in Conceptual Schemas at the
core of operational information systems is not yet well understood. We believe
that the notion of semantic change pattern is a significant step towards that
understanding.
References
ANSI/X3/Sparc Special Interest Group on Management of Data. (1975). Study
Group on Data Base Management Systems Interim Report. ACM-SIGMOD
Newsletter, 7(2).
Batini, C.W., Ceri, S. & Navathe, S.B. (1992). Conceptual Database Design:
An Entity-Relationship Approach. CA: Benjamin/Cummings Publishing.
Bézevin, J. (2000). New Trends in Model Engineering. Proceedings of the
IRMA2000 International Conference. Hershey, PA: Idea Group Publish-
ing, 1185-1187.
Bommel, P. van. (1995). Database Optimization: An Evolutionary Ap-
proach. Dissertation. Katholieke Universiteit Nijmegen, The Netherlands.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
278 Wedemeijer
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Coherence in Data Schema Transformations 279
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
280 Wedemeijer
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Coherence in Data Schema Transformations 281
For all clarity: objects and relations are encountered in the Universe of
Discourse, whereas entities, instances and references are constructs that exist
in the Conceptual Schema.
Graphical Conventions
We use the following graphical conventions that make for compact and easy to
read diagrams with a hierarchic, top-down structure:
• Rectangles depict entities. An enclosed rectangle depicts a specializa-
tion, a diagramming technique borrowed from Venn diagrams in set theory.
The is-a injective reference is evident from the inclusion.
• Arrows depict the references between entities. The arrow points from the
member entity upwards to the owner, with N:1 cardinality. This notation is
used to suggest some “pointer” attribute in the member entity.
• References may have 1:1 cardinality; i.e., each owner instance is referred
to by at most one member instance. This is depicted by omitting the head
of the arrow.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
282 Wedemeijer
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Model Transformations in Designing the ASSO Methodology 283
Chapter XIII
Model Transformations
in Designing the
ASSO Methodology
Elvira Locuratolo, ISTI, Italy
Abstract
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
284 Locuratolo
Introduction
Methodologies used in designing database applications can be based on informal
models that, although easy to understand, are unable to cover both static and
dynamic aspects of modelling in an integrate way and may cause inconsistencies.
Further, they can be inadequate to guarantee that the schema supported by the
database system satisfies the requirements specified at conceptual level. Finally,
they cannot ensure the coexistence of two quality requirements classically
conflicting, i.e., flexibility in reflecting the changes occurring in real life on the
schema and efficiency in accessing information.
B (Abrial, 1996), a formal method of software engineering, uses mathematical
notations for modelling static and dynamics and for performing consistency
proofs. The refinement of B, supported again by proofs, allows the derivation of
correct implementations; however, the direct use of B for developing database
applications presents some shortcomings since B lacks the high level abstraction
mechanisms used for modelling database schemas and its refinement has not
been specifically designed for obtaining efficient database implementations.
ASSO (Castelli & Locuratolo, 1995; Locuratolo, 1997; Locuratolo & Matthews,
1999) is an innovative methodology for the achievement of quality requirements,
which combines features of database design with the B-Method in order to
ensure easiness in schema specifications, flexibility in reflecting the changes
occurring in real life on the schema, consistency between static and dynamic
modelling, correctness of implementations and efficiency in accessing informa-
tion. Formality in ASSO is completely transparent to the designer until he decides
to make proofs.
Designing formal environments for the specification and the development of
database applications is currently an interesting topic of research (Mammar &
Laleau, 2003). This is because the growing use of databases in various
application domains where economical interests require a certain degree of
safety, e.g., e-business or financial systems, favours the call for the integration
of databases and formal methods (Laleau, 2000, 2002).
ASSO results from the intuitions of researchers or students with backgrounds
coming from different disciplinary areas. MetaASSO (Locuratolo, 2002), the
approach employed to design ASSO, highlights these intuitions while providing
a high-level description of interacting components, called methodological tools.
The following methodological tools have been designed to achieve quality in
ASSO:
the Revisited Partitioning, a formal method working on static
aspects of database conceptual schemas; the Structured Database
Schema, a formal conceptual model which integrates consistently
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Model Transformations in Designing the ASSO Methodology 285
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
286 Locuratolo
Background
This section highlights some aspects of database design and some aspects of the
B-Method which can be usefully exploited for comparing ASSO with related
works.
Database Design
The database design process (Elmasri & Navathe, 1997) consists of two parallel
activities: the first involves the content and the structure of data, whereas the
second relates to the design of database applications. Traditionally, database
design methodologies have primarily focused on the first of these activities,
whereas software design has focused on the second. It is being recognised by
database designers and software engineers that the two activities should proceed
in parallel and design tools are increasingly combining them. ASSO comprises a
phase of conceptual database design and a phase of logical design. The goal of
the former phase is to produce a conceptual schema, i.e., a high-level description
of the database structure and behaviour independent from the particular Data-
base Management Systems (DBMS) which will be used. The goal of the latter
phase is to map the conceptual schema from the high-level model into the lower-
level model of a chosen type of DBMS.
The data-driven methodologies (Batini, Ceri & Navathe, 1992) focus mainly on
the definition of data and their properties. The applications, which access and
modify the database, are a complement of the static description. The data-driven
methodologies generally consist of two steps: the conceptual schema construc-
tion and the logical schema generation. In order to make the conceptual schema
easy to be understood, high level abstraction models, such as Semantic Data
models (Cardenas & McLeod, 1990) or Entity-Relationship Models (Chen,
1976), are employed with a diagrammatic representation. The abstraction
mechanisms of these models closely resemble those used in describing their
applications. In order to represent the complementary dynamic aspects, state-
based and data-flow models are employed; however, as the models employed to
represent statics and dynamics are either informal or have non-integrated
formalisations, it is not possible to prove that the specified operations preserve
the database consistency. The construction of the conceptual schema is followed
by the generation of the logical schema. Within this step of design, information
specified in the conceptual schema is represented in terms of relational models.
The generation of the logical schema is a complex step since the mapping process
is not isomorphic and there is the possibility of introducing errors. When the data-
driven methodologies are used, the high-level description of the database given
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Model Transformations in Designing the ASSO Methodology 287
B-Method
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
288 Locuratolo
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Model Transformations in Designing the ASSO Methodology 289
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
290 Locuratolo
Person
income
Person
Is-a Is-a
Student Employee
Employee Student
salary identifier 1.b
1.a
Person Person
Student Employee or Student Stud_Empl Employee
1.c 1.d
The graph of Figure 1a, in the following called logical classes, is supported by
an object system if, besides the above properties of classification, inheritance
and object inclusion, there is also the restriction that each object instance
belongs to one and only one class. As a consequence, in the logical classes,
the object inclusion property can be only indirectly enjoined, i.e., as the
specialised class employee inherits all the attributes from the class person, the
class employee can be considered as enclosed in the class person, but the sets
representing the object instances of the two classes are really two disjoint sets.
The object intersection of the conceptual classes in Figure 1a is represented in
Figure 1b, whereas the object intersection of the logical classes is represented
in Figures 1c and 1d. In Figure 1b the intersection between the object instances
of the class student and of the class employee may be not empty, whereas in
Figure 1c no object instance can belong simultaneously to both the class student
and class employee. This is allowed only in the case of multiple inheritances, i.e.,
when the subclass student·employee belongs to the graph. Figure 1d represents
this last case evidencing that the object instances, which are simultaneously
student and employee, define a class, the class student·employee, which is
enclosed indirectly in both the class student and the class employee.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Model Transformations in Designing the ASSO Methodology 291
The following properties make the difference between semantic and object
models:
• Semantic data models: each object instance can belong to any class of the
graph. This enhances flexibility while limiting efficiency.
• Object models: each object instance belongs to one and only one class of
the graph. This enhances efficiency while limiting flexibility.
In order to show how object models limit the flexibility in reflecting the changes
occurring in real life, let us suppose that the student John becomes an employee.
In this case, the corresponding object instance must be removed from the class
student and must be inserted into a class student •employee. If John completes
his studies later on, the corresponding object instance must be removed from the
class student•employee and must be inserted into the class employee. On the
contrary, in semantic data models, the object instance corresponding to John can
be inserted into the class employee when the student John becomes an
employee and can be removed from the class student when John completes his
studies.
The Revisited Partitioning maps conceptual classes into logical classes linking
features of formal and informal methods. This method is correct, i.e., the
objects of the obtained classes are all and only those of the original conceptual
classes and each of them has all and only the original declared attributes. The
method is complete, i.e., it generates all the classes implicitly enclosed in the
conceptual classes.
The Revisited Partitioning is composed of two phases, called representation
and decomposition, respectively. The former permits describing the conceptual
classes, whereas the latter permits decomposing them while reconstructing
hierarchies supported by object systems. The logical classes represented in
Figure 1d encloses one class more than the original conceptual classes repre-
sented in Figure 1b. The next section describes the model transformations
resulting in ASSO.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
292 Locuratolo
• Definition (class): A class is a tuple (name, Att, Const, Op) where name
is a term connoting the class name and denoting the class objects. The term
name, called class extension, represents a subset of a given set. Att is a
finite set of terms called attributes; each of them is defined as a function
from the extension name to either a given set or the extension of another
class. Both extension and attributes define the class state variables,
whereas the predicate, which formalises their definitions, is called class
constraints. Const is a predicate on the class state variables, which
formalises a set of properties, called class application constraints. Op is
a finite set of operations defined as functions from predicates establishing
the class constraints to predicates establishing the class constraints. A
special operation, called initialisation, belongs to Op. This associates
initial values establishing the class constraints with the state variables.
The concept of class extends that provided by the database conceptual lan-
guages with application constraints and operations. The extension has been
designed in order to reflect some features of the model supported by B: the
operations have first been designed as basic operations, i.e., operations that add
objects, remove objects, modify attributes or let the class unchanged, and have
then been enriched with constructors, recursively applied to the basic operations.
The operations define pre-conditioned, partial and non-deterministic trans-
formations.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Model Transformations in Designing the ASSO Methodology 293
name are called specialisation. Other specific operations on the class name are
possible.
The two definitions of class and is-a* relationship allow introducing the following
definition of Structured Database Schema.
The definition of Structured Database Schema as a graph has been given in order
to allow the applicability of the Revisited Partitioning, after steps of behavioural
refinement, which abolish the non-determinism and weaken the pre-condition-
ing. The next section presents an example of conceptual schema specification,
a step of behavioural refinement and a step of Revisited Partitioning.
Example: The following syntactic forms are used to specify a schema supported
by the Structured Database Schema (Locuratolo, 2002):
• Class name1 of GIVEN-SET with (att-list; const; op-list)
• Class name2 is-a* name1 with (att-list; const, op-list)
The former is the basic constructor used to specify the root class of a Structured
Database Schema. The latter is used to specify the remaining classes. Within
these forms, name1 and name2 denote the class names; att-list, const and op-list
denote respectively the attributes, the application constraints and the operation
list of the class name 1 in the former specification and of the class name2 in the
latter specification. The class and the is-a constraints are implicitly specified with
the class constructors.
Figure 2 presents the specification of the conceptual schema. This specification
describes information about:
• a set of persons and their income,
• a subset of working persons and their salary,
• a subset of students and their identifiers.
The income of each person is greater than or equal to 1,000; the salary of each
employee is greater than or equal to 500; each student has a unique identifier.
Information is added when a new person is inserted in the database. This is
specialised both when the person is employed and when the person becomes a
student.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
294 Locuratolo
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Model Transformations in Designing the ASSO Methodology 295
each new student one and only one identifier that has not been used before. A
possible step of behavioural refinement consists in associating with each new
student the maximum of all existing identifiers incremented by one. A Structured
Database Schema behavioural refinement is a modular refinement of classes;
i.e., if a class is refined, the entire Structured Database Schema is refined. After
a step of behavioural refinement for the class student, the Conceptual Schema
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
296 Locuratolo
in Figure 2 has been decomposed into two: the Structured Database Schema1
with root class person-employee and the Structured Database Schema2 with
root class person•employee. The former root class takes only attributes,
application constraints and operations of the class person; whereas the latter root
class person•employee takes attributes, initialisation, and constraints of both the
classes person and employee. The operations are parallel compositions of the
corresponding operations on the classes person and employee. Each structured
database schema also takes a copy of the class student implicitly splitting this
class with the partition of the class person. With a further step of decomposition,
four disjoint classes are obtained.
These are recomposed to define a class hierarchy in which each object instance
belongs to one and only one class.
The logical schema specifies more information with respect to the conceptual
schema, since the person •employee •student class is explicitly specified.
The definition of Structured Database Schema based on graph has been given
in order to apply the Revisited Partitioning after some steps of behavioural
refinement; however, before performing any model transformation, the concep-
tual schema consistency needs to be proved. In order to prove the Conceptual
Schema consistency, the relationship between ASSO and B is usefully exploited.
The next section describes the ASSO-B relationships.
ASSO-B Relationship
In the following, the ASSO-B relationships (Locuratolo & Matthews, 1999) will
be captured through properties (Locuratolo, 2002) and will be used in order to
prove the Structured Database Schema consistency:
If the initialisation establishes the class constraints and the operations preserve
the class constraints, a class can be identified with a B-Machine whose state
variables are constrained to satisfy the class constraints. This means that in
order to prove consistency, no class constraint obligation needs to be proved, but
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Model Transformations in Designing the ASSO Methodology 297
only the application constraints obligations. In the following, by the term class-
machine, we will refer to a B-Machine that identifies a class.
With regard to the specification given in section 4.1, the class-machine person,
the class-machine employee and the class-machine student need to be consis-
tent.
In this chapter, the Structured Database Schema has been designed as a model
in which operations and application constraints involve only variables of single
classes. The model (Locuratolo, 2001) has been enriched in order to allow
operations and application constraints to involve variables of two or more
classes. In order to prove consistency of the enriched Structured Database
Schema, the following concept of specialised class-machine, a concept in-
duced from the is-a* relationship between classes, has been introduced:
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
298 Locuratolo
The proof of the inherited operations can be avoided. Thus, in the case of
operations and application constraints involving only single classes, the specialised
class-machine consistency is reduced to the class-machine consistency, whereas,
in the case of operations and application constraints involving variables of
specialised class-machines, the consistency proof of the specialised machine can
be optimised.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Model Transformations in Designing the ASSO Methodology 299
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
300 Locuratolo
Acknowledgments
The author would like to thank her husband, Antonio Canonico, and her children
Paola and Floriano for their help and advice.
Figures 2, 3 and 4 have been reprinted from “Designing Methods for Quality” by
Elvira Locuratolo in Information Modelling and Knowledge Bases XIII, edited
by Hunnu Kangassalo, Hunnu Jaakkola, Eiji Kawaguchi and Tatjana Welzer, p. 287
and p. 289, copyright 2002, with kind permission from IOS Press.
References
Abrial, J. R. (1996). The B-Book: Assigning Programs to Meanings. Cam-
bridge University Press.
Albano, A., Cardelli, L., & Orsini, R. (1985). Galileo: A strongly-typed interac-
tive conceptual language. ACM Transactions on Database Systems,
10(2), 230-260.
Andolina, R., & Locuratolo, E. (1997). ASSO: Behavioural specialisation
modelling. In H. Kangassalo (Ed.), Information Modelling and Knowl-
edge Bases VIII, (pp. 241-259). IOS Press.
B-Core. (n.d.). B-Toolkit (online manual). Oxford, UK. Available: http://
www.b-core.com
Batini, C., Ceri, S., & Navathe, S.B. (1992). Conceptual Database Design: An
Entity-Relationship Approach. Redwood City, CA: Benjamin Cummings.
Booch, G. (1994). Object-Oriented Analysis and Design with Applications.
Benjamin Cummings.
Cardenas, A. F., & McLeod (1990). Research Foundations in Object-
Oriented and Semantic Database Systems. Englewood Cliffs, NJ: Prentice
Hall.
Castelli, D., & Locuratolo, E. (1994). A Formal Notation for Database Concep-
tual Schema Specifications. In H. Jaakkola (Ed.), Information Modelling
and Knowledge Bases VI, IOS Press.
Castelli, D., & Locuratolo, E. (1995). ASSO: A formal database design meth-
odology. In H. Jaakkola (Ed.), Information Modelling and Knowledge
Bases VI, (pp. 145-158). IOS Press.
Castelli, D., & Locuratolo, E. (1995). Enhancing Database System Quality
through Formal Design. Fourth Software Quality Conference, University
of Abertay Dundee & Napier University, pp. 359-366.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Model Transformations in Designing the ASSO Methodology 301
Ceri, S., & Fraternali, P. (1997). Database Applications with Objects and
Rules. Edinburgh Gate Harlow Essex, UK: Addison Wesley Longman.
Chen, P. P. (1976). The entity-relationship model: Towards a unified view of
data. ACM Transaction on Database Systems, 1(1), 76-84.
Coad, P., & Yourdon, E. (1991). Object-Oriented Design. Yourdon Press.
Elmasri. R., & Navate, S. (1997). Fundamental of Database System. Addison-
Wesley.
Facon, P., Laleau, R., & Nguyen, H. P. (1996). Mapping object diagrams into
B specifications. In A. Bryant & L.T. Semmens (Eds.), Proceedings of
the Methods Integration Workshop, Electronics Workshops in Com-
puting, BCS.
Jarke, J., Mylopoulos, J.W., Schmidt, J. W., & Vassiliou, Y. (1992). DAIDA: An
environment for evolving information systems. ACM Transactions on
Information Systems, 10(1), 1-50.
Laleau, R. (2000). On the interest of combining UML, with the B formal method
for the specification of database applications. ICEIS2000, 2nd Interna-
tional Conference on Enterprise International Systems. Stafford, UK.
Laleau, R. (2002). Conception et development formels d’applications bases
de données. Habilitation Thesis, CEDRIC Laboratory, Évry, France.
Available: http://cedric.cnam.fr/PUBLIS /RS424.ps.gz
Locuratolo, E. (1997). ASSO: Evolution of a formal database design methodol-
ogy. Proceedings of Symposium on Software Technology, (SoST’97),
Buenos Aires, August 12-13.
Locuratolo, E. (1998). ASSO: Portability as a Methodological Goal. TR IEI
B4-05-02.
Locuratolo, E. (2002). Designing methods for quality. In H. Kangassalo, H.
Jaakkoala, E. Kawaguchi & T. Welzer (Eds.), Information Modelling
and Knowledge Bases XIII, (pp. 279-295). IOS Press.
Locuratolo, E., & Matthews, B.M. (1998). Translating structured database
schemas into abstract machines. Proceedings of the 2nd Irish Workshop
on Formal Methods, Cork, Ireland.
Locuratolo, E., & Matthews, B. M. (1999). ASSO: A formal methodology of
conceptual database design. In S. Gnesi & D. Latella (Eds.), Proceed-
ings of the Federated Logic Conference, (pp. 205-224). Fourth Interna-
tional ERCIM workshop on Formal Methods for Industrial Critical System.
Locuratolo, E., & Matthews, B. M. (1999). Formal development of Databases
in ASSO and B. In J. Wing, Woodcock, & J. Davies (Eds.), LNCS 1708,
FM 99 – Formal Methods (pp. 388-410). Berlin Heidelberg: Springer-
Verlag.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
302 Locuratolo
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
About the Authors 303
Patrick van Bommel received his master’s degree in Computer Science (1990)
and PhD from the Faculty of Mathematics and Computer Science, University of
Nijmegen, The Netherlands (1995). He is currently an assistant professor at the
University of Nijmegen. He gives courses in foundations of databases and
information systems, information analysis and design, and he supervises a semi-
commercial student software house. His main research interests include infor-
mation modeling and information retrieval. In information modeling, he particu-
larly deals with: modeling techniques for information systems; data models for
hyperdocuments and Web sites; semi-structured data; equivalence and transfor-
mation of data models; the transformation of conceptual data models into
implementations; and the transformation of database populations, operations and
constraints. In information retrieval his main research interests include: docu-
ment data modelling; WWW-based retrieval and filtering; document character-
ization languages; and digital libraries.
* * *
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
304 About the Authors
Paolo Bottoni graduated in Physics in 1988 and obtained his doctoral degree in
Computer Science (1995). Since 1994, he has been with the Department of
Computer Science of the University “La Sapienza” of Rome, first as a re-
searcher, and, since 2000, as an associate professor. His research interests are
mainly in the area of interactive computing, and include: definition of pictorial and
visual languages, visual simulation, formal models of visual interactive comput-
ing, agent-based computing. On these topics, he has published 100 scientific
papers in international journals, contributed volumes and conference proceed-
ings.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
About the Authors 305
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
306 About the Authors
completed in 2000. In between, she was a fellow within the European Research
Network GETGRATS at the University of Bordeaux in France. Her research
interests are mainly directed towards formal methods and include computing by
graph transformation, context-free generation of graphs and graph-like objects,
syntactic picture generation, and gender and teaching issues in computer
science.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
About the Authors 307
Z. M. Ma received his PhD from the City University of Hong Kong (2001). His
current research interests include intelligent database systems and design, Web-
based data management, e-learning systems, engineering database modeling,
enterprise information systems, knowledge management, intelligent planning and
scheduling, decision making, and robot path/motion planning. He has published
many journal, conference, and book chapter papers in these areas. He is
currently editing and authoring two upcoming books being published by Idea
Group Inc. and Kluwer Academic Publishers, respectively.
Andreas Märcz was born in Großenhain near Dresden in the former GDR (east
part of Germany). From 1983-1995, he attended elementary, grammar and
secondary school in Dresden and got a university-entrance diploma (German
Abitur). Because of his interest in databases and software engineering he
studied Information Technology at the Dresden University of Technology (TU-
Dresden) (1995-2000). In 2000, he graduated with an MSc in Computer Science
and got a position as database chair at TU-Dresden with scientific, teaching and
administration activities. His current research scopes on format independence
for multimedia databases, especially for time dependent objects and therefore
resource management.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
308 About the Authors
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
About the Authors 309
Jim Steel has worked for the last four years as a research scientist at the
Distributed Systems Technology Centre. He has extensive experience in
metamodeling and model-driven standards and techniques, including the meta-
object facility (MOF), XML-based model interchange (XMI), enterprise distrib-
uted object computing (EDOC) standards within the object management group
(OMG). He also serves as the chairman of the Human-Usable Textual Notation
(HUTN) Finalisation Task Force. He is currently working with the DSTC’s
Pegamento project on a language for describing model transformations. His
research interests include metamodeling, generative and generic programming,
automated systems development, and language design.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
310 About the Authors
Lex Wedemeijer has been working in the areas of data administration and data
management in several Dutch companies for more than 20 years. His main
interests are in quality and evolution of data schemas, both its theoretical
foundations and its practical implications encountered in the business environ-
ment. In 2002, he received a PhD from Delft University, on a dissertation titled,
Exploring Conceptual Schema Evolution. Currently, Lex Wedemeijer is data
architect at ABP Netherlands, Europe’s largest pension fund.
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Index 311
Index
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
312 Index
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Index 313
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
314 Index
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Index 315
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
316 Index
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Index 317
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
318 Index
V
validity 274
value-based management (VBM) 220
variable replacement 114
vertex 105
vertex identity 106
video 237
video stream 237
VirtualMedia 242
visual languages 31
W
weak entity 151, 271
well-formedness constraints 121
World Wide Web Consortium (W3C) 76
X
XML 149
XML model interchange (XMI) 129
XPATH 131
XQuery 131
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Instant access to the latest offerings of Idea Group, Inc. in the fields of
I NFORMATION SCIENCE , T ECHNOLOGY AND MANAGEMENT!
InfoSci-Online
Database
BOOK CHAPTERS
JOURNAL AR TICLES
C ONFERENCE PROCEEDINGS
C ASE STUDIES
“
to use access to solid, current
and in-demand information,
InfoSci-Online, reasonably
The InfoSci-Online database is the
most comprehensive collection of
priced, is recommended for full-text literature published by
academic libraries.
”
- Excerpted with permission from
Library Journal, July 2003 Issue, Page 140
Idea Group, Inc. in:
n Distance Learning
n Knowledge Management
n Global Information Technology
n Data Mining & Warehousing
n E-Commerce & E-Government
n IT Engineering & Modeling
n Human Side of IT
n Multimedia Networking
n IT Virtual Organizations
BENEFITS
n Instant Access
n Full-Text
n Affordable
Start exploring at n Continuously Updated
www.infosci-online.com n Advanced Searching Capabilities
A product of:
Information Science Publishing*
Enhancing knowledge through information science
Publishing Should you be interested in receiving a free sample copy of any of IGP's
existing or upcoming journals please mark the list below and provide your
mailing information in the space provided, attach a business card, or email
IGP at journals@idea-group.com.
Address: ______________________________________________________________________
_____________________________________________________________________________
Journals@idea-group.com www.idea-group.com
NEW RELEASE
Beyond Knowledge
Management
Brian Lehaney, PhD, University of Coventry, UK
Steve Clarke, PhD, University of Luton, UK
Elayne Coakes, PhD, University of Westminster, UK
Gillian Jack, PhD, University of Glamorgan, UK
NEW RELEASE
Trust in Knowledge
Management and Systems in
Organizations
Maija Leena Huotari, Ph.D., University of Oulu, Finland
Mirja Iivonen, Ph.D., University of Tampere, Finland
‘‘Trust has a crucial role to play when organizations aim at innovation and successful operation.
Although trust has been studied for decades in various disciplines, the importance of trust has
probably never before been more significant than it is today both in theory and practice.’’