Professional Documents
Culture Documents
The Transformational Approach To Database Engineering (2006)
The Transformational Approach To Database Engineering (2006)
to Database Engineering
Jean-Luc Hainaut
1 Introduction
Data structure manipulation has long proved to be a fertile domain for transforma-
tional engineering process modelling. Several contributions have made this approach
a fruitful baseline to solve the complex mapping problems that are at the core of many
database engineering processes.
We can mention the normalization theory, which laid the basis for data- and con-
straint-preserving schema transformations [13], but also the now standard 3-schema
data modeling architecture [48] which clearly complied, more than 25 years ago, to
what the SE comzmunity currently calls Model-Driven Engineering (MDE). Gener-
ally built on these principles, most database design methodologies rely on four ex-
pressions of the database structure, namely the conceptual schema, the logical
schema, the physical schema and the DDL1 code (Fig. 17). According to these ap-
proaches, a schema at one level derives from a more abstract schema at the upper
1
Data Description Language. That part of the DBMS language dedicated to the creation of data
structures.
R. Lämmel, J. Saraiva, and J. Visser (Eds.): GTTSE 2005, LNCS 4143, pp. 95 – 143, 2006.
© Springer-Verlag Berlin Heidelberg 2006
96 J.-L. Hainaut
level through some kind of translation rules that preserve its information contents,
which clearly are schema transformations. For instance, a logical relational schema
can be produced from the conceptual schema by applying to non SQL-compliant
conceptual structures rewriting rules that produce relational constructs such as tables,
columns and keys. If the rules are carefully selected, the relational schema has the
same information contents as its conceptual origin.
An increasing number of bodies (e.g., the OMG) and of authors recognize the
merits of transformational approaches, that can produce in a systematic way correct,
compilable and efficient database structures from abstract models.
Transformations that are proved to preserve the correctness of the source specifica-
tions have been proposed in virtually all the activities related to schema engineering:
schema normalization [39], logical design [4, 19, 41], schema integration [4, 34],
view derivation [35, 33], schema equivalence [11, 28, 29, 32], data conversion [36,
12, 46], reverse engineering [6, 8, 18, 19], database interoperability [34, 45], schema
optimization [19, 25], wrapper generation [45] and others.
Warning
In the database community, a general formalism in which database specifications
can be built is called a model. The specification of a definite database structure
expressed in such a model is called a schema. Example: the conceptuel schema of
the Customer database is expressed in the Entity-relationship model, while its logi-
cal schema, that is made up of table, column and key definitions, complies with the
relational model.
A First Illustration
Before discussing in deeper detail the concept of transformation and its properties, let
us have a look at a first practical application of the concept. The schemas of Fig. 1
show a popular example, namely the production of a relational schema (top right),
from a small conceptual schema (top left) that describes a set of books for which a
collection of copies are available. The graphical conventions will be described later,
but the essence of the schemas can be grasped without further explanation.
The main stream of the process is covered by the two top schemas. The translation
rules that have been applied can be identified easily:
1. each entity type is represented by a table,
2. each single-valued attribute is represented by a column,
3. each all-attribute identifier is represented by a primary or alternate key,
4. each one-to-many relationship type is represented by a foreign key,
5. each multivalued attribute is represented by a table, comprising the source at-
tribute that is declared a primary key, and by an additional table made up of a
foreign key to the table that represents the entity type of the attribute and an-
other foreign key to the new attribute table; both foreign keys form the primary
key of their table.
Of course, other, more or less sophisticated, sets of rules exist, but this one is
adequate for demonstration purpose.
The Transformational Approach to Database Engineering 97
We can read this derivation process from another, transformational, point of view.
We do not produce another schema, but we progressively modify the source concep-
tual schema, until it complies with the structural patterns allowed by the relational
model.
This interpretation, which will prove much more powerful and flexible than the
translation rules approach, is illustrated in the alternate circuit (top → down → right
→ up) of Fig. 1.
⇒
0-N ISBN AuthorName
CopyNbr ISBN
of
DatePurchased id: ISBN
id: ISBN AuthorName
1-1 CopyNbr ref: ISBN
ref: ISBN ref: AuthorName
COPY
CopyNbr
No more than 5 WRITE rows
DatePurchased
per BOOK row.
id: of.BOOK
CopyNbr
⇓ ⇑
BOOK BOOK
ISBN AUTHOR ISBN
AUTHOR
Title AuthorName Title
AuthorName
DatePublished id: AuthorName DatePublished
id: AuthorName
id: ISBN id: ISBN
0-5 1-N
0-N 0-N
0-5 write 1-N aw
⇒
bw
of of 1-1
1-1
Fig. 1. Two ways to describe the derivation of a relational schema from a conceptual schema
The first modified schema (bottom left) derives from the source conceptual schema
(top left) as follows: the multivalued attribute Author has been replaced with the
entity type AUTHOR comprising the identifying attribute AuthorName, and the many-
to-many relationship type write.
98 J.-L. Hainaut
Then (bottom right), the new many-to-many relationship type write is replaced
with entity type WRITE together with two one-to-many relationship types bw and aw.
The schema does not include multivalued attributes or complex relationship types
anymore.
Finally, each one-to-many relationship type is replaced with a foreign key. Hence
the final version, at the top right side.
This short illustration raises several questions and problems, to some of which this
paper will try to answer, at least partially. The paper is organized in two parts, that
allow two levels of reading.
The first part, that includes Sections 2 to 8, develops practical aspects of the
transformational paradigm. Section 2 positions the role of transformation in the data-
base realm. In Section 3, we show that dealing with multiple databases leads us to
introduce a generic pivot model, the GER, that is intended to represent a large variety
of operational models. Its graphical representation is sketched and a formal semantics
is suggested. In this section, we also show how specific operational models can be
defined in the GER. The concept of schema transformation is precisely defined in
Section 4, in which the property of semantics-preservation is defined and analyzed. In
Section 5, we describe some useful elementary and complex GER transformations,
that are then used in Section 6 to revisit the Database Design process, showing that it
is intrinsically a (complex) schema transformation. Similarly, Section 7 studies the
Reverse Engineering process as an application of the transformational paradigm.
Section 8 discusses the role of transformations in CASE tools, and illustrates this
point with the toolkit and the assistants of DB-MAIN.
The second part, comprising Sections 9 to 12, develops formal aspects of trans-
formations that were only sketched and suggested in Part 1. Section 9 describes the
ERM, an extended N1NF2 relational model the semantics of which is borrowed from
the relational theory. Section 10 maps the GER onto the ERM so that the former can
be given a precise formal semantics. Section 11 described a small set of ERM trans-
formations that can be proved to be semantics-preserving. Finally, Section 12 exploits
the GER-ERM mapping to prove the semantics-preservation property of selected
practical GER transformations.
Section 13 concludes the paper.
2 Transformational Engineering
Producing efficient software by applying systematic transformations to abstract speci-
fications has been one of the most mythical goals of software engineering since the
2
Non 1st Normal Form. Qualifies a relational structure that uses non simple domains. Elements
of a non simple domain can be tuples and/or sets. In particular, a relation or the powerset of a
relation can be a valid domain. A N1NF relational model is a relational model in which non
simple domains are allowed.
The Transformational Approach to Database Engineering 99
late seventies. For instance, [3] and [14] consider that the process of developing a
program [can be] formalized as a set of correctness-preserving transformations [...]
aimed to compilable and efficient program production. In this context, according to
[37], a transformation is a relation between two program schemes P and P' (a pro-
gram scheme is the [parameterized] representation of a class of related programs; a
program of this class is obtained by instantiating the scheme parameters). It is said to
be correct if a certain semantic relation holds between P and P'. The revival of this
dream has now got the name of Model-Driven Architecture [38], or, more generally,
Model-Driven Engineering (MDA/MDE).
It is not surprising that this view has been adopted and developed by the database
community since the seventies. Indeed, the data domain has relied on strong
theories that can cope with most of the essential aspects of database engineering,
from clean data structuring (including normalization) to operational data structures
generation.
In particular, producing a target schema from a source schema can be modeled
either by a set of translation rules, or by a chain of restructuring operators or
transformations. The latter has proved particularly attractive, notably in complex,
incremental, processes.
The question of how many operators are needed to cover the current needs in data-
base engineering is still open, though it has been posed for long: in the 80's, authors
suggested that four [15] [29] to six [11] were enough, but experience has shown that
there is no clear answer, except that surely more transformations are needed, as we
will show in the following.
One of the peculiarities of transformational approaches in the database realm is that
they must, in all generality, cope with three aspects of the application system, namely
the data structures, the data, and the programs. Let us consider a scenario in which a
database must be migrated from a technology to another one. Clearly, this database
must be transformed, whatever the meaning of this term, into another database. This
means that three components of the application must be modified.
1. The database schema, that must comply with the data model of the target tech-
nology, and, possibly, include additional requirements that have emerged since
the last schema modification.
2. The data themselves, that must be restructured according to the new schema,
possibly through some kind of ETL process.
3. The application programs, that must interface with the new schema and comply
with the new API. This generally involves rewriting some sections of the source
code.
Each of these modifications follows its own rules, but we should not be surprised
by the idea that the first one should strongly influence the others. This view currently
is emerging under the name co-transformation [30]. Indeed, it has been proved that it
is possible to automatically derive data transformations (ETL) directives, as a SQL
script for instance, from schema transformations [27]. Program transformation is
much more complex. Automating this conversion has been studied in [26] and [9],
and has been proved to be feasible.
100 J.-L. Hainaut
One of the arguments of this paper is that one can study all transformations, includ-
ing inter-model transformations, in the framework of a single model3. This raises the
question of the nature of this generic model. Two approaches have been proposed,
that distinguish themselves by the granularity of the model [24].
One approach, that can be called minimalistic, or bottom-up, relies on a very sim-
ple and abstract formalism, from which one can define more elaborated and richer
models. Such a model generally represents the schema constructs specific to a definite
model by abstract nodes and edges, together with assembly constraints. AutoMED
[7] is a typical representative of this approach.
Another approach, symmetrically called maximalistic or top-down, is based on a
large spectrum model, that includes, though in an abstract form, the main constructs
of the set of operational models that are used in the engineering processes. The GER
model follows this principle. It has been described in [21] and [24], and will be the
basis of this paper.
Some database engineering processes transform schemas from one model to itself,
involvin one model only. Such is the case of relational normalisation, and of XML
manipulation. These processes make use of intra-model transformations. Being dedi-
cated to this model, their form generally is quite specific (e.g., respectively relational
algebra and XSLT) and cannot be reused for other models.
Other processes, on the contrary, produce a result that is expressed into a model
that is different from that the source schema. The most obvious example is the so-
called database logical design, the goal of which is to transform an abstract
conceptual schema into an operational (say, relational) logical schema as will be
discussed in Section 6.2. In such cases one makes use of inter-model
transformations. Many comprehensive processes, such as database design, reverse
engineering and integration involve several abstraction levels and several
technologies (and therefore models).
To master this complexity, several approaches rely on some kind of pivot model.
The idea is quite simple, and has been adopted as an elegant way to solve the combi-
natorial explosion in situations in which mappings must be developed from any of M
formalisms to any of N formalisms. Theoretically, one would need N x M distinct
mappings. Thanks to the introduction of a intermediate or pivot formalism P, one
needs to define M + N formalisms only. Language translation and plateform-
independent components are two of the most common examples.
In the database engineering realm, dealing with a dozen models is not uncommon
in large organizations. Developing, migrating, integrating, reverse engineering
databases or publishing corporate data on the web all are processes that require inter-
model schema transformation and, accordingly, data conversion. Considering
N operational models, and admitting that the mappings among any pair of models
are potentially useful in some processes, we need to define N2 mappings, while
3
As illustrated in Fig. 1.
The Transformational Approach to Database Engineering 101
Σp>p
Σrel>p Σer>p
Pivot Model
Σcod>p Σxml>p
Fig. 2. Introducing a pivot model among N models reduces the number of inter/intra-model
mappings
The example of relational logical design, that is, producing a relational schema
from a conceptual schema, is illustrated in Fig. 3, which is just a subset of Fig. 2. It
reads as follows:
1. the source conceptual schema is transformed into the pivot model (Σer>p),
2. the resulting schema is transformed through a set of rules (Σp>p) such as those
that are largely described in the literature (see [4] for example4);
3. finally, the transformed schema obtained is expressed into the target relational
model (Σp>rel).
The next section describes in an informal way the main constructs of a pivot model
on which we will base our discussion, namely the GER model.
Remark. The interpretation of Fig. 2, 3 and some of those that follow, needs to be
precised a bit further. All schemas that can be expressed in model M are represented
by the M-labelled ellipse. The mapping Σm>m' states that any schema expressed in the
source model M is transformed through Σm>m' into a schema that complies with the
target model M'.
4
Not the most recent reference actually, but still one of the best.
102 J.-L. Hainaut
Σp>p
Remark. The interpretation of Fig. 2, 3 and some of those that follow, needs to be
precised a bit further. All schemas that can be expressed in model M are represented
by the M-labelled ellipse. The mapping Σm>m' states that any schema expressed in the
source model M is transformed through Σm>m' into a schema that complies with the
target model M'.
The GER model, GER for short, is an extended Entity-relationship model that inclu-
des, among others, the concepts of schema, entity type, domain, attribute, relationship
type, keys, as well as various constraints. In this model, a schema is a description of
data structures. It is made up of specification constructs which can be, for conve-
nience, classified into the usual three abstraction levels, namely conceptual, logical
and physical. We will enumerate some of the main constructs that can appear at each
level (Fig. 4).
• A conceptual schema comprises entity types (with/without attributes;
with/without identifiers), super/subtype hierarchies (single/multiple; total and
disjoint properties), relationship types (binary/N-ary; cyclic/acyclic; with/without
attributes; with/without identifiers), roles of relationship type (with min-max
cardinalities5; with/without explicit name; single/multi-entity-type), attributes (of
entity or relationship types; multi/single-valued; atomic/compound; with
cardinality6), identifiers (of entity type, relationship type, multivalued attribute;
comprising attributes and/or roles), constraints (inclusion, exclusion, coexistence,
at-least-one, etc.)
• A logical schema comprises record types, fields, arrays, single-/multi-valued
foreign keys, redundant constructs, etc.
• A physical schema comprises files, record types, fields, access keys (a generic
term for index, calc key, etc), physical data types, bag/list/array multivalued
attributes, and other implementation details.
5
The role cardinality constraint, denoted by i-j, specifies the range of the number of relation-
ships in which an entity can appear in a definite role. Value N of j denotes infinity.
6
Same as role cardinality applied to the number of attribute values per parent instance.
The Transformational Approach to Database Engineering 103
PERSON
Name
Address
T
PRODUCT ORDER
EMPLOYEE
PRO_CODE ORD-ID CUSTOMER
Employe Nbr
CATEGORY DATE_RECEIVED Customer ID
Date Hired
DESCRIPTION ORIGIN id: Customer ID
UNIT_PRICE DETAIL[1-5] array id: Employe Nbr
id: PRO_CODE REFERENCE 0-N
acc QTY-ORD
acc: CATEGORY id: ORD-ID of
ref: ORIGIN
ref: DETAIL[*].REFERENCE 1-1
ACCOUNT
PRODUCT.DAT
Account NBR
PRODUCT Amount
id: of.CUSTOMER
Account NBR
Fig. 4. A typical hybrid schema made up of conceptual constructs (e.g., entity types PERSON,
CUSTOMER, EMPLOYEE and ACCOUNT, relationship type of, identifiers Customer ID of
CUSTOMER), logical constructs (e.g., record type ORDER, with various kinds of fields inclu-
ding an array, foreign keys ORIGIN and DETAIL.REFERENCE) and physical objects (e.g.,
table PRODUCT with primary key PRO_CODE and indexes PRO_CODE and CATEGORY,
table space PRODUCT.DAT). Note that the identifier of ACCOUNT, stating that the accounts
of a customer have distinct account numbers (Account NBR), makes it a dependent or weak
entity type.
more formal treatment is necessary. In particular, we must base the definition and
the evaluation of the properties of each operator on a rigorous basis, that is, a
formal semantics of the GER. This is important for at least two reasons. First,
formal semantics allows us to reason about transformations, and in particular to
state its main properties such as the preservation of the information capacity of the
source schemas. Secondly, implementing a set of transformations, for instance in a
CASE tool, must rely on a completely defined semantics of both the model and the
operators.
In Part 2, Sections 9 and 10, we will give the GER a precise semantics by stating
mapping rules between the constructs of the GER and constructs of a variant of the
N1NF relational formalism, called Extended Relational Model (ERM). Each GER
construct will be given an ERM interpretation, and, conversely, each construct of the
ERM will be given a GER interpretation. Basically, these mappings are the inter-
model transformations Σger>erm and Σerm>ger depicted in Fig. 5. The ERM is
described in Section 9 while mapping Σger>erm and its inverse are presented in
Section 10. The reader will find a complete formalization of the GER in [16].
[44] follows another approach. The author associates with HERM, a variant of the
ER model, a specific notation with a precise ad hoc semantics, that includes an
algebra and a calculus.
104 J.-L. Hainaut
Σger>erm
GER Model Extended Rel.
Model
Σerm>ger
Fig. 5. Expressing the semantics of the GER model by a bidirectional mapping with the
Extended Relational Model (ERM)
Note. The interpretation of the inverse mapping Σerm>ger is a bit more complex than
suggested. Indeed, Σger>erm is not surjective, so that some ERM schemas have no
GER counterpart. To be quite precise, we should define the subset ERM* of ERM
that makes Σger>erm surjective. However, we will ignore this for simplicity sake.
This is no problem since ERM* is closed under the set of ERM transformations
Σerm>erm that we will use7. Proving this is fairly easy but would lead us beyond the
scope of this paper. In the rest of this paper, we will admit that the composition
Σerm>ger ° Σger>erm is the identity mapping without loss of generality.
Popular operational formalisms, that is, those which are in practical use among deve-
lopers, can be expressed as specializations of the GER. In general, deriving model M
from model M0 (here the GER) consists in,
1. selecting the constructs of M0 that are pertinent in M;
2. specifying the structural constraints on this subset so that only schemas valid in
M can be built;
3. renaming these constructs in order to make them compliant with the taxonomy
of M; this step will be ignored in this paper.
This process materializes the mapping ΣM>M0. We will briefly discuss this map-
ping for two models, namely Entity-relationship model and the SQL2 relational
model (Fig. 6). Similar mapping can be (and have been) developed for CODASYL
and IMS models, for standard files structures, and for XML DTDs and Schemas.
Let us first observe that there is no such thing as a standard ER model. At least a
dozen formalisms have been proposed, some of them being widely used in popular
7
Σerm>erm is the set of ERM-to-ERM transformations. Applying operators from the subset of
Σerm>erm that underlies Σger>ger (as discussed in Section 12) to any ERM* schema pro-
duces an ERM* schema.
The Transformational Approach to Database Engineering 105
text books and in CASE tools. However, despite divergent details, they all share
essential constructs such as entity type, relationship types with roles, some kind of
role cardinality/multiplicity, attributes and unique keys. Due to the nature of the GER,
restricting it to a definite Entity-relationship model is fairly straighforward, so that we
do not propose to develop the Σer>ger mapping.
The increasing popularity of the UML class model8 (aka class diagrams) incites
some authors and practitioners to use them to specify database conceptual and logical
schemas. This was not the primary objective of the UML formalism, so that it exhibits
severe flaws and weaknesses in database modelling. However, mapping Σuml>ger can
be developed in the same way as for other models.
A relational schema mainly includes tables, domains, columns, primary keys, unique
constraints, not null constraints and foreign keys. The relational model can therefore
be defined as in Fig. 7.
Fig. 7. Defining the standard relational (SQL2) model as a subset of the GER model (mapping
Σrel>ger)
A GER schema made up of constructs from the second column only, and that satis-
fies the assembly rules stated in the third column, can be called a relational GER
schema. As a consequence, a relational schema cannot comprise is-a relations, rela-
tionship types, multivalued attributes nor compound attributes.
8
The term UML model has a specific interpretation in UML, where it denotes what we call a
schema in this paper.
106 J.-L. Hainaut
4 Schema Transformation
Let us denote by M the unique model in which the source and target schemas are ex-
pressed, by S the schema on which the transformation is to be applied and by S' the
schema resulting from this application. Let us also consider sch(M), a function that
returns the set of all the valid schemas that can be expressed in model M, and inst(S), a
function that returns the set of all the instances that comply with schema S.
T
C C' = T(C)
inst_of inst_of
t
c c' = t(c)
Fig. 8. The two mappings of schema transformation S ≡ <T,t>. The inst_of arrow from x to X
indicates that x is an instance of X.
Let us consider relation R, the attributes of which are partitioned into non empty
subsets I, J and K. Considering Σ, a (lossy) variant of relational decomposition trans-
formation, predicates P and Q as well as instance transformation t can be written as
follows:
9
Description logic [2] could be a good candidate for P and Q.
The Transformational Approach to Database Engineering 107
A generic transformation can be partially instantiated if some, but not all, variables
of P have been instantiated.
Each transformation Σ is associated with an inverse transformation Σ' which can
undo the result of the former under certain conditions that will be detailed in the next
section.
One of the most important properties of a transformation is the extent to which the
target schema can replace the source schema without loosing information. This prop-
erty is called semantics preservation or reversibility.
Some transformations appear to augment the semantics of the source schema (e.g.,
adding an attribute), some remove semantics (e.g., removing an entity type), while
others leave the semantics unchanged (e.g., replacing a relationship type with an
equivalent entity type). The latter are called reversible or semantics-preserving. If a
transformation is reversible, then the source and the target schemas have the same
descriptive power, and describe the same universe of discourse, although with a dif-
ferent presentation.
Similarly, in the pure software engineering domain, [3] introduces the concept of
correctness-preserving transformation aimed at compilable and efficient program
production.
108 J.-L. Hainaut
Example
The so-called decomposition theorem of the 1NF relational theory [13] is an example
of reversible transformation that can be described as follows10.
The complexity of high-level models, and that of the GER in particular, makes the
study of their transformations particularly complex. To begin with, experience shows
that several dozens of operators can be useful, if not necessary, to describe the most
important engineering processes. Then, identifying and proving the reversibility de-
gree of each of them can be a huge and complex task, notably since there is no agreed
upon algebra or calculus to express Entity-relationship queries.
10
Denotes a multivalued dependency, [ ] the projection operator and * the join operator.
The Transformational Approach to Database Engineering 109
The key lies in the ERM formalism that expresses the semantics of the GER. In-
deed, the relational model, of which the ERM inherits, includes a strong and simple
body of properties and inference rules that can be used to built a relational transfor-
mational theory. We can reasonably expect the set of transformations defined for the
ERM to be far smaller and simpler than that of the GER.
If this idea proves to be correct, then we will be provided with a nice way to gener-
ate, explain, and reason on, GER transformations. Fig. 9 illustrates this approach.
Σger>ger Σerm>erm
Σger>erm
GER Model Extended Rel.
Model
Σerm>ger
Fig. 9. Generating and specifying GER transformations through their expression in the
Extended Relational Model
According to this view, each GER transformation can be modelled as the com-
pound mapping:
This section describes several families of GER transformations with which complex
engineering processes will be built.
A B A B
T1
⇒ 0-N i-j
Σ1
0-N r i-j ⇐ rA rB
T1'
R
1-1 id: rA.A 1-1
rB.B
A T2 A B
A1 B ⇒ A1 A1[i-j]
Σ2 id: A1
⇐ id: A1 ref: A1[*]
A T3 A EA2
A1 ⇒ A1 A2
Σ3 ⇐
A2[i-j] A3 id: A2
A3
T3'
i-j ra2 1-N
A T4 EA2
A1 ⇒ A
A1
A2
⇐
A2[i-j] id: ra2.A
A3
Σ4 A3 A2
T4'
i-j ra2 1-1
The mutation transformations can solve many database engineering problems, but
other operators are needed to model special situations.
Expressing supertype/subtype hierarchies in DMS that do not support them
explicitly is a recurrent problem. The technique of Fig. 11 is one of the most
commonly used [4] [23]. It consists in representing each source entity type by an
independent entity type, then to link each subtype to its supertype through a one-to-
one relationship type. The latter can, if needed, be further transformed into foreign
keys by application of Σ2-direct.
The Transformational Approach to Database Engineering 111
A A
A1 T5 A1
A2 ⇒ r 0-1 A2 0-1 s
Σ5 D
⇐ 1-1
excl: s.C
r.B 1-1
C
T5'
B B C
B1 C1 B1 C1
B2 C2 B2 C2
Fig. 11. Transforming an is-a hierarchy into one-to-one relationship types and conversely. The
exclusion constraint (excl:s.C,r.B) states that an A entity cannot be simultaneously linked to a B
entity and a C entity. It derives from the disjoint property (D) of the subtypes.
A T6
A1 ⇒ A1
A
A2[0-5] array
A3 ⇐ A2[5-5]
Index
Σ6 T6' Value[0-1]
A3
id(A2):
Index
Fig. 12. Converting an array A2 into a set-multivalued attribute and conversely. The values are
distinct wrt component Index (id(A2):Index). The latter indicates the position of the cell that
contains the value (Value). The domain of Index is the range [1..5].
Attributes defined on the same domain and the name of which suggests a spatial or
temporal dimension (e.g., departments, countries, years or pure numbers) are called
homogeneous serial attributes. In many situations, they can be interpreted as the rep-
resentation of an indexed multivalued attributes (Fig. 13). The identification of these
attributes must be confirmed by the analyst.
112 J.-L. Hainaut
A T7 A
A1 ⇒ A1
⇐
A2X A2[3-3]
A2Y Dimension
A2Z T7' Value
Σ7 A3 A3
id(A2):
Dimension
dom(A2.Dimension) = {'X','Y','Z'}
Fig. 13. Transforming homogeneous serial attributes {A2X, A2Y, A2Z} into a multivalued
compound attribute A2 and conversely. The values (Value) are indexed with the distinctive
suffix of the source attribute names, interpreted as a dimension (sub-attribute Dimension).
Σ8
export
Volume
⇐ id: Ctry_Name
Prod_ID
PRODUCT
Prod_ID
T8' Cy_Name id: Prod_ID
ref: Cy_Name
0-N
ref: Prod_ID
PRODUCT ref: Ctry_Name COUNTRY
Prod_ID Ctry_Name
id: Prod_ID id: Ctry_Name
extracts this attribute into entity type EXPENSE, with attributes Year and Amount
(transformation Σ4-direct). Then, the same operator is applied to attribute Year,
yielding entity type YEAR, with attribute Year. Finally, entity type EXPENSE is
transformed into relationship type expense (Σ1-inverse).
Project
Project
Dep# T9
⇒
InitialBudget Dep#
Expense-2000 InitialBudget
Expense-2001
Expense-2002
⇐ 5-5
Expense-2003
T9'
expense
Expense-2004
Amount
Σ9
1-N
YEAR
Year
id: Year
dom(Year) = [2000..2004]
The process of designing and implementing a database that is to meet definite users
requirements has been described extensively in the literature [4] and has been avail-
able for several decades in CASE tools. It comprises four main sub-processes, namely
(Fig. 17):
1. Conceptual design, the goal of which is to translate users requirements into a
conceptual schema, which is a technology-independent abstract specification11.
2. Logical design, which produces a logical schema that losslessly translates the
constructs of the conceptual schema according to a specific technology family12.
3. Physical design, which augments the logical schema with performance-oriented
constructs and parameters, such as indexes, buffer management policies or lock
management parameters.
11
Or Platform-Independent Model (PIM according to the MDA/MDE vocabulary.
12
The logical and physical schemas can be called Platform-Specific Model (PSM in the
MDA/MDE vocabulary).
116 J.-L. Hainaut
4. Coding, that translates the physical schema (and some other artefacts) into the
DDL code of the DBMS.
Calling the whole process DB-Design, and the four sub-processes respectively
ConcD, LogD, PhysD and Coding, we can describe them with the transformational
notation:
We consider the most popular conceptual source model, namely the Entity-
relationship model, and the most popular logical target model, the SQL2 relational
model, to which Oracle, SQL Server, DB2, PostgreSQL, Firebird and many others are
compliant. The GER expression of the SQL2 model has been developed in Fig. 7. By
complementing this table, we identify the Entity-relationship constructs that do not
belong to the SQL2 model, the four most important of which being transformed as
follows.
The Transformational Approach to Database Engineering 117
transform
is-a relations (Σ5d)
transform complex
rel-types (Σ1d)
no
still non simple attributes ?
yes
transform functional
transform level-1 multi- rel-types (Σ2d)
valued attributes (Σ4d)
no
any failure ?
disaggregate level-1
compound attributes (Σ10d) yes
Add technical Id
where needed (Σ11d)
Process names
Fig. 18. A simple transformation plan for logical relational database design
118 J.-L. Hainaut
DOCUMENT
DocID AUTHOR
Title Name
0-N written 1-N
Date-Published First-Name[0-1]
Keyword[0-10] Origin[0-1]
id: DocID
0-N
D
responsible-for
reserved
Reservation date responsible
REPORT BOOK 0-1
0-N
Report Code ISBN
Version Publisher
0-N BORROWER
id': Report Code id': ISBN
PID
Name
0-N
Address
Street
of City
Phone[1-5]
id: PID
1-1
0-N
0-1
COPY
Serial-No
Date-Acquired borrowing works on
Location Borrow-Date
Store 0-N Return-Date[0-1] 0-N
Shelf id: COPY
Row Borrow-Date
PROJECT
id: of.BOOK 0-N
Serial-No ProjCode
Title
ContractNo[0-1]
Company
id: ProjCode
id': ContractNo
13
This assertion is not quite correct if we only use the transformations presented in this paper.
In particular, some constraints can be lost, or incompletely translated. Such is the case for
cardinality constraints [i-j] where 1 < j < N. A more comprehensive plan, making use of more
precise transformations, can preserve these constraints until the coding phase, e.g., in the
form of SQL triggers.
14
Several DBMS do not manage correctly candidate keys comprising a nullable column.
120 J.-L. Hainaut
BORROWER
reserved
REPORT PID
PID
DocID Name
BOOK DocID
Report Code Add_Street
DocID Reservation date
Version Add_City
ISBN id: DocID
id: DocID ProjCode[0-1]
Publisher PID
ref Responsible[0-1]
id: DocID ref: DocID
id': Report Code id: PID
ref ref: PID
ref: Responsible
id': ISBN
ref: ProjCode
borrowing
COPY DocID PROJECT
DocID Serial-No ProjCode
Serial-No Borrow-Date Title
Date-Acquired PID Company
Loc_Store ProjCode
id: ProjCode
Loc_Shelf Return-Date[0-1]
Loc_Row id: DocID
id: DocID Serial-No ContractNo
Serial-No Borrow-Date ContractNo
ref: DocID ref: DocID ProjCode
Serial-No id: ContractNo
ref: PID id': ProjCode
ref: ProjCode ref
Fig. 20. The relational schema obtained by the application of the transformation plan of Fig. 18
on the conceptual schema of Fig. 19
In complex projects, for instance when the database includes several hundreds or
thousands of tables15, the core of the process will be organized as described in Fig. 21.
It comprises four main sub-processes, namely:
1. Parsing, that rebuilds the raw physical schema by merely parsing the DDL code
(codeddl). Only the constructs that have been explicitly declared in the code can
be recovered.
2. Refinement, which enriches the raw physical schema with the undeclared
constructs and constraints that have been elicited through the analysis of program
code (codeprg), as well as other sources that we will ignore here. Sometimes
more than 50% of the specifications can be retrieved in this way.
15
An SAP database can comprise 30,000 tables and more than 200,000 columns.
The Transformational Approach to Database Engineering 121
3. Cleaning, which removes the technical constructs, such as the indexes, and which
produces the logical schema.
4. Conceptualization, which derives a plausible conceptual schema from the logical
schema.
codeddl codeprg
Parsing
Calling the whole process DB-REng, and the four sub-processes Parse, Refine,
Clean and Concept respectively, we can write:
An interesting, and not really surprising, aspect of database reverse engineering is that
all the processes we have mentioned appear to be the reverse of database design proc-
esses. Indeed, we have the following relations:
Refine o Parse = Coding-1
Clean = PhysD-1
Concept = LogD-1
This observation has a deep influence on the specifications and the strategies of the
reverse processes. For instance, since the Conceptualization process is the inverse of
Logical design, it should be possible to derive a transformation plan for the former
just by reversing the plan of Logical design. Though this approach has proved
successful, the problem is a bit more complex due to the undisciplined way legacy
databases were designed. When the logical schema was built, it had to meet not only
functional requirements (that is, to express all the semantics of the conceptual
schema), but also non-functional requirements such as time-space optimization,
122 J.-L. Hainaut
security or privacy. The satisfaction of the latter requirements can deeply affect the
readability of the logical schema to such an extent that it has become quite difficult to
understand.
In the next section, we will very shortly describe the Conceptualization process as
a transformation process, and elaborate a representative transformation plan.
Reversing a transformation plan is a new concept that would deserve some further
discussion [22]. Due to space limit, we will give a simplified definition that is valid
for linear plans only, that is, plans which do not include if-then-else or loop con-
structs:
Fig. 22. Building a linear transformation plan for the Conceptualization process
The Transformational Approach to Database Engineering 123
16
A desirable property of these plans is their idempotence. It is not guaranteed in general.
124 J.-L. Hainaut
17
The free Education edition of DB-MAIN is available at the following address: http://www.
info.fundp.ac.be/libd, select "DB-MAIN CASE".
The Transformational Approach to Database Engineering 125
9.1 Domain
A domain is a named set of elements. It is declared by its name and the specification
of the set of elements. A domain is dynamic if its set can change over time. Some
predefined basic domains are provided, such as number, string or date. The model
includes a special dynamic basic domain, called entities, whose structure is immate-
rial, but the goal of which could be to denote application domain entities. A user-
defined domain is defined by an element set which is a subset of that of another
domain. A relation is a valid domain. Any domain defined as a subset of the domain
entities is an entity domain, and so forth transitively.
Examples
address ( Street: name,
City: name );
employee ( PId: number,
Name: name,
1st-name[0-1]: name,
Phone[1-5]: phone,
Contact: address);
126 J.-L. Hainaut
Interpretation: an employee has one (default [1-1], that is, from 1 to 1) personal ID,
one name, from 0 to 1 first name, from 1 to 5 phone numbers, and one contact, which
is made up of one street and one city.
If the concept of address is not considered important (for instance, it is not referred
to elsewhere), the domain address could be specified in line as follows:
employee (…, Contact: (Street: name, City: Name));
In some situations, the specification of the domain will be ignored for simplicity.
Consequently, the following notation will be allowed.
employee (PId, Name, 1st-name[0-1], Phone[1-5], Contact: address);
By default, the value of an attribute is a set of domain values. Due to the generality of
the GER, that is intended, among others, to describe logical and physical schemas, we
need more poweful data structures, such as set, bag, list and array attributes:
9.4 Constraints
ERM includes the uniqueness and inclusion constraints, as well as various depend-
encies, such as functional (FD) and multivalued (MV), of the standard relational
model18. Candidate key {A,B,C} of R will be declared by the clause id(R): A,B,C.
When possible, and where no ambiguity may arise, this specification can be replaced
by continuously underlining the components of the key. Inclusion constraints between
algebraic expressions are allowed.
18
These constraints have been defined on 1NF models, and their generalization to
N1NF models is far from trivial. Due to the limited scope of this paper, and without
loss of generality, we will ignore the complexity of the constraint patterns of N1NF
models.
The Transformational Approach to Database Engineering 127
Examples
19
1. R (A, B, C, D); id(R): A,B; also defined as: R (A, B, C, D)
2. S(E, G, H); S[G,H] ⊆ R[A,B];
3. T(A, B, C); T[A] = A;
Example 1 declares a candidate key in both alternative syntaxes. Example 2 declares a
foreign key through an inclusion constraint. Expression R[G,H] denotes the projection
of the current instance of R on attributes (A,B). Example 3 expresses a domain con-
straint. Every element of domain A must appear as the value of attribute A of at least
one tuple of the current instance of T.
In a N1NF structure, a local key can hold in a multivalued compound attribute. In
the following example, we declare that, for each product tuple, the candidate key
{Year} holds in each instance of Sales (no two sales the same year):
ERM includes a special form of cardinality constraint, through which we can state
how many tuples of the current instance of a relation must/can share a common do-
main value.
Considering the relation schema R(A,B,C) and an instance r of R,
card(R.A): [I-J],
is interpreted as20
∀a∈A, I ≤ ⏐r(A=a)⏐ ≤ J
Examples
1. R (A, B, C); card(R.A): [0-5];
2. R (A, B, C); card(R.(B,C)): [1-3];
Example 1 declares that any value of domain A may not appear in more than 5 tuples
of (any instance of) R. Example 2 shows a generalization of the constraint. It declares
that any couple of values of domains B and C must appear in 1 to 3 tuples of (any
instance of) R.
Note that candidate keys as well as the domain constraint T[A] = A are special cases
of cardinality constraint. Note also that cardinality properties and cardinality
constraints serve different purposes, and that none can replace the other one.
19
The graphical convention is as follows: the key of R(A,B,C) is {A,B} while R(A,B,C) has two
keys {A} and {B}.
20
Expression r(A=a) denotes the set of tuples of r where A=a.
128 J.-L. Hainaut
• domains
CUSTOMER: entities;
VEHICLE: entities;
CAR, BOAT: VEHICLE;
Name: string;
• relations
cust (CUSTOMER, CId: number, Name: name, Phone[0-3]: string),
owns (owner: CUSTOMER, CAR);
• constraints
VEHICLE = CAR ∪ BOAT;
id(cust): CId;
card(owns.owner): [0-5];
owns(CAR) = CAR;
The mapping Σger>erm (Section 3.3) is fairly straighforward for most GER
constructs. The inverse mapping is easy to derive as well. The main rules are
presented in Fig. 24, and need little explanation, except for the representation of an
entity type, since it seems to differ from the usual way one translates a conceptual
schema into relational structures, as illustrated in Fig. 1 for example. First, let us
recall that the goal of this section is not to produce relational databases, as
discussed in Section. 6.2, but rather to give an operational model rigorous
semantics.
An entity type E is merely represented by an entity domain, with name E,
independently of any other feature, such as attributes, it may be concerned with.
When entity type E participates in relationship type (rel-type for short) R, with role
r, its representation also appears as the domain of ERM attribute r of the relation R
that expresses this rel-type (see rel-types of and export in Fig. 24).
Now, how to express the GER attributes of E? Through a special relation that
aggregates each entity with its GER attribute values. The relation is given the
conventional name desc-E, for description of E. This relation comprises an entity
attribute, with name E, and defined on entity domain E. This attribute is a key of the
relation. Then, for each GER attribute, it comprises an ERM attribute, with the same
name and the same domain. Later on, we will see that, in some circumstances, this
relation can include other entity domains.
In this way, we can easily describe, beyond plain GER structures, an entity type
without attributes, or without identifiers, or with complex constraint patterns.
The Transformational Approach to Database Engineering 129
EMPLOYEE CUSTOMER
0-N
COUNTRY
Fig. 24. Main GER-to-ERM transformations (left to right) and their inverse (right to left)
130 J.-L. Hainaut
A rel-type is functional if it is binary, has no attributes and if at least one of its roles
has cardinality [i-1]. Let us consider the functional rel-type of, between ACCOUNT and
CUSTOMER, in Fig. 4, and recalled in Fig. 24. These three constructs translate in
ERM as follows (note that the identifier of ACCOUNT has not been translated yet):
Principle
A relation R in which a multivalued dependency (e.g., a FD) holds can be decom-
posed into smaller fragments according to this dependency [13].
Structural mapping
P R(U); {I,J,K} is a partition of U; I →→ J|K;
Q R1(I J); R2(I K); R1[I]=R2[I];
Variants
The project-join transformation can be particularized to relations in which I, J and/or
K are made up of one attribute only, in which K is optional, in which K is multivalued,
in which J is empty, and in which J and K are multivalued.
Signatures
direct : (R1,R2) ←⎯ PJ(R,I,J)
reverse : R ←⎯ PJ-1(R1,R2,I)
Discussion
This transformation is the variant of the relational decomposition theorem mentioned
in Section 4.3. It is therefore symmetrically reversible.
Example
Source schema works(who:EMP,in:PROJ,for:DEPART)
works:who ⎯→ for
Transformation (works-in,works-for) ←⎯ PJ(works,{who},{in})
Target schema works-in(who:EMP,in:PROJ)
works-for(who:EMP,for:DEPART)
works-in[who] = works-for[who]
Signatures
direct : (X,D,{A1,…,An}) ←⎯ den(S, )
reverse : () ←⎯ den-1(X,D)
Principle
The projection of a relation R on a subset {I1,…,In} of its attributes is explicitly
represented by surrogate domain X. Bijective relation D acts as a dictionary for the
elements of X. This domain replaces I in R, leading to relation T.
Structural mapping
Variants
When I = U, J is empty, so that the transformation degenerates into:
P R(U);
Q domain X; D(X,U); X appears in D only
Signatures
Extension
direct : (X,D,T) ←⎯ ext(R,I)
reverse : R ←⎯ ext-1(X,D,T)
The Transformational Approach to Database Engineering 133
Extension decomposition
direct : (X,{D1,D2,..,Dm},T) ←⎯ ext-dec(R,{I1,I2,..,Im})
reverse : R ←⎯ ext-dec-1(X,{D1,D2,..,Dm},T)
Discussion
This family of operators is particularly powerful, since it allows us to generate most
entity-generating and entity-removing transformations [17]. Based on the den and PJ-1
transformations, it is symmetrically reversible. The role of the parameter I can be inter-
preted as follows: the subset I of attributes of R seems to represent an outstanding
concept which would deserve being described by a new surrogate domain X.
Principle
A relation S is replaced by its composition T with another relation R.
Structural mapping
P R(I K); S(K L); S[K] ⊆ R[K]; I,K,L not empty;
Q R(I K); T(I L); T[I] ⊆ R[I]; R*T: K →→ L|I
Variants
The transformation simplifies when R is bijective:
Discussion
These operators derive from transformations PJ and PJ-1. Therefore they are symmet-
rically reversible. In the bijective variants, the transformation is symmetrical and can
be seen as substituting in S a key I of R for the key K.
Example
Source schema manages(MANAGER,DEPART)
works-in(EMPLOYEE,DEPART)
works-in[DEPART] ⊆ manages[DEPART]
Transformation works-for ←⎯
comp(manages,works-in,{DEPART})
Principle
A N1NF relation R that comprises a multivalued attribute B is replaced by S, its
equivalent 1NF version [43] [31].
Structural mapping
P R(I,B[1-N]);
Q S(I,B);
Variants
The cardinality of attribute B prohibits empty sets (otherwise values of I are lost),
which can be too strong a precondition. Hence the following variant, in which the
tuples of R with an empty B set can be rebuilt from the elements of the evaluation of
that do not appear in S:
P R(I,B(K));
Q S(I,K);
The Transformational Approach to Database Engineering 135
Signatures
direct : S ←⎯ unnest(R,B)
reverse : R ←⎯ unnest-1(S,B)
Discussion
Unnest, together with its inverse nest, are the main algebraic operators specific to
N1NF relational models. This version of unnest is symmetrically reversible. Indeed,
R meets the following criterion of reversibility (see [10] for instance): considering the
relation R(A,B[0-N],C), the application of the unnest relational operator on B is (sym-
metrically) reversible iff:
• no tuple of R has an empty B value (as if the cardinality property of B actually
was [1-N]),
• B is functionally (possibly non minimally) dependent on the set of all the other
attributes of R.
Examples
Source schema contacts(EMPLOYEE,PHONE[1-N])
Transformation contact ←⎯ unnest(contacts,PHONE)
Target schema contact(EMPLOYEE,PHONE)
The issue is to prove that a known, but possibly ill-defined, practical transformation is
correct and complete as far as semantics preservation is concerned. In this context, we
will revisit the three transformations that we have informally used in the introductory
example of Fig. 1, and that also are the most popular, notably in database logical design.
Due to space limit, only the main patterns will be discussed. For any variant of the
source schema, such as those that are suggested below, the reader is invited to examine
the ERM expression and to infer the actual resulting schema. For example, in the trans-
formation of attribute A2 into an entity type, no hypothesis is made on the participation
of A2 in an identifier of A. If this is the case, the ERM expression clearly shows how to
deal with this pattern, based on the dependency theory21. This is left as an exercise.
21
More precisely the rules that govern the propagation of FD in the projection, the join and the
selection.
136 J.-L. Hainaut
P Q
A A EA2
A1 ⇔ A1 0-N rA 1-N A2
A2[0-N] A3 id: A2
A3
Signatures
direct : (EA2,rA) ←⎯ att-to-et(A,A2)
reverse : A2 ←⎯ att-to-et-1(EA2)
Analysis
We express the source schema (left) in ERM, then we extract and flatten the multival-
ued attribute:
A: entities;
desc-A(A,A1,A2[0-N],A3);
desc-A[A]=A;
⇔
(desc-A',R) ←⎯ PJ(desc-A,{A},{A2})
A: entities;
desc-A'(A,A1,A3);
R(A,A2[1-N]);
desc-A'[A]=A;
⇔
R' ←⎯ unnest(R,A2)
A: entities;
desc-A'(A,A1,A3);
R'(A,A2);
desc-A'[A]=A;
The Transformational Approach to Database Engineering 137
(EA2,{desc-EA2},rA) ←⎯ ext(R',{A2})
A,EA2: entities;
desc-A'(A,A1,A3);
desc-EA2(EA2,A2);
rA(A,EA2);
desc-A'[A]=A;
desc-EA2[EA2]=rA[EA2]=EA2;
Interpreting this schema in the GER gives the expected target schema (right). We
conclude that att-to-et is an SR-transformation.
In the illustration of Fig. 1, we transformed relationship type write into entity type
WRITE. Here is a generalization of this operator for N-ary relationship types, that can
also have attributes (Fig. 26).
P Q
A B C A B C
0-N 0-N
R
⇔ 0-N rB 0-N
0-N R1 0-N 1-1
R2
R
rA rC
1-1 R1 1-1
R2
id: rA.A
rB.B
rC.C
Variants. The roles of R have cardinality constraints other than [0-N]; R is binary;
one (or more) of the roles of R has cardinality [0-1]; R has one (or more) explicit
identifier22.
Signatures
direct : (R,{(A,rA),(B,rB),(C,rC)}) ←⎯ rt-to-et(R)
reverse : R ←⎯ rt-to-et-1(R)
22
The default (not necessarily minimal) identifier of a relationship type is made up of the set of
its roles.
138 J.-L. Hainaut
Analysis
We express the source schema (left) in ERM, then we represent the set of roles by the
new entity domain R:
A,B,C: entities;
R(A,B,C,R1,R2);
desc-A[A]=A;
⇔
(R,{rA,rB,rC},desc-R) ←⎯ ext-dec(R,{{A},{B},{C}})
A,B,C,R: entities;
rA(R,A); rB(R,B); rC(R,C);
desc-R(R,R1,R2);
rA*rB*rC: A,B,C ⎯→ R;
rA[R]=rB[R]=rC[R]=desc-R[R]=R;
Interpreting this schema in the GER gives the expected target schema (right). We
conclude that rt-to-et is an SR-transformation.
In Fig. 1, we transformed all the one-to-many relationship types into attributes, then
we declared them foreign keys.
P Q
A B A
B
B1 1-1 R 0-N
A1
A2 ⇔ B1
B2
A1
A2
B2
id: A1 A1 id: A1
ref: A1
Signatures
direct : {A1} ←⎯ rt-to-att(R.B)
reverse : R ←⎯ rt-to-att-1(B,{A1},A)
Analysis
We express the source schema (left) in ERM, then we apply the composition trans-
formation:
The Transformational Approach to Database Engineering 139
A,B: entities;
desc-A(A,A1,A2); desc-B(B,B1,B2); R(A,B);
R[B]=B; desc-A[A]=A; desc-B[B]=B;
⇔
R' ←⎯ comp(desc-of-A,R,{A},{A1})
A,B: entities;
desc-A(A,A1,A2); desc-B(B,B1,B2); R'(A1,B);
desc-B[B]=R'[B]=B; desc-A[A]=A;
R'[A1] ⊆ desc-A[A1];
⇔
desc-B' ←⎯ PJ-1(desc-B,R',B)
A,B: entities;
desc-A(A,A1,A2); desc-B'(B,B1,B2,A1);
desc-A[A]=A; desc-B'[B]=B;
desc-B'[A1] ⊆ desc-A[A1];
Interpreting the latter schema in the GER gives the expected target schema (right).
We conclude that rt-to-att is an SR-transformation.
This process consists in exploiting the parametric nature of most ERM transforma-
tions to discover new practical GER transformations. This problem is open, but we
can illustrate it through a more in-depth examination of the extension-decomposition
transformation.
Let us consider the transformation depicted in the Fig. 26. Its analysis is based on
the ERM ext-dec transformation of the ERM relation R(A,B,C,R1,R2) that models the
relationship type R.
P Q
A B C A B C
0-N
⇔ 0-N
0-N
0-N
R rB
0-N R1 0-N
R
R2 1-1
rA R1
1-1 1-N R2
AB
id: rA.A
rB.B
Database engineering intrinsically has been model-driven for more than three dec-
ades. Designing, normalizing, merging, optimizing data structures can be performed
at an abstraction level that is, to a large extent, platform independent.
The transformational approach enriches this framework considerably, since it
opens the way to more structured and more reliable engineering processes. This paper
shows that such an approach brings several essential benefits.
Being formal, it can be used to study rigorously basic properties such as semantics
preservation, that states how the operators preserve the information contents of the
schemas;
To be fruitful, and to avoid combinatorial explosion, a pivot model, with which we
associate a relational semantics, has proved necessary;
From the pedagogical view point, this approach provides a disciplined and reliable
way to conduct important processes such as logical design, which many students
too often tend to consider as some kind of magic;
Developing CASE tools based on the transformational approach leads to more
reliable products, notably as far as generation completeness is concerned;
A transformational approach based on a pivot model is by construction scalable;
introducing a new model M involves the development of components independent
of the existing models.
Several problems still are to be addressed, of which we mention a sample.
20. Hainaut, J-L. Transformation-based database engineering. Tutorial notes, VLDB'95, Zü-
rich, Switzerland, (1995) (available at http://www.info.fundp.ac.be/libd).
21. Hainaut, J-L. Specification preservation in schema transformations - application to seman-
tics and statistics, Data & Knowledge Engineering, 11(1). (1996)
22. Hainaut, J-L., Henrard, J., Hick, J-M., Roland, D., Englebert, V. Database Design Recove-
ry, in Proc. of the 8th Conf. on Advanced Information Systems Engineering (CAiSE•96),
Springer-Verlag (1996)
23. Hainaut, J.-L., Hick, J.-M., Englebert, V., Henrard, J., Roland, D. Understanding imple-
mentations of IS-A Relations, in Proc. of the conference on the ER Approach, Cottbus,
Oct. 1996, LNCS, Springer-Verlag (1996).
24. Hainaut, J-L. Transformation-based Database Engineering. In: [47]. (2005) 1–28
25. Halpin, T., A., & Proper, H., A. Database schema transformation and optimization. Proc.
of the 14th Int. Conf. on ER/OO Modelling (ERA). (1995)
26. Henrard, J., Hick, J-M. Thiran, Ph., Hainaut, J-L. Strategies for Data Reengineering, in
Proc. of WCRE'02, IEEE Computer Society Press. (2002)
27. Hick, J-M., Hainaut, J-L. Strategy for Database Application Evolution: the DB-MAIN Ap-
proach, in Proc. ER'2003 conference, Chicago, Oct. 2003, LNCS Springer-Verlag. (2003)
28. Jajodia, S., Ng, P., A., & Springsteel, F., N. The problem of Equivalence for Entity-
Relationship Diagrams, IEEE Trans. on Soft. Eng., SE-9(5). (1983)
29. Kobayashi, I. Losslessness and Semantic Correctness of Database Schema Transformation
: another look of Schema Equivalence, Information Systems, 11(1). (1986) 41-59
30. Lämmel, R. Coupled Software Transformations (Extended Abstract), In Proc. First Inter-
national Workshop on Software Evolution Transformations (SET 2004). (2004)
[http://banff.cs.queensu.ca/set2004/set2004_proceedings_acrobat4.pdf]
31. Levene, M. The Nested Universal Relation Database Model, LNCS 595, Springer-Verlag.
(1992)
32. Lien, Y., E. On the equivalence of database models, JACM, 29(2). (1982)
33. Ling, T., W. External schemas of Entity-Relationship based DBMS, in Proc. of Entity-
Relationship Approach : a Bridge to the User, North-Holland. (1989)
34. McBrien P., & Poulovassilis, A. Data integration by bi-directional schema transformation
rules, Proc 19th International Conference on Data Engineering (ICDE'03), IEEE Compu-
ter Society Press. (2003)
35. Motro, Superviews: Virtual integration of Multiple Databases, IEEE Trans. on Soft. Eng.
SE-13, 7, (1987)
36. Navathe, S., B. Schema Analysis for Database Restructuring, ACM TODS, 5(2), June
1980. (1980)
37. Partsch, H., & Steinbrüggen, R. Program Transformation Systems. Computing Surveys,
15(3). (1983)
38. Poole, J. Model-Driven Architecture : Vision, Standards And Emerging Technologies. in
Proc. of ECOOP 2001, Workshop on Metamodeling and Adaptive Object Models, (2001)
39. Rauh, O., & Stickel, E. Standard Transformations for the Normalization of ER Schemata.
Proc. of the CAiSE•95 Conf., Jyväskylä, Finland, LNCS, Springer-Verlag. (1995)
40. Roland, D. Database engineering process modelling, PHD Thesis, University of Namur.
http://www.info.fundp.ac.be/~dbm/publication/2003/these-dro.pdf (2003)
41. Rosenthal, A., & Reiner, D. Theoretically sound transformations for practical database de-
sign. Proc. of Entity-Relationship Approach. (1988)
42. Rosenthal, & A., Reiner, D. Tools and Transformations - Rigourous and Otherwise - for
Practical Database Design, ACM TODS, 19(2). (1994)
The Transformational Approach to Database Engineering 143
43. Schek, H-J., Scholl, M., H. () The relational model with relation-valued attributes, Infor-
mation Systems, 11. (1986) 137-147
44. Thalheim, B. Entity-Relationship Modeling: Foundation of Database Technology.
Springer-Verlag, (2000)
45. Thiran, Ph., Hainaut, J-L. Wrapper Development for Legacy Data Reuse. Proc. of
WCRE'01, IEEE Computer Society Press. (2001)
46. Thiran, Ph., Estiévenart, F., Hainaut, J-L., Houben, G-J, A Generic Framework for Extrac-
ting XML Data from Legacy Databases, in Journal of Web Engineering, Rinton Press,
(2005)
47. van Bommel, P. (Ed.). Transformation of Knowledge, Information and Data: Theory and
Applications, Information Science Publ., Hershey. (2005)
48. van Griethuysen, J.J., (Ed.). Concepts and Terminology for the Conceptual Schema and the
Information Base. Publ. nr. ISO/TC97/SC5-N695. (1982)