You are on page 1of 12

TOWARDSTHE SUPPORTOF INTEGRATEDVIEWS OF MULTIPLE DATABASES:

AN AGGREGATESCHEMAFACILITY

Donald Swartwout
James P. Fry
Database Systems Research Group
The University
of Michigan, Ann Arbor, Michigan
ABSTRACT: Supporting multiple
user views of databases is currently
an important problem area
in database management system development.
An interesting
facet of this problem arises whenever
a user needs an integrated
view of several distinct
databases.
Using traditional
database
concepts, an aggregate schema facility
has been developed to address this problem.
The
basic functions of an aggregate schema facility
are discussed, as well as their implementation
in a CODASYL/DBTG-like environment.
Interest
in an aggregate schema facility
grew out of a
problem in restructuring
large databases.
The application
of this facility
to restructuring
is discussed, as well as potential
applications
to dynamic translation
and distributed
databases.
KEYWORDS
AND PHRASES: Database integration,
aggregate schema, database management systems,
data definition
languages, database restructuring,
data translation,
dynamic translation,
distributed
databases.

1.0

INTRODUCTION
requires processing, while the remainder need
only be copied from the original
to the translated database.

Today we find numerous databases implemented


under various accessing schemes being utilized
by
many diverse users.
With the installation
of
more and more database management systems, the
trend has been toward databases which are
larger (in terms of volume), more complex (in
terms of fnterrelationsbtps
among records)., and
used by a broader spectrum of users, To reduce

Our efforts to exploit


this fact soon led to
a situation
which sometimes required one of the
MDT's major modules to process two distinct
databases as if they were a single dqtabase,
To facilitate
this unusual processing,
the
Aggregate Schema Facility
was developed,
It
allows the user (in this case a translator
module)
to view and to access two distinct,
but possibly
interrelated
network databases at the same time,
as if they were a single database,

thi\s complexity, and to sane degree fmprove


often prosecurity, the database administrator
vides a subset of the database for the user to
process,
In CODASYLterms, this would be a
subschema, whereas in IMS terminology
it is
called a PSB IProaram Soecification.Block)
or
in ANSI/SPARC'vocabulary,
an External Schema.
Independent from the logical
subsetting
capabilities
DBMSalso provide features to enhance
database access--CODASYL provides AREA/REALM
and indexing mechanisms and IMSprovides several
access methods--physical
databases, secondary
dataset groups, and indexing.
The problem is
that in either case, there is no facility
for
the direct connection of a user view with an
"optimized"
access method.

More generally,
an aggregate schema facility
ts any set of data deflnition
and manlpulatton
capabilities
which permit the processtng of
several physically
and schematically
distinct,
possibly interrelated
databases as if they were
a single database. This single database is
.referred to as an aggregate schema database; its
schema is an aggregate schema, The physically
existing
databases are referred to as underlying
or component databases; their schemas' as underlying or component schemas. An aggregate schema
facility
can represent a considerable
relief
for
a user whose.processing
requires substantial
Interfacing
of several databases.
The user is no
longer responsible
for keeping track of the
current database, selecting
the database In
which relevant data resides, accounting. for
naming dtscre ancies among the databases, etc.
the module
In fact, in t Re MDT implementation,
which uses the aggregate schema facility
is

While this general problem of supporting


multiple
user "windows" is encountered whenever
data is shared by a diverse user community, it
has appeared in restricted
form in the restructuring of large databases.
During the development of the Michigan Data Translator
(MDT) we
found that while sane restructuring
transformations substantially
alter a database, many affect
only a few record and set types.
In such cases,
only a small portion of the total data actually
132

unaware of the number of underlying


is processing.

databases

it

Therefore,
an aggregate schema facility
must provide a mechanism for translating
references to
aggregate schema names into references to the
appropriate
underlying
schema names.

While similar
at the the conceptual level to
the IMS looical
database concept where several
physical hierarchical
databases can be integrated
into a complex logical
structure,
the aggregate
schema facility
differs
in the following
way. An
IMS user may only process/access (in his PSB) a
hierarchical
subset of the logical
database,
although the logical
database-may in fact be a
network.
The ASF oermits full access to the
underlying
databases.
The aggregate schema
approach also bears similarity
to the CODASYLDBTG Schema/Subschema facilitv.
While this
facility
is.a true subsetting-capability,
perhaps
the best way to view the aggregate schema facility
is a "supersetting
capability".
An aggregate
schema database is an aggregation of subsets of
databases.
Objects (records, items, sets) in an
aggregate network database correspond more or less
directly
with objects in the underlying databases.
An aggregate schema facility
is by no means as
aeneral as an ANSUSPARC External/Internal
Schema
interface,
but is a step in this direction.

The implementation
details
for such e
facility
are not very complex, and are determined
largely
by the time at which the binding of aggregate schema names to underlying schema names is
performed.
Choices for this binding time are at
aggregate schema DDL compile time, aggregate
database application
time, or at each DML call.
The latter
is unlikelv
to be advantaaeous unless
applications
are expected to issue relatively
few DML calls which accomplish large amounts of
data transfer.
The first
two options differ
in
one important respect: binding at DDL compile
requires recompiling
the aggregate schema DDL
whenever a recompile of one of the underlying
database DDLs occurs, but binding at run time
does not.
In the ASF, binding occurs at DDL
compile, since the underlying
schemas were expected to be stable enough that the increased
flexibility
of later binding would not outweigh
the increased cost.

This paper describes the functions of aggregate schema facilities,


and some of the implementation problems they present.
The ASF is used
as an example throughout.
Section 2 discusses
the basic tasks of an aggregate schema facility,
and Section 3 describes-the-implementation
of the
ASF. The aoolication
of ASF to the restructurino
of large dayabases is described in Section 4,
and Section 5 concludes by discussing some
additional
applications.
2.0

Finally,
a somewhat less obvious form of
name-maopinq is necessarv.
Each DBTG record
instance'has
a name, kno-wn as its database key.
Database keys generally
do not contain information identifying
the database in which the
record instance resides.
As a result,
if the
aggregate schema system permits the user access
to database keys, some sort of database identification
must be appended to underlying
database
keys, when they are passed to the user.
Conversely,
a database key received from the user
must be decoded into an identifier
for an underlying database and a database key for a record
residing
in that database.

BASIC TASKS OF AN AGGREGATESCHEMAFACILITY

In order to achieve functional


equivalence
between an aggregate schema database and its set
of underlying
databases, four basic tasks must
be performed: i) mapping of aggre ate schema
names to underlying
databases, ii 4 maintenance
of inter-database
connections,
iii)
maintenance
of currency for the aggreqate database, and iv)
protecting-the
consistency of the aggregate
DBTG terminoloqv will be used in
databases.
the discussion,
but analogous tasks exist for
other classes of databases.
Also, we discuss
exclusivelv
aooreaate schema facilities
built
"on top of'l an-existing
DBMS; that is, those
which do not alter the existing
DBMSfunctions or
At run time, they act approxiimplementation.
mately as dispatchers
translating
the aggregate
schema operations
into OMLcalls for the
The more comappropriate
underlying
database.
plicated
problem of incorporating
aggregate
schema capabilities
into existing
DBMSfunctions
Finally,
implementation
is not considered here.
strategies
used in the Aggregate Schema Facility
(ASF) will be identified.
2.1

2.2

Inter-database

Connections

Name mapping is the only major task required


of an aqqreqate schema facility
which does not
recogni% connections among underlying databases.
However, among databases which are reasonable
candidates for aggregation,
there are likely
to
exist implicit
and/or explicit
inter-database
In the environment which led to
relationships.
the development of the ASF, such relationships
were a necessary feature of the underlying
databases (see Section 4).
In DBTG environments,
inter-database
relationships
take two natural
In the first form, the owner of a set
forms.
resides in one of the underlying
databases, and
In the
its member record types in another.
second form, data items which the aggregate
schema user views as the items of a single
record type are divided among record types in
different
underlying
databases.
Such a record
type in an aggregate schema is referred to as a
I'sPlit"
record
type and the underlying record
tvoes are its "components".
In the same spirit,
wk'will
refer to a'divided
set as a "split-set".

Name Mapping

Many aggregate schema names differ


from the
names of the corresponding
underlying
schema
In fact, if a particular
name is
constructs.
used for different
objects in two or more of the
underlying
databases, only one of them can use
the common name as its aggregate schema name.

At the heart of both forms is the necessity


for information
contained in a record instance in
one of the underlying
databases to identify
a
record instance in one of the other underlying
Record instances can be identified
databases.
133

explicitly
(by database keys) or implicitly
(by
the data they contain).
Split records and split
sets can be implemented usjng,either .
approach.
If records are identifiea
DY aataDase keys, each
record underlying
a split record will contain,
as a specially
designated data item, the aggregate schema database keys of the other comoonents of the split record.,
Similarly,
the records which participate
in a split set will
contain pointer space for cross-database
pointers.
In a member record type, for instance,
these might include the aggregate schema database keys for the owner record instance,
as well
as the next and previous member records.
.For
practical
identification
of records based on
their data, each record instance's
position
in
its database must be computable (at least approximately) from the values of some or all of its
data items.
Borrowing an obvious term from relational
databases, we will refer
to such a set
of data items as a record's "primary key".
Provided that all of the participating
records can
in fact be located in their databases by hashing
on their primary keys, the aggregate schema database keys referred to above could be replaced
by the values of the primary keys of the desired
records.
Thus each member of a split set might
contain the primary key of the next member instance, and each record underlying
a split record would contain the primary keys of all the
other components, etc.

In Figure 2-2 the discharge of the first


PATIENT
covered by a POLICY will leave a pointer to nowhere in First-Covered,
unless the discharge program is aware of the aggregation,
The ASF does not
support split sets implemented by database keys.

Figure 2-l presents an example of the two


types of inter-database
connection.
Small
portions of three underlying database schemas
are shown. Underlined items are primary keys
used to hash the AGENT and both POLICY records
to their locations
in the two insurance dataPOLICY records rebases. The two underlying
present different
types of information
about
insurance policies
and are natural candidates
for aggregation
into a split record type POLICY.
The components of an instance of this split
record type would be a sales POLICY record
instance and a premium/payment POLICY record
instance with matching Numbers. In Figure 2-2
the same split record-is
implemented wjth
database keys (Premium-Key and Sales-Kevl because the premium/paymentPOLICY record"cannot be located by,hastiing.
A .potential
split set exists between the
premium/payment POLICY record type and the
hospital
PATIENT record type.
An instance of the
COVERSset consists of a POLICY record instance
as owner and as members, all PATIENTS covered by
the POLICY. Membership in the COVERSset is established
by the Insurance-Number item; whose
value is the number of each patient's
health insurance policy.
A PATIENT record instance's
owner along COVERScan be located by hashing into,
the premium/payment database, using-the value of
Insurance-Number as hash key. Notice that locating all PATIENTS associated with a particular
POLICY requires accessing all PATIENT record instances.
To avoid such potential
inefficiences,
the ASF permits only member-to-owner traversal
of'
split sets implemented by primary kevs. This
problem disappears when the split set is implemented by database keys, as in Figure 2-2.
Of course, a split set implemented by database
keys poses considerable
update problems.

Split records and sets add a new wrinkle to


the currency system of an aggregate schema facility.
For example, the current instance of a
split record type involves two or more current
record instances in two or more underlying databases. Three basic approaches to the currency
problem suggest themselves.
First, the aggregate
schema facility
software maintains appropriate
current constructs
in the underlying
databases,
using the underlying
DBMS' currency system.
Second, all currency is maintained by the aggregate schema software, ignoring the underlying
currency system. And third the aggregate schema
software maintains currency for split records
and sets, while the underlying
DBMSis used to
maintain currency for aggregate schema records and
sets which correspond directly
to underlying
.:
We know of no general advanschema constructs.
tages of one of these options over the others, but
within the ASF architecture,
the second option involved the least overhead.

Notice that a potential


split set analogous
to COVERSexists between the sales POLICY record
type and PATIENT. Naturally,
the user is under
no obligation
to recognize all potential
connections among the underlying databases.
An
aggregate schema which uses the split POLICY record and split COVERSset of Figure 2-1 is shown
in Figure 2-3. A split record can participate
in all sets (ordinary or split) in which its
components participate.
This is illustrated
by
the AGENT-FORand COVERSsets of Figure 2;3.
Naturally,
the data used to maintain interdatabase connections may be meaningless when an,
underlying database is processed individually.
Thus split records and sets may require significant overhead in storage space, and their integrity is always in danoer of violation
bv aoolications which process a-single
underlying-database,
without knowledae of its oarticioation
in an
aggregate database.
Inter-database
connections
are therefore a rather mixed blessing:
for an
example of their application
see Section 4.
2.3 Currency

2.4 Consistency
Except for split records and sets, the integrity of the underlying databases guarantees that
of an aggregate database.
(Provided, of course,
that the aggregate schema DDL is correct.)
Furthermore the potential
inconsistencies
for split
sets--illegal
member or owner types, pointers to
nowhere, missing pointers,
etc.--parallel
those
As a result,
split setintegfor ordinary sets.
rity control comparable to that available
for
ordinary sets can be achieved by implementing
split set manipulation
functions with the same
consistency checks used in the underlying
set
Split records do not have
manipulation
functions.
such a parallel
in the underlying
DBMSfacilities,
and they pose two special consistency problems for
134

AGENT
1

Name

Phone

Addr

AGENT-FOR
--------------w----w---

POLICY

POLICY

Potential

Split

Record POLICY

Potential
x
Split Set
COVERS
-'
l-iq%qcJ
\

INSURANCE
SALES DATABASE

Figure

2-1

I
v
INSURANCE
PREMIUM/PAYMENTDATABASE

Inter-database

Connections

Y
HOSPITAL
DATABASE

AGENT
i

AGENT-FOR

-------_-------------------POLICY

POLICY

I
l-

1
Number

SaleDate
----Am--------------

EffectiveDate

PremiumKey

Number

Premium

POLICY

BenefitSales
Code
-----COVERS -I.\

Key

First
Covered

I
Li

PATIENT
*
Name Age Insurance
Number

v
SALES

-I
PREMIUM/PAYMENT

Figure

2-2 Inter-Database

Connection

Implemented

by Database Keys

HOSPITAL

Next
Covered

important goal of this design is that no major


modifications
are to be made to-the existing
architecture
of the database management system.
The ASF interacts
with ADBMSas a user would,
albeit
a sophisticated
one. The complete ASF
consists of the Aggregate Schema DDL (ASDDL), and
the Aggregate Schema Processor.

AGENT

3.1

AGENT-FOR

Schema DDL

The ASDDL is the description


mechanism which
permits the user to specify an aggregated view
of multiple
databases.
It is a DBTG-like DDL,
with the added dimension of inter-database
connection facilities.
In this section we will highlight the salient
features of the language.

POLICY

An aggregate schema record type has two


basic modes. In the first mode the record type
is based exclusively
upon a record type in one of
the underlying
databases.
In the second mode,
the record type is based on the coupling of record types in several underlying
schemas. To
avoid potentially
large numbers,of aggregate
schema pointers in the components of a split
record,.the
ASF permits exactly two components
for each solit record type.
The couplino of the
record instances which make up a split record
instance takes one of two forms:

COVERS

Aggregate

PATIENT

1. Identical
Figure

2-3

An Aggregate

Schema

the aggregate schema facility,


First, a valid
split record instance exists only if each of its
component records exists and correctly
references the other components. The aggregate schema
software may elect to check that each component
record instance underlying a split record contains valid references to the other components
whenever the split record is created or modified, and/or whenever data is read from one of
the components.
Second, aggregate schema items
in a split record type may correspond to an item
in exactly one of the component records, or the
aggregate schema user may expect the same item
to appear in twe or more of the component records.
Whenever the value of an item of the
latter
variety
is set by the aggregate schema
item assignments must take place in
user,
several underlying
databases.
Also, whenever
the value of such an item is read by the aggregate schema system, it may be desirable
to
check for matching values in other component
This is the case because
record instances.
applications
which process an underlying database outside the aggregation--individually
or as
part of another aggregation--might
update the
item in one of the components but not the others.
The ASF user has the option of requesting such a
verification
in the aggregate schema DDL.
3.0

IMPLEMENTATIONOF AN AGGREGATESCHEMA
FACILITY

In the previous section we discussed the


basic tasks of an aggregate schema facility.
This section discusses the design and implec
mentation of such a facility
in a DBTG v like
database management system r ADBMS (UYL), An

Random Access Keys (VIA HASH)

In this situation
the underlying record
types have identical
primary keys. Two
record instances make up a split record
'instance when their primary keys have
identical
values,
2. By Pointers

(VIA iname-1, iname-2)

Here each of the underlying


records
contains an item (iname-l and iname-2)
whose value is the database key of the
other underlying
record.
An item in a split record type may correspond to an item in one of the underlying record
In the latter case the item
types, or both,
value is assumed to be the same in both underlying records,
A verification
mechanism is

avatlable,
Aggregate schema set types also have two
modes, An aggregate schema set may correspond
to a set of one of the underlying
schemas, or
it may be a split set,
As it does with split
records, the ASF restricts
the possible implementatlons for s lit sets,
ASF split sets are
1y item values,
all established
That is, the
owner record type% address must be computable
from its primary key value, and each member rer
cord must contain a copy of its owner's primary
key,
Figure 391 shows ADBMSDDL descriptions
for
the three underlying
databases of Figure 2-l and
Figure 3-2 presents and ASDDL description
for the
aggregate schema of Figure 2-3.

INSURANCESALES DATABASEADBMS:DDLEXCERPT
RECORD
ITEM
ITEM
ITEM
I(ERD
ITEM
ITEM
SET
OWNER
MEMBER

AGENT
NAME
PHONE
ADDR
POLICY
NUMBER
SALE-DATE
EFFECTIVE-DATE
AGENT-FOR
AGENT
POLICY

HASH
CHAR
CHAR
CHAR
HASH
CHAR
CjlAR
CHAR'
FIFO

PRIMARY KEY

:i
40
15

PRIMARY KEY

8"

INSURANCEPREMIUM/PAYMENTDATABASEADBMS.DDL EXCERPT
PoLrcY
KiRD
NUMBER
ITEM
PREMIUM
ITEM

BENEFIT-CODE

15
4
4

INTEG
TNTEG

PRIMARY KEY

HOSPITAL DATABASEADBMSDDL EXCERPT


RECORD
:zl
ITEM

PATIENT
NAME
cHAR
INTEG
AGE
INSURANCENUMBER CHAR
Figure

3:
15
3,l

Sample,ADBMS DDL

AGGREGATESCHEMADDL
RECORDAGENT IS AGENT IN SALES
ITEM NAME IS NAME
ITEM PHONEIS PHONE
ITEM ADDRESSIS ADDR
RECORDPOLICY COUPLEPOLICY IN SALES WITH POLICY IN PREMIUM/PAYMENT
VIA HASH
ITEM NUMBERIS NUMBER, NUMBER
ITEM SALE-DATE IS SALE-DATE IN FIRST
ITEM EFFECTIVE-DATE IS EFFECTIVE-DATE IN FIRST
ITEM PREMIUMIS PREMIUMIN SECOND
ITEM BENEFIT-CODE IS BENEFIT-CODE IN SECOND
RECORDPATIENT IS PATIENT IN HOSPITAL
ITEM NAME IS NAME
ITEM AGE IS AGE
ITEM INSURANCE-NUMBER
IS INSURANCE-NUMBER
SET AGENT-FORIS AGENT-FORIN SALES
OWNERIS AGENT
MEMBERIS POLICY
SET COVERSMATCH-KEY
OWNERIS POLICY
MEMBERIS PATiENT
REPRESENTATIONIS MATCH-SPRINGINSURANCE-NUMBER
Figure

3-2

138

ASDDL

3.2

The Binding of Underlying


Aggregate Schema

Schemas to Form an

(existing)
database to the target (desired) database [UTlO].
The basic architecture
of the
Michigan Data Translator
(MDT) is shown in Figure
4-l.
Any data translation
system constructed
according to this architecture
performs the
copying cited above three times: once in each of
the major modules.
In partial
restructuring,
data which is not changed.during translation
bypasses the Restructurer
module, and is transferred directly
from the source internal
form
database to the user's target database by the
Writer.
Data flow in a partial
restructuring
translation
is shown in Figure 4-2.

The CODASYLDatabase Task Group Specification


states that "Object versions of primary and auxiliarv
data definitions
mav be cornoiled indeDendentli of any user program-and of each other; and
stored in a library"
[SSI.
To achieve this data
independence the binding of the underlying
schemas to form an aggregate schema must take
place at run time.
This means that all aggregate schema pointers must be assigned values at
run time.
It was our opinion that the overhead
required by this approach was not worth the flexibility
it provided.
As a result the schemas
are bound at analysis time and the DDL analysis
dates are checked-at run time.
This assures that
consistency with underlying
schemas and the
aggregate schema analysis dates is maintained,
3.3

Aggregate

Economies in both translation


processing
time and secondary storage overhead result from
shortened Restructurer
execution and smaller
target internal
form databases.
However,
implementation
of the partial
restructuring
strategy is contingent
upon the solution of two
fundamental problems--the
static portion connection problem and the Writer input problem.

Schema Processor

Since the ASF provides an aggregation of


several underlying
schemas and corresponding
physical databases, the DML operations on the
aggregate schema must be distributed
accordingly.
In order to make this transparent
to the user,
the Aggregate Schema Processor transforms the
user's DML commands on the aggregate schema into
specific
corrmands on the schemas of the underlying databases.
The two basic mechanisms
that provide this capability
are the currency
and split record mechanisms.

4.2 The Writer

APPLICATION:

4.3 The Static

USE OF THE ASF IN PARTIAL

Work on the data translation


problem of
partial
restructuring
provided the impetus for
development of the ASF. This section sunsnarizes
the partial
restructuring
strategy and the role
of the ASF in its implementation.
Partial

Portion

Connection

Problem

It became clear early in the partial


restructuring
effort that sets having an unaltered
record as owner type and an altered record as
member type, and occasionally
vice versa as well,
would have to be maintained,
That is, split sets,
generally with owner in the internal
form source
database and member in the database for altered
data, were called for,
Two factors prevented
the use of split
sets, however, Firkt,
such
split sets represented by database keys would
have been effectively
impossible for the Restructurer
to create.
The other option for
split sets is representation
by primary keys.
Placement of internal
form source records by
hashing on primary keys was not possible for
the Reader module, and as discussed in Section 2,
this prevented the use of the second form for
Thus split records are used to
split sets.
connect the altered and unaltered data.
Unaltered record types which are related by at
least one set to an altered record type are
known as "fringe"
records,, and reside in the
aggregate.target
database as split records,

RESTRUCTURING

4.1

Problem

A given data translation


may or may not involve partial
restructuring.
If it does not, the
Writer module's input consists solely of the
internal
form target database output by the
Restructurer.
But if partial
restructuring
is
involved,
then the databases output by both the
Reader and Restructurer
modules are input to
the Writer.
Although they are physicallydistinct
databases, conceptually
they represent
a single body of information:
the user's target
The Writer is therefore a natural
data.
aggregate schema facility
user. Also, the
Writer's
architecture
profits
considerably
from
the. use of the ASF. It always expects an aggregate database as input, without knowing whether
there are one or two underlying databases.
The
Writer is thus relieved of responsibility
for
dealing with two very dlfferent
input configurMOT architecture
with ASF is shown in
ations.
Figure 4-3.

The currency mechanism of the ASF is based


upon the existing
currency system of ADBMS. The
ASF maintains a currency table containing
the
aggregate schema database keys of the current
member and owner of each set type and of the
current record of each type.
The split record
mechanism is responsible
for translating
data
manipulation
operations on a split
record instance into ordinary operations on components.
When an aggregate schema user issues DML calls
such as "Get Field" or "Set Field" for an item
in a split record, the split record mechanism
determines which component contains the desired
item, and issues an appropriate
ordinary DML
call to the appropriate
underlying
database.
Two such ordinary DML calls are necessary when
the aggregate schema item corresponds to matching
items in both components, as in the case of the
Number item in the aggregate POLICY record
of Figure 3-2.
4.0

Input

Restructuring

Practical
experience in data translation
has
shown that many real-world
database restructuring
problems involve actual changes to only a small
portion (often 10% or less) of the data.
The
remainder is simply copied from the source
139

+-

Writer

Restructurer

Reader

Target
Data
:Internal
Form)

Source
Data

Data Flow

Figure 4-1
Basic MDT Architecture

*-

Restructurizr

Reader

uata
Internal
Form)

Data
Internal
Form)
..-.
unaltered
s. *ta

-7

Writer

Partial

Figure 4-2
Restructuring

--I@-

Scenario

Data

140

Data Flow

==F

Data Flow

Figure 4-3
MDT With &F

141

Unaltered

CA
S

records:

A, B

s2

Fringe
B

records:

Altered
C

User Target

records:

Schema

Inter-component
References

B Record

B Record

A Records

C Records

Altered
(Internal

Source Data
(Internal
Form)

Distribution

Figure 4-4
of a Fringe

142

Record

Data
Form)

The component of a fringe record which resides


in
source database contains all of
. . the
__ internal
~~~
the record's data items and the pointers which
establish
its set relationships
with other unThis component is created
altered record types.
by the Reader module. The component which resides in the altered data contains no actual
data; it serves as a "place-holder"
for the record, participating
in sets which relate it to
This component is created by
altered records.
the Restructurer
module. The two components
reference each other via database keys. Both
reference items are set by the Restructurer
upon creation of the altered-data
component.
Figure 4-4 illustrates
this distribution
of a
fringe record.

data in an "alterations
database".
Ordinary
users could be presented with an aggregate database'made up of the original
database and the
alterations,
making the translator's
activity
invisible
to them. The alterations
database is
an application
of the differential
file concept
of Severance and Lohman [GlZ] to data translation.
5.2

Databases

A distributed
database can be viewed very
A database
naturally
as an aggregate database.
which resides physically
at several nodes of a
computer network can be presented to the user as
an aggregation of the underlying databases at
each node of the network.
Deppe and Fry [DB22]
point out that maintenance of a distributed
database requires maintenance of integrated
views of several underlying
databases.

4.4 Peformance Estimates


The time required for the translation
of
aggregate schema DML calls into OML calls
against the underlying databases increases the
processing required for aggregate schema DML
calls,
compared to equivalent
ordinary DML calls,
This translation
is of a very simple nature in
most cases, and the resulting
increase was not
expected to be significant,
especially
when
compared to the essentially
unchanged I/O rePerformance tests of the ASF inquirements.
dicate that approximately
3 to 5 percent of
total DBMSprocessing time is spent in aggreSince the ASF maintains
gage schema functions.
its own currency tables and other information
concerning schema constructs that is roughly
equivalent
in size to the information
maintained
by the underlying
DBMS, storage requirements
far DBMSin-core tables approximately
doubled
with the addition of the ASF. This increase
was offset somewhat by the simplifications
permitted in the Writer by the aggregate schema
Finally,
time spent for automatic
approach.
generation of ASDOLwithin the translator
was approximately
equal to the time used for
generation of DDL for the two underlying
databases.

5.3

Design of Complex Databases

As databases become larger and more complex,


it will be necessary to provide an optimized
logical
and physical subset to particular
users
and at the same time an integrated
view to the
general user.
An aggregate schema facility
provides the capability
to integrate
a collection
of optimized subsets, thereby allowing the database administrator
to tailor/optimize
particular
portions of the general database.
ACKNOWLEDGMENT
The authors wish to express their appreciation to David E. Bakkom, who developed the
original
idea for the ASF, and to Andrew M.
Marine and Linda A. Hutchins, who implemented it.
REFERENCES
DB22 FRY, J. P. and DEPPE, M. E., "Distributed
Data Bases: A-Sumnary of Research,"
Computer Networks 1,2(1976):1-13.
612
-.-

5..0 PL'RTKRA,PPLXC!WONS
.:The ASF was used successfully
to improve
the MDT's performance on data translations
involving limited restructuring,
We believe
that aggregate schema facilities
can prove
useful in several other areas as well.
These
areas include dynamtc translatfon,
dtstributed
databases, and destgn of complex databases,
5.1

Distributed

SEVERANCE,Dennis 6. and LOHMAN,6. H.,


,I
Differential
Files:
Their Application
to the Maintenance of Large Databases,"
ACM Transactions
on DatabaseSYstems 1,

-_.-

?lli17k\
\
,a,

S5

Dynamic Translation

v,

.?r;k,7k7
.W

-s

CODASYLDATA DESCRIPTION LANGUAGECOMMITTEE,


CODASYLData Description
Lanasuane
Journal of Development, June 1973, NBS
Handbook 113, ACM, N.Y., Jan. 1974.

UT10 BAKKOM, D. E., et al.,


"Partial
Restructuring Approach to Data Translation,"
Working Paper 76 DT 8.1, Data Translation Project,
The University
of
Michigan, Ann Arbor, 1976,

As it is done currently,
data translation
is qenerally
an off-line
process, with all
transactions
locked out during translation.
Since this can mean several days of no activity
other than translation,
even if the alterations
affect onlv a very small portion of the database, restructuring
operations that are desirable
in the long run may not be done because
of their short-term
cost.
To ease the pain of
translation,
a dynamic translation
approach
might be taken,, Here, the data translator
is
seen as "just another user", reading data from
the original
database and entering restructured

uY2 HERSHEY, E, A,,III,


DIssEN, R.L. and
of ADBMS:
MESSINK, P.W., "A Description
Working Paper No, 122, ISDOS Research
Project,
The University
of Michigan,
Ann Arbor, July 1975.

143