You are on page 1of 14

Edgar H.

Sibley Of the many approaches to relational database design, the Object Modeling
Panel Editor
Technique (OMT) is particularly effective. A comprehensive explanation and
two applications show the semantic improvement of OMT over other
approaches.

RELATIoNdL DATABASE DESIGN USING AU


OBJECT-ORIENTEDMETHODOLOGY

MICHAEL R. BLAHA, WILLIAM J. PREMERLANI and JAMES E. RUMBALIGH

Object-oriented concepts provide a useful abstraction 3. understandability: How coherent is the structure of
for relational database design. In this article, we present the database to end users, other database a.rchitects,
a design technique that has been used for several proj- and the original designers after a period of time?;
ects at General Electric. The methodology is intuitive, and
expressive, and extensible. Object modeling promotes 4. extensibility: How easily can the database be ex-
adherence to normal forms and improves integration tended to new applications without disrupting ongo-
between databases and applications. ing work?.
Data’basedesign or data modeling is one aspect of We have developed a new approach to relational da-
software engineering. A data model is the first design tabase design that has been effective in meeting these
step towards using a database in an application. It de- goals. This approach is based on the work of Loomis,
fines the structure of a database. For a relational Data- Shah, and Rumbaugh [5]. We only focus on relational
base Management System (DBMS), this structure in- database design. Relational DBMS have a better theo-
cludes details like defining attributes and tables and retical foundation than network and hierarchical sys-
specifying rules to guarantee the integrity of tables. Ap- tems and are the focus of intense commercial activity.
plicatians populate the database structure and make
the information accessible to the user. RELATED WORK
The goal of database modeling is to design a better There are many approaches to database design.
database. The merit of a database design can be mea- Wiederhold, for instance, lists eleven categories of data-
sured in a variety of ways. Some important criteria are: base models in his study [lo]. Although a thorough
1. performance: Does the structure of the database pro- review of data modeling techniques is beyond the scope
mote the availability of the data?; can users quickly of this article, some of Wiederhold’s comments are apt.
retrieve and update relevant data?; “We believe that having a wide variety of [database]
2. integrity: To what extent does the database guaran- models is valid, since an equally wide variety of objec-
tee that correct data is stored? (the definition of “cor- tives is being served . . . . Choosing a good way to repre-
rect” depends on the application); sent a problem is a major step toward its solution” [lo,
p. 115, 1161. The model presented here is particularly
01988 ACM OOOl-0782/88/0400-0414 $1.50 effective for large, complex database problems: often

414 Communications of the ACM April 1988 Volume 31 Number 4


Computing Practices

found in science and engineering. We will limit our The Object Modeling Technique (OMT)
discussion to the best alternatives to our methodology. The Smalltalk- programming language [3] demon-
strates many object oriented concepts. An object-ori-
Simples Tables (SQL Language) ented program encapsulates data with procedures that
The first question that arises is why should a data mod- act upon the data. Each package of data and operations
eling technique be used in the first place? Why not just is called an object. Objects cleanly separate external
directly express the database structure in a DBMS lan- specification from internal implementation. An object
guage like SQL? Software engineering addresses this affects other objects only through its external protocol.
question. Objects are grouped to facilitate reuse of similar code.
SQL has undergone extensive human factors studies Object technology is most appropriate for complex,
and is one of the better DBMS languages in the com- deeply structured problems.
mercial marketplace, just as LISP, Ada, and C are some Object-oriented data models share many of the char-
of the better programming languages. Just as one would acteristics and benefits of object-oriented programs. A
quickly dismiss the idea of immediately writing ‘C’ database stores the passive component of objects-that
code, however, one must dismiss the temptation to be- is, their private data or internal state. Applications
gin with SQL code. The up-front planning, analysis, and combine this data with an active component-the pro-
design are an integral part of an effective data model. cedures or operations.
The Object Modeling Technique (OMT) improves
upon the ER and LRDM approaches. One advantage of
Chen’s Entity-Relationship Model (ER)
object-oriented data models is the straightforward inte-
The entity-relationship (ER) approach [l] is the most gration with object-oriented programs. In general, it is
widely accepted technique for logical data modeling.
difficult to meld database interaction with procedural
The ER model supports entities and relationships. An
code. The use of a common object metaphor and the
entity is something that exists and is distinguishable. A
same design notation for data models and programs
group of similar entities form an entity set. A relation-
helps this situation.
ship is a logical binding between entities. Entities and
relationships are described by attributes.
APPLICATION OF THE OMT TO RELATIONAL
ER diagrams are more expressive than mere tables.
DATABASE DESIGN
Relational tables are attractive vehicles for implement-
Three Levels of Representation
ing a data model because they are simple, theoretically
Figure 1 summarizes our database design methodology.
sound, understood, and supported by commercial
This methodology uses three levels of representation.
DBMS. Nevertheless, the simplicity of relational tables
We will refer to them as the high, middle, and low lev-
interferes with designing a data model. Higher levels of
els. All three levels describe the same problem at dif-
abstraction, such as ER diagrams, are conducive to cre-
ferent levels of abstraction. The initial, logical data
ative thinking and effective communication.
Despite its usefulness, the ER method fails to fully
capture the data modeler’s intent, especially for large,
High level (abstraction)
complex applications. ER lacks a substructure for enti- Logical data model

+
ties and relationships. An even more powerful model-
ing tool is necessary. This claim is evident from re-
search that supports extension or replacement of the Mapping of object structures to tables

I
ER method [lo]. So far, however, none of these tech- Candidate keys
No-null attributes
niques has matched the popularity of the ER method.
Domains
Apparently, the other techniques do not satisfy the Frequently accessed attributes
need for power beyond ER.

Teorey’s Logical Relational Design Methodology Middle level (ideal relational)


DBMS-independent tables

+
(LRDM)
Scores of papers have been written on variations of the
ER method. We have selected Teorey’s approach as rep- Placement of tables within files

I
resentative of state-of-the-art. Teorey and coworkers Length of names
extend the ER approach with their Logical Relational Domain definitions
Primary keys
Design Methodology (LRDM) [9]. LRDM, similar to ER,
Secondary indexes
is a graphical data modeling technique that supports
four basic concepts: entities, generalization, aggrega-
tion, and association. These terms will be defined later. Low level (reality)
For now, it suffices to say that the additional concepts Actual DBMS data definition commands
improve the expressive power of LRDM. ER is a vast
improvement over simple tables. Similarly, LRDM is
more powerful than ER. FIGURE 1. Three levels of Representation

April 1988 Volume 31 Number 4 Communications of the ACM 415


Computing Practices

model is successively converted into ideal relational of data types, and choice of performance tuning mecha-
tables and then into DBMS data definition commands. nisms. It also deals with the arbitrary restrictions such
Multiple levels are a useful construct for encouraging as size limitations.
logical database design while still addressing imple- The mapping between levels is mechanical except for
mentation realities. that presented in the two boxes in Figure 1. These
The high level focuses on the fundamental data struc- boxes contain decisions that the data modeler must
ture. The high level is a subset of a graphical notation make during the mapping process. We perfor:med most
th.at was developed by Loomis, Shah, and Rumbaugh of the conversion manually, but an automatic conver-
[5] for object-oriented programming. They call their no- sion is possible.
tation the Object Modeling Technique (OMT). The au-
thors of this paper do not take credit for developing the High-Level Representation
OMT. We have taken the OMT and extended it into the Objects
realm of databases. An object-oriented database design An object is a thing that exists and has identity. Exam-
presents a simple and concise logical abstraction of data ples of objects are items such as the chair in the corner,
that is straightforward to implement with a commercial room 101, and George Washington. A group of similar
DBMS. We found that non-DBMS application experts objects form an object class. Chair, room, and people are
were able to read OMT diagrams after a few hours of examples of object classes. An object is an instance of an
explanation. object class described by attributes or fields. The notion
The middle level contains generic, DBMS-independent of an object is synonymous with entity in the ER and
tables. The motivation for the middle level is to decou- LRDM methods.
ple the general problem of mapping objects to tables The boxes in Figure 2 denote object classes. The
from the idiosyncrasies of each DBMS. The middle equipment class has equipment name, cost, and weight
level is wordier and less effective at conveying the fields. Pump has suction pressure, discharge pressure,
overall structure of the model than the high level but and flow rate fields.
documents more details. The middle level addresses
issues such as the mapping of object structures to ta- Relationships
bles, domains, and keys. A relationship is a logical binding between objects.
The low-level is the data definition language of the There are three types of relationships: generalization,
target DBMS. This level contains the actual DBMS com- aggregation, and association. We indicate a relationship
mands that create tables, attributes, and indexes. The with a line or lines between objects.
low-level considers DBMS-specific details such as Special symbols at the ends of a relationship line
placement of tables within database files, a limited set indicate how many objects of one class relate to each

Pump Heat exchanger

Suction pressure Surface area


Discharge pressure Tube pressure
Flow rate Shell pressure
Tube diameter

A Pump type

Plunger . . .
Diaphragm
pump pump

Impeller diameter Diaphragm material Plunger length


Number of blades Plunger diameter
Axis of rotation Number of cylinders

FIGURE2. Generalization Relationship

416 Communications of the ACM April 1988 Volume 31 Number 4


ComputingPractices

Car
Name
Year
Color

CAR-DOOR CAR-ROOF
Number of
windows

FIGURE3. Aggregation Relationships

WORKS-FOR
Company 3

Name Name MANAGES


Job
Address Social security number o
title
Address

FIGURE4. Association Relationships

object of another class. We call this the multiplicity of butes are inherited from the top level down. Each cen-
the relationship. For instance, a small solid circle trifugal pump has an equipment name, cost, weight,
means many. Many, in this context, is zero or more. A suction pressure, discharge pressure, flow rate, impeller
small hollow circle means zero or one. A straight line diameter, number of blades, and axis of rotation.
ending without a symbol denotes exactly one. Note that the OMT supports multiple inheritance.
The ER method uses the term relationship in a differ- Each object may participate in more than one generali-
ent and much narrower sense than LRDM and OMT. zation hierarchy.
The ER relationship is equivalent to the association re-
Aggregation Relationship
lationship of the LRDM and OMT. The ER method has
no construct that corresponds to generalization and Aggregation is an assembly-component or a-part-of rela-
aggregation. tionship. One well known example of this relationship
is the “bill-of-materials” or “parts explosion” problem.
Aggregation combines low level objects into composite
Generalization Relationship
objects. Aggregation may be multilevel and recursive.
A generalization or is-a relationship partitions a class For example, a data structure may recursively refer to
into mutually exclusive subclasses. Generalization may itself.
have an arbitrary number of levels. The heavy triangles As shown in Figure 3, a roof is a part of a car; many
in Figure 2 symbolize generalization. A piece of equip- doors are part of a car. The same type of door and roof
ment can be a pump, heat exchanger, tank, or some- can be used for a variety of cars. In this case, car is an
thing else. Pumps subdivide into centrifugal, dia- assembly and door and roof are components. Note that
phragm, plunger, and other. For the top generalization, the arrows point towards the composite object. Aggre-
equipment is the superclass; pump, heat exchanger, gations often exhibit existence dependency.
and tank are subclasses. The superclass stores general
data like name, cost, and weight. The subclasses store Association Relationship
data particular to each type of equipment. Similarly, for An association relates two or more independent objects.
the lower generalization, pump is the superclass while Associations do not exhibit existence dependency. Fig-
centrifugal pump, diaphragm pump, and plunger pump ure 4 shows that many employees work for a company
are subclasses. and an employee manages other employees. We arbi-
Each box in Figure 2 corresponds to an object class. trarily restricted an employee to working for one com-
Each box does not correspond to an object. The same pany. In some contexts, multiple companies may be
object is being represented at each level of the generali- more appropriate. The precise choice of objects, rela-
zation. Existence dependency holds; a pump cannot be tionships, and multiplicity of relationships depends on
entered into the centrifugal pump table unless entries the problem domain. Associations may have one or
are also made in the pump and equipment tables. Attri- more properties. These are circled in the diagram.

April 1988 Volume 31 Number 4 Communications of the ACM 417


Computing Practices

Qualification of Relationships second normal form. Earlier, we defined objects as


Qualification adds information about the many end of a things that exist and are distinguishable. Objects have a
relationship. Figure 6 presents an aggregation, with and unique key when provided with the distinguishing in-
without qualification. A plant has many pieces of formation Relationships are between objects. Thus re-
equip:ment that are distinguished by equipment name. lationships also have a unique key-the combined keys
Equipment name is a qualification field. Either form, of participating objects.
qualified or unqualified, supports storage and retrieval Our claim for third normal form is a bit weaker. Most
of equipment data. The notion of qualification refines violations of third normal form seem to occur when
the notation. extraneous information is introduced into a table or a
Qualification has two benefits: improved semantic table lacks focus. Relational tables allow unrealistic
accuracy and more visible navigation paths. Both forms constructs and are at too low a level for design. The
state that a plant has many pieces of equipment, how- object paradigm is at a higher level of abstraction and
ever, t.he qualified form adds a unique name to each tends to block unreasonable designs. Objects are less
piece of equipment in a given plant. To find a piece of flexible and less dangerous than relational tables.
equipment, we first choose a plant and then specify an Building a data model from a small number of coherent
equipment name. Qualification is a major advange of entities is superior to the traditional approach of col-
the OMT approach. Qualification frequently occurs and lecting all the attributes, ferreting out the functional
is worthy of special semantic support. dependencies, and synthesizing tables.

I Plant I Plant

Not qualified
q Plant name
Supervised by

Equipment name

Qualified

FIGURE 5. Aggregation Relationships

Middle Level Representation Objects


The middle level maps high-level object structures into Each object class maps directly to one table. .A11object
generic tables. The middle level decouples the general fields become attributes of tables. Note that Figure 6
problem of mapping objects to tables from the idiosyn- introduces an additional attribute: “Plant ID”. Our data
crasies of a DBMS. This improves documentation and modeling methodology provides strong suppa’rt for the
easesporting to a new DBMS. notion of object identity [a]. Each object has a unique
In our applications we have observed that the result- ID; all references to objects are made via the ID. Object
ing tables tend to be in third normal form. Third nor- identity is implicit in object diagrams and must be
mal form is an intrinsic benefit of object modeling. Nor- made explicit in tables.
mal fo.rms improve data integrity. A table is in first There are many reasons for adopting a strong sense of
normal form when each attribute value is atomic and object identity. One advantage is that object :IDs are
does not contain a repeating group. A table is in second immutable and completely independent of changes in
normal form when it satisfies first normal form and data value and physical location. The stability of object
each row has a unique key. A table is in third normal IDS is particularly important for relationships since they
form when it satisfies second normal form and each refer to objects. Contrast this with referring to objects
non-key attribute directly depends on the primary key. by name. Changing a name requires update of many
One meets first normal form by decomposing com- relationships. Object identity provides a uniform mech-
plex objects. The extent of this decomposition depends anism for referencing all objects.
on the meaning of atomic and the application. For ex- The middle level controls the use of null values. Null
ample, it may be perfectly reasonable to consider an means that an attribute value is unknown or not appli-
array an atomic object when it stores the composition cable for a given row. “N” forbids nulls; “Y” permits
of a fluid. In a different context, however, an array may them. Attributes in candidate keys must not be null.
require decomposition. This column gives the data modeler the optia’n of re-
It is ‘easyto see why object-derived tables satisfy quiring values for additional fields.

410 Communications of the ACM April 1988 Volume 31 Number 4


Computing Practices

Middle level

High level
I Attribute name 1 Nulls? 1 Domain

Candidatekeys: (Plant ID) (Plant name)


Frequentlyaccessed: (Plant ID) (Plant name)

FIGURE6. Middle Level for an Object

Each attribute has a domain or set of legal attribute Generalization Relationship


values. It would be undesirable to give “plant name” a A generalization relationship has one superclass table
domain of long name in one table and short name in and multiple subclass tables. Figure 7 illustrates the
another. Consistency is important. Domains ensure general mechanism. For each piece of equipment, there
consistent decisions on attribute length and prevent is one superclass row and one subclass row with a com-
operations on incompatible entities. It does not make mon Equipment ID. Recall that literally the same object
sense to add a cost to a weight. The concept of do- is being represented at each level of the generalization.
main is similar to strong typing in a programming Equipment type is the superclass discriminator field
language. that partitions the subclasses. Each value of equipment
Figure 6 lists candidate keys. A candidate key is a set type corresponds to one subclass table.
of attributes that uniquely identify each row. Each at-
tribute may belong to zero, one, or more candidate Aggregation Relationship
keys. Figure 6 also lists groups of attributes that are Many-to-many relationships by necessity map to dis-
likely to experience frequent access. These groups tinct tables. This is a consequence of normal form. One-
would be prime targets for indexing or hashing. We to-one and one-to-many relationships may be mapped
would expect IDS and names to be common references. to distinct tables or merged with a participating object.
The order of the attributes within a group may or may Our handling of one-to-one and one-to-many aggrega-
not be relevant to the low level implementation. tions depends on the context. We merge existence de-

I
High level
! Attribute name Nulls? Domain
I
Equipmeni Equipment ID N ID
Equipment i Equipment name N Long name
Equipment type N Equip type
Equipment name cost Y Dollars
cost I Weight Y Weight
Weight

h I Candidatekeys: (Equipment ID) (Equipment name)


Equipment I Frequentlyaccessed: (Equipment ID) (Equipment name)
type i

I
I
Candidate keys: (Equipment
Frequently accessed:
ID)
(Equipment ID)
I

FIGURE7. Middle Level for a Generalization Relationship

April 1988 Volume 31 Number 4 Communicationsof the ACM 419


Computing Practices

pendent aggregations with an object table to simplify Low-Level Representation


integrity enforcement. Within the context of the appli- The low-level is the data definition language of the
cation for Figure 8, every piece of equipment must be target DBMS. This level contains the actual DBMS com-
assigned to a plant. Freestanding aggregations are stored mands that create tables, attributes, and indexes. This
in distinct tables. In Figure 9, each type of roof is used level exploits DBMS features and compensates for
for several car models. A roof is a part that exists inde- shortcomings and quirks. The specific details of the low
pendent of any particular car. level depend upon the choice of target DBMS.
MIMER was the DBMS for the two applications that
Associrztion Relationship
we will discuss later and will be the basis for discussion
As a rule, we map associations to distinct tables, as in in this section. MIMER is an SQL-like, relational
Figure 9. Properties of an association become attributes
DBMS [6].
of the association table. We do not collapse associations
with a corresponding object, as in Figure 8, except for Prima y Keys
performance bottlenecks. There are many reasons for MIMER requires that each table have a primary key
externalizing associations. They are as follows: composed of one or more attributes. This desirable fea-
1. Associations are between independent objects of ture improves the integrity of MIMER databases. The
equal syntactic weight. In general, it seems inappro- primary key must be unique. None of the participating
pria.te to contaminate objects with knowledge of fields may be null. MIMER sorts each table on its pri-
other objects; mary key. The primary key is the fastest access path to
2. collapsing associations with descriptive properties a MIMER row.
into objects may violate third normal form; The middle level identifies candidate keys. One can-
3. it is difficult to get multiplicity right on the first few didate key must be chosen as the primary ke:y. In gen-
design passes.Choice of multiplicity is sometimes a eral, our philosophy would be to use the object ID as
rath.er arbitrary decision and may change as the sub- the primary key for object tables. The primary key for
set of the world being modeled evolves. One-to-one relationship tables would be one or more IDS from par-
and one-to-many associations may be externalized. ticipating objects. Unfortunately, MIMER interferes
Many-to-many associations must be externalized; with this approach.
and We deliberately chose to make IDS the primary key
4. a symmetrical representation simplifies search and even though they have no inherent meaning to the
update. user. Most scientific applications are structurally com-

Middle level, merged tables

High level I Attribute name I Nulls? I Domain I

Plant

I Plant I

I Plant name
Supervised by

1 Equipment na&T
1 !
/
Candidatekeys: (Plant
Frequently accessed:
ID)
(Plant
(Plant name)
ID) (Plant name)

Attribute name Nulls?

Equipment Equipment ID N ID
Equipment Plant ID N ID
Equipment name N Long name
cost cost Y Dollars
Weight Weight Y Weight
I

i Candidatekeys: (EquipID) (Plant ID, Equipname)


Frequentlyaccessed: (EquipID) (Plant ID, Equipname)
I

FIGURE8. Middle Level for an Existence-Dependent Aggregation

420 Communications of the ACM April 1988 Volume 31 Number 4


Computing Practices

Middle level, distinct tables

Car
i

High level I

Candidatekeys: (Car ID) (Name, year)


Frequently accessed: (Car ID) (Name) (Year)

Attribute name 1 Nulls? 1 Domain 1

CAR-ROOF

Candidate keys: ( Roof ID )


Frequently accessed: (Roof ID )

~1

Car-Roof

Candidate keys: ( Car ID )


I
Frequently accessed: (Car ID) (Roof ID)

FIGURE 9. Middle Level for a Free-Standing Aggregation

plex and difficult for the unassisted user to navigate. indexes cannot enforce the uniqueness of multiattri-
Furthermore, commercial DBMS lack proper support bute candidate keys. Only primary keys may be mul-
for integrity (specifically referential integrity [2]). Thus, tiattribute. To compensate for this anomaly, we were
complex applications must mediate user access with forced to compromise some choices of primary key.
custom programs. If we are going to restrict database An example may clarify this point. In Figure 8 for the
access through a program, we might as well do our equipment table, we wanted to make “Equipment ID”
access through IDS. IDS never change and they have a the primary key. “Plant ID” + “Equipment name” be-
small fixed size (that can be implemented as an integer) comes a unique secondary index. This satisfies our de-
that speeds selects and joins. sire for object identity and meets candidate key and
performance specifications. Since MIMER does not sup-
Secondary Indexes port a unique secondary index, we were forced to com-
MIMER is deficient in its support for secondary in- promise. Our ultimate decision was to make “Plant ID”
dexes. Secondary indexes in most relational DBMS + “Equipment name” the primary key and “Equipment
serve a dual role. They improve the performance of ID” a unique secondary index.
some queries by quickly finding the rows with a certain This example illustrates some of the value of our
attribute value. Secondary indexes can also enforce the multilevel modeling approach. The middle level en-
uniqueness of candidate keys. ables us to clearly indicate our intent. The low-level
The problem is that MIMER restricts secondary in- generates executable code. A future software port to
dexes to a single attribute. This provides adequate per- another DBMS with different features and problems
formance but it damages integrity. MIMER secondary will be more likely to honor our original intent.

April 1988 Volume 31 Number 4 Communications of the ACM 421


Computing Practices

Other Details principles of good software engineering. We ;are left


h4IMEIR regards a database as a collection of files. Each with the ER, LRDM, and OMT for further discussion.
table is wholly contained within one file. The low-level
assigns each table to a file. MIMER restricts names to a Shortcomings of Chen’s ER
maximum of eight characters. One must specify a The entity-relationship (ER) method has certainly been
h4IME:R name for each table and attribute. MIMER pro- a useful and successful technique for database design.
vides some support for domains. Data modelers can as- The ER method, however, leaves much room for im-
sign data type, data length, default value, range check, provement, especially in certain problem domains.
and edit mask to domains. ER lacks a substructure for entities. It has no coun-
terpart to generalization hierarchies. Generalization al-
ANOTHER LOOK AT ALTERNATE lows one to refine the structure of entities and add
METHODOLOGIES detail as needed. One can choose the proper level of
Now, let us revisit our comparison of data modeling abstraction for each context. The resulting design is
techn:iques. Recall that we reject the idea of directly robust and extensible. Generalization and its exten-
designing a database with a DBMS language like SQL. sion to programming is the fundemental idea in object-
A DBMS language is at too low a level and violates the oriented languages like Smalltalk, C++, and Objective C.

Site

Location name

Plant name

a Plant version

Creation time
Last update time
Comment

Section name

I
I
I

Heat exchanger
Equipment type
I

. . .

Suction press Surface area


Discharge press
Flow rate
R Tube pressure
Shell pressure
Tube diameter
Height

FIGURE 10. Application 1, Aggregation Hierarchy

422 Communications of the ACM April 1988 Volume 3:f Number 4


Computing Practices

Generalization reduces the semantic gap between sign programs, simulation programs, and cost programs.
the data modeler and the database design language. Most of these programs already exist. Current practice
Similarly, it reduces the semantic gap between the data at best relies on converting and passing files. This is
model and applications. The addition of generalization awkward, since n x n interfaces are required for n
to ER is a substantial step forward, in the same way programs. Current practice often degenerates into man-
that ER was from database languages. ER also lacks a ual data reentry.
substructure for relationships. Whereas ER only offers The solution is to exchange data with a database
association, newer approaches support aggregation and rather than more data between each pair of programs.
association. Then for n programs, one requires 2n interfaces. Most of
For many database problems, the ER approach is suf- these programs are mature, carefully debugged code
ficient. For many database problems, it would be the and to tamper with them is undesirable. Thus, these
method of choice. Many design productivity products programs must use database services in batch mode.
are available in the commercial marketplace to assist A preprocessor extracts information from the database
the ER data modeler. The ER approach has had the and generates an input file. The application program
benefit of close scrutiny and much research. For large, runs. Then a postprocessor digests the output file(s)
complex problems, however, ER lacks power. Scientific and updates the database. The application remains un-
applications are pushing the frontier of database re- changed and runs as before, unaware that it is receiv-
search, and this requires all the help that is available. ing database services.
The two applications in the next section required The first application is dominated by four aggregation
about 20 dense pages of OMT diagrams and six months hierarchies: equipment, piping, graphics, and mathe-
of database design work. It is not difficult to envision matical simulation. The bulk of the data model refines
several hundred pages of OMT diagrams taking several these hierarchies and forms associations between the
years for more complex projects. A more effective tool levels of the hierarchies.
directly affects the quality of the resulting design and Figure 10 shows the equipment aggregation hier-
the effort expended. archy. A site name uniquely identifies a site. For that
site, a plant name identifies a plant. The plant has mul-
Comparison of OMT with Teorey’s LRDM tiple versions. Selecting a plant version and a section
In their article, Teorey [9] and coworkers claim that name locates a particular section. A section combined
their LRDM approach improves upon the ER method. with an equipment name finds a piece of equipment.
We agree. LRDM has generalization and it has aggrega- A piece of equipment may be a pump, heat exchanger,
tion. Our OMT-based approach builds upon LRDM as tank, or some other object.
follows:
1. Qualification further refines the structure of rela- Description of the Second Application
tionships; The second application focuses on electrical engineer-
2. the OMT directly extends into the realm of program- ing. The goal was to develop an interactive graphical
ming. (The OMT supports methods.) The OMT pro- editor for electric power diagrams. Typical operations
vides a consistent notation for database models and include creating, deleting, moving, copying, cutting,
application programs; and pasting of buses, circuits, and devices. This pro-
3. the OMT graphical syntax appears to be cleaner gram must run fast despite frequent interaction with a
than that of LRDM; and database during its course of execution. The database
4. an intermediate level between high level database provides a neutral format for interfacing to other appli-
design and a DBMS language is provided. This is cations, crash recovery, and multiuser concurrency. As
more flexible than Teorey’s direct mapping between of this writing, we have designed the database. We are
graphical diagrams and a DBMS language. preparing to implement the procedural code. We will
Our work emanates from an industrial environment be using an in-house object-oriented language called
and has been refined by use on real problems. Our DSM [?‘I that is built on top of C.
experience with database design cannot match that of This application decomposes a diagram into a series
the ER approach, but it is still significant. About 12 of sheets. Each sheet corresponds to one piece of paper
people have influenced the evolution of the OMT. More upon output. Nearly all information can be assigned to
than one hundred have been trained in its use. a single sheet.
This application runs in real time. It cannot pause for
APPLICATION OF THE METHODOLOGY a database operation. We plan to boost performance by
These OMT applications were performed by two differ- shadowing the database in memory. The user selects
ent people. The intent is to convey some measure of the some sheets for study and the system reads them into
size and complexity of these applications. memory. All read requests are satisfied through RAM
data structures-a quick response. Update operations
Description of the First Application are accumulated in memory and posted to the database
The first application is a chemical engineering problem. upon an explicit save request. This save request spawns
The objective was to integrate the data from many free- an asynchronous process with a series of database
standing programs that include drawing programs, de- commands.

April 1988 Volume 31 Number 4 Communications of the ACM 423


Computing Practices

There are three major components of the second ap- on. So instead of storing material name, we store an ID
plication data model: a geometry aggregation hierarchy, or pointer into a table of material names. For 100 types
a simulation model, and user interface. The bulk of the of equipment there would be approximately 100 differ-
data model fleshes out the geometry and simulation ent references into a material list. This would skew the
subsections and relates the two. statistics. We felt that these references into a~material
Figure 11 is a fragment of the geometry aggregation list were not of the same stature as associations be-
model. Buses and circuits have two ends. A device has tween independent and freestanding objects.
an arbitrary number of pins. These possible points of We should also comment on the multiplicity num-
contact generalize into a pin. We improve integrity and bers. This issue arises for qualified aggregations and
performance by associating pin with a connection qualified associations. We counted the qualified aggre-
rather than with other pins. This model can quickly gation between site and plant as one-to-many rather
answer the following types of questions: than the one-to-one shown in Figure 10. The ER ap-
I.. What connects to the following electrical object?; proach, the traditional way of viewing data :models,
2. What connects to a particular pin on an electrical does not qualify relationships. We felt that multiplicity
object?; and statistics would have the most meaning if placed within
3. What electrical objects connect at a screen location?. the ER context. So, to summarize, we counted the mul-

Connection Pin

-7

FIGURE11. Application2, ElectricPowerConnectivity

Application Statistics TABLEI. ApplicationStatistics


Table Xquantifies the complexity and diversity of the Appliition
two applications. The statistics are purely a by-product
of our application work. There was no deliberate at- -1 2
tempt to warp the data models so the statistics would Total number of tables 71 108
support a particular point. ’
The numbers in Table I are approximate. There is Maximum attributes per table 100 50
Total number of object classes 48 82
some subjectivity or discretion in how these statistics Total number of relationships 66 45
are compiled. The generalization fan-out for the first Distribution of relationship type:
application is shown as 100. The equipment object in Qualified aggregation 21 11
Figure 10 generalizes into many types of equipment: Unqualified aggregation 18 1
pumps,, tanks, columns, reactors, and so forth. There Generalization 4 7
would be approximately 100 different types of equip- Binaty, unqualiied association 20 25
ment and hence about 100 subclass tables. We actually Qualified association 1 1
developed tables for two types of equipment. We Ternary association 2 0
counted 100 as the generalization fan-out and ttio to- Distribution of relationship multiplicity:
wards the number of objects and tables. One-to-one 5 14
One-to-many 44 30
Table I does not count tables whose sole purpose was Many-to-many ‘17 1
to remedy DBMS shortcomings and/or tighten integ-
rity. An example may clarify this point. Each type of Maximum number of aggregation levels 6 4
equipm.ent has one or more materials of construction. If Maximum levels below generalization symbol 1 1
the database stored material names, confusion could
Maximum generalization fan-out 100 50
arise from abbreviations, synonyms, mistyping, and so

424 Communications of the ACM April 1988 Volume 31 Number 4


Computing Practices

tiplicity for qualified relationships as if the qualification 4. expert knowledge. How do we merge implicit
was not there. A site has many plants. knowledge with an explicit database? Knowledge
Note the usefulness of qualified aggregation. The low based system are one answer, but they only handle
count for generalization may be misleading. The large small amounts of data.
maximum fan-out is a better indicator of the impor-
tance of generalization.
CONCLUSION
FUTURE DIRECTIONS Our OMT-based approach to database design has many
Automate OMT-Based Database Design advantages. It is:
Currently, the transformation between levels is a com- 1. intuitive, easy to use, and easy to understand. Non-
bination of ad hoc tools and much manual effort. In the DBMS application experts were able to read OMT
future we envision a fully automatic process. The data diagrams after a few hours of explanation;
modeler draws OMT diagrams on the screen. The 2. expressive. It provides a richer set of constructs for
drawing software captures objects and relationships modeling data than alternative approaches.
while actively supporting the semantics. Data flows for- 3. extensible. It accommodates changes in the scope of
ward to the middle and low-levels. the data model and ports to other DBMS.
In this scenario, the data modeler has an efficient, 4. a useful level of abstraction. It matches real world
integrated data modeling tool. Tight control of redun- problems well and maps naturally to a relational
dant data enhances model integrity. A meta-data model DBMS.
and a DBMS lie at the core of such a system. 5. good performance. It is easy to visualize patterns of
access to the data when using the OMT.
Further Enrich the Semantic Support 6. promotes database integrity. Object-derived tables
The OMT improves upon the ER and LRDM methods tend to be in third normal form.
and provides richer semantic support. We see many 7. improves integration. The object paradigm helps
opportunities for further improvements beyond that of bridge the semantic gap between databases and ap-
the OMT: plications.
1. versioning. Current databases are a snapshot of time. 8. has been tested. The OMT has been applied to real
We also need the history of the data. We have made problems. It has suffered critical review and several
some crude attempts at capturing versions in our iterations of refinement.
applications, but an elegant solution has been The OMT is evolving and maturing. This methodol-
elusive; ogy is an improved version of that used for our past
2. accountability. Who provided the data? Who ap- applications. Object modeling has improved the clarity
proved the data?; of our thought and ability to communicate during the
3. data quality. How much confidence do we have in design process. At the same time, our application work
the data? This area becomes particularly murky as has generated feedback to fine tune the methodology.
we combine data; and This cycle continues.

Appendix A. Textural format for object models class objecf, the ultimate ancestor of all classes since no
A graphical model permits the designer to see the over- superclass has been specified. Each field has a name
all structure of an application at a glance and to easily and a data type. The name must be unique within the
trace out the relationships among various objects. class. The type declarations establish the domain of a
Nevertheless, there are times when a linear textual for- data field. It is possible to declare a type as being objecf,
mat is useful, particularly when object classes have in which case it can hold any kind of object. A type can
many fields or methods. It is also necessary to have a specify a particular object class, in which case the
textual format if the object model is to be compiled for object must be an instance of the class pr one of its
use with an object-oriented language or automatically descendent classes. A type need not be an object; for
converted into a database schema. The following exam- example, integer, boolean,or text are pure values, not
ples are taken from the object-oriented language DSM objects. In this case, money and weight are user defined
(Appendix B). The authors and colleagues have imple- value types.
mented a graphical editor for the OMT which generates Generalization relationships specify a superclass as
DSM declaration text as an output. part of a class declaration:
A typical object class is declared with the format: define task subclass of equipment
define equipment class fields
fields diameter:length
equipment-name:text height:length
cost:money Class tank inherits all the fields of class equipmenf plus
weight:weight two new ones of its own.
A one-to-many aggregation is declared with the
This class is implicitly a subclass of the predefined format:

April 1988 Volume 31 Number 4 Communications of the ACM 425


Computing Practices

ailed associations are declared in a similar ma

EiKh C&P containa


of only one car. ‘.

d by descendax% of he &as unles


by another method with the same
di@n to the class name and list c
x can contam a list of methods ,
@list box. In IBM, each class declaration can : i : ;:~”
_’;n nij
d &tbti& StWion Of the following format:
the format:

fields

The format is similar to the aggr


tionship fields are values associate
.1 * .. #Z 4% r&l -El. .;. per
Field values are functionally dependetst.bn~~~.~~~:
nent classes of the association, The fiel&‘eectmn is-~:: : e?ch.method is implicitly an
optional; most associations do not have fiel~velnes, ” &dditional arguments are in
The multiplicity of a relationship can besRe&ied’by v&e type must be specified.
one of the following: given to each argument.
*-* many-to-many Meth$s ~~.~:im~rt~at .~,‘bbject-oriented program-
*-I many-to-one ming. Database Sclmnnis tY$@c&llyomit any specifica-
f-* one-to-many tion of the operations that are permitted on the data, in
l-l one-to-one part because a database is supposed to hold data but not
restrict its use, When model&g a database, we have
Each component of an association can have an op- nevertheless found it useful to consider and specify the
tional name to distinguish it from other components: operations that will be apblied to the data. Even if the
define manages association operations will be implemented externally to the data-
manager:employee I-* worker: base, their specification permits the object model to be
employee evaluated for logical completeness and correctness
early in the design cycle. We have found the object-
The component names managerand worker are called oriented paradigm useful for these specifications.
role names because they indicate the role that each
component plays in the relationship. They are particu- Appendix B. D&t Structwre Manager IDSM)
larly useful for associations between objects of the The Data Structure Manager (DSM) is a softw,are devel-
same class. When objects are of different classes, class opment system developed at GE-Corporate Research
names serve to differentiate the components. and Development and GE-Calma to support object-
A role name establishes a pseudo-field with respect oriented programming in the C language. DSM imple-
to a class. For example, class empZoyeehas a pseudo- ments all the concepts of a standard object-oriented
field mlmager that yields an employeeobject and a system such as Smalltalk withii the context of the
pseudo-field worker that yields a set of employeeobjects. C language while adding a number of extensions that
Unlike actual fields, the pseudo-fields generated by an greatly extend its power, such as support for relation-
association declaration are not independent; they are ship&
views on the relationship. IBM comprises a set of preprocessors and a run-time
A qualified aggregation is declared with the format: object library w&ten in C. The user defines amobject
class hierarchy with single or multiple inheritance and
define aggregation
relationships between classes. Full metaclass hierarchy
plant [equipment-name:text] l-l equip-
is supported including class fields and class methods.
ment
This is similar to Smalltalk. All system objects are ere-
In this case, equipment name qualifies a plant object ated dynamically at run time and are fully extensible.
to yield a unique piece of equipment; each pisce of The system manages memory but does not include a
equipment has a unique name within some plant, The garbage collection scheme. Using class descriptor and
role names for the plant and equipment have been omit- relation descriptor objects, generic operations such as
ted, as has the name for the relationship itself; a default pretty printing, copying, destroying, and saving and re-
relationship name is provided by the compiler. Quali- storing objects in a file are possible.

426 Communications of the ACM April 1988 Volume 31 Number 4


in ~S~~~~fbut Bert be bootstrapped on a system that
runs s&d&d C. It has been ported to several different
I_
oa~nbe called udng the tiur&me messagesearch mech- systetip &&ding Sun/Unix, VAX/VMS, and Apollo/
agism btied on the dbject class ar called directly as a C Aegis workstations. The DSM system includes an object
futiction call if the object class is known. The DSM prettjr-pririter and an interpreter for displaying objects
com$ler automates this decision in most cases. Object and data structures for debugging programs.
fields (instance variables in Smalltalk) can have C types The most significant novel feature of DSM as an ob-
as their values, so the overhead of objects is not neces- ject-oriented language is its support for relationships in
sary to hold pure values such as integers or structures both language syntax and run-time library [8). Users of
of values. The object class hierarchy is a part of the C the system have found relationships intuitive and. use-
type hierarchy. This approach allows easy interfacing ful and have requested extensions to the language that
of DSM programs with existing non-DSM code written have increased its power, such as the ability to treat
in C or other languages. relationships as pseudofields using role names.
DSM has been used for creating production software

Acknowledgments. We would like to thank Esin Oriented Programming Systems, Languages, and Applications. (Orlando,
Fla., Oct.). SIGPLAN Not. 22, 12 (1987).
Ulug, Ashwin Shah, Mary Loomis, and the referees for 9. Teorey, T.J., Yang, D.. and Fry. J.P. A logical design methodology for
their careful and constructive review of this document. relational databases using the extended entity-relationship model.
ACM Comput. Sure. IS,2 (June 1966).
10. Wiederhold. G. Modeling databases. Inform. Sci. 29, 2 (1983).
REFERENCES
1. Chen, P.P. The entity-relationship model: Toward a unified view of
CR Categories and Subject Descriptors: D.2.2 [Software Engineer-
data. ACM TODS I, 1 (Mar. 1976).
ing]: Tools and Techniques; D.2.10 [Sonware Engineering]: Design-
2. Date, CJ. Relational Database: Selected Writings. Addison-Wesley,
methodologies; H.2.1 [Database Management]: Logical Design: H.2.6 [Da-
Reading, Mass., 1966.
tabase Management]: Database Applications; J.2 [Physical Sciences and
3. Goldberg, A., and Robson. D. Smalltalk-80: The Language and Its Im-
Engineering]: J.6 [Computer-Aided Engineering]:-computer-aided design
plementation. Addison-Wesley, Reading, Mass., 1964.
General Terms: Design, Documentation
4. Khoshafian, S.N., and Copeland. G.P. Object identity. In Proceedings
Additional Key Words and Phrases: Entity-relationship method. nor-
of the ACM Conference on Object-Oriented Programming Systems, Lan-
mal forms. object, object-oriented, relational database
guages, and Applications. (Portland, Oregon). SIGPLAN Not. 21,ll
(1966).
5. Loomis, M.E.S.. Shah, A.V.S., and Rumbaugh, J.E. An object model- Authors’ present Address: Michael R. Blaha, William J. Premerlani,
ing technique for conceptual design. In Proceedings of the European and James E. Rumba@, GE, Corporate Research and Development,
Conference on Object-Oriented Programming. (Paris, France, June 15- Schenectady, NY.
17). Lecture Notes in Computer Science, 276. Springer-Verlag, New
York, 1967. Permission to copy without fee all or part of this material is granted
6. MIMER Information Systems AB. Uppsala, Sweden. provided that the copies are not made or distributed for direct commer-
7. Rumbaugh, J.E. Data Structure Manager Reference Manual. GE in- cial advantage, the ACM copyright notice and the title of the publication
ternal document. Schenectady, New York, 1967. and its date appear, and notice is given that copying is by permission of
8. Rumbaugh, J.E. Relations as semantic constructs in an object- the Association for Computing Machinery. To copy otherwise, or to
oriented language. In Proceedings of the ACM Conference on Object- republish, requires a fee and/or specific permission.

An excellent source to Published four times a year


JOURNAL information on computer theory (ISSN 0004-5411)
OF THE and research in..
ASSOCIATION Algorithm & complexity theory
l

Artificial intelligence Write for an order form and your


FOR
l

Combinatorics & graph theory


l
ACM Publications Catalog to:
COMPUTING l Computer organization & design
Systems modeling & analysis
.
MACHINERY l

Database theory & structures


l

l Distributed computing
$W.OO/year
Subscriptions l Formal languages
for ACM members; l Computational models
$i%OO/year for nonmembers. l Numerical analysis Catherine Yunqye,
(Members please include 9 Operating systems and research ACM,
member #) l Programming languages & 11 West 42nd Street.
related methodology New York, NY 10036
l Computational theory

April 1988 Volume 31 Number 4 Communications of the ACM 427

You might also like