89 views

Uploaded by curiousfan

save

- Bo Interview Question
- EOP Concept Definitions
- s s-7 relation versus function
- IBPS IT Officer Question Paper 2
- Additional Mathematics Project Work Sarawak 2013
- 100 TOP DATABASE MANAGEMENT SYSTEM Questions and Answers DATABASE MANAGEMENT SYSTEM Questions and Answers.pdf
- Calculus of Relations
- 0201537710
- 5. Database
- 12 Mathematics Impq Relations and Functions 02
- 100 Top Database Management System Questions and Answers Database Management System Questions and Answers
- Assignment 1
- CICT1513 Chapter 9
- All Strings
- 634711717779040000 Tc s Interview Questions
- PDS Project
- Import Parameterss
- TCC Filtering Steps
- Recommending Materialized Views and Indexes With the IBM DB2 Design Advisor
- 07A80207-DATABASEMANAGEMENTSYSTEMS
- 5 Parallelism and Partitioning in Data Warehouses
- IS371 Update ProjectDescription
- IBM Cognos PowerHouse 4GL Primer
- Sapiens: A Brief History of Humankind
- Yes Please
- The Unwinding: An Inner History of the New America
- The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
- Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
- Dispatches from Pluto: Lost and Found in the Mississippi Delta
- John Adams
- Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
- The Prize: The Epic Quest for Oil, Money & Power
- Grand Pursuit: The Story of Economic Genius
- This Changes Everything: Capitalism vs. The Climate
- A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
- The Emperor of All Maladies: A Biography of Cancer
- Team of Rivals: The Political Genius of Abraham Lincoln
- The New Confessions of an Economic Hit Man
- Rise of ISIS: A Threat We Can't Ignore
- The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
- Smart People Should Build Things: How to Restore Our Culture of Achievement, Build a Path for Entrepreneurs, and Create New Jobs in America
- The World Is Flat 3.0: A Brief History of the Twenty-first Century
- Bad Feminist: Essays
- How To Win Friends and Influence People
- Angela's Ashes: A Memoir
- Steve Jobs
- Leaving Berlin: A Novel
- The Silver Linings Playbook: A Novel
- The Sympathizer: A Novel (Pulitzer Prize for Fiction)
- Extremely Loud and Incredibly Close: A Novel
- The Light Between Oceans: A Novel
- The Incarnations: A Novel
- You Too Can Have a Body Like Mine: A Novel
- Life of Pi
- The Love Affairs of Nathaniel P.: A Novel
- We Are Not Ourselves: A Novel
- The First Bad Man: A Novel
- The Rosie Project: A Novel
- The Blazing World: A Novel
- Brooklyn: A Novel
- The Flamethrowers: A Novel
- A Man Called Ove: A Novel
- The Master
- Bel Canto
- Interpreter of Maladies
- The Kitchen House: A Novel
- Beautiful Ruins: A Novel
- Wolf Hall: A Novel
- The Art of Racing in the Rain: A Novel
- The Wallcreeper
- My Sister's Keeper: A Novel
- A Prayer for Owen Meany: A Novel
- The Cider House Rules
- The Bonfire of the Vanities: A Novel
- Lovers at the Chameleon Club, Paris 1932: A Novel
- The Perks of Being a Wallflower

You are on page 1of 11

BAXENDALE, Editor

A Relational Model of Data

Large Shared Data Banks

E. F. CODD

I BM Research Laboratory, San Jose, California

for

Future users of large data banks must be protected from

having to know how the data is organized in the machine (the

internal representation). A prompting service which supplies

such information is not a satisfactory solution. Activities of users

at terminals and most application programs should remain

unaffected when the internal representation of data is changed

and even when some aspects of the external representation

are changed. Changes in data representation will often be

needed as a result of changes in query, update, and report

traffic and natural growth in the types of stored information.

Existing noninferential, formatted data systems provide users

with tree-structured files or slightly more general network

models of the data. In Section 1, inadequacies of these models

are discussed. A model based on n-ary relations, a normal

form for data base relations, and the concept of a universal

data sublanguage are introduced. In Section 2, certain opera-

tions on relations (other than logical inference) are discussed

and applied to the problems of redundancy and consistency

in the user's model.

KEY WORDS AND PHRASES: data bank, data base, data structure, data

organization, hierarchies of data, networks of data, relations, derivability,

redundancy, consistency, composition, join, retrieval language, predicate

calculus, security, data integrity

CR CATEGORIES: 3.70, 3.73, 3.75, 4.20, 4.22, 4.29

1. Rel at i onal Model and Nor mal For m

1.1. INTRODUCTION

This paper is concerned with the application of ele-

mentary relation theory to systems which provide shared

access to large banks of formatted data. Except for a paper

by Childs [1], the principal application of relations to data

systems has been to deductive question-answering systems.

Levein and Maron [2] provide numerous references to work

in this area.

In contrast, the problems treated here are those of data

independence--the independence of application programs

and terminal activities from growth in data types and

changes in data representation--and certain kinds of data

inconsistency which are expected to become troublesome

even in nondeductive systems.

Vol ume 13 / Number 6 / J une, 1970

The relational view (or model) of data described in

Section 1 appears to be superior in several respects to the

graph or network model [3, 4] presently in vogue for non-

inferential systems. It provides a means of describing data

with its natural structure onl y--t hat is, without superim-

posing any additional structure for machine representation

purposes. Accordingly, it provides a basis for a high level

data language which will yield maximal independence be-

tween programs on the one hand and machine representa-

tion and organization of data on the other.

A further advantage of the relational view is t hat it

forms a sound basis for treating derivability, redundancy,

and consistency of relations--these are discussed in Section

2. The network model, on the other hand, has spawned a

number of confusions, not the least of which is mistaking

the derivation of connections for the derivation of rela-

tions (see remarks in Section 2 on the "connection trap").

Finally, the relational view permits a clearer evaluation

of the scope and logical limitations of present formatted

data systems, and also the relative merits (from a logical

standpoint) of competing representations of data within a

single system. Examples of this clearer perspective are

cited in various parts of this paper. Implementations of

systems to support the relational model are not discussed.

1.2. DATA DEPENDENCIES IN PRESENT SYSTEMS

The provision of data description tables in recently de-

veloped information systems represents a major advance

toward the goal of data independence [5, 6, 7]. Such tables

facilitate changing certain characteristics of the data repre-

sentation stored in a data bank. However, the variety of

data representation characteristics which can be changed

without logically impairing some application programs is

still quite limited. Further, the model of data with which

users interact is still cluttered with representational prop-

erties, particularly in regard to the representation of col-

lections of data (as opposed to individual items). Three of

the principal kinds of data dependencies which still need

to be removed are: ordering dependence, indexing depend-

ence, and access path dependence. In some systems these

dependencies are not clearly separable from one another.

1.2.1. Ordering Dependence. Elements of data in a

data bank may be stored in a variety of ways, some involv-

ing no concern for ordering, some permitting each element

to participate in one ordering only, others permitting each

element to participate in several orderings. Let us consider

those existing systems which either require or permit data

elements to be stored in at least one total ordering which is

closely associated with the hardware-determined ordering

of addresses. For example, the records of a file concerning

parts might be stored in ascending order by part serial

number. Such systems normally permit application pro-

grams to assume t hat the order of presentation of records

from such a file is identical to (or is a subordering of) the

Communi c at i ons of t he ACM 377

stored ordering. Those application programs which take

advantage of the stored ordering of a file are likely to fail

to operate correctly if for some reason it becomes necessary

to replace t hat ordering by a different one. Similar remarks

hold for a stored ordering implemented by means of

pointers.

It is unnecessary to single out any system as an example,

because all the well-known information systems t hat are

marketed today fail to make a clear distinction between

order of presentation on the one hand and stored ordering

on the other. Significant implementation problems must be

solved to provide this kind of independence.

1.2.2. Indexing Dependence. In the context of for-

matted data, an index is usually thought of as a purely

performance-oriented component of the data representa-

tion. It tends to improve response to queries and updates

and, at the same time, slow down response to insertions

and deletions. From an informational standpoint, an index

is a redundant component of the data representation. If a

system uses indices at all and if it is to perform well in an

environment with changing patterns of activity on the data

bank, an ability to create and destroy indices from time to

time will probably be necessary. The question then arises:

Can application programs and terminal activities remain

invariant as indices come and go?

Present formatted data systems take widely different

approaches to indexing. TDMS [7] unconditionally pro-

vides indexing on all attributes. The presently released

version of IMS [5] provides the user with a choice for each

file: a choice between no indexing at all (the hierarchic se-

quential organization) or indexing on the primary key

only (the hierarchic indexed sequential organization). In

neither case is the user's application logic dependent on the

existence of the unconditionally provided indices. IDS

[8], however, permits the file designers to select attributes

to be indexed and to incorporate indices into the file struc-

ture by means of additional chains. Application programs

talcing advantage of the performance benefit of these in-

dexing chains must refer to those chains by name. Such pro-

grams do not operate correctly if these chains are later

removed.

1.2.3. Access Path Dependence. Many of the existing

formatted data systems provide users with tree-structured

files or slightly more general network models of the data.

Application programs developed to work with these sys-

tems tend to be logically impaired if the trees or networks

are changed in structure. A simple example follows.

Suppose the data bank contains information about parts

and projects. For each part, the part number, part name,

part description, quantity-on-hand, and quantity-on-order

are recorded. For each project, the project number, project

name, project description are recorded. Whenever a project

makes use of a certain part, the quantity of t hat part com-

mitted to the given project is also recorded. Suppose t hat

the system requires the user or file designer to declare or

define the data in terms of tree structures. Then, any one

of the hierarchical structures may be adopted for the infor-

mation mentioned above (see Structures 1-5).

Structure 1.

F ile Segment

F PART

PROJECT

Projects Subordinate to Part s

Pl dds

part #

part name

part description

quant i t y-on-hand

quantity-on-order

project #

project name

project description

quant i t y committed

Structure 2.

F ile Segment

F PROJECT

P A R T

Part s Subordinate to Projects

gi dds

project #

project name

project description

part #

part name

part description

quant i t y-on-hand

quant i t y-on-order

quant i t y committed

Structure 3. Part s and Projects as Peers

Commitment Relationship Subordinate to Projects

F ile Segmen~ F ields

F PART part #

part name

part description

quant i t y-on-hand

quant i t y-on-order

G PROJECT project #

project name

project description

PART part #

quant i t y committed

Structure 4. Part s and Projects as Peers

Commitment Relationship Subordinate to Part s

F ile Segment F i dds

P P A R T part #

part description

quant i t y-on-hand

quant i t y-on-order

PROJECT project #

quant i t y committed

G PROJECT project #

project name

project description

Structure 5. Part s, Projects, and

Commitment Relationship as Peers

F ~¢ Segment F ields

F PART part #

part name

part description

quant i t y-on-hand

quantity-on-order

G PROJECT project #

project name

project description

H COMMIT part #

project #

quant i t y committed

378 Communi c at i ons of t he ACM Vol ume 13 / Numb er 6 / J une, 1970

Now, consider t he probl em of pri nt i ng out t he par t

number, par t name, and quant i t y commi t t ed for every par t

used in t he proj ect whose proj ect name is "al pha. " The

following observat i ons may be made regardless of which

available tree-oriented i nformat i on syst em is selected t o

tackle this problem. I f a pr ogr am P is developed for this

probl em assuming one of t he five st ruct ures above- - t hat

is, P makes no t est to det ermi ne which st ruct ure is in ef-

f e c t - t he n P will fail on at least t hree of t he remai ni ng

structures. More specifically, if P succeeds with st ruct ure 5,

it will fail with all t he others; if P succeeds with st ruct ure 3

or 4, it will fail wi t h at least 1, 2, and 5; if P succeeds wi t h

1 or 2, it will fail with at least 3, 4, and 5. The reason is

simple in each case. I n t he absence of a t est to det ermi ne

which st ruct ure is in effect, P fails because an at t empt is

made t o execute a reference to a nonexistent file (available

syst ems t r eat this as an error) or no at t empt is made to

execute a reference to a file containing needed information.

The reader who is not convinced should develop sampl e

programs for this simple problem.

Since, in general, it is not practical to develop applica-

tion programs which t est for all tree st ruct uri ngs permi t t ed

by t he syst em, these programs fail when a change in

st ruct ure becomes necessary.

Syst ems which provi de users wi t h a net work model of

t he dat a run into similar difficulties. I n bot h t he tree and

net work cases, t he user (or his pr ogr am) is required to

exploit a collection of user access pat hs to t he dat a. I t does

not mat t er whet her these pat hs are in close correspondence

with pointer-defined pat hs in t he stored r epr esent at i on- - i n

I DS t he correspondence is ext remel y simple, in TDMS it is

j ust t he opposite. The consequence, regardless of t he stored

representation, is t hat t ermi nal activities and programs be-

come dependent on t he continued existence of t he user

access pat hs.

One solution to this is to adopt t he policy t ha t once a

user access pat h is defined it will not be made obsolete un-

til all application programs using t hat pat h have become

obsolete. Such a policy is not practical, because t he number

of access pat hs in t he t ot al model for t he communi t y of

users of a dat a bank would event ual l y become excessively

large.

1.3. A RELATIONAL VIEW OF DATA

The t er m relation is used here in its accepted mat he-

mat i cal sense. Gi ven sets $1, $2, • • • , S~ (not necessarily

di st i nct ), R is a relation on these n sets if it is a set of n-

tuples each of which has its first el ement from S~, its

second element from $2, and so on. 1 We shall refer t o $i as

t he j t h domain of R. As defined above, R is said to have

degree n. Relations of degree 1 are often called unary, de-

gree 2 binary, degree 3 ternary, and degree n n-ary.

For expository reasons, we shall frequent l y make use of

an array represent at i on of relations, but it must be re-

membered t hat this part i cul ar represent at i on is not an es-

sential par t of t he relational view being expounded. An ar-

i More concisely, R is a subset of the Cartesian product $1 X

S~X "' " X S..

r ay which represent s an n- ar y relation R has t he following

properties:

(1) Each row represents an n-t upl e of R.

(2) The ordering of rows is i mmat eri al .

(3) All rows are distinct.

(4) The ordering of columns is si gni fi cant --i t corre-

sponds to t he ordering S~, $2, - . - , S~ of t he do-

mai ns on which R is defined (see, however, remarks

below on domai n-ordered and domai n-unordered

relations ).

(5) The significance of each column is part i al l y con-

veyed by labeling i t wi t h t he name of t he corre-

sponding domain.

The exampl e in Fi gure 1 illustrates a relation of degree

4, called supply, which reflects t he shipments-in-progress

of part s from specified suppliers t o specified projects in

specified quantities.

supply (supplier part project quantity)

1 2 5 17

1 3 5 23

2 3 7 9

2 7 5 4

4 1 1 12

FIG. 1. A relation of degree 4

One mi ght ask: I f t he columns are labeled by t he name

of corresponding domains, why should t he ordering of col-

umns mat t er? As t he exampl e in Fi gure 2 shows, t wo col-

umns may have identical headings (indicating identical

domai ns) but possess distinct meanings with respect to t he

relation. The relation depicted is called component. I t is a

t er nar y relation, whose first t wo domai ns are called part

and t hi rd domai n is called quantity. The meani ng of com-

ponent (x, y, z) is t hat par t x is an i mmedi at e component

(or subassembl y) of par t y, and z units of par t x are needed

to assemble one uni t of par t y. I t is a relation which plays

a critical role in t he part s explosion probl em.

component (part part quantity)

1 5 9

2 5 7

3 5 2

2 6 12

3 6 3

4 7 1

6 7 1

FiG. 2. A relation with two identical domains

I t is a remarkabl e fact t hat several existing i nformat i on

syst ems (chiefly those based on t ree-st ruct ured files) fail

to provi de dat a represent at i ons for relations which have

t wo or more identical domains. The present version of

I MS/ 360 [5] is an exampl e of such a syst em.

The t ot al i t y of dat a in a dat a bank may be viewed as a

collection of t i me-varyi ng relations. These relations are of

assorted degrees. As t i me progresses, each n- ar y relation

may be subj ect t o insertion of additional n-tuples, deletion

of existing ones, and al t erat i on of component s of any of its

existing n-tuples.

Vol ume 13 / Numb er 6 / J une, 1970 Communi c at i ons of t he ACM 3 7 9

I n many commercial, governmental, and scientific dat a

banks, however, some of t he relations are of quite high de-

gree (a degree of 30 is not at all uncommon). Users should

not normal l y be burdened with remembering t he domai n

ordering of any relation (for example, t he ordering supplier,

t hen part, t hen project, t hen quantity in t he relation supply).

Accordingly, we propose t hat users deal, not with relations

which are domain-ordered, but with relationships which are

t hei r domain-unordered counterparts. 2 To accomplish this,

domains must be uni quel y identifiable at least within any

given relation, wi t hout using position. Thus, where t here

are two or more identical domains, we require in each case

t hat t he domain name be qualified by a distinctive role

name, which serves to i dent i fy t he role pl ayed by t hat

domain in t he given relation. For example, in t he relation

component of Figure 2, t he first domain part mi ght be

qualified by t he role name sub, and t he second by super, so

t hat users could deal with t he relationship component and

its domains--sub.part super.part, quantity--without regard

to any ordering between these domains.

To sum up, i t is proposed t hat most users should i nt eract

with a relational model of t he dat a consisting of a collection

of t i me-varyi ng relationships (rather t han relations). Each

user need not know more about any relationship t han its

name t oget her with t he names of its domains (role quali-

fied whenever necessary).3 Even this i nformat i on mi ght be

offered in menu style by t he syst em (subject to security

and pri vacy constraints) upon request by t he user.

There are usually many al t ernat i ve ways in which a re-

lational model may be established for a dat a bank. I n

order to discuss a preferred way (or normal form), we

must first introduce a few additional concepts (active

domain, pri mary key, foreign key, nonsimple domai n)

and establish some links with t ermi nol ogy current l y in use

in i nformat i on systems programming. I n t he remainder of

this paper, we shall not bot her t o distinguish bet ween re-

lations and relationships except where it appears advan-

tageous to be explicit.

Consider an example of a dat a bank which includes rela-

tions concerning parts, projects, and suppliers. One rela-

t i on called part is defined on t he following domains:

(1) part number

(2) part name

(3) part color

(4) part weight

(5) quant i t y on hand

(6) quant i t y on order

and possibly ot her domains as well. Each of these domains

is, in effect, a pool of values, some or all of which may be

represented in t he dat a bank at any instant. While it is

conceivable t hat , at some instant, all par t colors are pres-

ent, i t is unlikely t hat all possible part weights, par t

In mathematical terms, a relationship is an equivalence class of

those relations that are equivalent under permutation of domains

(see Section 2.1.1).

Naturally, as with any data put into and retrieved from a com-

puter system, the user will normally make far more effective use

of the data if he is aware of its meaning.

names, and part numbers are. We shall call t he set of

values represented at some i nst ant t he active domain at t hat

instant.

Normally, one domain (or combination of domai ns) of a

given relation has values which uniquely i dent i fy each ele-

ment (n-tuple) of t hat relation. Such a domai n (or com-

bi nat i on) is called a primary key. I n t he example above,

par t number would be a pri mary key, while part color

would not be. A pri mary key is nonredundant if it is either

a simple domai n (not a combi nat i on) or a combination

such t hat none of t he participating simple domains is

superfluous in uniquely identifying each element. A rela-

tion may possess more t han one nonredundant pri mary

key. This would be t he case in t he example if different part s

were always given distinct names. Whenever a relation

has two or more nonredundant pri mary keys, one of t hem

is arbi t rari l y selected and called the pri mary key of t hat re-

lation.

A common requi rement is for elements of a relation to

cross-reference ot her elements of t he same relation or ele-

ment s of a different relation. Keys provide a user-oriented

means (but not t he only means) of expressing such cross-

references. We shall call a domai n (or domai n combina-

t i on) of relation R a foreign key if i t is not t he pri mary key

of R but its elements are values of t he pri mary key of some

relation S (the possibility t hat S and R are identical is not

excluded). I n t he relation supply of Figure 1, t he combina-

tion of supplier, part, project is t he pri mary key, while each

of these t hree domains t aken separat el y is a foreign key.

I n previous work t here has been a strong t endency to

t r eat t he dat a in a dat a bank as consisting of two parts, one

part consisting of ent i t y descriptions (for example, descrip-

tions of suppliers) and t he ot her part consisting of rela-

tions between t he various entities or t ypes of entities (for

example, t he supply relation). This distinction is difficult

to mai nt ai n when one may have foreign keys in any rela-

tion whatsoever. I n t he user' s relational model t here ap-

pears to be no advant age to making such a distinction

(there may be some advant age, however, when one applies

relational concepts to machine representations of t he user' s

set of relationships).

So far, we have discussed examples of relations which are

defined on simple domai ns--domai ns whose elements are

atomic (nondecomposable) values. Nonat omi c values can

be discussed within t he relational framework. Thus, some

domains may have relations as elements. These relations

may, in t urn, be defined on nonsimple domains, and so on.

For example, one of t he domains on which t he relation em-

ployee is defined might be salary history. An element of t he

salary hi st ory domai n is a bi nary relation defined on t he do-

main date and t he domai n salary. The salary history domai n

is t he set of all such bi nary relations. At any i nst ant of t i me

t here are as many instances of t he salary history relation

in t he dat a bank as t here are employees. I n contrast, t here

is only one instance of t he employee relation.

The t erms at t ri but e and repeating group in present dat a

base terminology are roughly analogous t o simple domai n

3 8 0 Communi c at i ons of t he ACM Vol ume 13 / Numb er 6 / J une, 1970

and nonsimple domain, respectively. Much of the confusion

in present terminology is due to failure to distinguish be-

tween type and instance (as in "record") and between

components of a user model of the data on the one hand

and their machine representation counterparts on the

other hand (again, we cite "record" as an example).

1.4. NORMAL FORM

A relation whose domains are all simple can be repre-

sented in storage by a two-dimensional column-homo-

geneous array of the kind discussed above. Some more

complicated data structure is necessary for a relation with

one or more nonsimple domains. For this reason (and others

to be cited below) the possibility of eliminating nonsimple

domains appears worth investigating. 4 There is, in fact, a

very simple elimination procedure, which we shall call

normalization.

Consider, for example, the collection of relations ex-

hibited in Figure 3 (a). Job history and children are non-

simple domains of the relation employee. Salary history is a

nonsimple domain of the relation job history. The tree in

Figure 3 (a) shows just these interrelationships of the non-

simple domains.

jobhistory

I

salaryhistory

employee

children

employee (man#, name, birthdate, jobhistory, children)

jobhistory (jobdate, title, salaryhistory)

salaryhistory (salarydate, salary)

children (childname, birthyear)

FIG. 3(a). Unnormalized set

employee' (martS, name, birthdate)

jobhistory' (man#, iobdate, title)

salaryhistory' (man#, iobdate, salarydate, salary)

children' (man#, childname, birthyear)

Fro. 3(b). Normalized set

Normalization proceeds as follows. Starting with the re-

lation at the top of the tree, take its primary key and ex-

pand each of the immediately subordinate relations by

inserting this primary key domain or domain combination.

The primary key of each expanded relation consists of the

primary key before expansion augmented by the primary

key copied down from the parent relation. Now, strike out

from the parent relation all nonsimple domains, remove the

top node of the tree, and repeat the same sequence of

operations on each remaining subtree.

The result of normalizing the collection of relations in

Figure 3 (a) is the collection in Figure 3 (b). The primary

key of each relation is italicized to show how such keys

are expanded by the normalization.

' M. E. Sanko of IBM, San Jose, independently recognized the

desirability of eliminating nonsimple domains.

If normalization as described above is to be applicable,

the unnormalized collection of relations must satisfy the

following conditions:

(1) The graph of interrelationships of the nonsimple

domains is a collection of trees.

(2) No primary key has a component domain which is

nonsimple.

The writer knows of no application which would require

any relaxation of these conditions. Further operations of a

normalizing kind are possible. These are not discussed in

this paper.

The simplicity of the array representation which becomes

feasible when all relations are cast in normal form is not

only an advantage for storage purposes but also for com-

munication of bulk data between systems which use widely

different representations of the data. The communication

form would be a suitably compressed version of the array

representation and would have the following advantages:

(1) It would be devoid of pointers (address-valued or

displacement-valued).

(2) It would avoid all dependence on hash addressing

schemes.

(3) It would contain no indices or ordering lists.

If the user's relational model is set up in normal form,

names of items of data in the data bank can take a simpler

form than would otherwise be the case. A general name

would take a form such as

R (g).r.d

where R is a relational name; g is a generation identifier

(optional); r is a role name (optional); d is a domain name.

Since g is needed only when several generations of a given

relation exist, or are anticipated to exist, and r is needed

only when the relation R has two or more domains named

d, the simple form R.d will often be adequate.

1.5. SOME LINGUISTIC ASPECTS

The adoption of a relational model of data, as described

above, permits the development of a universal data sub-

language based on an applied predicate calculus. A first-

order predicate calculus suffices if the collection of relations

is in normal form. Such a language would provide a yard-

stick of linguistic power for all other proposed data lan-

guages, and would itself be a strong candidate for embed-

ding (with appropriate syntactic modification) in a variety

of host languages (programming, command- or problem-

oriented). While it is not the purpose of this paper to

describe such a language in detail, its salient features

would be as follows.

Let us denote the data sublanguage by R and the host

language by H. R permits the declaration of relations and

their domains. Each declaration of a relation identifies the

primary key for t hat relation. Declared relations are added

to the system catalog for use by any members of the user

community who have appropriate authorization. H per-

mits supporting declarations which indicate, perhaps less

permanently, how these relations are represented in stor-

Vol ume 13 / Number 6 / J une, 1970 Communi cat i ons of t he ACM 3 8 1

age. R permits t he specification for retrieval of any subset

of dat a from t he dat a bank. Action on such a retrieval re-

quest is subject t o security constraints.

The universality of t he dat a sublangnage lies in its

descriptive ability (not its computing ability). In a large

dat a bank each subset of t he dat a has a very large number

of possible (and sensible) descriptions, even when we as-

sume (as we do) t hat t here is only a finite set of function

subroutines t o which t he syst em has access for use in

qualifying dat a for retrieval. Thus, the class of qualification

expressions which can be used in a set specification must

have t he descriptive power of t he class of well-formed

formulas of an applied predicate calculus. I t is well known

t hat t o preserve this descriptive power it is unnecessary t o

express (in what ever synt ax is chosen) every formula of

t he selected predicate calculus. For example, just those in

prenex normal form are adequat e [9].

Arithmetic functions may be needed in t he qualification

or ot her part s of retrieval statements. Such functions can

be defined in H and invoked in R.

A set so specified may be fetched for query purposes

only, or it may be held for possible changes. Insertions t ake

t he form of adding new elements t o declared relations with-

out regard to any ordering t hat may be present in t hei r

machine representation. Deletions which are effective for

t he communi t y (as opposed t o the individual user or sub-

communities) t ake t he form of removing elements from de-

clared relations. Some deletions and updat es may be trig-

gered by others, if deletion and updat e dependencies be-

tween specified relations are declared in R.

One i mport ant effect t hat t he view adopt ed t oward dat a

has on t he language used to retrieve it is in t he naming of

dat a elements and sets. Some aspects of this have been dis-

cussed in t he previous section. Wi t h t he usual net work

view, users will often be burdened with coining and using

more relation names t han are absolutely necessary, since

names are associated with pat hs (or pat h t ypes) rat her

t han with relations.

Once a user is aware t hat a certain relation is stored, he

will expect t o be able to exploit 5 it using any combination

of its arguments as "knowns" and t he remaining argu-

ments as "unknowns, " because t he i nformat i on (like

Everest ) is there. This is a syst em feat ure (missing from

many current i nformat i on syst ems) which we shall call

(logically) symmetric exploitation of relations. Nat ural l y,

symmet r y in performance is not t o be expected.

To support symmet ri c exploitation of a single bi nary re-

lation, two directed pat hs are needed. For a relation of de-

gree n, t he number of pat hs t o be named and controlled is

n factorial.

Again, if a relational view is adopt ed in which every n-

ary relation (n > 2) has to be expressed by t he user as a

nested expression involving only bi nary relations (see

Fel dman' s LEAP Syst em [10], for example) t hen 2n -- 1

names have t o be coined instead of only n -b 1 with direct

n-ary not at i on as described in Section 1.2. For example, t he

5 Exploiting a relation includes query, update, and delete.

4-ary relation supply of Figure 1, which entails 5 names in

n-ary notation, would be represented in t he form

P (supplier, Q (part, R (project, quantity)))

in nested bi nary not at i on and, thus, employ 7 names.

A furt her disadvantage of this kind of expression is its

asymmet ry. Although this asymmet r y does not prohibit

symmet ri c exploitation, it certainly makes some bases of

interrogation ver y awkward for t he user to express (con-

sider, for example, a query for those part s and quantities

related to certain given projects via Q and R) .

1.6. EXPRESSIBLE, NAMED, AND STORED RELATIONS

Associated with a dat a bank are two collections of rela-

tions: t he named set and t he expressible set. The named set

is t he collection of all those relations t hat t he communi t y of

users can i dent i fy by means of a simple name (or identifier).

A relation R acquires membership in t he named set when a

suitably authorized user declares R; it loses membership

when a sui t abl y authorized user cancels t he declaration of

R.

The expressible set is t he t ot al collection of relations t hat

can be designated by expressions in t he dat a language. Such

expressions are const ruct ed from simple names of relations

in t he named set; names of generations, roles and domains;

logical connectives; t he quantifiers of t he predicate calcu-

lus; 6 and certain const ant relation symbols such as =, >.

The named set is a subset of t he expressible set --usual l y a

ver y small subset.

Since some relations in t he named set may be time-inde-

pendent combinations of others in t hat set, i t is useful to

consider associating wi t h t he named set a collection of

st at ement s t hat define these t i me-i ndependent constraints.

We shall postpone furt her discussion of this until we have

i nt roduced several operations on relations (see Section 2).

One of t he maj or problems confronting t he designer of a

dat a syst em which is to support a relational model for its

users is t hat of determining t he class of stored representa-

tions to be supported. Ideally, t he vari et y of permi t t ed

dat a representations should be just adequat e t o cover t he

spect rum of performance requirements of t he t ot al col-

lection of installations. Too great a vari et y leads t o un-

necessary overhead in storage and continual rei nt erpret a-

tion of descriptions for t he structures current l y in effect.

For any selected class of stored representations t he dat a

syst em must provi de a means of t ransl at i ng user requests

expressed in t he dat a language of t he relational model into

correspondi ng--and efficient--actions on t he current

stored representation. For a high level dat a language this

presents a challenging design problem. Nevertheless, it is a

probl em which must be sol ved--as more users obtain con-

current access t o a large dat a bank, responsibility for pro-

viding efficient response and t hr oughput shifts from t he

individual user t o t he dat a system.

s Because each relation in a practical data bank is a finite set at

every instant of time, the existential and universal quantifiers

can be expressed in terms of a function that counts the number of

elements in any finite set.

3 8 2 Communi c at i ons of t he ACM Vol ume 13 / Numb er 6 / J une, 1970

2. Redundancy and Consi stency

2.1. OPERATIONS ON RELATIONS

Since relations are sets, all of t he usual set operations are

applicable to them. Nevertheless, t he result may not be a

relation; for example, t he union of a bi nary relation and a

t ernary relation is not a relation.

The operations discussed below are specifically for rela-

tions. These operations are i nt roduced because of t hei r key

role in deriving relations from ot her relations. Thei r

principal application is in noninferential information sys-

t ems - s ys t ems which do not provide logical inference

servi ces--al t hough t hei r applicability is not necessarily

destroyed when such services are added.

Most users would not be directly concerned with these

operations. Informat i on systems designers and people con-

cerned with dat a bank control should, however, be thor-

oughly familiar with t hem.

2.1.1. Permutation. A binary relation has an array

representation with two columns. Int erchangi ng these col-

umns yields t he converse relation. More generally, if a

permut at i on is applied to t he columns of an n-ary relation,

t he resulting relation is said to be a permutation of t he

given relation. There are, for example, 4! -- 24 permut a-

tions of the relation supply in Figure 1, if we include t he

i dent i t y permut at i on which leaves t he ordering of columns

unchanged.

Since t he user' s relational model consists of a collection

of relationships (domain-unordered relations), permut a-

tion is not rel evant to such a model considered in isolation.

I t is, however, relevant to t he consideration of stored

representations of the model. In a system which provides

symmetric exploitation of relations, t he set of queries

answerable by a stored relation is identical to t he set

answerable by any permut at i on of t hat relation. Although

it is logically unnecessary to store bot h a relation and some

permut at i on of it, performance considerations could make

it advisable.

2.1.2. Projection. Suppose now we select certain col-

umns of a relation (striking out the others) and then re-

move from the resulting array any duplication in the rows.

T he final array represents a relation which is said to be a

projection of t he given relation.

A selection operat or 7r is used to obtain any desired

permut at i on, projection, or combination of t he two opera-

tions. Thus, if L is a list of lc indices 7 L = / 1, / 2, • • • , ik

and R is an n-ary relation (n _> k), t hen 7rL ( R) is the k-ary

relation whose j t h column is column ij of R (j = 1, 2, . . . , k)

except t hat duplication in resulting rows is removed. Con-

sider t he relation supply of Figure 1. A permut ed projection

of this relation is exhibited in Figure 4. Not e t hat , in this

part i cul ar case, the projection has fewer n-tuples t han t he

relation from which it is derived.

2.1.3. Join. Suppose we are given two bi nary rela-

tions, which have some domain in common. Under what

circumstances can we combine these relations to form a

7 When dealing with relationships, we use domain names (role-

qualified whenever necessary) instead of domain positions.

t ernary relation which preserves all of t he information in

t he given relations?

The example in Figure 5 shows two relations R, S, which

are joinable wi t hout loss of information, while Figure 6

shows a join of R with S. A bi nary relation R is joinable

with a bi nary relation S if t here exists a t ernary relation U

such t hat 7r12(U) = R and ~'23(U) -- S. Any such t er nar y

relation is called a join of R with S. I f R, S are binary rela-

tions such t hat v2 (R) = ~i (S), t hen R is joinable with S.

One join t hat always exists in such a case is t he natural

join of R with S defined by

R*S = {(a, b, c):R(a, b) A S(b,c)}

where R (a, b) has t he value true if (a, b) is a member of R

and similarly for S (b, c). I t is immediate t hat

•"12(R,S) = R

and

~23 ( R, S) = S.

Not e t hat t he join shown in Figure 6 is t he nat ural join

of R with S from Figure 5. Anot her join is shown in Figure

7.

FIG. 4.

II31 (supply) (project suppl i er)

5 1

5 2

1 4

7 2

A permuted projection of the relation in Figure 1

R (supplier part) S (part project)

1 1 1 1

2 1 1 2

2 2 2 1

FIG. 5. Two joinable relations

R*S

FIG. 6.

(supplier part project)

1 1 1

1 1 2

2 1 1

2 1 2

2 2 1

The natural join of R, with S (from Figure 5)

U (supplier part project)

1 1 2

2 1 1

2 2 1

FIG. 7. Another join of R with S (from Figure 5)

Inspection of these relations reveals an element (ele-

ment 1 ) of t he domain part (the domain on which t he join

is to be made) with t he propert y t hat it possesses more

t han one relative under R and also under S. I t is this ele-

Vol ume 13 / Number 6 / J une, 1970 Communi c at i ons of t he ACMM 383

ment which gives rise to t he pl ural i t y of joins. Such an ele-

ment in t he joining domai n is called a point of ambiguity

wi t h respect to t he joining of R wi t h S.

I f either ~r21 (R) or S is a function, 8 no poi nt of ambi gui t y

can occur in joining R with S. I n such a case, t he nat ural

join of R with S is t he only join of R wi t h S. Not e t hat t he

rei t erat ed qualification "of R wi t h S" is necessary, because

S mi ght be joinable wi t h R (as well as R wi t h S), and this

join would be an entirely separat e consideration. I n Fi gure

5, none of t he relations R, ml (R), S, ~r21 (S) is a function.

Ambi gui t y in t he joining of R wi t h S can somet i mes be

resolved by means of ot her relations. Suppose we are given,

or can derive from sources i ndependent of R and S, a rela-

tion T on t he domai ns project and supplier with t he follow-

ing properties:

(1) ~-~(T) = ~-2(S),

(2) w2(T) = ~( R) ,

(3) T( j , s) - - +3p( R( S, p) A S( p, j ) ) ,

(4) R(s, p) ---->3j(S(p, j) A T( j , s) ) ,

(5) S( p, j ) ---->3s(T(j, s) A R(s, p) ) ,

t hen we may form a t hree-way join of R, S, T; t hat is, a

t er nar y relation such t hat

~( v) = R, ~( U) = S, ~( V) = T.

Such a join will be called a cyclic 3-join t o distinguish it

from a linear 3-join which would be a quat er nar y relation

V such t hat

~r12(V) = R, ~r~3(V) = Z, 7r34(V) = T.

While it is possible for more t han one cyclic 3-join to exist

(see Figures 8, 9, for an exampl e), t he circumstances under

which this can occur entail much more severe constraints

FIG. 8.

R (8 p) s (p i) T ~ 8)

l a a d dl

2a a e d2

2b bd e 2

be e 2

Bi nar yr el at i onswi t hapl ur al i t yof cycl i c3~oi ns

FIG. 9.

U (8 p j) U' (8 p i)

l a d l a d

2a e 2a d

2bd 2a e

2be 2bd

2be

Two cyclic 3-joins of the relations in Figure 8

t han t hose for a pl ural i t y of 2-joins. To be specific, t he re-

lations R, S, T must possess points of ambi gui t y wi t h

respect t o joining R wi t h S (say poi nt x), S with T (say

8 A function is a binary relation, which is one-one or many-one,

but not one-many.

y), and T wi t h R (say z), and, furt hermore, y must be a

relative of x under S, z a relative of y under T, and x a

relative of z under R. Not e t hat in Fi gure 8 t he poi nt s

x = a; y = d; z = 2 have this propert y.

The nat ural linear 3-join of t hree bi nary relations R, S,

T is given by

R. S. T = { ( a, b, c , d) : R( a, b) A S( b, c) A T( c, d) }

where parent heses are not needed on t he l eft -hand side be-

cause t he nat ural 2-join (*) is associative. To obt ai n t he

cyclic count erpart , we i nt roduce t he operat or .y which pro-

duces a relation of degree n - 1 from a relation of degree n

by t yi ng its ends t oget her. Thus, if R is an n- ar y relation

(n >_ 2), t he tie of R is defined by t he equat i on

"I(R) = { (al, a2, . . . , a, _l ) : R( al , a2, . . . , a~-l, an)

A al = an}.

We may now represent t he nat ural cyclic 3-join of R, S, T

by t he expression

( R. S. T) .

Ext ensi on of t he notions of linear and cyclic 3-join and

their nat ural count erpart s to t he joining of n bi nary rela-

tions (where n > 3) is obvious. A few words may be ap-

propri at e, however, regarding t he joining of relations which

are not necessarily bi nary. Consider t he case of t wo rela-

tions R (degree r), S (degree s) which are to be joined on

p of t hei r domai ns (p < r, p < s). For simplicity, sup-

pose these p domai ns are t he last p of t he r domai ns of R,

and t he first p of t he s domai ns of S. I f this were not so, we

could al ways appl y appropri at e per mut at i ons t o make it

so. Now, t ake t he Cart esi an product of t he first r-p do-

mai ns of R, and call this new domai n A. Take t he Car-

tesian product of t he last p domai ns of R, and call this B.

Take t he Cart esi an product of t he last s-p domai ns of S

and call this C.

We can t r eat R as if it were a bi nary relation on t he

domai ns A, B. Similarly, we can t r eat S as if it were a bi-

nar y relation on t he domai ns B, C. The notions of linear

and cyclic 3-join are now directly applicable. A similar ap-

proach can be t aken wi t h t he linear and cyclic n-joins of n

relations of assorted degrees.

2.1.4. Composition. The reader is pr obabl y familiar

with t he notion of composition applied t o functions. We

shall discuss a generalization of t hat concept and appl y it

first to bi nary relations. Our definitions of composi t i on

and composabi l i t y are based very directly on t he definitions

of join and joinability given above.

Suppose we are given two relations R, S. T is a com-

position of R wi t h S if t here exists a join U of R wi t h S such

t hat T = 7913 (U). Thus, t wo relations are composabl e if

and only if t hey are j oinable. However, t he existence of

more t han one join of R wi t h S does not i mpl y t he existence

of more t han one composition of R with S.

Corresponding to t he nat ural join of R wi t h S is t he

384 Communi c at i ons of t he ACM Volume 13 / Number 6 / June, 1970

natural composition 9 of R with S defined by

R. S = ~' i~(R*S).

Taki ng t he relations R, S from Figure 5, their nat ural com:

position is exhibited in Figure 10 and anot her composition

is exhibited in Figure 11 (derived from t he join exhibited

in Figure 7).

Fro. 10.

R. S (project supplier)

1 1

1 2

2 1

2 2

The natural composition of R with S (from Figure 5)

T (project supplier)

1 2

2 1~:

Fro. 11. Another composition of R with S (from Figure 5)

When two or more joins exist, t he number of distinct

compositions may be as few as one or as many as t he num-

ber of distinct joins. Figule 12 shows an example of two

relations which have several joins but only one composition.

Not e t hat the ambiguity of point c is lost in composing R

with S, because of unambiguous associations made via t he

points a, b, d, e.

R (supplier part) S (part project)

1 a a g

1 b b f

1 c c f

2 c c g

2 d d g

2' e e f

composition Fio. 12. Many joins, only one

Extension of composition t o pairs of relations which are

not necessarily bi nary (and which may be of different de-

grees) follows t he same pat t ern as extension of pairwise

joining to such relations.

A lack of understanding of relational composition has led

several systems designers into what may be called t he

connection trap. This t rap may be described in t erms of t he

following example. Suppose each supplier description is

linked by pointers to t he descriptions of each part supplied

by t hat supplier, and each part description is similarly

linked to t he descriptions of each project which uses t hat

part. A conclusion is now drawn which is, in general, er-

roneous: namel y t hat , if all possible pat hs are followed from

a given supplier via t he parts he supplies to t he projects

using those parts, one will obtain a valid set of all projects

supplied by t hat supplier. Such a conclusion is correct

only in t he very special case t hat t he t arget relation be-

tween projects and suppliers is, in fact, t he nat ural com-

position of t he other two rel at i ons--and we must normally

add the phrase "for all t i me, " because this is usually im-

plied in claims concerning path-following techniques.

9 Other writers tend to ignore compositions other than the na-

tural one, and accordingly refer to this particular composition as

the composition--see, for example, Kelley's "General Topology."

2.1.5. Restriction. A subset of a relation is a relation.

One way in which a relation S may act on a relation R t o

generate a subset of R is t hrough t he operation restriction

of R by S. This operation is a generalization of t he restric-

tion of a function to a subset of its domain, and is defined

as follows.

Let L, M be equal-length lists of indices such t hat

L = / 1, / 2, " " , ik, M = j l , j 2, " " , jk where k ~ degree

of R and k ~ degree of S. Then t he L, M restriction of R by

S denoted RdMS is t he maximal subset R' of R such t hat

~L ( R' ) = ~M (S).

The operation is defined only if equality is applicable be-

tween elements of ~'~h (R) on t he one hand and ~'~h (S) on

t he ot her for all h = 1, 2, • • •, k.

The t hree relations R, S, R' of Figure 13 satisfy t he equa-

tion R' = R¢~,3)[¢i,2iS.

R 0 p J) S (p j) R' (s p j)

1 a A a A 1 a A

2 a A c B 2 a A

2 a B b B 2 b B

2 b A

2 b B

FIG. 13. Example of restriction

We are now in a position to consider various applications

of these operations on relations.

2.2. REDUNDANCY

Redundancy in t he named set of relations must be dis-

tinguished from redundancy in t he stored set of representa-

tions. We are primarily concerned here with the former.

To begin with, we need a precise notion of derivability for

relations.

Suppose 0 is a collection of operations on relations and

each operation has t he propert y t hat from its operands it

yields a unique relation (thus nat ural join is eligible, but

join is not ). A relation R is O-derivable from a set S of rela-

tions if t here exists a sequence of operations from the col-

lection 0 which, for all time, yields R from members of S.

The phrase "for all t i me" is present, because we are dealing

with t i me-varyi ng relations, and our interest is in derivabil-

i t y which holds over a significant period of time. For t he

named set of relationships in noninferential systems, it ap-

pears t hat an adequat e collection 01 contains t he following

operations: projection, nat ural join, tie, and restriction.

Permut at i on is irrelevant and nat ural composition need

not be included, because it is obtainable by taking a nat ural

join and t hen a projection. For the stored set of representa-

tions, an adequat e collection ~ of operations would include

permut at i on and additional operations concerned with sub-

setting and merging relations, and ordering and connecting

their elements.

2.2.1. Stronq Redundancy. A set of relations is strongly

redundant if i t contains at least one relation t hat possesses

a projection which is derivable from ot her projections of

relations in t he set. The following two examples are in-

t ended t o explain why strong redundancy is defined this

way, and to demonst rat e its practical use. I n t he first ex-

Vol ume 13 / Numb e r 6 / J une , 197 0 Communi c at i ons of t he ACM 3 8 5

ample t he collection of relations consists of j ust t he follow-

ing relation:

employee (serial #, name, manager#, managername )

with serial# as t he pri mary key and manager# as a foreign

key. Let us denote t he active domain by , ~, and suppose

t hat

and

At (manager#) c A, (serial#)

At (managername ) C At (name)

for all t i me t. I n this case the redundancy is obvious: t he

domai n managername is unnecessary. To see t hat it is a

strong redundancy as defined above, we observe t hat

~34 (employee) = ~'12 (employee )l[lm (employee).

I n t he second example t he collection of relations includes a

relation S describing suppliers with pri mary key s#, a re-

lation D describing depart ment s with pri mary key d#, a

relation J describing projects with pri mary key j#, and t he

following relations:

P (s#, d#, . . . ), Q (s#, j#, . - . ), R (d#, j#, . . . ),

where in each case . - . denotes domains ot her t han s#, d#,

j#. Let us suppose the following condition C is known t o

hold i ndependent of time: supplier s supplies depar t ment

d (relation P) if and only if supplier s supplies some proj ect

j (relation Q) to which d is assigned (relation R). Then, we

can write t he equation

~'12 ( P ) = 71-12 ( Q) . 71"21( R)

and t hereby exhibit a strong redundancy.

An i mport ant reason for t he existence of strong re-

dundancies in the named set of relationships is user con-

venience. A part i cul ar case of this is t he ret ent i on of semi-

obsolete relationships in t he named set so t hat old pro-

grams t hat refer to t hem by name can continue to run cor-

rectly. Knowledge of t he existence of strong redundancies

in t he named set enables a syst em or dat a base adminis-

t r at or greater freedom in t he selection of stored representa-

tions t o cope more efficiently with current traffic. If t he

strong redundancies in t he named set are directly reflected

in strong redundancies in t he stored set (or if ot her strong

redundancies are i nt roduced into t he stored set ), then, gen-

erally speaking, ext ra storage space and updat e t i me are

consumed wi t h a potential drop in query t i me for some

queries and in load on t he central processing units.

2.2.2. Weak Redundancy. A second t ype of redun-

dancy may exist. In contrast to strong redundancy it is not

characterized by an equation. A collection of relations is

weakly redundant if it contains a relation t hat has a projec-

tion which is not derivable from ot her members but is at

all times a projection of some join of ot her projections of

relations in t he collection.

We can exhibit a weak redundancy by taking t he second

example (cited above) for a strong redundancy, and as-

suming now t hat condition C does not hold at all times.

The relations ~12 (P), ~r12 (Q), '~'12 (R) are complex l° relations

with t he possibility of points of ambi gui t y occurring from

t i me t o t i me in t he potential joining of any two. Under

these circumstances, none of t hem is derivable from t he

ot her two. However, constraints do exist between t hem,

since each is a projection of some cyclic join of t he t hree of

t hem. One of t he weak redundancies can be characterized

by t he st at ement : for all time, ~12 ( P) is some composition

of ~-12 (Q) with "~'21 (R). The composition in question mi ght

be t he nat ural one at some i nst ant and a nonnat ural one at

anot her instant.

Generally speaking, weak redundancies are i nherent in

t he logical needs of t he communi t y of users. They are not

removable by t he syst em or dat a base administrator. If

t hey appear at all, t hey appear in bot h t he named set and

t he stored set of representations.

2.3. CONSISTENCY

Whenever t he named set of relations is redundant in

either sense, we shall associate with t hat set a collection of

st at ement s which define all of t he redundancies which hold

i ndependent of t i me between t he member relations. If t he

information syst em l acks- - and it most probabl y wi l l --de-

tailed semantic information about each named relation, it

cannot deduce t he redundancies applicable to t he named

set. I t might, over a period of time, make at t empt s to

induce t he redundancies, but such at t empt s would be fal-

lible.

Given a collection C of t i me-varyi ng relations, an as-

sociated set Z of constraint st at ement s and an i nst ant aneous

val ue V for C, we shall call t he st at e (C, Z, V) consistent

or inconsistent according as V does or does not satisfy Z.

For example, given stored relations R, S, T t oget her wi t h

t he constraint st at ement '%'12(T) is a composition of

7r~2 (R) with 7r~2 ( S) ", we may check from t i me to t i me t hat

t he values stored for R, S, T satisfy this constraint. An al-

gori t hm for making this check would examine t he first two

columns of each of R, S, T (in what ever way t hey are repre-

sented in the syst em) and determine whet her

(1) Try(T) = ~-i(R),

(2) ~( T) = ~( S) ,

(3) for every element pair (a, c) in t he relation ~2 (T)

t here is an element b such t hat (a, b) is in ~-12 (R)

and (b, c) is in 7r12(S).

There are practical problems (which we shall not discuss

here) in t aki ng an instantaneous snapshot of a collection

of relations, some of which may be ver y large and highly

variable.

I t is i mport ant to not e t hat consistency as defined above

is a propert y of t he instantaneous st at e of a dat a bank, and

is i ndependent of how t hat st at e came about. Thus, in

particular, t here is no distinction made on t he basis of

whet her a user generated an inconsistency due to an act of

omission or an act of commission. Exami nat i on of a simple

10 A binary relation is complex if neither it nor its converse is a

function.

3 8 6 Communi c at i ons of t he AMC Vol ume 13 / Numb er 6 / J une, 1970

example will show t he reasonableness of this (possibly un-

conventional) approach to consistency.

Suppose t he named set C includes t he relations S, J, D,

P, Q, R of the example in Section 2.2 and t hat P, Q, R

possess either t he strong or weak redundancies described

therein (in t he particular case now under consideration, it

does not mat t er which kind of redundancy occurs). Furt her,

suppose t hat at some t i me t the dat a bank st at e is consistent

and contains no project j such t hat supplier 2 supplies

project j and j is assigned to depart ment 5. Accordingly,

t here is no element (2, 5) in ~-i2 (P). Now, a user introduces

t he element (2, 5) into ~ri2 ( P) by inserting some appropri-

ate element into P. The dat a bank st at e is now inconsistent.

The inconsistency could have arisen from an act of omis-

sion, if t he i nput (2, 5) is correct, and t here does exist a

project j such t hat supplier 2 supplies j and j is assigned to

depart ment 5. In this case, it is very likely t hat t he user

intends in t he near future to insert elements into Q and R

which will have the effect of introducing (2, j ) into ~r12 (Q)

and (5, j ) in 7rI2(R). On the ot her hand, the i nput (2, 5)

might have been faulty. I t could be the case t hat the user

i nt ended t o insert some ot her element into P- - a n element

whose insertion would t ransform a consistent st at e into

a consistent state. The point is t hat t he system will

normally have no way of resolving this question wi t hout

interrogating its envi ronment (perhaps t he user who cre-

ated t he inconsistency).

There are, of course, several possible ways in which a

syst em can detect inconsistencies and respond to them.

I n one approach the system checks for possible inconsist-

ency whenever an insertion, deletion, or key updat e occurs.

Nat ural l y, such checking will slow these operations down.

If an inconsistency has been generated, details are logged

internally, and if it is not remedied within some reasonable

t i me interval, either t he user or someone responsible for

the security and integrity of t he dat a is notified. Anot her

approach is to conduct consistency checking as a bat ch

operation once a day or less frequently. Input s causing t he

inconsistencies which remain in t he dat a bank st at e at

checking t i me can be t racked down if t he system main-

rains a journal of all state-changing transactions. This

l at t er approach would certainly be superior if few non-

t ransi t ory inconsistencies occurred.

2.4. SUMMARY

In Section 1 a relational model of dat a is proposed as a

basis for protecting users of format t ed dat a systems from

t he potentially disruptive changes in dat a representation

caused by growth in t he dat a bank and changes in traffic.

A normal form for t he time-varying collection of relation-

ships is introduced.

In Section 2 operations on relations and two t ypes of

redundancy are defined and applied to t he problem of

maintaining t he dat a in a consistent state. This is bound t o

become a serious practical problem as more and more dif-

ferent t ypes of dat a are i nt egrat ed t oget her into common

dat a banks.

Many questions are raised and left unanswered. For

example, only a few of the more i mport ant properties of

t he dat a sublanguage in Section 1.4 are mentioned. Nei t her

t he purely linguistic details of such a language nor t he

i mpl ement at i on problems are discussed. Nevertheless, t he

material presented should be adequat e for experienced

systems programmers to visualize several approaches. I t

is also hoped t hat this paper can cont ri but e to greater pre-

cision in work on format t ed dat a systems.

Ack nowledgment. I t was C. T. Davies of I BM Pough-

keepsie who convinced the aut hor of the need for dat a

independence in fut ure i nformat i on systems. The aut hor

wishes to t hank him and also F. P. Palermo, C. P. Wang,

E. B. Altman, and M. E. Senko of the I BM San Jose Re-

search Labor at or y for helpful discussions.

RECEIVED SEPTEMBER, 1969; REVISED FEBRUARY, 1970

REFERENCES

1. CHILDS, D. L. Feasibility of a set-theoretical data structure

--a general structure based on a reconstituted definition of

relation. Proc. IFIP Cong., 1968, North Holland Pub. Co.,

Amsterdam, p. 162-172.

2. LEVEIN, R. E., AND MARON, M. E. A computer system for

inference execution and data retrieval. Comm. ACM 10,

11 (Nov. 1967), 715--721.

3. BACHMAN, C. W. Software for random access processing.

Datamation (Apr. 1965), 36--41.

4. McGEE, W. C. Generalized file processing. In Annual Re-

view in Automatic Programming 5, 13, Pergamon Press,

New York, 1969, pp. 77-149.

5. Information Management System/360, Application Descrip-

tion Manual H20-0524-1. IBM Corp., White Plains, N. Y.,

July 1968.

6. GIS (Generalized Information System), Application Descrip-

tion Manual H20-0574. IBM Corp., White Plains, N. Y.,

1965.

7. BLEIER, R. E. Treating hierarchical data structures in the

SDC time-shared data management system (TDMS).

Proc. ACM 22nd Nat. Conf., 1967, MDI Publications,

Wayne, Pa., pp. 41---49.

8. IDS Reference Manual GE 625/635, GE Inform. Sys. Div.,

Pheonix, Ariz., CPB 1093B, Feb. 1968.

9. CHURCH, A. An Introduction to Mathematical Logic I. Prince-

ton U. Press, Princeton, N.J., 1956.

10. FELDMAN, J. A., AND ROVNEa, P. D. An Algol-based associ-

ative language. Stanford Artificial Intelligence Rep. AI-66,

Aug. 1, 1968.

[]

Vol ume 13 / Number 6 / J une, 1970 Communi cat i ons of t he ACCM 3 8 7

- Bo Interview QuestionUploaded byMani Kandan
- EOP Concept DefinitionsUploaded bygshubert17
- s s-7 relation versus functionUploaded byapi-253679034
- IBPS IT Officer Question Paper 2Uploaded bypriyatoshdwivedi2010
- Additional Mathematics Project Work Sarawak 2013Uploaded byaziq_lumut
- 100 TOP DATABASE MANAGEMENT SYSTEM Questions and Answers DATABASE MANAGEMENT SYSTEM Questions and Answers.pdfUploaded bySumeet Bhardwaj
- Calculus of RelationsUploaded bydaukszda
- 0201537710Uploaded byacgrasool
- 5. DatabaseUploaded byMaryus Steaua
- 12 Mathematics Impq Relations and Functions 02Uploaded byHarikrishna Shenoy
- 100 Top Database Management System Questions and Answers Database Management System Questions and AnswersUploaded bySumeet Bhardwaj
- Assignment 1Uploaded byint0
- CICT1513 Chapter 9Uploaded byJonnyDeep
- All StringsUploaded bynajibsangar
- 634711717779040000 Tc s Interview QuestionsUploaded byAnshu Shukla
- PDS ProjectUploaded byVignesh Selvaraj
- Import ParameterssUploaded bySiva Kumar
- TCC Filtering StepsUploaded bypidamarthi14
- Recommending Materialized Views and Indexes With the IBM DB2 Design AdvisorUploaded byManikandan Subbiah
- 07A80207-DATABASEMANAGEMENTSYSTEMSUploaded bySharanya Thirichinapalli
- 5 Parallelism and Partitioning in Data WarehousesUploaded byexbis
- IS371 Update ProjectDescriptionUploaded byAnupam Kumar
- IBM Cognos PowerHouse 4GL PrimerUploaded bylonelyman1