You are on page 1of 10

The Nested Relational Data Model is Not a Good Idea

Robert M. Colomb
School of Information Technology and Electrical Engineering
The University of Queensland
Queensland 4072 Australia
colomb@itee.uq.edu.au
Work performed partly while visiting LADSEB-CNR; Corso Stati Uniti, 4; Padova, Italy

Abstract
The nested relational data model is a natural generalisation of the relational
data model, but it often leads to designs which hide the data structures
needed to specify queries and updates in the information system. The
relational data model on the other hand exposes the specifications of the
data structures and permits the minimal specification of queries and updates
using SQL. The deficiencies in relational systems leading to a demand for
object-oriented nested relational solutions are seen to be deficiencies in the
implementations of relational database systems, not in the data model itself.

Introduction
The nested relational data model is a natural generalisation of the relational data model,
but it often leads to designs which hide the data structures needed to specify queries and
updates in the information system. The relational data model on the other hand exposes
the specifications of the data structures and permits the minimal specification of queries
and updates using SQL. However, there are deficiencies in relational systems, which lead
to a demand for object-oriented nested relational solutions. This paper argues that these
deficiencies are not inherent in the relational data model, but are deficiencies in the
implementations of relational database systems.
The paper first sketches how the nested-relational model is a natural extension of the
object-relational data model, then shows how the nested relational model, while sound, is
expensive to use. It then examines the object-oriented paradigm for software engineering,
and shows that it gives very little benefit in database applications. Rather, the relational
model as represented in conceptual modelling languages is argued to provide an ideal
view of the data. The ultimate thesis is that a better strategy is to employ a main-memory
relational database optimised for queries on complex objects, with a query interface
based on a conceptual model query language.

Object-relational data model leads to nested relations


The object-relational data model (Stonebraker, Brown and Moore 1999) arises out of the
realisation that the relational data model abstracts away from the value sets of attribute
functions. If we think in terms of tuple identifiers in relations (keys), then a relation is
simply a collection of attribute functions mapping the key into value sets.
The pure relational data model is based on set theory, and operates in terms of
projections, cartesian products and selection predicates. Cartesian product simply creates
new sets from existing sets, while projection requires the notion of identity, since the
projection operation can produce duplicates, which must be identified. Selection requires
the concept of a predicate, but the relational model abstracts away from the content of the
predicate, requiring only a function from a tuple of value sets into {true, false}. The
relational system requires only the ability to combine predicates using the propositional
calculus.
Particular value sets have properties which are used in predicates and in other operations.
The only operator used in the pure relational model is identity. The presence of this
operator is guaranteed by the requirement that the value sets be sets, although in practice
some value sets do not for practical purposes support identity (eg real number represented
as floating point).
This realisation that the relational data model abstracts away from the types of value sets
and from the operators which are available to types has allowed the design of database
systems where the value sets can be of any type. Besides integers, strings, reals, and
booleans, object-relational databases can support text, images, video, animation,
programs and many other types. Each type supports a set of operations and predicates
which can be integrated with the relational operations into practical solutions (each type
is an abstract data type).
If a value set can be of any type, why not a set of elements of some type? Why not a
tuple? If we allow sets and tuples, then why not sets of tuples? Sets of tuples are relations
and the corresponding abstract data type is the relational algebra. Thus the object-
relational data model leads to the possibility of relation-valued attributes in relations.
Having relation-valued attributes in relations looks as if it might violate first normal
form. However, the outer relational operations can only result in tuples whose attribute
values are either copies of attribute values from the original relations or are functions of
those values, in the same way as if the value sets were integers, the results are either the
integers present in the original tables or functions like square root of those integers. In
other words, the outer relational system can only see inside a relation-valued attribute to
the extent that a function is supplied to do so. These functions are particular to the
schema of the relation-valued attribute, and have no knowledge of the outer schema.
Since the outer relational model and the abstract data type of a relation-valued attribute
are the same abstract data type, it makes sense to introduce a relationship among the two.
The standard relationships are unnest and nest. Unnest is an operator which modifies the
scheme of the outer data model, replacing the relation-valued attribute function by a
collection of attribute functions corresponding to the scheme of the inner relation. Nest is
the reverse operation, which modifies the outer scheme by packaging a collection of
attributes into a single relation-valued attribute.
Having relation-valued attributes together with nest and unnest operations between the
outer and inner relational systems is called the nested relational data model. We see that
the nested relational data model is a natural extension of the object-relational data model.

2
Use of the nested relational data model for object-oriented development
In recent years the object-oriented model has become the dominant programming model
and is becoming more common in systems design, including information systems. The
data in an object-oriented system consists typically of complex data structures built from
tuple and collector types. The tuple type is the same as the tuple type in the object-
relational model. A collector type is either a set, list or multiset. The latter two can be
seen as sets with an additional attribute: a list is a set with a sequence attribute, while a
multiset is a set with an additional identifying attribute. So a nested-relational data model
can represent data from an object-oriented design.
Accordingly, object-relational databases with object-relational nested SQL can be used to
implement object-oriented databases. How this is done is described for example by
Stonebraker , Brown and Moore (1999) (henceforth SBM).
We should note that both the relational and object-oriented data models are
implementations of more abstract conceptual data models expressed in conceptual data
modelling languages such as the Entity-Relationship-Attribute (ERA) method. Well-
established information systems design methods begin the analysis of data with a
conceptual model, moving to a particular database implementation at a later stage.
An example adapted from SBM will clarify some issues. Consider the data model in
Figure 1. Since the relationship between department and vehicle is one-to-many,
associated with each department is a set of vehicles.

Figure 1 A conceptual model


A nested-relational implementation of this conceptual data model is
Dept(ID:int, other: various, (1)
car: set of (vehID: string, make:string, year:int))
and a typical population might be
Figure 2 Sample NR population
Dept
ID Car
VehID Make Year
1 006BKL Laser 1991
099CDR Holden 1999
2 656TTR Falcon 2000
881SQL Honda 1998
Note that the object-relational table is Dept, with two attributes, ID and Car. Car is a
relation-valued attribute with scheme (VehID, Make, Year).

3
OR SQL on nested relations
SQL has been extended by SBM among others to handle object-relational databases,
mainly by permitting in SQL statements the predicates and operators particular to the
abstract data types supporting the value sets. In particular, nested relational systems are
supported by extending and overloading the dot notation for disambiguating attribute
names. For example, in
Select ID from dept where car.year = 1999 (2)
Dot year identifies the year attribute of the car tuple, and also designates the membership
of a tuple where year = 1999 in the set of tuples which is the value set of dept.car. The
result of this query on the table of Figure 2 is ID = 1.
As a consequence of this overloading, the and boolean operator in the WHERE clause
becomes if not ambiguous, at least counterintuitive to someone used to standard SQL.
The query
Select ID from dept where car.make = Laser (3)
Has the same result, ID = 1. Since the outermost interpretation of dot is set membership,
in the query
Select ID from dept where car.year = 1999 and car.make = Laser (4)
The and operator is interpreted as set intersection, and the result is also ID = 1.
This result, although correct, is probably not what the maker of the query intended. They
would more likely have been looking for a department which has a 1999 Laser, and the
response they would be looking for would be none.
There are two ways to fix this problem. One is to import a new and operator from the
relational ADT, so that (4) becomes
Select ID from dept where car.year = 1999 and2 car.make = Laser (5)
In this solution, both arguments of and2 must be the same relation-valued attribute of the
outer system.
The other solution is to unnest the table so that the standard relational operator works in
the way it does in standard SQL
Select ID from dept, dept.car where (6)
car.year = 1999 and2 car.make = Laser
Where the addition of dept.car to the FROM clause signifies unnesting.
The former method is problematic since nesting can occur to any level, and the second is
problematic since it requires the user to introduce navigation information into the query.
The same sort of problem occurs when we try to correlate the SELECT clause with the
WHERE clause
Select ID, car.year from dept where car.make = Laser (7)
Returns the table

4
ID = 1, Year = {1991, 1999} (8)
when applied to the table of Figure 2, as a consequence of first normal form. We need
again to use unnest to convert the nested structure to a flat relational structure in order to
make the query mean what we want to say.
Although OR SQL is a sound and complete query language, the simple-looking queries
tend to be not very useful, and in order to make useful queries additional syntax and a
good understanding of the possibly complex and possibly multiple nesting structure is
essential. The author’s experience is that it is very hard to teach, even to very advanced
students.

Representation of many to many relationships


If we are going to use the nested relational model to represent complex data structures,
then we must take account of many to many relationships, as in Figure 3.
N M N M
Student Course Lecturer

Figure 3 A many to many relationship


There are several different ways to implement this application in the nested relational
model, taking each of the entities as the outermost relation. If implemented as a single
table, two of the entities would be stored redundantly because of the many-to-many
relationships. So the normalised way is to store the relationships as sets of reference types
(attributes whose value sets are object identifiers).
If the query follows the nesting structure used in the implementation, then we have only
the problems of correlation of various clauses in the SQL query described in the last
section.
However, if the query does not follow the nesting structure, it can get very complex. For
example, if the table has a set of courses associated with each student and a set of
lecturers associated with each course, then in order to find the students associated with a
given lecturer, the whole structure needs to be unnested, and done so across reference
types. The query is hard to specify, and would be very complex to implement.
One might argue that one should not use the nested relational model for many to many
relationships. But nested systems can interact, as in Figure 4.

5
Event Team

1 1

N N

N M
Race Competitor

Figure 4 A many-to-many with nesting


In this case, an event has a set of races, and a team has a set of competitors, and we have
to decide whether a race has a set of references to competitor or vice versa. What if we
want to find what events a team participates in? The whole structure must be unnested.
The point is that representing these commonly occurring complex data structures using a
nested relational model is very much more complex then representing them in the
standard relational model.

Reconsideration of using NR model for OO concepts


We have seen that the nested relational model arises naturally from the object-relational
model, and that it has a sound and complete query language based on first normal form.
However, we have seen several practical problems:
• Using the NR model forces the designer to make more choices at the database schema
level than if the standard relational model is used.
• A query on a NR model must include navigation paths.
• A query must often unnest complex structures, often very deeply for even
semantically simple queries.
So even though the nested relational model is sound, it is very much more difficult to use
than the standard relational model, so may be thought of as much more expensive to use.
In order for a more expensive tool to be a sound engineering choice, there must be a
corresponding benefit. Let us therefore look at the benefits of the object-oriented
programming model.
OO programming and design originated in the software engineering domain. In this
domain, it is considered beneficial to hide the details of the implementation of a program
specification. This information hiding makes use of objects more transparent, and ensures
that modifications made to objects which do not affect functionality may be made
without side effects. The principles of information hiding were a major advance in
software engineering.
The benefits of using an OO approach in a database therefore would come from
information hiding, that is hiding implementation details not required for understanding
the specification of an object.

6
Let us see how this applies to the specification of data in an information system. As we
have seen, it is common to use a conceptual modelling technique to specify such data, as
in Figures 1, 3 and 4. The implementation of this data is ultimately in terms of disk
addresses, file organisations and access methods, but is generally done in several stages.
The first stage of implementation is normally the specification of schemas in a database
data description language, very often in a relational database system. This stage of
implementation is almost a transliteration, frequently introducing no additional design
decisions. Algorithms for the purpose are given for example by Elmasri and Navathe
(2000).
Further stages of implementation are performed almost entirely within the database
manager software (DBMS), sometimes with the guidance of a database administrator
who will identify attributes of tables which need rapid access, or give the DBMS some
parameters which it will use to choose among pre-programmed design options. In effect,
the implementation of the data model is almost entirely automated, and generally not the
concern of the applications programmer.
So the conceptual data model is a specification, the almost equivalent DBMS table
schemas are in effect also specifications, and the programmer does not generally proceed
further with refinement.
On the programming side, an information system generally has a large number of
modules which update or query the tables. In a relational system, these programs are
generally written using the SQL data manipulation language. The SQL statement is at a
very high level, and is generally also refined in several stages:
• The order of execution of the various relational operators must be chosen.
• Various secondary and primary indexes can be created or employed
• Decisions need to be made as to the size of blocks retrieved from disk, what is to be
cached in main memory, whether intermediate results need to be sorted, and what sort
algorithms to use.
But, again, these refinement decisions are made by the DBMS using pre-programmed
design decisions depending on statistics of the tables held in the system catalog and to a
degree on parameters supplied by the database administrator. The programmer is
generally not concerned with them.
So it makes sense to think of an SQL statement not as a program but as a specification for
a program. It is hard to see what might be removed from an SQL statement while
retaining the same specified result. The SELECT clause determines which columns are to
appear in the result, the FROM clause determines which tables to retrieve data from (in
effect which entities and relationships the data is to come from), and the WHERE clause
determines which rows to retrieve data from.
We have that the benefits of information hiding in object-oriented design is that the
programmer can work with the specifications of the data and methods of a system
without having to worry about how the specifications are implemented. However, in
information systems, the programmer works only with specifications of data structures
and access/ update methods. The implementation is hidden already in the DBMS. So in a

7
DBMS environment the programmer never has to worry how the specifications are
implemented. Information hiding is already employed no matter what design method the
programmer uses.
What the nested relational data model does is hide aspects of the structure of the specified
data, whereas the standard relational model exposes the specified structure of the data.
Using the NR data model, the data designer must make what amount to packaging design
decisions in the implementation of a conceptual model. In this sense, a NR model is more
refined than a standard relational model, and is therefore more expensive to build. On the
other hand, when a query is planned, in the NR model the programmer, besides
specifying the data that is to appear in the query, must also specify how to unpackage the
data to expose sufficient structure to specify the result. So as we have seen, the query is
also more expensive. Both the data representation and the query are unnecessarily more
expensive than the standard relational representation, since the information being hidden
is part of the specification, not how the specifications are implemented.

So why don’t people use RDBs for OO applications?


One might ask why people don’t already use relational databases for problems calling for
object-oriented approaches. The usual reason given is that RDBs are too slow. The
paradigmatic object-oriented application is system design, say a VLSI design or the
design of a large software system. There is often only one (very complex) object in the
system. This object has many parts, which are themselves complex. A relational
implementation therefore calls for many subordinate tables with limited context; and
processing data in the application generally requires large numbers of joins.
Database managers tend to be designed to support transactional applications, where there
are a large number of objects of limited complexity. The space of pre-programmed design
options for the implementation of data structures and queries does not generally extend to
the situation where there are a small number of very complex objects.
Rejection of the standard relational data model for these applications is therefore not a
rejection of the model per se, but a recognition that current implementations of the
standard relational data model do not perform well enough for these problems.

What can be done?


Two problems have been identified which make the standard relational model difficult to
use for OO applications: the slowness of the implementation and the necessity for the
definition of a large number of tables with limited context.
The former problem is technical. A large amount of investment has been made in the
design of implementations for transaction-oriented applications. Given sufficient
effective demand, there is no reason why a sufficient investment can not be made for
applications of the OO type. In particular, there are already relational database systems
optimised around storage of data primarily in main memory rather than on disk. For
example, a research project of National Research Institute for Mathematics and Computer
Science in the Netherlands together with the Free University of Amsterdam, called

8
Monet1, has published a number of papers on the various design issues in this area. A
search on the Web identifies many such products. The problem of slowness of standard
relational implementations for OO applications can be taken to be on the way to solution.
The latter problem, that the data definition for an OO application requires a large number
of tables with limited context, is a problem with the expressiveness of the standard
relational data model. In an OO application one frequently wants to navigate the complex
data structures specified. For example, from the model in Figure 4 one might want the set
of teams participating in a particular race in a particular event, or the set of events in
which a particular competitor from a particular team is competing, or the association
between teams and events defined by the many-to-many relationship between Race and
Competitor. From the point of view of each of those queries, there is a nested-relational
packaging of the conceptual model which makes the query simple, simpler than the
standard relational representation. The unsuitablity of the NR model is that these NR
packagings are all different, and that a query not following the chosen packaging
structure is very complex.
However, we have already seen that the primary representation of the data can be in a
conceptual model. The relational representation can be, and generally is, constructed
algorithmically. If the DBMS creates the relational representation of the conceptual
model, then the conceptual model should be the basis for the query language. A query
expressed on the conceptual model can be translated into SQL DML in the same sort of
way that the model itself is translated into SQL DDL. In fact, there are a number of
conceptual query languages which permit the programmer to construct a query by
specifying a navigation through the conceptual model, for example ConQuer (Bloesch
and Halpin, 1996, 1997).
Using a language like ConQuer, the programmer can specify a navigation path through
the conceptual model, which when it traverses a one-to-many relationship opens the set
of instances on the target side. When it traverses a many-to-many relationship, the view
from the source of the path looks like a one-to-many. Any of the views of Figure 4
described above can be used. Such a traversal of the conceptual model provides a sort of
virtual nested-relational data packaging, which can be translated into standard SQL
without the programmer being aware of exactly how the data is packaged. This approach
therefore is more true to the spirit of object-oriented software development since the
implementation of the specification is completely hidden.

Conclusion
The thesis of this paper is therefore that a standard relational data model where the DDL
and DML are both hidden beneath a conceptual data modelling language and the DBMS
is a main-memory implementation optimised for OO-style applications, presents a much
superior approach to the problem of OO applications than does the nested relational data
model.

1
http://dbs.cwi.nl:8080/cwwwi/owa/cwwwi.print_projects?ID=41

9
References
Bloesch, A. and Halpin, T. (1996) “ConQuer: a Conceptual Query Language” Proc.
ER’96: 15th International Conference on Conceptual Modeling, Springer LNCS, no.
1157, pp. 121-33.
Bloesch, A. and Halpin, T. (1997) “Conceptual Queries Using ConQuer-II” in. David W.
Embley, Robert C. Goldstein (Eds.): Conceptual Modeling - ER '97, 16th International
Conference on Conceptual Modeling, Los Angeles, California, USA, November 3-5,
1997, Proceedings. Lecture Notes in Computer Science 1331 Springer 1997
Elmasri, R. & Navathe, S. B. (2000). Fundamentals of Database Systems. (3rd ed.).
Addison Wesley, Reading, Mass.
Stonebraker, M., Brown, P. and Moore, D. (1999) Object-relational DBMSs : tracking
the next great wave San Francisco, Calif. : Morgan Kaufmann Publishers.

10

You might also like