Professional Documents
Culture Documents
Robert M. Colomb
School of Information Technology and Electrical Engineering
The University of Queensland
Queensland 4072 Australia
colomb@itee.uq.edu.au
Work performed partly while visiting LADSEB-CNR; Corso Stati Uniti, 4; Padova, Italy
Abstract
The nested relational data model is a natural generalisation of the relational
data model, but it often leads to designs which hide the data structures
needed to specify queries and updates in the information system. The
relational data model on the other hand exposes the specifications of the
data structures and permits the minimal specification of queries and updates
using SQL. The deficiencies in relational systems leading to a demand for
object-oriented nested relational solutions are seen to be deficiencies in the
implementations of relational database systems, not in the data model itself.
Introduction
The nested relational data model is a natural generalisation of the relational data model,
but it often leads to designs which hide the data structures needed to specify queries and
updates in the information system. The relational data model on the other hand exposes
the specifications of the data structures and permits the minimal specification of queries
and updates using SQL. However, there are deficiencies in relational systems, which lead
to a demand for object-oriented nested relational solutions. This paper argues that these
deficiencies are not inherent in the relational data model, but are deficiencies in the
implementations of relational database systems.
The paper first sketches how the nested-relational model is a natural extension of the
object-relational data model, then shows how the nested relational model, while sound, is
expensive to use. It then examines the object-oriented paradigm for software engineering,
and shows that it gives very little benefit in database applications. Rather, the relational
model as represented in conceptual modelling languages is argued to provide an ideal
view of the data. The ultimate thesis is that a better strategy is to employ a main-memory
relational database optimised for queries on complex objects, with a query interface
based on a conceptual model query language.
2
Use of the nested relational data model for object-oriented development
In recent years the object-oriented model has become the dominant programming model
and is becoming more common in systems design, including information systems. The
data in an object-oriented system consists typically of complex data structures built from
tuple and collector types. The tuple type is the same as the tuple type in the object-
relational model. A collector type is either a set, list or multiset. The latter two can be
seen as sets with an additional attribute: a list is a set with a sequence attribute, while a
multiset is a set with an additional identifying attribute. So a nested-relational data model
can represent data from an object-oriented design.
Accordingly, object-relational databases with object-relational nested SQL can be used to
implement object-oriented databases. How this is done is described for example by
Stonebraker , Brown and Moore (1999) (henceforth SBM).
We should note that both the relational and object-oriented data models are
implementations of more abstract conceptual data models expressed in conceptual data
modelling languages such as the Entity-Relationship-Attribute (ERA) method. Well-
established information systems design methods begin the analysis of data with a
conceptual model, moving to a particular database implementation at a later stage.
An example adapted from SBM will clarify some issues. Consider the data model in
Figure 1. Since the relationship between department and vehicle is one-to-many,
associated with each department is a set of vehicles.
3
OR SQL on nested relations
SQL has been extended by SBM among others to handle object-relational databases,
mainly by permitting in SQL statements the predicates and operators particular to the
abstract data types supporting the value sets. In particular, nested relational systems are
supported by extending and overloading the dot notation for disambiguating attribute
names. For example, in
Select ID from dept where car.year = 1999 (2)
Dot year identifies the year attribute of the car tuple, and also designates the membership
of a tuple where year = 1999 in the set of tuples which is the value set of dept.car. The
result of this query on the table of Figure 2 is ID = 1.
As a consequence of this overloading, the and boolean operator in the WHERE clause
becomes if not ambiguous, at least counterintuitive to someone used to standard SQL.
The query
Select ID from dept where car.make = Laser (3)
Has the same result, ID = 1. Since the outermost interpretation of dot is set membership,
in the query
Select ID from dept where car.year = 1999 and car.make = Laser (4)
The and operator is interpreted as set intersection, and the result is also ID = 1.
This result, although correct, is probably not what the maker of the query intended. They
would more likely have been looking for a department which has a 1999 Laser, and the
response they would be looking for would be none.
There are two ways to fix this problem. One is to import a new and operator from the
relational ADT, so that (4) becomes
Select ID from dept where car.year = 1999 and2 car.make = Laser (5)
In this solution, both arguments of and2 must be the same relation-valued attribute of the
outer system.
The other solution is to unnest the table so that the standard relational operator works in
the way it does in standard SQL
Select ID from dept, dept.car where (6)
car.year = 1999 and2 car.make = Laser
Where the addition of dept.car to the FROM clause signifies unnesting.
The former method is problematic since nesting can occur to any level, and the second is
problematic since it requires the user to introduce navigation information into the query.
The same sort of problem occurs when we try to correlate the SELECT clause with the
WHERE clause
Select ID, car.year from dept where car.make = Laser (7)
Returns the table
4
ID = 1, Year = {1991, 1999} (8)
when applied to the table of Figure 2, as a consequence of first normal form. We need
again to use unnest to convert the nested structure to a flat relational structure in order to
make the query mean what we want to say.
Although OR SQL is a sound and complete query language, the simple-looking queries
tend to be not very useful, and in order to make useful queries additional syntax and a
good understanding of the possibly complex and possibly multiple nesting structure is
essential. The author’s experience is that it is very hard to teach, even to very advanced
students.
5
Event Team
1 1
N N
N M
Race Competitor
6
Let us see how this applies to the specification of data in an information system. As we
have seen, it is common to use a conceptual modelling technique to specify such data, as
in Figures 1, 3 and 4. The implementation of this data is ultimately in terms of disk
addresses, file organisations and access methods, but is generally done in several stages.
The first stage of implementation is normally the specification of schemas in a database
data description language, very often in a relational database system. This stage of
implementation is almost a transliteration, frequently introducing no additional design
decisions. Algorithms for the purpose are given for example by Elmasri and Navathe
(2000).
Further stages of implementation are performed almost entirely within the database
manager software (DBMS), sometimes with the guidance of a database administrator
who will identify attributes of tables which need rapid access, or give the DBMS some
parameters which it will use to choose among pre-programmed design options. In effect,
the implementation of the data model is almost entirely automated, and generally not the
concern of the applications programmer.
So the conceptual data model is a specification, the almost equivalent DBMS table
schemas are in effect also specifications, and the programmer does not generally proceed
further with refinement.
On the programming side, an information system generally has a large number of
modules which update or query the tables. In a relational system, these programs are
generally written using the SQL data manipulation language. The SQL statement is at a
very high level, and is generally also refined in several stages:
• The order of execution of the various relational operators must be chosen.
• Various secondary and primary indexes can be created or employed
• Decisions need to be made as to the size of blocks retrieved from disk, what is to be
cached in main memory, whether intermediate results need to be sorted, and what sort
algorithms to use.
But, again, these refinement decisions are made by the DBMS using pre-programmed
design decisions depending on statistics of the tables held in the system catalog and to a
degree on parameters supplied by the database administrator. The programmer is
generally not concerned with them.
So it makes sense to think of an SQL statement not as a program but as a specification for
a program. It is hard to see what might be removed from an SQL statement while
retaining the same specified result. The SELECT clause determines which columns are to
appear in the result, the FROM clause determines which tables to retrieve data from (in
effect which entities and relationships the data is to come from), and the WHERE clause
determines which rows to retrieve data from.
We have that the benefits of information hiding in object-oriented design is that the
programmer can work with the specifications of the data and methods of a system
without having to worry about how the specifications are implemented. However, in
information systems, the programmer works only with specifications of data structures
and access/ update methods. The implementation is hidden already in the DBMS. So in a
7
DBMS environment the programmer never has to worry how the specifications are
implemented. Information hiding is already employed no matter what design method the
programmer uses.
What the nested relational data model does is hide aspects of the structure of the specified
data, whereas the standard relational model exposes the specified structure of the data.
Using the NR data model, the data designer must make what amount to packaging design
decisions in the implementation of a conceptual model. In this sense, a NR model is more
refined than a standard relational model, and is therefore more expensive to build. On the
other hand, when a query is planned, in the NR model the programmer, besides
specifying the data that is to appear in the query, must also specify how to unpackage the
data to expose sufficient structure to specify the result. So as we have seen, the query is
also more expensive. Both the data representation and the query are unnecessarily more
expensive than the standard relational representation, since the information being hidden
is part of the specification, not how the specifications are implemented.
8
Monet1, has published a number of papers on the various design issues in this area. A
search on the Web identifies many such products. The problem of slowness of standard
relational implementations for OO applications can be taken to be on the way to solution.
The latter problem, that the data definition for an OO application requires a large number
of tables with limited context, is a problem with the expressiveness of the standard
relational data model. In an OO application one frequently wants to navigate the complex
data structures specified. For example, from the model in Figure 4 one might want the set
of teams participating in a particular race in a particular event, or the set of events in
which a particular competitor from a particular team is competing, or the association
between teams and events defined by the many-to-many relationship between Race and
Competitor. From the point of view of each of those queries, there is a nested-relational
packaging of the conceptual model which makes the query simple, simpler than the
standard relational representation. The unsuitablity of the NR model is that these NR
packagings are all different, and that a query not following the chosen packaging
structure is very complex.
However, we have already seen that the primary representation of the data can be in a
conceptual model. The relational representation can be, and generally is, constructed
algorithmically. If the DBMS creates the relational representation of the conceptual
model, then the conceptual model should be the basis for the query language. A query
expressed on the conceptual model can be translated into SQL DML in the same sort of
way that the model itself is translated into SQL DDL. In fact, there are a number of
conceptual query languages which permit the programmer to construct a query by
specifying a navigation through the conceptual model, for example ConQuer (Bloesch
and Halpin, 1996, 1997).
Using a language like ConQuer, the programmer can specify a navigation path through
the conceptual model, which when it traverses a one-to-many relationship opens the set
of instances on the target side. When it traverses a many-to-many relationship, the view
from the source of the path looks like a one-to-many. Any of the views of Figure 4
described above can be used. Such a traversal of the conceptual model provides a sort of
virtual nested-relational data packaging, which can be translated into standard SQL
without the programmer being aware of exactly how the data is packaged. This approach
therefore is more true to the spirit of object-oriented software development since the
implementation of the specification is completely hidden.
Conclusion
The thesis of this paper is therefore that a standard relational data model where the DDL
and DML are both hidden beneath a conceptual data modelling language and the DBMS
is a main-memory implementation optimised for OO-style applications, presents a much
superior approach to the problem of OO applications than does the nested relational data
model.
1
http://dbs.cwi.nl:8080/cwwwi/owa/cwwwi.print_projects?ID=41
9
References
Bloesch, A. and Halpin, T. (1996) “ConQuer: a Conceptual Query Language” Proc.
ER’96: 15th International Conference on Conceptual Modeling, Springer LNCS, no.
1157, pp. 121-33.
Bloesch, A. and Halpin, T. (1997) “Conceptual Queries Using ConQuer-II” in. David W.
Embley, Robert C. Goldstein (Eds.): Conceptual Modeling - ER '97, 16th International
Conference on Conceptual Modeling, Los Angeles, California, USA, November 3-5,
1997, Proceedings. Lecture Notes in Computer Science 1331 Springer 1997
Elmasri, R. & Navathe, S. B. (2000). Fundamentals of Database Systems. (3rd ed.).
Addison Wesley, Reading, Mass.
Stonebraker, M., Brown, P. and Moore, D. (1999) Object-relational DBMSs : tracking
the next great wave San Francisco, Calif. : Morgan Kaufmann Publishers.
10