Professional Documents
Culture Documents
Goetz Graefe
Portland State University, Computer Science Depaxtatent
P.O. Box 751, Portland, Oregon 97207-0751
Abstract The foundation of physical data independence is a
A cornerstone of modern tJatabase systems is physical separation of conceptual data and their physical repre-
data independence, i.e.,the separationof a type and its sentation. On the logical level, data are encapsulated
associated operations from its physical representation in (abstract) data types with a well-defined set of oper-
in memory and on storage media. Users manipulate ations. All instances of the same type support the
and query data at the logical level; the DBMS trans- same operations and exhibit, to the outside, exactly the
lates these logical operations to operations on files, same behavior. On the inside, however, i.e., on the
indices, records, and disks. The efficiency of these level of physical representations, it is entirely possible
physical operations depends very much on the choice that two instances of the same type be represented dif-
of data representations. ferently, as long as the different representations sup-
port the same behavior to the outside world. Reasons
Choosing a physical representation for a logical for different internal implementations are typically
database is called physical database design. The num- time and space efficiencies combined with an
ber of possible choices in physical database design is instance's size and its frequency and modes of retrieval
very large; moreover, they very often interact with and update. Of course, this separation and physical
each other. W e attempt to list and classify these data independence are very closely related to the sepa-
choices and to explore theirinteractions. The purpose ration of interface specification and implementation in
of Ibis paper is to provide an overview of possible abstract data types.
options to the D B M S developer and some guidance to
the DBMS administrator and user. Let us consider some very concrete examples. In
the relational data model, a relation is a set of tuples.
While much of our disenssion will draw on the rela- Notice that it is a set, not a sequence or list: thus, there
tional data model, physical database design is of even is no inherent order. In the storage representation and
more importance for object-oriented and extensible in processing, however, order can make a big differ-
systems. The reasons are simple: First, the number of ence in query processing performance and therefore is
logical data types and their operations is larger, requir- an important physical database design issue. Partition-
ing and permitting more choices for their representa- ing or striping a relation over multiple disks as well as
tion. Second, the state of the art in query optimization indexing are relevant and important performance tech-
for these systems is much less developed than for rela- niques that also have no place in the relational data
tional systems, making careful physical database model, since it is a logical or conceptual data model.
design even more imperative for object-oriented Moreover, tuple representations are not specified in the
database systems. model: a tuple can be a record, it can be divided into
multiple records (e.g., fixed-length and variable-length
1. I n t r o d u c t i o n fields), it can be represented by pointers into files rep-
The most significant contributions of database research resenting the underlying domains, etc. In the object-
to computer science have probably been the transac- oriented database world, objects are individuals with
tion concept, non-procedural query languages and their identity and state m none of the object-oriented data
effective optimization into efficient and parallel execu- models specifies physical issues such as object cluster-
tion plans, and physical data independence. In the last ing or caching in workstation-server architectures,
two decades, formal specifications and efficient imple- although these issues clearly can have a significant
mentations have been developed for each of these three performance effect. These issues are captured in the
concepts. In the present survey, we focus on physical physical database design, which is a separate step fol-
data independence. lowing logical database design in any effort of intro-
ducing a database management system.