Professional Documents
Culture Documents
Ed. A. Makinouchi @world Scientific Publishing Co.: Database Systems For Advanced Applications 91
Ed. A. Makinouchi @world Scientific Publishing Co.: Database Systems For Advanced Applications 91
109
--
PC++ provides some major features that are not supported by class PERSISTENT (
relational database management systems such as support for public:
C++ object modeling, long transaction, and object versioning. PERSISTFNT(int object-size,
Section of this paper describes the PC++ system in detail. Sec- cluster c = DEFAULT-CLUSTER);
tion 3 relates PC++ to other works, and discusses future exten- virtual -PERSISTENT();
sions of several PC++ facilities. Section 4 states some protected:
conclusions. PERSISTENT *modify();
<other protected functions>
2.0 PC++ Architecture and Functionalities private:
PERSISTENT-ID id;
Figure 1 depicts the PC++ system architecture. PC++ employs a cother private data>
distributed client-server architecture that consists of two compo-
1;
nents: the application workspace manager and the database
server. PC++ provides a C++ programming interface through The pre-defined class PERSISTENT-ID defines the following
which applications use PC++ facilities. Section 2.1 describes the properties for object identifiers:
PC++ programming model. Section 2.2 describes the application
workspace manager. Section 2.3 describes the database server. class PERSISTENTJD (
Section 2.4 describes the optimistic concurrency control and public:
crash recovery mechanisms. Section 2.5 reports some perfor- PERSISTENT *operator-r();
mance characteristics of PC++. protected:
<protected functions for manipulating ids>
private:
<declarations for page and object numbers>
110
class node-id : public PERSISTENTJD ( node *nl = new node(...);
public: node *n2 = new node(...);
node(<args>); n I->set_leftmost_child(n2);
/* pointer-to-identifier conversion function */
node-id(node *); Note that the function ser~l&mosr~chiZd accepts a node pointer
/* node id-to-node mapping operator */ (n2) as its argument because the constructor node id(node *) of
node *operator->0 the class node-id automatically converts a node pointer to a node
( return ((node *) PERSISTENT-ID::->()); ) identifier. PC++ provides a C-++ macro implementing the con-
1; version function to prevent programming error and to relieve
implementation burden on the programmet.
PC++ supports only persistent heap objects; an application can-
not create static or local persistent objects. To create a persistent The overloaded operator -> defined in each persistent object
object, an application uses the C++ new operator. For example, identifier class allows the natural use of C++ syntax in accessing
to create a node: a persistent object’s members through the persistent object’s
identifier. For example, the following C++ expression invokes
node *n = new node(<args>); the get-value member function of node nl ‘s leftmost child
which is an object identifier of type node-id:
In C++, the order of constructors execution for a class hierarchy
is from ancestors to descendants. Thus, the constructor of the n 1->getJeftmost-child()->get-value0
class PERSISTENT executes before the constructor of the class
node executes. The PERSISTENT constructor allocates space The overloaded operator -> of the class PERSISTENT imple-
for a new node object, and assigns a unique identifier to the new ments the identifier-to-object mapping. Each persistent class
node object. The node constructor passes the size of the node needs to provide only the proper casting of the mapping result.
information through the object-size formal argument of the For example, the overloaded operator -> of the class node
PERSISTENT constructor. The node constructor also optionally invokes the overloaded operator -> of the class PERSISTENT
specifies a cluster where the new node object belongs through and then casts the result to &come a node pointer.
the c formal argument of the PERSISTENT constructor. Clusters
are referred to by cluster identifiers (enumeration constants) that The pre-defined class WORKSPACE supplies functions for
are defined in a C++ header file. checkpointing, transaction processing, and initialization. The
WORKSPACE class supplies the following interface:
When modifying an object, an application must invoke the mod-
zfi function which marks the object as modified so that during class WORKSPACE (
transaction processing the application workspace manager can public:
identify which objects need to be sent to the server for updating. status checkin();
For example, the set leftmost-child member function of class status abort();
node invokes the modify function before modifying the lmc field. status checkpoint(char *checkpoint-file);
status update(int version-number = LATEST);
To delete a persistent object, an application uses the C++ delete status assign-root(PERSISTENTJD root-id);
operator. The delete operator invokes the PERSISTENT virtual PERSISTENT *get-root();
destructor after the descendants’ virtual destructors. The PER- status initialize(char *bootstrap-file);
SISTENT virtual destructor immediately reclaims the space and 1;
the object identifier of the persistent object being deleted. PC++
employs the C++ heap space management model, and assumes PC++ supports long transaction by providing a checkpoint and a
that the application programmer is responsible for assuring that checkin function. The checkpoint function saves all new and
there is no dangling persistent pointer. The PERSISTENT con- modified objects to a user-specifiable file in case there is a sys-
structor and destructor for persistent C++ objects have the same tem failure. The checkin function requests the database server to
semantics as those of the C++ malloc and free library functions, perform a transaction commit, and sends all new and modified
respectively, for non-persistent objects. Thus, PC++ does not objects to the server for transaction processing. PC++ starts a
employ a garbage collection mechanism (as there is no garbage transaction automatically after a transaction commit or abort. To
collection facility in C++). abort a transaction, an application invokes the abort function
which resets the workspace.
Pointers to persistent object are automatically converted to per-
sistent object identifiers by conversion functions. Thus, persis- A PC++ database is an extension to a C++ application’s heap
tent object pointers are used syntactically the same as non- space. The identifier-to-object mapping function provides
persistent object pointers. For example, the following C++ state- implicit indexing on object identifiers which is critical for navi-
ments set node n2 as node nl’s leftmost child: gation-based applications. Currently, PC++ does not provide any
explicit indexing capability for accessing objects based on
attributes or names. PC++ assumes that all persistent objects are
reachable from a roof object. The WORKSPACE class provides
111
the function assign root for assigning an object as the root find the object through the object index table on the page. The
object, and the function get-root for accessing the root object operator -> returns a pointer to the object.
from which all objects can then be accessed through their iden-
tifiers. The persistent heap space consists of pages grouped in clusters.
When allocating an object in a specified cluster, the cluster man-
The WORKSPACE class provides an initialization function for ager finds an existing page with enough free space for the object.
starting up a C++ application. The initialization function accepts If there is not enough free space for the object in the specified
a UNIX path to a database bootstrap file. The database bootstrap cluster, the cluster manager allocates a new page for the cluster.
file contains the location of the followings: the database server, The cluster manager then allocates space to the object on the
the database files, the checkpoint file, and the local persistent page, and assigns to the object an identifier whose page number
heap swap file. Both the application workspace manager and the is that of the page and whose object number is the next free index
database server use the bootstrap file for starting up. in the page’s object index table.
2.2 The Application Workspace Manager The variable-size page index table is a hash table whose entries
are pairs of page numbers and pointers to pages in the heap
An application integrates with the PC++ application workspace space. The page index table grows as the cluster manager allo-
manager and runs as a database client process. The workspace cates new pages.
manager implements the PC++ programming interface, manages
local persistent objects, and interfaces to the database server The application workspace manager provides and manages addi-
through a stream socket interprocess communication channel on tional swap space for persistent objects because the application’s
behalf of the application. Figure 2 depicts the structure of the process swap space may exhaust before the application commits
application workspace manager. a long transaction. The application workspace manager employs
a least recently used (LRU) algorithm to swap pages. the algo-
t object mapping
rithm freely discards pages containing only old unmodified
objects since these pages can always be refetched from the data-
base. For pages containing modified or new objects, the algo-
rithm swaps them in from and out to the persistent swap file.
, t checkin/lctch
112
pages are logically liked together to facilitate new version cre- produce persistent C++ objects with average size of 30 bytes.
ation (during checkin) and page reconstruction (during fetch-
ing). Each SMARTsystem benchmark consists of invoking a series of
SMARTsystem tool commands simulating code development
The server employs a reverse delta reconstruction algorithm to and maintenance activities (text navigation, editing, parsing,
reconstruct any version of a database. When a client requests a symbols creation and lookup, and call graph browsing) on a pro-
page of a previous database version, the server applies the ver- duction C program. The benchmark showed that PC++ takes 15
sion objects of a page to the base page. Since multiple clients microseconds to map an object identifier to the object on a Sun
may request the same page of different versions, the server 3/60 machine; this translates to roughly 66,000 object traversals
caches both base and version pages. The server uses an LRU per seconds (providing that they’re already in memory). The
algorithm for replacing pages in the page cache. benchmark also showed that PC++ takes 100 microseconds to
create a new object (roughly 10,000 new objects per second).
2.4 Concurrency Control and Crash Recovery
The SMARTsystem performs well when the working set
PC++ provides optimistic concurrency- concurrent object cre- IDen of a benchmark fits the available main memory. For
ation, access, and modifications by multiple clients- through example, navigating and editing with the SMARTsystem syntax-
mechanisms for unique object identifier generation, and work- sensitive editor is as fast as using Emacs[StaSl]. However, some
space version updating and conflict resolution. By providing incremental global symbol generation appears sluggish because
optimistic concurrency, PC++ does not need to provide a locking it traverse more objects than the available memory can hold. Pro-
facility. filing the later case showed that the system spent 90% of its time
waiting in the UNIX read system call [Rit74], and the remaining
The application workspace manager generates unique identifiers 10% of its time in executing tool functions, other system calls,
by requesting the database server to dynamically allocate unique or the identifier-to-object mapping function.
sets of contiguous identifiers on demand. When a client creates
a new object and exhausted the previously allocated set of Heuristically grouping objects (for example, objects belonging
unique identifiers, the client sends a message to the server to the parse tree, or the symbol table) using the static clustering
requesting for a new set of unique identifiers. The server returns mechanism improved SMARTsystem performance from 10% to
a set of unique page numbers that the client can freely use to allo- 20%. The improvement comes from increased locality, and less
cate new objects. internal fragmentation because objects of the same type (and
thus same size) are grouped together. On the average, 6% of a
PC++ enables a client to validate the effects and resolve conflicts page’s space is overhead (header information and object index
(if there are any) resulting from other clients’ checkins by pro- table), and less than 2% of a page’s space is unoccupied by
viding an update workspace function where all the old objects at objects.
version V in the client’s workspace are replaced by their latest
version. When a client successfully checkins a new database ver- 3.0 Related and Future Works
sion, other clients are neither notified nor updated immediately.
Consequently, a client who starts out at version V of the database The PC++ programming model is similar to that of PS-Algol
may not always able to checkin on top of version V. And there- tAtk831. Both systems view the persistent heap space as part of
fore, the client needs to use the update workspace function to virtual memory, and assume that persistent objects are reachable
validate the semantic correctness of the new database version from a root object. PC++ applications, like the E system [Ric89]
before the client commits its checkin. applications, use C++ to define both persistent types and their
methods. Thus, PC++ and E smoothly integrate their object-ori-
PC++ automatically detects failures and recover through a roll- ented database systems with the C++ programming language.
back mechanism. If there is a failure during checkin, the server Other systems such as the Vbase system [And871 provide one
rollbacks to the latest consistent version before the checkin language for defining types and a host language (some superset
started. Rollback consists looking for base pages with the new of the C programming language) for definirig methods. Thus,
version number, and reconstructing them back to the previous Vbase provide more flexibility for supporting other host lan-
consistent version. Checkin processing facilitates rollback by guages.
first constructing the version object (for the previous version) of
a page and writing the version object out to disk before overwrit- Some SMARTsystem tools produce intermediate objects in
ing the base page with the new version. Thus, the server can computing some final results. After obtaining the final results,
always reconstruct the previous version of a base page. here is no need to save the intermediate objects to the database.
By making persistency independent from type, PC++ can allow
2.5 Performance Characteristics applications to specify whether an object of a class deriving from
the class pERSISTENT is persistent. Specification of persis-
PC++ serves as the foundation for the SMARTsystem. PC++ tency can be done at object creation time through an argument of
performance is characterized by numerous SMARTsystems the new operator of the class PERSISTENT [SQW. PS-Algo*
benchmarks that produce databases ranging in sizes from hun- and the ODE system [Arg89] provide strict orthogonal persis-
dreds of kilobytes to hundreds of megabytes. These benchmarks tence.
113
Like in Avalon/C++ [Det88], in PC++ persistence types are [And871 T. Andrews, and C. Harris, “Combining Language and
defined by inheriting from a pre-defined type that supplies per- Database Advances in an Object-oriented Development
sistent properties. PC++ does not employ compilation technol- Environment”, Proceedings of the Conference on Object-
ogy, such as employed by the E system, to support persistency. Oriented Programming Systems, Languages, and Applica-
tions, Orlando, FL, October 1987.
An object identifier in PC++ is a physical address. Thus, an
object’s physical location (the page location) can be computed [Atk83] M. P. Atkinson, I? J. Bailey, W. P. Cockshott, K. J.
from its identifier without using a lookup table. Consequently, Chisholm, and R. Morrison, “An Approach to Persistent Pro-
the identifier-to-object mapping mechanism is simple and highly gramming”, Computer Journal, 26(4), 360-365, 1983.
efficient. The object size limitation does not hamper the con-
struction of the SMARTsystem in any way. However, applica- men681 P. Denning, “The Working Set Model for Program
tions in image processing, graphics, or multimedia may require Behavior”, Communications of the ACM, 11(5), 323-333,
support for large objects with arbitrary sizes. PC++ can support May 1968.
arbitrarily-sized objects by dividing the object name space into
two name spaces: one for small objects (less than 4K bytes) and pet881 D. D. Detlefs, M. I? Herlihy, and J. M. Wing, “Inherit-
one for larger objects. ance of Synchronization and Recovery Properties in Avalon/
C++“, IEEE Computer, 21(12), 57-69, December 1988.
A more flexible object clustering mechanism should include
dynamic clustering. With dynamic clustering, the application [Fis88] C. N. Fischer, R. J. LeBlanc, Jr., Crafting a Compiler,
workspace manager creates new clusters at run time on demands Benjamin/Cummings, 1988.
by the application. The existing unique object identifier genera-
tion mechanism can be generalized for generating dynamic clus- @as821 R. Haskings, and R. Lorie, “On Extending Functions of
ter identifiers. With both static and dynamic clustering, PC++ a Relational Database System”, Proceedings of the 1982
can support all object clustering options as described in ACM-SIGMOD International Conference on the Manage-
[Kim90]. ment of Data, Orlando, FL, June 1982.
114
Sun 3/60 is a trademark of Sun Microsystems, Inc.
DEC is a trademark of Digital Equipment Corporation
Apollo is a trademark of Hewlett-Packard Corporation
UNIX is a trademark of AT&T Bell Laboratories
SMARTsystem is a trademark of proCASE Corporation
115
.i -. - -.- -. --