Professional Documents
Culture Documents
Jason McHugh, Serge Abiteboul, Roy Goldman, Dallan Quass, Jennifer Widom
Stanford University
fmchughj,abitebou,royg,quass,widomg@db.stanford.edu
http://www-db.stanford.edu/lore
&1
Member Project
Member Project
Member
Member
&2 &3 &4 Project &5 &6
&7 &8 &9 &10 &11 &12 &13 &14 &15 &16
"Clark" "Smith" 46 "Gates 252" "Jones" 28 "Lore" "Tsimmis"
API
Results Lore
System
Queries Query Compilation
QUERY erful bulk loading facility that allows merging new data into
select M.Name, an existing database. There is also a means of attaching
( select M.Project.Title variables to certain objects on paths, or even to the labels
where M.Project.Title != "Lore" ) or paths themselves (in the style of the attribute and path
from DBGroup.Member M variables of [CACS94]), which yields a rich mechanism for
where M.Project.Title = "Lore" structure discovery. Such features, described in [AQM+ 96],
RESULT
are beyond the scope of this paper.
Member
Name "Jones" 3 System Architecture
Title "Tsimmis"
Over a larger database, this query would construct one The basic architecture of the Lore system is depicted in Fig-
Member object for each group member in the result, con- ure 2. This section gives a brief introduction to the com-
taining the member's Name and a Title for each qualifying ponents that make up Lore. More detailed discussions of
project. individual components appear in subsequent sections.
A Lore database is modied using Lorel's declarative up- Access to the Lore system is through a variety of applica-
date language, as in the following example: tions or directly via the Lore Application Program Interface
(API). There is a simple textual interface, primarily used
by the system developers, but suitable for learning system
update P.Member += functionality and exploring small databases. The graphical
( select DBGroup.Member interface, the primary interface for end users, provides pow-
where DBGroup.Member.Name = "Clark" ) erful tools for browsing query results, a DataGuide feature
from DBGroup.Project P for seeing the structure of the data and formulating sim-
where P.Title = "Lore" or ple queries \by example," a way of saving frequently asked
P.Title = "Tsimmis" queries, and mechanisms for viewing the multimedia atomic
types such as video, audio, and java. These two interface
This update adds all group members named Clark as modules, along with other applications, communicate with
members of the Lore and Tsimmis projects. Intuitively, the Lore through the API. Details of interfaces are discussed in
from and where clauses are rst evaluated, providing bind- Section 6.
ings for P . For each binding, the expression \P.Member +=" The Query Compilation layer of the Lore system consists
species to add Member edges between P and every object of the parser, preprocessor, query plan generator, and query
returned by the subquery. In general, the update language optimizer. The parser accepts a textual representation of a
supports the insertion and removal of edges, the creation of query, transforms it into a parse tree, and then passes the
new vertices (objects), and the modication of atomic values parse tree to the preprocessor. The preprocessor handles the
and name assignments. (As mentioned earlier, object dele- transformation of the Lorel query into an OQL-like query
tion is by unreachability, i.e., garbage collection, so there is (recall Section 2.2). A query plan is generated from the
no explicit delete operation.) transformed query and then passed to the query optimizer.
Lorel also oers grouping and aggregate functions in the In addition to doing some (currently simple) transformations
style of OQL, external functions and predicates, and a pow- on the query plan, the optimizer also decides whether the
use of indexes is feasible. The optimized query plan is then processing, as described in, e.g., [Gra93]. With iterators,
sent to the Data Engine layer. execution begins at the top of the query plan, with each node
The Data Engine layer houses the OEM object manager, in the plan requesting a tuple at a time from its children and
query operators, external data manager, and various utili- performing some operation on the tuple(s). After a node
ties. The query operators execute the generated query plans completes its operation, it passes a resulting tuple up to its
and are explained in Section 4. The object manager func- parent. For many operators, an iterator approach avoids
tions as the translation layer between OEM and the low- creation of temporary relations.
level le constructs. It supports basic primitives such as The \tuples" we operate on are Object Assignments, or
fetching an object, comparing two objects, performing sim- OAs. An OA is a simple data structure containing slots cor-
ple coercion, and iterating over the subobjects of a complex responding to range variables in the query, along with some
object. In addition, some performance features, such as a additional slots depending on the form of the query. For
cache of frequently accessed objects, are implemented in this example, the OA slots for the example query are shown in
component. The index manager, external data manager, Figure 4. Intuitively, each slot within an OA will hold the
and DataGuide manager are discussed in Sections 4.3, 5.1, oid of a vertex on a data path currently being considered
and 5.2 respectively. Finally, bulk loading and physical ob- by the query engine. For example, if OA1 holds the oid for
ject layout on disk are discussed in Section 4.5. member \Smith", then OA2 and OA3 can hold the oids for
one of Smith's Oce subobjects and one of his Age subob-
jects, respectively. Note that at a given point during query
4 Query and Update Processing in Lore processing, not all slots of the current OA necessarily con-
tain a valid oid. Indeed, the goal of query execution is to
As depicted in Figure 2, the basic steps that Lore follows build complete OAs. Once a valid OA reaches the top of the
when answering a query are: (1) the query is parsed; (2) the query plan, oids in appropriate slots are used to construct a
parse tree is preprocessed and translated into an OQL-like component of the query result.
query; (3) a query plan is constructed; (4) query optimiza-
tion occurs; and (5) the optimized query plan is executed. 4.2 Query Operators
Query processing in Lorel is fairly conventional, with some
notable exceptions: We now brie
y explain the query operators appearing as
Because of the
exibility of Lorel, the preprocessing of
nodes in Figure 3; query operators not appearing in this
the parse tree to produce the OQL-like query is com- plan are discussed later. Each operator takes a number of
plex. We+have implemented the specication described arguments, with the last argument being the OA slot that
will contain the result of the operation. Exceptions to this
in [AQM 96] and we will not discuss the issue further are the Select and Project operators, which do not have a
here. target slot.
Although the Lore engine is built around standard op- The Scan operator, which is used in several leaf nodes,
erators (such as Scan and Join), some take an original is similar in functionality to a relational scan. Here, how-
avor. For example, Scan may take as argument a gen- ever, instead of scanning the set of tuples in a relation, our
eral path expression, and therefore may entail complex scan returns all oids that are subobjects of a given object,
searches in the database graph. following a specied path expression. The Scan operator is
dened as:
A unique feature of Lore is its automatic coercion of
atomic values. Coercion has an impact on the imple-
mentation of comparators (e.g., = or <), but more Scan (StartingOASlot, Path_expression,
TargetOASlot)
importantly we shall see that it has important eects
on indexing. Scan starts from the oid stored in the StartingOASlot, and
The result of a Lorel query is always a set of OEM ob- at each iteration places into the TargetOASlot the oid of
jects, which become subobjects of a newly created Result the next subobject that satises the Path expression, until
object. The Result object is returned through the API. The there are no more matching subobjects. Note that in most
application may then use routines provided by the API to cases Path expression consists of a single label, however it
traverse the result subobjects and display them in a suitable may be a complex data structure representing an arbitrary
fashion to the user. component of a general path expression (recall Section 2.2),
essentially a regular expression. For the regular expressions
To illustrate the sequence of steps that Lore follows when that we currently support, it is sucient for the Scan op-
answering a query, we will trace an example through query erator to keep a run-time stack of objects visited in order
planning and then discuss the operators used to execute the to match the Path expression. However, for general regu-
query plan. Consider the query introduced in Section 2, lar expressions a nite-state automaton is required. Recall
whose OQL-like version is: that to avoid innite numbers of matching paths, we match
acyclic paths in the data only. Currently, the Scan operator
select O can avoid traversing a cycle by ensuring that no oid appears
from DBGroup.Member M, M.Office O more than once on its stack. Since the stack grows no larger
where exists A in M.Age : A > 30 than acyclic paths in the database, we do not expect its size
to be a problem.
The initial query plan generated for this query is given in As a simple example of the Scan operator, consider the
Figure 3. Before discussing the various operators in this following node from our example plan:
plan, it is necessary to rst understand the
ow of control
and the auxiliary data structures used when executing such Scan (OA1, "Office", OA2)
a plan.
This iterator will place into slot OA2, one at a time, all
4.1 Iterators and Object Assignments Oce subobjects of the object appearing in slot OA1. Note
the special form for the lower left Scan:
Our query execution strategy is based on familiar database
operators. We use a recursive iterator approach in query Scan (Root, "DBGroup", OA0)
Project
(OA2)
Join
Select
Join (OA4= TRUE)
Scan Aggr
Join (OA1,"Office",OA2) (Exists, OA3, OA4)
Scan
(OA1,"Age",OA3)
Instead of using an OA slot as the rst argument, the value oid for the new set of objects within the target slot in the
Root, which is a system-known object from which all names OA. Finally, the Groupby operator handles (sub)queries that
(such as DBGroup) can be reached, is used. include a groupby expression.
The Join, Project, and Select nodes are nearly identical To give a more in-depth
avor of query plan construction,
to their corresponding relational operators. Like a relational we consider a second query. This query asks for the names
nested-loop join, the Join node coordinates its left and right and the number of publications for each database group3
children. For each partially completed OA that the left child member who is in the Computer Science (\CS") department.
returns, the right child is called exhaustively until no more
new OAs are possible. Then the left child is instructed to
retrieve its next (partial) OA. The iteration continues until select M.Name, count(M.Publication)
the left side produces no more OAs. The Project node is used from DBGroup.Member M
to limit which objects should be returned by specifying a set where M.Dept = "CS"
of OA slots, while the Select node applies a predicate to the
object identied by the oid in the OA slot specied. It is important to note that both M.Name and M.Publication
The Aggregation node (shown in Figure 3 on the right appearing in the select clause are sets of objects, and in
side of the query plan as Aggr) is used in a somewhat novel the general case are represented by subqueries. Thus, the
way, since it implements quantication as well as aggrega- OQL-like translation of this query is:
tion. At a high level, the aggregation node calls its child
exhaustively, storing the results temporarily or computing select (select N from M.Name N),
the aggregate incrementally. When the child can produce count(select P
no more valid OAs, a new object is created whose value is from M.Publication P)
the nal aggregation; this new object is identied within the from DBGroup.Member M
target OA slot. In the example shown, the aggregation node where exists D in M.Dept : D = "CS"
adds to the target slot (OA4) the result of the aggregation,
which here is the value true if the existential quantication To see the construction of the query plan, refer to Figure 5.
is satised (an object exists in OA3) and false otherwise. The subtree for the from clause is constructed rst. Each
Filtering of OAs whose quantication is true occurs in the simple path expression (or range variable) appearing within
Select node immediately above the aggregation node. Note the from becomes a Scan node. If several of these exist,
that the exists aggregation operator \short circuits" when it then a left-deep tree of Scan nodes with Join nodes con-
nds the rst satisfying OA, while other aggregation opera- necting them is constructed. At the top of the from subtree
tors may need to look at all OAs. a Join node connects the from clause with the subtree for
There are four other primary query operators in Lore, the where clause. For where, each exists becomes a Select,
in addition to operators for plans that use indexes (see Sec- Aggr, and Scan node, and each predicate becomes a Select
tion 4.3): SetOp, ArithOp, CreateSet, and Groupby. SetOp node. Finally, for the select clause, another Join node is
handles the Lorel set operations Union, Intersect, and Ex- added to the top of the tree, and the query plan subtree for
cept. Likewise, ArithOp handles arithmetic operations such the select clause becomes the right child.
as addition, multiplication, etc. CreateSet is used to pack- Let us further consider the subtree for the select clause.
age the results of an arbitrary subquery before proceeding; The plans for the two expressions constituting the select
it calls its child exhaustively, storing each oid returned as clause are combined via union (using the SetOp operator).
part of a newly created complex object. After the child has
produced all possible OAs, the CreateSet operator stores the 3 Several of our group members are in the Electrical Engineering
department.
Join
Join Select
Join (OA3
= TRUE)
Scan Scan
(Root,"DBGroup",OA0) (OA0,"Member",OA1) Aggr
Scan Scan (Exists, OA2, OA3)
(Root,"DBGroup",OA0) (OA0,"Member",OA1)
From clause Select
From and Where clauses (OA2 = "CS")
Scan
(OA1,"Dept",OA2)
Project
(OA7)
SetOp
Join (Union,OA5,
Select OA6, OA7)
Join (OA3
= TRUE)
CreateSet Aggr
Aggr (OA4, OA5) (Count, OA6, OA7)
Scan Scan (Exists, OA2, OA3)
(Root,"DBGroup",OA0) (OA0,"Member",OA1) Scan
Scan
Select (OA1,"Publications",
(OA1,"Name",OA4)
(OA2 = "CS") OA6)
Scan
(OA1,"Dept",OA2)
Thus, each (complex) object in the result contains the set arg2
of all Name subobjects of a Member (the left subtree of the arg1 string real int
Union), together with the count of all publications for that string ? string ! real both ! real
member. (In Lorel, a select list indicates union, while or- real string ! real ? int ! real
dered pairs would be achieved using a tuple constructor op- int both ! real int ! real ?
erator [AQM+ 96].) The CreateSet operator, described ear-
lier, is needed to obtain all Name children of a given member
before returning its object assignment up the query tree. A Table 1: Coercion for basic comparison operators
CreateSet operator is not used in the right subtree, however,
since the Aggregation operator by denition already calls its
subquery to exhaustion (and then applies the aggregation are useful for range (inequality) as well as point (equality)
operator, in this case count) before continuing. queries, they are implemented as B+-trees. Lindexes, on
the other hand, are used for single object lookups and thus
4.3 Query Optimization and Indexing are implemented using linear hashing [Lit80].
Used in conjunction, these two kinds of indexes enable
The Lore query processor currently implements only a few query processing in Lore to avoid the standard Scan opera-
simple heuristic query optimization techniques. For exam- tor. Before examining query plans that exploit indexes, we
ple, we do push selection operators down the query tree, and rst take a more detailed look at Vindexes and how they
in some cases we eliminate or combine redundant operators. handle the coercion present in Lorel.
In the future, we plan to consider additional heuristic op-
timizations, as well as the possibility of truly exploring the
search space of feasible plans. 4.3.1 Value Indexes
Despite the lack of sophisticated query optimization, Lore Value indexing in Lore requires some novel features due to
does explore query plans that use indexes when feasible. In its non-strict typing system. When comparing two values
a traditional relational DBMS, an index is created on an of dierent types, Lore always attempts to coerce the val-
attribute in order to locate tuples with particular attribute ues into comparable types. Currently, our indexing system
values quickly. In Lore, such a value index alone is not suf- deals with coercions involving integers, reals, and strings
cient, since the path to an object is as important as the only. Table 1 illustrates the coercion that Lore performs for
value of the object. Thus, we have two kinds of indexes in these types; note that we simplify the situation by always
Lore: a link (edge) index, or Lindex, and a value index, or coercing integers to reals. Now, in order to use Vindexes
Vindex. A Lindex takes an oid and a label, and returns the for comparisons, Lore must maintain three dierent kinds
oids of all parents via the specied label. (If the label is of Vindexes:
omitted all parents are returned.) The Lindex essentially
provides \parent pointers," since they are not supported by 1. A String Vindex, which contains index entries for all
Lore's object manager. A Vindex takes a label, operator, string-based atomic values (string, HTML, URL, etc.).
and value. It returns all atomic objects having an incom-
ing edge with the specied label and a value satisfying the 2. A Real Vindex, which contains index entries for all
specied operator and value (e.g., < 5). Because Vindexes numeric-based atomic values (integer and real).
Project
(OA3)
Join
Scan
Join
(OA1,"Office",OA3)
Vindex Join
("Age", >, 30, OA2)
Once Named_Obj
(OA1) ("DBGroup", OA0)
Lindex Lindex
(OA2,"Age",OA1) (OA1,"Member",OA0)
For each label over which a Vindex is created, three separate A query plan using indexes is shown in Figure 6. This plan
B+-trees, one for each type, are constructed. introduces four new query operators: Vindex, Lindex, Once,
When using a Vindex for a comparison (e.g., nd all Age and Named Obj. The Vindex operator, which appears as
objects > 30), there are two cases to consider, based upon the left child of the second Join operator, iteratively nds
the type of comparison value: all atomic objects with value less than 30 and an incoming
edge labeled Age, placing their oids in slot OA2. The Lindex
1. If the value is of type string, then: (i) do a lookup in operator that appears below the Once operator iteratively
places into OA1 all parents of the object in OA2 via an Age
the String Vindex; (ii) if the value can be coerced to edge. (Since OEM data may have arbitrary graph structure,
a real, then also do a lookup for the coerced value in the object could potentially have several parents via Age, as
the Real Vindex. well as parents via other labels.) Since Age is existentially
2. If the value is of type real (or integer), then: (i) do a quantied in the query, we only want to consider each par-
lookup in the Real Vindex; (ii) also do a lookup in the ent once, even if it has several Age subobjects; this is the
String-coerced-to-real Vindex. purpose of the Once query operator. The second Lindex
operator nds all parents of the OA1 object via a Member
edge, placing them in OA0. Since we want the object in
4.3.2 Index Query Plans OA0 to be the named object DBGroup, the Named Obj op-
erator checks whether this is so. Once we have traversed
If the user's query contains a comparison between a path up the database using index calls and constructed a valid
expression and an integer, real, or string (e.g., \DBGroup. OA, we nally use a Scan operator to nd all Office sub-
Member.Age > 30"), and the appropriate Vindexes and Lin-
objects, which are returned as the result via the topmost
dexes exist, then a query plan that uses indexes will be gen- Project operator.
erated. For simplicity, let us consider only queries in which Currently, for processing where clauses, Lore only consid-
the where clause consists of one such comparison. ers subplans that are completely index-based (i.e., bottom-
Query plans using indexes are dierent in shape from up), such as the one discussed here, or subplans that are
those based on Scan operators. Intuitively, index plans tra- completely Scan-based (i.e., top-down), such as the one in
verse the database bottom-up, while scan-based plans per- Figure 3. An interesting research topic that we have just be-
form a top-down traversal. An index query plan rst locates gun to address is how to combine both bottom-up (index)
all objects with desired values and appropriately labeled in- and top-down (Scan) traversals. When the two traversals
coming edges via the Vindex. A sequence of Lindex oper- reach a predened \meeting point", the intersection of the
ations then traverses up from these objects attempting to objects discovered by the index calls and the Scan operators
match the full path expression in the comparison.4 Note identify paths that satisfy the where clause. The appropri-
that once we have an OA that satises the where clause, ate meeting point depends on the \fan-in" and \fan-out" of
it may be necessary to use one or more Scan operations to the vertices and labels in the database, and requires the use
nd those components of the select expression that do not of statistical information.
appear in the where clause.
Let us consider the following query (in its OQL-like form), 4.4 Update Query Plans
rst introduced in Section 2:
Thanks to query plan modularity, we were able to handle
arbitrary Lorel update statements by adding a single opera-
tor, Update, to the query execution engine. We illustrate the
4 An obvious alternative is to use full path indexes in place of the
approach with our example update query from Section 2.2:
Lindex. Path indexes would be (much) more expensive to maintain
but (much) faster at query time. Path indexes are discussed in more
detail in [GW97].
Update
(Create_Edge, OA1,
OA5, "Member")
Query plan to find all projects with Query plan to find all members
the title "Lore" or "Tsimmis", with name "Clark", results
results placed in OA1 placed in OA5
Name
Name Publications Publications
Fetched
Subgraph
containing "Pub_Fetch.o" 120
all of Jim's Type Type Query Label
Publications
6 Interfaces to Lore
As shown in Figure 2, the Lore Application Programming
Interface (API) provides a gateway between Lore and any
user interface or client application. It is used, for instance,
by the system's textual interface, which passes user com-
mands to Lore and presents query results in a hierarchical
display. After summarizing the API, we describe a Java-
based Web interface that makes Lore simple to use in an
interactive fashion. Figure 10: A DataGuide in Java
6.1 Application Programming Interface
then presented with a Java program featuring a DataGuide,
The Lore API is composed of a small collection of C++ as described in Section 5.2. Users can quickly and easily
classes. For any client, Lore is simply viewed as a single browse the DataGuide to explore the structure of the under-
library, accessible through the API classes and methods de- lying database. Through the Web interface, the user may
clared in a single header le. (Eventually we hope to move submit a textual Lorel query or select a sample prewrit-
Lore toward a traditional client-server model.) At the high- ten query. Furthermore, in a style similar to Query-By-
est level, the API allows a client program to connect to a Example [Zlo77], queries may be formulated and submitted
Lore database, submit queries and commands, and process without any knowledge of Lorel by using the DataGuide
query results. to select path expressions and specify selection conditions.
Currently, DataGuide queries can express Lorel queries with
Any session with a Lore database is encapsulated in an simple path expressions and a where clause that is conjunc-
instance of the LoreConnection class. A client will rst tive with respect to unique path expressions.
Connect to a specic database (and eventually Disconnect
when nished). Clients submit Lorel queries using the As an example, Figure 10 is a screen snapshot of the
Submit function. Submit is also used for other Lore sys-
Java presentation of a DataGuide. This DataGuide summa-
tem commands, such as index creation and updates. When rizes an existing database for Stanford's Database Group,
called with a Lorel query, Submit returns the query result as similar in structure to (but much larger than) the sample
a LoreOem object. A LoreOem instance initially contains only database used throughout this paper. Arrows accompany
an oid; the actual value is fetched from the database on de- complex objects and are used to expand or collapse sub-
mand. For atomic objects, a client may request the Type and objects. Also, a diamond is associated with each displayed
Value of the object. To traverse the subobjects of a complex
label, corresponding to a unique path expression from the
object, a client instantiates a LoreIterator. Each succes- root. When the user clicks on a diamond, a dialog box
sive call to the iterator's Next method returns a dierent pops up, from which the user may view sample values, se-
LoreOem subobject and its Label. By nesting LoreIterator
lect the path expression for the query result, or add ltering
instances, a client may perform arbitrary traversals of OEM conditions. When the user selects a path expression, the
objects. corresponding diamond is rendered in a dierent color. Fil-
tering conditions are displayed next to the corresponding la-
bel. The DataGuide shown in Figure 10 represents a query
6.2 Web Interface to select all group members that are PhD students, have a
research interest in semistructured data, and have been at
A user connects to our graphical Web interface by visit- Stanford more than one year but less than six. When the
ing a specic URL and choosing a database. The user is user clicks Go, the Java program automatically generates an
equivalent Lorel query and sends it to Lore to be processed.
Regardless of how a query is submitted, the interface dis- (child) pointers. We plan to compare the performance of a
plays query results in HTML, in a hierarchical format that storage manager with inverse pointers to that of our current
is easy to read and navigate. By formatting OEM objects in approach based on Lindexes. We also plan to consider us-
HTML, we can leverage Web browser support for our multi- ing path indexes in place of the Lindex. Interestingly, the
media data (such as gif les, audio, or video). To make the functionality of path indexes is incorporated easily into the
hierarchical display of OEM more readable, we perform two DataGuide, as discussed in [GW97].
small presentation transformations. First, if several objects Currently all \expansions" of path expressions in query
share the same label, we display the label only once and paths are done at run-time. However, for some classes of
show the values of the objects underneath it. For example, path expressions, it is possible to use information in the
if a query result contains ten objects, each with the same DataGuide to expand the regular expressions to all pos-
label Project, we create an HTML page that begins with sible completions at query compilation time. We plan to
a single header Projects, followed by the values for all ten explore the compile-time approach and compare its perfor-
projects. Second, we present complex OEM objects as ac- mance against the run-time approach we now take.
tive hyperlinks. Clicking on the link brings up a new HTML
page showing the subobjects of that complex object.
7.3 New Functionality
7 System Status and Future Work We are in the process of implementing transaction support
for concurrency control and recovery. As with other aspects
As of June 1997, the Lore system is functional and robust for of Lore, the semistructured nature of Lore's data is requiring
a large subset of the Lorel language. It consists of approxi- us to rethink some aspects of traditional solutions.
mately 60,000 lines of C++ code. Some language features, In the user interface area, we plan to increase the expres-
such as external predicates and functions, are still under siveness of DataGuide queries toward the full power of Lorel.
implementation. Also, general path expressions are not yet In addition, to follow the recent trend of enabling database
implemented in their full generality, although a substantial systems to dynamically generate customized HTML displays
and very useful subset is. of query results [Gaf97, BDK92], we plan to investigate more
A Lore server with sample databases is available for pub- sophisticated techniques for customizing the presentation of
lic use. Users can submit queries and can experiment with OEM objects in a Web environment.
features such as DataGuides and result browsing. To visit In a companion project, we have extended OEM and
our on-line demo, see http://www-db.stanford.edu/lore. Lorel in order to treat changes to the data as a rst-class
In addition, Lore system binaries for several platforms are concept [CAW97], similar to the Heraclitus system that op-
available through the Web page. erates on structured data [GHJ96]. Currently we are imple-
We are considering many possible enhancements and ex- menting this model and language on top of the Lore system.
tensions to Lore, as follows. Initial work is underway to dene both view and trigger
mechanisms appropriate for semistructured data, and to im-
7.1 Compatibility and Interoperability plement them in Lore. (See [AGM+ 97] for a discussion of
views in the context of OEM and Lorel.) Finally, because
As mentioned in Section 2 and covered in detail in [AQM+ 96], many applications appropriate for a semistructured DBMS
OEM and Lorel can be translated to ODMG and OQL such as Lore include a signicant amount of text data, we
[Cat94]. In the translation, OEM objects are represented plan to incorporate a special text type along with a full-text
by ODMG objects, while Lorel queries are transformed into indexing system into Lore.
pure OQL queries that use method calls to handle Lorel
features such as type coercion and general path expressions. Acknowledgments
As a proof-of-concept for the translation, we have imple-
menting Lorel on top of the O2 object-oriented database
management system [BDK92]. Note that this implemen- For their many contributions to the Lore project and system
tation enables the storage of semistructured (OEM) and implementation we are grateful to (alphabetically) Kevin
structured (ODMG) data in a single repository, providing Haas, Matt Jacobsen, Tirthankar Lahiri, Qingshan Luo,
a useful setting in which we are studying integration of the Svetlozar Nestorov, Anand Rajaraman, Hugo Rivero,
two data models. We also plan to explore how Lorel could Michael Rys, and Takeshi Yokokawa. We also thank many
be translated to SQL3 and thus implemented on top of an other members of the Stanford Database Group for fruitful
object-relational database management system. discussions about Lore and Lorel, including (alphabetically)
Sudarshan Chawathe, Joachim Hammer, Shuky Sagiv, Je
Ullman, Janet Wiener, and Jun Yang. Finally, we are grate-
7.2 Performance Issues ful to an anonymous referee for a careful reading and helpful
comments.
To date we have done little performance analysis of Lore.
There are a number of performance aspects we want to con- References
sider, such as overall performance and bottlenecks in the sys-
tem, scalability of the system to extremely large databases,
and comparing the performance of Lore against our imple- [Abi97] S. Abiteboul. Querying semistructured data. In
mentation of Lorel on top of O2 (see Section 7.1). Proceedings of the International Conference on
Database Theory, Delphi, Greece, January 1997.
There is signicant additional research to do in query +
[AGM 97] S. Abiteboul, R. Goldman, J. McHugh, V. Vassa-
optimization, including query rewriting, operation ordering, los, and Y. Zhuge. Views for semistructured data.
selecting the best use of indexes in query plans, and exploit- In Proceedings of the Workshop on Management
ing information stored in the DataGuide. of Semistructured Data, pages 83{90, Tucson, Ari-
As described in Section 4.3, we can build in Lore a link zona, May 1997.
index (Lindex) in order to quickly nd all parents of a given [AQM+ 96] S. Abiteboul, D. Quass, J. McHugh, J. Widom,
object reachable via a given label. Alternatively, we could and J. Wiener. The Lorel query language for semi-
instead augment our storage manager to store with objects structured data. Journal of Digital Libraries, 1(1),
their inverse (parent) pointers in addition to their subobject November 1996.
[BCK+ 94] G. Blake, M. Consens, P. Kilpelainen, P. Larson, [Lit80] W. Litwin. Linear hashing: a new tool for le and
T. Snider, and F. Tompa. Text/relational data- table addressing. In Proceedings of the Interna-
base management systems: Harmonizing SQL and tional Conference on Very Large Data Bases, pages
SGML. In Proceedings of the First International 212{223, Montreal, Canada, October 1980.
Conference on Applications of Databases, pages [LMR90] W. Litwin, L. Mark, and N. Roussopoulos. Interop-
267{280, Vadstena, Sweden, 1994. erability of multiple autonomous databases. ACM
[BDHS96] P. Buneman, S. Davidson, G. Hillebrand, and Computing Surveys, 22(3):267{293, 1990.
D. Suciu. A query language and optimization tech- [LSS96] L.V.S. Lakshmanan, F. Sadri, and I.N. Subrama-
nieques for unstructured data. In Proceedings of nian. A declarative language for querying and re-
the ACM SIGMOD International Conference on structuringthe Web. In Proceedings of the Sixth In-
Management of Data, pages 505{516, Montreal, ternational Workshop on Research Issues in Data
Canada, June 1996. Engineering (RIDE '96), New Orleans, February
[BDK92] F. Bancilhon, C. Delobel, and P. Kanellakis, edi- 1996.
tors. Building an Object-Oriented Database Sys- [MMM96] A.O. Mendelzon, G. Mihaila, and T. Milo. Querying
tem: The Story of O2 . Morgan Kaufmann, San the world wide web. In Proceedings of the Confer-
Francisco, California, 1992. ence on Parallel and Distributed Information Sys-
[BDS95] P. Buneman, S. Davidson, and D. Suciu. Program- tems (PDIS'96), December 1996. Full version to
ming constructs for unstructured data. In Proceed- appear in the Journal of Digital Libraries.
ings of the 1995 International Workshop on Data- [MS93] J. Melton and A.R. Simon. Understanding the New
base Programming Languages (DBPL), 1995. SQL: A Complete Guide. Morgan Kaufmann, San
[BK94] C. Beeri and Y. Kornatski. A logical query language Francisco, California, 1993.
for hypermedia systems. Information Sciences, 77, [MW93] T. Minohara and R. Watanabe. Queries on struc-
1994. ture in hypertext. In Proceedings of the Conference
[CACS94] V. Christophides, S. Abiteboul, S. Cluet, and on Foundations of Data Organization (FODO '93),
M. Scholl. From structured documents to novel pages 394{411. Springer Verlag, 1993.
query facilities. In Proceedings of the ACM SIG- [MW95] A. O. Mendelzon and P. T. Wood. Finding regular
MOD International Conference on Management of simple paths in graph databases. SIAM Journal of
Data, pages 313{324, Minneapolis, Minnesota, May Computing, 24(6), 1995.
1994. [MW97] J. McHugh and J. Widom. Integratingdynamically-
[Cat94] R.G.G. Cattell. The Object Database Standard: fetched external information into a dbms for semi-
ODMG-93. Morgan Kaufmann, San Francisco, Cal- structured data. In Proceedings of the Workshop on
ifornia, 1994. Management of Semistructured Data, pages 75{82,
Tucson, Arizona, May 1997.
[CAW97] S. Chawathe, S. Abiteboul, and J. Widom. Rep- [O'N87] Patrick O'Neil. Model 204 architecture and per-
resenting and querying changes in semistructured formance. In Proceedings of the 2nd International
data. Technical report, Stanford University Data- Workshop on High Performance Transaction Sys-
base Group, February 1997. tems (HPTS), pages 40{59, Asilomar, CA, 1987.
[CCM96] V. Christophides, S. Cluet, and G. Moerkotte. Eval- [PGGMU95] Y. Papakonstantinou, A. Gupta, H. Garcia-Molina,
uating queries with generalized path expressions. and J. Ullman. A query translation scheme for
In Proceedings of the ACM SIGMOD International rapid implementation of wrappers. In Proceedings
Conference on Management of Data, pages 413{ of the Fourth International Conference on Deduc-
422, Montreal, Canada, June 1996. tive and Object-Oriented Databases, Singapore, De-
[CM89] M.P. Consens and A.O. Mendelzon. Expressing cember 1995.
structural hypertext queries in GraphLog. In Pro- [PGMW95] Y. Papakonstantinou, H. Garcia-Molina, and
ceedings of the Second ACM Conference on Hy- J. Widom. Object exchange across heterogeneous
pertext, pages 269{292, Pittsburgh, Pennsylvania, information sources. In Proceedings of the Eleventh
November 1989. International Conference on Data Engineering,
[Com91] IEEE Computer. Special Issue on Heterogeneous pages 251{260, Taipei, Taiwan, March 1995.
Distributed Database Systems, 24(12), December [QRS+ 95] D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman,
1991. and J. Widom. Querying semistructured hetero-
[Gaf97] J. Ganey. Illustra's web datablade module. geneous information. In Proceedings of the Fourth
Technical report, Informix Corporation, Febru- International Conference on Deductive and Object-
ary 1997. Available at http://www.informix.com Oriented Databases, pages 319{344, Singapore, De-
as /informix/corpinfo/zines/whitpprs/illuswp/ cember 1995.
dblade.htm. [QWG+ 96] D. Quass, J. Widom, R. Goldman, K. Haas, Q. Luo,
[GHJ96] S. Ghandeharizadeh, R. Hull, and D. Jacobs. Her- J. McHugh, S. Nestorov, A. Rajaraman, H. Rivero,
aclitus: Elevating deltas to be rst class citizens in S. Abiteboul, J. Ullman, and J. Wiener. LORE: A
a database programming language. ACM Transac- Lightweight Object REpository for Semistructured
tions on Database Systems, 21(3):370{426, 1996. Data. In Proceedings of the ACM SIGMOD Inter-
[Gra93] G. Graefe. Query evaluation techniques for large national Conference on Management of Data, page
databases. ACM Computing Surveys, 25(2):73{170, 549, Montreal, Canada, June 1996. Demonstration
1993. description.
[SK91] M. Stonebraker and G. Kemnitz. The POST-
[GW97] R. Goldman and J. Widom. Dataguides: Enabling GRES next-generation database management sys-
query formulation and optimization in semistruc- tem. Communications of the ACM, 34(10):78{92,
tured databases. In Proceedings of the Twenty- October 1991.
Third International Conference on Very Large [SL90] A. Sheth and J.A. Larson. Federated database sys-
Data Bases, Athens, Greece, August 1997. tems for managing distributed, heterogeneous, and
[KKS92] M. Kifer, W. Kim, and Y. Sagiv. Querying object- autonomous databases. ACM Computing Surveys,
oriented databases. In Proceedings of the ACM 22(3):183{236, 1990.
SIGMOD International Conference on Manage- [YA94] T. Yan and J. Annevelink. Integratinga structured-
ment of Data, pages 393{402, San Diego, Califor- text retrieval system with an object-oriented data-
nia, June 1992. base system. In Proceedings of the Twentieth In-
[KS95] D. Konopnicki and O. Shmueli. W3QS: A query ternational Conference on Very Large Data Bases,
system for the World Wide Web. In Proceedings of pages 740{749, Santiago, Chile, September 1994.
the Twenty-First International Conference on Very [Zlo77] M.M. Zloof. Qurey-by-Example: a data base lan-
Large Data Bases, pages 54{65, Zurich, Switzer- guage. IBM Systems Journal, 16(4):324{343, 1977.
land, September 1995.