You are on page 1of 12

Making Database Systems Usable

H. V. Jagadish Adriane Chapman Aaron Elkiss


Magesh Jayapandian Yunyao Li Arnab Nandi Cong Yu
{jag,apchapma,aelkiss,jmagesh,yunyaol,arnab,congy}@umich.edu

University of Michigan
Ann Arbor, MI 48109-2122

ABSTRACT database systems are used extensively, we find an army of


Database researchers have striven to improve the capability database administrators, consultants, and other technical
of a database in terms of both performance and functional- experts all busily helping users get data into and out of a
ity. We assert that the usability of a database is as important database. For almost all organizations, the indirect cost of
as its capability. In this paper, we study why database sys- maintaining a technical support team far exceeds the di-
tems today are so difficult to use. We identify a set of five rect cost of hardware infrastructure and database product
pain points and propose a research agenda to address these. licenses. Not only are support staff expensive, they also
In particular, we introduce a presentation data model and interpose themselves between the users and the databases.
recommend direct data manipulation with a schema later Users cannot interact with the database directly and are
approach. We also stress the importance of provenance and therefore less likely to try less straightforward operations.
of consistency across presentation models. This hidden opportunity cost may be greater than the vis-
ible costs of hardware/software and technical staff. Most
of us remember the day not too long ago when booking a
Categories and Subject Descriptors flight meant calling a travel agent who used magic incanta-
H.2.0 [General]; H.5.0 [General] tions at an arcane system to pull up information regarding
flights and to make bookings. Today, most of us book our
General Terms own flights on the web through interfaces that are simple
enough for anyone to use. Many enjoy the power of being
Design, Human Factors able to explore options for themselves that would have been
too much trouble to explain to an agent, such as willingness
Keywords to trade off price against convenience of a flight connection.
Search engines have done a remarkable job at directly
Database, Usability, User Interface
connecting users with the web. The simple keyword-based
query mechanism allows web users to issue queries freely; the
1. INTRODUCTION almost instantaneous response encourages the user to refine
Database technology has made great strides in the past queries until satisfactory results are found. While search
decades. Today, we are able to efficiently process ever larger engines today are still far from perfect, their huge success
numbers of ever more complex queries on ever more humon- suggests that usability is key. An information system pro-
gous data sets. As a field, we can be justifiably proud of vides value to its users through the ability to get information
what we have accomplished. into and out of the system easily and efficiently. Unfortu-
However, when we see how information is created, ac- nately, databases today are hard to design, hard to modify,
cessed, and shared today, database technology remains only and hard to query.
a bit player: much of the data in the world today remains One obvious question to ask is whether database systems
outside database systems. Even worse, in the places where can simply have a search engine sit on top of them and let
the search engine handle the interaction with the users. The
Supported in part by NSF grant IIS 0438909 and NIH
grants R01 LM008106 and U54 DA021519. We thank Mark answer, we argue, is “No.” While the search engine interface
Ackerman, Jignesh Patel and Barbara Mirel for their com- works well for the web, it does not address all the usabil-
ments on a draft of this paper. ity problems database systems are facing. This is due to
many characteristics that stem from users’ expectations for
interacting with databases, which are fundamentally differ-
ent from expectations for the web.
Permission to make digital or hard copies of all or part of this work for The first characteristic that users expect is the ability to
personal or classroom use is granted without fee provided that copies are query the database in a more sophisticated way. While users
not made or distributed for profit or commercial advantage and that copies are content with searching the web with keywords, they want
bear this notice and the full citation on the first page. To copy otherwise, to to express more complex query semantics when interact-
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. ing with databases simply because they know databases are
SIGMOD’07, June 11–14, 2007, Beijing, China. more than collections of documents and there are structures
Copyright 2007 ACM 978-1-59593-686-8/07/0006 ...$5.00.
inside databases that can be used to answer their queries. the views, they tend to become confused and lose trust in
The use of boolean connectives, quote phrases, and colon the system. A good presentation data model can be enor-
prefixes are just the beginning of the complex semantics we mously helpful, but too many such models becomes counter-
can expect from database users. For example, when query- productive. We describe this problem of “painful options”
ing an airline database for available flights, a user will natu- in Section 4.2.
rally search on explicit travel dates, departure/arrival cities The expectation of perfect precision and recall for database
or airports, and sometimes even airlines. Such complex se- search introduces yet another usability issue: the need to is-
mantics cannot be handled by a keyword-based interface sue explanations to the user when the database system pro-
that search engines provide. duces unexpected results or fails to produce the expected
The second characteristic is that users expect more pre- results. The first case is a failure in precision—some of
cise and complete answers from database search. One of the the results produced are not relevant in the mind of the
hidden reasons behind the success of search engines is the user. The database system will need to be prepared to an-
fact that web users can tolerate a search that returns many swer questions like “where does this result come from?” and
irrelevant results (i.e., less than perfect precision) and they provenance tracking [89] is essential in this regard. Second,
typically do not care if there are relevant results missing some potentially relevant results may not be produced—a
(i.e., less than perfect recall), as long as some relevant re- failure in recall. Some of those missing results can be iden-
sults are returned to them. In the database world, however, tified by the system. For example, consider a query asking
these assumptions no longer hold. Users issuing a database for “all flights with booking fee less than $10” and assume
search expect to obtain all and only the relevant results: there are flights in the airline database that do not have
anything less than perfect precision and recall will have to a known booking fee. It is reasonable, or even desirable,
be explained to the user. not to return those flights to the user. However, the sys-
The third characteristic is that users have an expectation tem must be able to explain to the user that those flights
of structure in the result. In the case of a web search, a user are not included in the result because their booking fees are
expects simply a set of links, with almost no interrelation- unknown, and not because their booking fees exceed $10.
ship between them. In the case of a database search, a user Some missing results cannot ever be identified by the sys-
may expect to see a table, a network, a spatial presentation tem. For example, consider a query asking for “a Tuesday
on a map, or a set of points in a multidimensional space — flight from DTW to PEK on any airline” and assume the air-
the specific structure depends on the users mental model of line database does not carry flights from Northwest Airlines,
the application. which does have a flight between DTW and PEK. While the
The fourth characteristic is that users often expect to cre- system may not be aware of the existence of such a flight
ate and update databases. While search engines are all with Northwest Airlines, it still needs to explain to the user
about searching, database users frequently want to design that this omission is due to its incomplete coverage. We
new databases to store their own information or to generate describe this problem of “unexpected pain” in Section 4.3.
information to put into existing databases. The usability Another way in which a user can get unexpected results
issues in designing a database structure from scratch and from a system is when the user makes an error in specify-
creating structured information for existing databases are ing the query. Unfortunately, given how difficult it can be
completely unexplored so far and need to be addressed for to specify a query correctly, such errors are all too com-
databases to become widely used by ordinary users. mon. A fundamental difficulty in a traditionally architected
These fundamental characteristics suggest that the data- database system is that the user has a labor-intensive query
base research community needs to think about database us- construction phase followed by a potentially lengthy query
ability not just as a query interface, but as a more compre- evaluation phase, so that after the results are obtained the
hensive and integral component of the database systems. costs to reformulating the query are often high. Worse still,
Structured query models like SQL and XQuery are the errors in query construction remain uncaught until the con-
current means provided by database systems to allow users struction phase is completed and the query submitted for
to express complex query semantics. While powerful, those evaluation. This sort of write and debug cycle is something
models are fundamentally difficult for users to adopt because we computer scientists get used to as part of learning how
they require users to fully comprehend the structure of the to program. Other users need mechanisms so that they can
database (i.e., schema) and to express queries in terms of see what they are doing with the database at all times. This
that particular structure. We argue that while the logical absence of the ability to manipulate data directly is what we
schema of the database is an appropriate abstraction over call “unseen pain” and discussed in Section 4.4.
the underlying physical schema for data organization, it is Finally, the users’ expectation of being able to create a
still at a level too low to serve as the abstraction with which database from scratch and/or to create structured informa-
users should interact directly. Instead, a higher level pre- tion for an existing database introduces a whole new set of
sentation data model abstraction is needed to allow users usability issues that are currently unexplored. Requiring a
to structure information in a natural way. We describe this user to go through a rigorous schema design process before
problem of “painful relations” in Section 4.1. Because differ- any information can be stored, as in the case of many cur-
ent users have different views (i.e. presentation data models) rent database design practices, puts too much burden on the
on how information should be organized, one would natu- user and runs the risk of user forgoing a database approach
rally like those presentation data models to be personalized altogether. Similarly, requiring a user to study the exist-
for individual users. Our experience with the MiMI [51] ing database schema and restructure their data according
project, however, has told us a different story. When users to this schema before she can update the database with her
are presented with multiple ways to access the information own data also imposes an unnecessary burden. We describe
but do not understand the underlying differences between this problem of “birthing pain” in Section 4.5.
Many systems such as DBXplorer [2], BANKS [11], DIS-
Paper Outline: The rest of the paper is organized as fol- COVER [47] and an early work of Goldman et al. [40] at-
lows. We describe the current research activities related to tempt to extend the simplicity of keyword search to rela-
database usability in Section 2 and describe our ongoing tional data. This is not merely an integration of full-text
MiMI project [51] as a case study of database usability in search with relational query capability—such an approach
Section 3. In Section 4, we describe in detail the usabil- still requires knowledge of the database schema. Rather, the
ity challenges facing database systems. Section 5 presents core principle is to provide keyword search across tuples.
a research agenda that suggests some directions for future Most of these systems find tuples that individually match
research in database usability. Finally, we conclude in Sec- each keyword and then find a path of primary/foreign key
tion 6. relationships that relate the tuples. Result ranking is typ-
ically provided based on the path of primary/foreign key
relationships. A common ranking approach is to use some
2. CURRENT APPROACHES variation of PageRank [14], where documents and hyperlinks
There is evidence that human error is the leading cause are replaced with tuples and referential constraints.
for the failure of complex systems [77, 15]. To this effect, the A parallel thread of research examines keyword search in
nature of human usage has received considerable attention in XML databases. The basic problem is the same as in re-
research, e.g., there is a recent move in the software systems lational databases: given a set of keywords, we must find
community to conduct serious user studies [91]. Database data fragments that match the individual keywords. In-
usability started to receive attention more than 25 years stead of only referential constraints, XML databases mostly
ago [32]. Since then, research in database usability has been have parent/child relationships between individual elements.
following two main directions: innovative query interface The problem of determining if a data fragment is meaning-
design (including both visual and keyword-based interfaces) fully related becomes much more important. Approaches to
and database personalization. We describe recent accom- determine the meaningfulness as well as the relevance of a
plishments in those areas, as well as other related areas like data fragment have ranged from simple tree distance used by
automatic configuration and management of database sys- XSEarch [29] to XRANK’s adaptation of PageRank [41] to
tems. approaches such as Schema-Free XQuery [62, 63] that look
at either the entire database or the database schema [98] to
2.1 Visual Interface determine if a data fragment is meaningful.
Visual query specification interface is perhaps the oldest A different approach with a long history is the construc-
and most prominent field related to database usability, in tion of natural language interfaces to databases [5]. Natural
term of both academic research (e.g., QBE [100]) and in- language understanding is an extremely difficult problem,
dustrial practices (e.g., Microsoft Access and IBM Visual and commercial systems such as Microsoft English Query [12]
XQuery Builder). Many visual query interfaces have been tend to be unreliable or unable to answer questions outside
proposed to assist users in building queries incrementally, a manually predefined narrow domain [83].
including XQBE [13], MIX [71], Xing [37], VISIONARY [9], Other systems assume the user has some imperfect knowl-
Kaleidoquery [73] and QBT [85]. edge of the structure of the data as could occur with het-
Forms-based query interface design has also been receiv- erogeneous or evolving schema. This is an intermediate step
ing attention. Early works on such interfaces include [26, 36] in user complexity between pure keyword search and rigid
and provide users with visual tools to frame queries and to structural search. Research such as FleXPath [4] and the
perform tasks such as database design and view definition. work of Kanza and Sagiv [54] has focused on relaxation of
The GRIDS system [84] generates forms that allow users fully specified structural queries; other systems such as Ju-
to pose queries in a semi-IR, semi-declarative fashion; the ruXML [21] support querying and result ranking based on
Acuity project [90] developed form generation techniques similarity to a user-specified XML fragment.
for data-entry operations such as updating tables. More re- A more recent trend in keyword-based search is to analyze
cently in XML database systems, efforts have been made to a keyword query and automatically discover the hidden se-
shield users from both the details of the XQuery syntax and mantic structures that the query carries. This trend has in-
the textual representation of XML. FoXQ [1] and EquiX [28] fluenced the design of projects for both traditional database
are systems that helps users build queries incrementally by search [52] as well as web search [65].
navigating through layers of forms. Semi-automatic form
generation tools have been proposed in QURSED [79]. Fur- 2.3 Context and Personalization
thermore, [93] proposes the use of XML rather than HTML Advancements in query interface design, while making it
to represent forms, making them more reusable, scalable, easier for users to interact with the database, are mostly
machine-readable and easier to standardize. Another in- generic: they do not take into account the specific user
teresting project in UI design is DRIVE [68], a runtime and her unique problems. The notion of personalization ad-
user interface development environment for object-oriented dresses this problem by attempting to customize database
databases, which uses the NOODL data model to enable systems for each individual user [33]. This approach has
context-sensitive interface editing. received great attention in the context of websites [80, 69],
where the content and structure of the website is tailored
2.2 Text Interface to the needs of each user by analyzing usage patterns. The
Our sister field of information retrieval has had wider notion of user context and personalization has also found
adoption by normal users. This has prompted database in- interest in the information retrieval community, where the
terface designers to take the approach of providing database ranking of search results is biased using a certain personal-
systems with an IR-style keyword-based search interface. ized metric [45, 46, 53].
Database research has made advancements in accommo- experiments have varying degrees of reliability. Several pub-
dating user and contextual information into query process- lic databases of protein interactions have arisen, each with
ing. Koutrika and Ioannidis [56] define a user preference its own focus (e.g., organism, disease, high-throughput tech-
model and describe methods to evaluate the degree of per- nologies, etc.). Protein entries are sometimes repeated within
sonal interest in query results. Chen and Li [24] provide and across the repositories. A scientist interested in learning
methods to mine query logs and cluster results according to about a particular protein might have to visit half a dozen
user characteristics. Ioannidis and Viglas [49] also propose sites and merge information obtained from them, some over-
the idea of conversational querying in which queries are in- lapping, some even contradictory. We created MiMI [51], a
terpreted in the context of the previous queries in a query deep integration of several of the best-regarded protein in-
session. teraction databases. Provenance was retained to describe
where the data originated, and the entire dataset and meta-
2.4 Other Related Work data were stored in Timber [50].
Commercial database systems come with a suite of auxil- Given the XML representation of MiMI data, XQuery was
iary tools. The AutoAdmin project [3, 23] at Microsoft, ini- our first choice for accessing the database. Indeed, some
tiated by Surajit Chaudhury and his colleagues, makes great users wanted the power of a declarative query language, even
strides with respect to many aspects of database configura- if they didn’t have the training to write such queries. A
tion including physical design and index tuning. Similarly, majority of users, however, were complete technophobes and
the Autonomic Computing project [64, 66] at IBM provides preferred forms-based interfaces. (Such interfaces do a good
a platform to tune a database system, including query op- job today for specific applications—quite complex back-end
timization. However, none of these projects deal with the queries can be run, for instance in an airline reservations
user-level database usability that is the focus of this paper. database, while the user is shielded from this complexity by
Much work has been done by the HCI community in the a simple form-based query interface.) Aside from these were
area of usability improvement for computer system inter- a few users who wanted to download the entire dataset and
faces in general. Some of the earliest works in database write Perl scripts to slice and dice it. Our challenge in MiMI
usability includes [87], which analyzed the expressive power was to provide easy-to-use interfaces beyond a few hand-
of a declarative query language (SEQUEL) in comparison designed forms for some common queries. In fact, MiMI
to natural language. Usability of information retrieval sys- allows users to access data through various interfaces, which
tems was studied in [92, 99], which analyzed usability er- are depicted in Figure 1 and discussed in greater detail in
rors and design flaws, and also in [34], which performed a the next section.
comparison of usability testing methods. Principles of user-
centered design were introduced in [55, 94], including how 3.1 Accomplishments
they could complement software engineering techniques to
We started with query interfaces at two opposite ends of
create interactive systems. Incorporating usability into the
the spectrum: XQuery and a simple forms-based interface.
evaluation of computer systems was studied in [16], which
Almost right away, we decided to add a visual query builder,
analyzed human behavior with a dependability benchmark.
MQuery, as an intermediate option between the two oppo-
An extensive user study was performed in [22] to identify
sites. MQuery enables users to create declarative queries
the reasons for user frustration in computing experiences,
incrementally by clicking on elements of interest in a graph-
while [20] takes a more formal approach to model user be-
ical schema tree and filling in form fields associated with
havior for usability analysis. However, for database systems
each of them. However, we found that only a few people
in particular, these only scratch the surface of what needs
preferred this interface—it was considered not much simpler
to be done to improve usability.
than writing XQuery in many circumstances.
It turned out that users’ difficulties with both XQuery
3. A CASE STUDY and MQuery are caused not only by the syntax of the lan-
Just a few short years ago, we were a traditional database guage, but more importantly by the mere complexity of the
research group at the University of Michigan, focusing on MiMI schema. Almost all users found it difficult to locate el-
data structures, algorithms, and performance. Usability was ements of interest to be used in their query specification. To
not a topic that we paid too much attention to. A signif- assist such users we aggressively tackled this problem with
icant project at that time was the development of Timber multiple approaches. One approach we developed is that of
[50], a native XML database. As we looked for challeng- schema summarization [97]. The idea is to develop a rep-
ing applications to run on our database system, we started resentation of the underlying complex schema at different
collaborating with biologists. When we put their data on levels of detail. A user unfamiliar with the database would
our system and had them try to use it, we became aware of first be shown a high-level schema summary comprising only
many unexpected issues. The insights gleaned from watch- a small number of concepts. Progressively greater detail is
ing very smart but mostly non-technical people use database revealed on demand as the user zooms in on the portions
systems, both Timber and commercial relational systems, of the schema of greatest interest. Based on the schema
led to the ideas presented in this paper. In this section, we summary concept, we have recently developed a new query
present a short history of these efforts as essential context model called Meaningful Summary Query [98], which allows
for what follows. a user to query the database through the schema summary
After the sequencing of the human genome was completed, directly, without the knowledge of the underlying complex
biologists began focusing their attention on the proteins ex- schema and with high result quality and query performance.
pressed by these genes, their interactions, and their func- Another approach we developed was that of Schema-Free
tions. Scientists perform a wide variety of experiments to XQuery [62, 63]. The idea here is to allow users to specify
determine which proteins interact with one another. These query entities of interest without specifying how they are
(b)

(c)

(a)

(d) (e)

Figure 1: Methods users can interact with MiMI: (a) XQuery (b) Keyword (c) Natural Language (d) Forms (e)
MQuery.

hierarchically related. The system then automatically de- directed iterative refinement method to assist users in re-
termines local structures in the XML database that match stating queries when it is unable to understand the users’
these specifications. The users can specify queries based natural language statements [60]. We extended NaLIX to
on whatever partial knowledge of the schema they have— permit conversational querying so that users can construct
knowing the full schema, they can write regular XQuery complex queries as modifications of previously issued queries
statements; not knowing the schema at all, they can just [61]. We also added a domain learning component to the
specify labeled keywords; most importantly, they can spec- generic NaLIX system, leveraging off the iterative restate-
ify queries somewhere in between. The system will respect ment of queries. This system, DaNaLIX is being demon-
whatever specifications are given. More recently, we have ex- strated at this conference [57].
tended this work to Ranked Relatedness Queries [35], which In short, we developed a rich panoply of interfaces with
return a set of matches with associated relevance scores. which to access MiMI. Our intention was that each user
For certain types of queries, our users, being accustomed could choose to interact with the database using the inter-
to web search, preferred keyword-based search. We promptly face they prefer. Many issues, however, still remain.
added such an interface to MiMI. All textual fields within
the database were used to populate a Lucene index [44]. One 3.2 Remaining Issues
challenge we faced was that there were multiple alternative Many of our users have strong preferences for some in-
spellings of the names of an entity (i.e., a gene or a protein) terfaces over others. But users do access MiMI through
and names were quite long. Frequently, a user issued a key- multiple interfaces. Somewhat surprisingly, we receive com-
word query only to find later that the results were incorrect plaints about inconsistencies between different interfaces—
because the wrong keywords were used. An instantaneous some users found different results by going through different
feedback mechanism was highly desired. This desire led us interfaces, as the following example shows.
to the creation of the word autocompletion interface, which
had since evolved into a phrase autocompletion interface
[75], and more recently, into a query autocompletion facil- Example 1. A user issues a keyword query ‘Wee1’ through
ity. In particular, the query autocompletion facility is being the keyword interface. The query evaluation accesses the
demonstrated at this conference [74]. Lucene index, which is constructed on all textual fields, ir-
Finally, some of our users, being complete technophobes, respective of which field it occurs in. Hundreds of results are
preferred to access MiMI in their own language, English. returned, including molecules that bear some relationship to
We developed an interactive Natural Language Interface for Wee1 and mention the string ‘Wee1’ somewhere in one of
Querying XML, NaLIX [58, 59], which is built on top of their fields. Later on, the same user issue a form-based query
Schema-Free XQuery. NaLIX is a generic natural language through the MQuery interface, by typing ‘Wee1’ into the
query interface capable of handling queries with not just se- ‘Molecule Name’ field, and only ten results are returned—
lection, but also joins, nesting, and aggregation. It uses a the molecules that contain Wee1 as their name. The user
complains to us for producing inconsistent results.
While we, as computer scientists, can see immediately flight. Yet, in our normalized relational representation, this
where the problem is, our users don’t. And the burden is on single concept is recorded across four different tables. Such
us to explain to our users, in an effective way, the reasons “splattering” of data decreases the usability of the database
behind this inconsistency. in terms of schema comprehension, join computation, and
Another set of complaints we frequently get are the inabil- query expression.
ity to explore and manipulate the data directly, in a graphi- First, given the large number of tables in a database, often
cal setting. We partially addressed this issue by integrating with poorly named entities, it is usually not easy to under-
a popular graphical tool, Cytoscape [86], into MiMI. While stand how to locate a particular piece of data. Even in a
users cannot issue complex queries over the database be- toy schema such as Figure 2, there is the possibility of trou-
cause of the limitations of Cytoscape, they are happy to be ble. Obviously, the airports table has information about
able to graphically manipulate the results, which are viewed the starting location and the destination. However, how do
as a graph of interacting protein nodes. One primary bene- we figure out what is used by a particular flight? The words
fit of using Cytoscape is the ease of specifying joins to find f id and tid have no meaning. Instead we must bring up
related interactions or proteins in the graphical setting. (In the schema, press our fingers to the monitors, and create a
fact, we found joins to be extremely hard for our users to greasy smear as we follow the foreign key constraint. Al-
reason with correctly.) How to allow complex query seman- ternatively, we could suffer almost as much pain by reading
tics over a graphical representation of the data is a major the database creation statements for the same information.
research issue that we are currently pursuing. The current solution to manage this pain is to hire DBAs
Finally, many of our users frequently generate scientific re- and offer them copious amounts of money not to leave once
sults from the experiments performed in their labs—results they have learned the company’s database schema well.
that they would like to put into MiMI for easy access by The next problem users face is computing the joins. We
others. However, MiMI’s rigid structure, as exemplified by break apart information during the database design phase
a schema that is updated only a few times a year, prevents such that everything is normalized—space efficient and up-
users from simply putting their data into MiMI. Instead, datable. However, the users will have to stitch the informa-
they need to understand the MiMI structure first and con- tion back together to answer most of the real queries. The
vert their data to the MiMI format before the update can be fundamental issue is that joins destroy the connections be-
made. In reality, little data has been uploaded into MiMI tween information pertaining to the same real world entities
from our users primarily because they are all busy scientists and are nonintuitive to most normal users. We note that
whose time is simply too precious to be spent understanding many commercial database systems carefully denormalize
the schema of a database they use as a tool. their schema to reduce the number of joins required, al-
As we analyze our accomplishments, and more impor- though the purpose there is to speed up query evaluation.
tantly, the many remaining issues described above, we have Finally, queries become painful to express across multiple
come to realize that the usability of a database system is tables. Because joins innately disrupt data cohesion, such
much more than skin deep. Our work on query interfaces queries are problematic for many users. For example, con-
may contribute towards the usability of a system, but they sider a query as simple as “Find all flights from Detroit to
are far from enough to provide the optimal user experience. Beijing” in our airline database. Even though we are inter-
In the next section, we enunciate what we believe are the ested only in information about flights, the city names that
major database usability problems. specify the selection predicate are found only in the airports
relation, which must be joined twice with f light inf o to ex-
4. THE PERSISTENCE OF PAIN press our query.
For queries like this one with only a few simple joins, it
When we look at how users struggle with database sys-
is not so hard to see how one could reconstruct the un-
tems today, we see several major issues. Whereas we have
derlying object that was broken apart to produce the nor-
certainly been motivated to study issues of usability because
malized schema. One could even envisage an automated
of MiMI and our biological collaborations, we believe that
tool to do so. However, when we start having recursive
the usability concepts we present in this paper are applica-
self-joins or one-to-many joins with variable cardinality, or
ble universally and not just to scientific data. To stress this
non-equijoins, it is not even clear how to produce a single
point, and to make this paper accessible to the database re-
tuple for the user to operate on in a tabular fashion. In
searcher who may not know much Biology, we have chosen
fact, even experts may run into trouble, as pointed out by
an airline database as our running example in this paper
David Beech [6], the technical guru at Oracle: “Supposing
and avoided the use of biological terms.
that the same set of features is widely available in different
4.1 Painful Relations implementations, will the standards be well enough under-
stood by users who are not programming wizards? I’m not
Whereas a single table of data is natural for most peo-
implying that application developers will all use SQL di-
ple, joins between multiple tables are not. Unfortunately,
rectly. Even if higher-level tools conceal the syntax of the
normalization is at the center of relational design. Indeed
language, users must clearly understand the data model or
normalization saves space, avoids update anomalies, and is
type system of the manipulated information.”
a desirable property from many perspectives. However, the
Some practitioners have realized that typical commercial
use of joins in a relational model does not retain the integrity
databases are too heavy duty and too confusing for the aver-
of data objects that a user regards as one unit.
age user. Approaches such as DabbleDB [31] or OpenRecord
Consider an airline database with a basic schema shown
[76] present users with “easy” relational systems. However,
in Figure 2, for tracing planes and flights. The data encap-
even these systems, specifically geared toward taking the
sulated is starting location, destination, plane information,
onus of SQL away from the user, are still haunted by the
and times—essentially what every passenger thinks of as a
Forms limit database access to a specific set of queries felt
airplane
to be of most use to their target users. They offer a conve-
id nient entry-point to the database that requires no knowledge
type flight_info
of query languages or data organization that other querying
serial_number
id airports mechanisms like SQL require to obtain the same informa-
flight_number tion. But this simplicity comes at a price. While we would
airplane_id id like to limit options, it is not easy to determine which op-
tid city_name
schedule fid airport_name tions to keep and which options to leave out. “What do
schedule_id users want?” is a difficult question to answer, especially
id
date since users are not all the same. While cutting down on op-
day_of_week
departure_time tions can greatly enhance the querying experience for some
arrival_time users, it might dissatisfy many others. The challenge is to
simplify querying for novice users, while providing the ex-
pert user with the tools she needs to be productive.
Figure 2: The base tables needed to store a “flight”. A
flight contains from location, destination, airplane info 4.3 Unexpected Pain
and schedule, yet consists of at least four tables. Note
Another place where database systems can frustrate users
that an actual schema for such data is likely to involve
is when they produce results that are unexpected with no
many more attributes and tables.
explanation. Shielded from the details of the system, a user
can obtain results that do not make sense to her because
her mental model and assumptions of the system conflict
with the actual underpinnings of the system. A counter-
join curse. The normalization/join notion creeps into even
example is a web search engine, which has a complex ranking
the best interfaces because of its centrality in the relational
procedure for search results and does not reveal the technical
mind set.
details to most users. Users seem to accept search engines
4.2 Painful Options because: 1) Expectations are set correctly. Search engines
are upfront about performing behind the scenes magic that
Most computer scientists will consider a system that sup-
the user cannot influence. 2) Usually, the top few results
ports both A and B as being superior to a system that only
returned do contain what the user is looking for. 3) If the
supports A, for any functionality A and B. After all, the first
result is not there, the user has the option to either trawl
system can emulate the second, while the second cannot em-
through pages of returned options, or to try a different query
ulate the first. Similarly, a system that permits adjustment
path. 4) The web is huge, and no one knows exactly what
of some tuning parameter is logically superior to another
is out there. If a user is not presented with a relevant page,
that does not permit this parameter to be adjusted.
chances are he would not know he missed it, and would make
Unfortunately, this sort of logic quickly leads us to create
do with a different entry.
software with too much functionality and too many options.
Unfortunately, the search engine strategy will not work
The problem with irrelevant options is that they compete
for database systems mainly because of (4). When a user
with the relevant ones. Exercising options is not costless.
queries an airline reservation system, she may know that
A cost analysis of weighing diverse options was performed
there are flights between Detroit and Beijing. When the sys-
in [88]. Psychologists can also tell us about regret regard-
tem tells her otherwise, it is unexpected and she will demand
ing “paths not taken,” a cost that would not be there if the
an explanation. This unexpectedness comes in two forms,
alternative paths were not there in the first place. [48] stud-
unable to query and unexpected results. We have mentioned
ies the effects of forgone options and models its cost to the
the latter previously. We now describe both of them in more
user. Too many choices can also have an adverse effect on
detail.
a user’s “need for closure” which is studied in [67]. A great
deal of the mystique of Apple Inc. has to do with their re- 4.3.1 Unable to Query
ducing the functionality of their product, and hence making
When a system hinders users from querying the data in
them better. Witness the success of the iPod: the simplest
the way that they want, it frustrates the user. It is especially
set of features that can possibly work is what attracts large
frustrating when the user knows that the underlying data
numbers of users—not the latest in whizbang gadgetry [81].
exists, yet she cannot query it:
The database metamodel today is all about options. Query
optimization is at the heart of what we do. Determining al- Example 2. Consider a world traveler who has infinite
gebraic equivalences and generating alternative query plans flexibility, many destinations to visit but limited money. She
is at the core of our heart. The same query can be posed visits her favorite airline reservation site, and chooses “Flex-
multiple equivalent ways in SQL. ible Dates”. After filling in a bit more information, she at-
Instead, we should design systems for customized value tempts to specify multiple stops. Suddenly, she is forced to
and care only about how well users can get their job done. enter fixed dates into the system!
Unexplored alternatives provide no value, as the nervous in-
terviewee was told during job search, “It is not the number The system behaved unexpectedly, which led to the user
of offers that counts, but rather the quality of the one offer not being able to construct her query, but why? Was it
that you accept.” It is no surprise that forms-based inter- because she cannot have multiple hops with flexible dates?
faces, for all their limitations, have been the primary means Or was it some other piece of information she entered? Even
through which non-experts interact with databases. if the user determined that she could not enter flexible dates
with multiple hops, there was still unexpectedness. The data
obviously exists in the database, so has she merely chosen Example 4. A user, looking for an escape, peruses the
the wrong interface to access it? If she had started in a list of cheap flights provided by her favorite airline. She
different place, or followed a different path, could she have can get to Los Angeles for $75, Boston for $100 and San
fulfilled her request? Francisco for $400. Why is San Francisco on this list? It is
This is a fundamental problem with forms-based inter- not a particularly cheap fare, but it must have satisfied some
faces. Forms, by definition, provide only a limited view of criteria to be placed there.
the underlying data. When users’ mental model of the data
In addition to where a result comes from, why a result is
differs from that of the form designer, unexpected pain re-
returned is also essential. The latter describes why a par-
sults. The user has to focus on how to obtain results rather
ticular item is included in a set [8, 30]. For instance, in the
than on what results they want. This practically eliminates
above example, if the criteria for inclusion is that the fare
the benefit of declarative querying that an RDBMS provides.
is less than the average flight price for the next month, and
The need for application-specific forms-based interfaces
the San Francisco fare satisfies this, it should be included
stems from the general database usability problems. If the
on the list.
average user could successfully query a relational database,
When users encounter unexpected results, it is responsi-
there would be no need for a separate form for each appli-
bility of the database system to explain to them the where
cation. Ideally, the system could adapt to the user’s view of
and why. The usability of the system can be significantly
what the data represents and allow the user to query it in
affected when no such explanation can be given.
whatever way makes sense for her.
Developing methods for querying relational databases that 4.4 Unseen Pain
are agnostic with respect to the true underlying structure
As computer scientists, when we think of database users,
poses many challenges. First, the system must determine
our instinct is that they will think like us. But we are not
what the user wants to query for. In the cases of simple se-
typical database users. For example, the vast majority of
lection conditions, this determination is not too hard so long
us today prefer to use LATEX for document creation. Yet an
as the user knows what fields are available and the system
overwhelming majority of the rest of the world prefers Mi-
knows how to join the tables containing these fields. In the
crosoft Word. As a computer scientist, you can explain why
case of queries requiring aggregation and multiple complex
you prefer LATEX to Word—the former is elegant, it permits
joins, it is not so clear how to provide the user a straightfor-
global changes more easily, it separates content from format-
ward yet comprehensive way of specifying what they want
ting, it stores everything in small ASCII files, and so on. Yet,
to query.
Word has one overwhelming advantage over LATEX—it has
Even if the user successfully specifies what to query, the
the “What You See is What You Get” (WYSIWYG) prop-
system may be unable or unwilling to perform the query.
erty. As a LATEX user, you edit the source file and predict
The reasons for this are manifold: performance or security
what your modifications will do to the output generated.
concerns, data or program errors at some level of the system,
If you are an experienced LATEX user, and have your brain
etc. To avoid unnecessary pain, the system must be able
wired like a Computer Scientist, your predictions are cor-
to report these failures to the user in a meaningful way.
rect most of the time. If your predictions are correct often
Determining the appropriate level of detail so as to help the
enough, your few mistakes are easy to take in stride. For
user without overwhelming her is a challenge in and of itself.
the lay user, though, this can become a frustrating barrier
to use.
4.3.2 Unexpected Results
Our situation with database manipulation is similar. Es-
There is still plenty of opportunity for unexpected pain sentially all query languages, including visual query builders,
even when a user is able to successfully navigate through a separate query specification from output. A user issues a
system’s query interface. When the user encounters unex- query, and hopes that it will produce the desired output. If
pected results and no explanation is provided, the user is it does not, then she has to revise the query and resubmit.
again frustrated. Previous work [70] focuses on explanation There has been some discussion in the database community
for empty results. However, even non-empty results can still of query sequences, but the assumption is that a query is be-
be unexpected. We present two different ways that users can ing reformulated because the user is “exploring” the data.
encounter problems with the results. The first is with the While this may be true in some cases, often the query re-
base data itself. formulation is because the user did not initially specify the
query correctly.
Example 3. A user books a trip through the airline reser- Querying in its current form requires prediction on the
vation system and requests lowest fare and a window seat. part of the user. In our airline database example, consider
However, the system keeps giving him an aisle seat without the specification of a three letter airport code. Some in-
any error message. Where does the aisle seat come from? terfaces provide a drop down list of all the cities that the
Is it from the pool of general seats or is it from the pool of airline flies into. For an airline of any size, this list can have
seats with the lowest fare? hundreds of entries, most of which are not relevant to the
user. The fact that it is alphabetized may not help—there
Often the need to know where a result comes from is only may be multiple airports for some major cities, the airport
requested when something goes wrong. However, it can be may be named for a neighboring city, and so on. A better
necessary in its own right. For instance, you may be inter- interface allows a user to enter the name of the place they
ested in the list of prohibited items published by the gov- want to get to, and then looks for close matches. This can-
ernment agency TSA, but not by the individual airlines. It not be a simple string comparison—we need Inchon airport
matters where a result comes from. However, where alone is to be suggested no matter whether the user entered Inchon
not enough, as the next example will demonstrate. or Seoul or even Soul. This does not seem too hard, and
some airline web sites will do this. But now consider a user contained a list of items and quantities of each to be pur-
who wants to visit KAIST, and so enters Daejeon as the city chased. After the first shopping trip, Jane realized that she
to fly to. No search interface today, to our knowledge, can needed to add price information to the list to monitor her ex-
suggest flying into Inchon airport even though that is likely penses and she also started marking items that were not in
to be the preferred solution for most travelers. stock at the store. A week before Thanksgiving, Jane created
A significant part of database query specification is result another shopping list. However, this time, the items were
construction. Once the FROM and WHERE clauses of a gifts to her friends, and information about the friends there-
SQL query have been executed, we have data in hand that fore needed to be added to create this “gift list.” A week after
must be manipulated to produce the desired output. In Christmas, Jane started to create another “gift list” to track
the case of report generation from a data warehouse, there gifts she received from her friends. However, the friends in-
may not even be a selection condition to apply—the entire formation were now about friends giving her gifts. In the
query specification is about how to aggregate and present end, what started as a simple list of items for Jane had be-
the results. Indeed, the only examples we are aware of that come a repository of items, stores, and more importantly,
provide WYSIWYG capabilities in the database context are friends – an important part of Jane’s life.
warehouse report generation tools.
What does WYSIWYG mean for databases? After all, the The above example, although simple, illustrates how an
point of specifying a query is to get at information that the everyday database evolves and the many usability challenges
user does not possess. Even search engines are not WYSI- facing a database system. First, users do not have a clear
WYG. A WYSIWYG interface for selection specification knowledge of what the final structure of the database will
and data results involves a constant predictive capability on be and therefore a comprehensive design of the database
the part of the system. For example, instantaneous-response is impossible at the beginning. For example, Jane did not
interfaces [74] allow users to gain insights into the schema know that she needed to keep track of information about her
and the data during query time, which allows the user to friends until the time had come to buy gifts for them. Sec-
continuously refine the query as they are typing the initial ond, the structure of the database grows as more information
query. By the time the user has typed out the entire query, become available. For example, the information about price
the query has been correctly formulated and the results have and out of stock only became available after the shopping
returned. trip. Finally, information structures may be heterogeneous.
Other examples of WYSIWYG in databases can be seen in For example, the two “gift lists” that Jane created had differ-
a geographical context. Consider the display of a world map. ent semantics in their friends information and the database
The user could zoom into the area of interest and select air- needs to gracefully handle this heterogeneity.
ports geographically from the choices presented. Most travel In summary, for everyday data, the structure grows in-
sites already provide a facility to specify dates using a pop- crementally and a database system must provide interfaces
up calendar. It is just a question of taking this WYSIWYG for users to easily create both unstructured and structured
approach and pushing it farther. Most map databases today information and to fluidly manipulate the structure when
provide excellent direct manipulation capabilities, including necessary.
pan, zoom, and so on. Imagine a map database without
these facilities that requires users to specify, through a text 5. THE PAINLESS FUTURE
selection of zip code or latitude/longitude, the portion of the
When we speak of usability, we mean much more than
map that is of interest each time. We would find it terribly
just the user interface, which is only a part of the usability
frustrating. Unfortunately, most database query interfaces
equation. A more fundamental concern is that of the under-
today are not WYSIWYG and can be compared to this hy-
lying architecture. To understand this, consider computer
pothetical frustrating map query interface.
system security as a parallel example. When we think of
4.5 Birthing Pain security, many would immediately think of firewalls, and in-
deed firewalls are an important part of establishing a secure
While database systems have fully established themselves
computing environment. Yet, we all appreciate that a fire-
in the corporate market, they have not made a large im-
wall in itself is not enough—security is truly obtained only
pact on how users organize their everyday information. It
when it is designed into every aspect of the system.
is not because users do not want to store their informa-
While they provide visual means to let users manipulate
tion inside a database. Rather, there are many everyday
queries easily, state-of-the-art query builders and graphi-
data a user would like to put into her databases [7] such as
cal interfaces on top of current database systems still re-
shopping lists, expense reports, etc. The main reason for
quire abstraction of the query semantics through the user—
this “birthing pain” is that creating a database and putting
something at which she may not be particularly adept. They
information into a database are not easy tasks. For exam-
also tend to expose the underlying database schema to the
ple, creating a database in current systems requires a care-
user, adding to her cognitive burden. Furthermore, there
ful design of the database schema, for which ordinary users
are few friendly ways for a user to create or edit a database.
simply do not have the inclination or expertise. Similarly,
We need database systems that reflect the user’s model
putting information into an existing database may require
of the data, rather than forcing the data to fit a particular
the user to re-organize her information according to the spe-
model. Even if we have a relational implementation under
cific structure in this existing database, which involves un-
the hood, it should be hidden from the user, who should see
derstanding this structure and developing a mapping to it
the data presented in a form that is “natural.” This means
from the data.
there is no single standard presentation data model of data.
Example 5. Consider our user, Jane, who started to keep However, there are at least a few major models that work
track of her shopping lists. The first list she created simply well to model significant segments of user applications:
• Geographic: Many data sources of interest have geo- is applied, say on price, in the list view, the user should ex-
graphic or spatial distributions. These include not just pect that selection to be reflected in the geographic view. If
traditional geospatial data and map data, but also any these are implemented as views on the underlying data in
information with a location component. In fact, on the the manner we suggested in the preceding paragraph, then
web, mashups have been tremendously successful in this type of consistency should be maintained automatically.
presenting joins between data sets using a geographic Furthermore, it should be noted that while having two op-
location as the basis. tions for views may be appreciated by users, it is probably
the case that eight options would be considered too much.
• Network: We may have a graph or network represen- In addition to consistency, notions of data provenance
tation of data that is natural in many circumstances. must be integrated into the presentation data models. Prove-
In the case of MiMI, we found that for many scientists, nance [10, 18, 96], both why provenance and where prove-
protein interaction data is most naturally viewed as a nance, can assist the user in understanding the results pre-
graph. This is the case even when the scientist is not sented to her. Most discussions of provenance today are in
directly interested in graph properties: for example, the context of scientific workflows and scientific data man-
when viewing the properties of a single interaction be- agement [38, 78, 95]. However, data provenance is important
tween a pair of proteins, scientists still prefer to view for most application domains, including everyday tasks such
the interaction as an edge in the graph. We conjecture as travel planning and weather monitoring. Provenance us-
that this preference is because the local neighborhood ability is still in its infancy and presents fertile ground to
of the graph establishes “context” for the scientist and explore. To successfully include provenance in any system,
provides her with confidence that she is indeed looking we must find an easy, automated and unobtrusive way to
at the correct interaction. Moreover, it is also much capture it. Recent work [17, 72] present initial attempts
easier to point and click than to type. at this. However, how to succeed in capturing the correct
• Multidimensional: The multidimensional data model information, unobtrusively throughout the entire system is
is the presentation data model that is perhaps the most still an open topic. Moreover, once captured, the amount
successful commercially. It was explicitly called a data of provenance can easily outweigh the size of the data itself.
model, and introduced for decision support queries on Good provenance storage, compression, and query mecha-
warehoused data almost fifteen years ago [82]. It has nisms need to be in place. Finally, there must be a way
since been adopted widely, and even today it is at to make this provenance information understandable to the
the cutting edge of database user interaction [43, 19]. user: [25, 27] present provenance viewing strategies, but
While data in the warehouse itself may be stored in a understandability is far more than just easily viewing the
star schema with multiple tables, users of the multi- provenance entries.
dimensional data model think of the data as points in To allow more intuitive user interaction with the database,
multi-dimensional space, with aggregates of measure the presentation data model should be capable of direct
attributes being computed over specified ranges of di- data manipulation. Users are very good at point-and-
mension axes. click, drag-and-drop, and, to a lesser extent, filling in textboxes.
But, whatever they do, they should not be surprised by what
• Tabular: While joins across multiple normalized ta- they get. In addition, we must develop an algebra of oper-
bles may be difficult, people are certainly used to see- ations in the presentation data model such that the basic
ing data represented in simple two-dimensional tables. needs of most users are met by a very small number of op-
The popularity of the Excel spreadsheet as a data erators, thus reducing the barrier to adoption. As users
model speaks to this. For situations where data can be gain experience with the data model, they can become more
represented conveniently as a table, a tabular model is proficient at manipulating it, and can add to the suite of
certainly appropriate. operators they know how to invoke, thereby increasing the
expressive power of the algebra.
The concept of a view has been around almost since the be-
Finally, database systems must accommodate users who
ginning of relational data management. Traditionally, this
expect to create and update their databases and yet have
has been just another relation, defined as the result of eval-
no interest or expertise in database design and database in-
uating a query. Given the need to support various presen-
tegration. In spirit similar to the Dataspace concept [42,
tation data models, including those just listed, we can
65], we argue that database systems need to support inter-
generalize the notion of a view to be not just a table, but
faces for casual “schema-later” and “heterogeneous”
a representation of derived information in the presentation
database design: a database can be created with data
data model. Manipulating data through the presentation
that is unstructured or (heterogeneously) structured, and
data model leads to the well-known problem of updating
the system needs to take advantage of whatever structure
through views [39]. There are excellent research problems
the data currently has. Furthermore, the system needs to
to be addressed in characterizing presentation data models
provide functionality for users to add structure easily when
and types of updates that can be supported without am-
there is a need and in a manner convenient to them.
biguous updates.
Given a data set, it may not always be the case that a sin-
gle presentation data model is best to serve all user needs. 6. CONCLUSION
For example, most travel sites will show hotel options in Database systems today, for all their virtues, are extremely
both a geographical view and a textual list view. Each view difficult for most people to interact with. This difficulty can-
has its strengths, and most users seem to have no trouble not be fixed just by improving the query interface. Rather,
handling this choice of views. However, there is an expecta- we must rethink the architecture of the database system as
tion of consistency among view options—if a selection a whole. This paper has suggested a framework for this
purpose comprising a presentation data model as a distinct Database System. In VLDB, 2000.
layer above the usual logical data model. We envision this [24] Z. Chen and T. Li. Addressing Diverse User Preferences in
presentation data model to allow effective user interaction SQL-Query-Result Navigation. In SIGMOD, 2007.
with the database through direct data manipulation. [25] K. Cheung and J. Hunter. Provenance Explorer -
Customized Provenance Views Using Semantic
Inferencing. In ISWC, 2006.
7. REFERENCES [26] J. Choobineh, M. V. Mannino, and V. P. Tseng. A
[1] R. Abraham. FoXQ - XQuery by forms. In IEEE Form-Based Approach for Database Analysis and Design.
Symposium on Human Centric Computing Languages and CACM, 35(2), 1992.
Environments, 2003.
[27] S. Cohen, S. C. Boulakia, and S. Davidson. Towards a
[2] S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A Model of Scientific Workflows and User Views. In DILS,
System for Keyword-Based Search over Relational 2006.
Databases. In ICDE, 2002.
[28] S. Cohen, Y. Kanza, Y. Kogan, Y. Sagiv, W. Nutt, and
[3] S. Agrawal, S. Chaudhuri, L. Kollar, A. Marathe, A. Serebrenik. EquiX–A Search and Query Language for
V. Narasayya, and M. Syamala. Database Tuning Advisor XML. JASIST, 53(6), 2002.
for Microsoft SQL Server 2005. In VLDB, 2004.
[29] S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv. XSEarch:
[4] S. Amer-Yahia, L. V. S. Lakshmanan, and S. Pandit. A Semantic Search Engine for XML. In VLDB, 2003.
FleXPath: Flexible Structure and Full-Text Querying for
[30] Y. Cui and J. Widom. Lineage Tracing for General Data
XML. In SIGMOD, 2004.
Warehouse Transformations. In VLDB, 2001.
[5] I. Androutsopoulos, G. Ritchie, and P. Thanisch. Natural
[31] DabbleDB. http://www.dabbledb.com/.
Language Interfaces to Databases–an introduction.
Journal of Language Engineering, 1(1):29–81, 1995. [32] C. J. Date. Database Usability. In SIGMOD, New York,
NY, USA, 1983. ACM Press.
[6] D. Beech. Can SQL3 Be Simplified? Database
Programming and Design, 10(1):46–50, Jan 1997. [33] X. Dong and A. Halevy. A Platform for Personal
Information Management and Integration. In CIDR, 2005.
[7] G. Bell and J. Gemmell. A Digital Life, 2007.
[34] A. Doubleday, M. Ryan, M. Springett, and A. Sutcliffe. A
[8] O. Benjelloun, A. D. Sarma, A. Halevy, and J. Widom.
Comparison of Usability Techniques for Evaluating
ULDBs: Databases with Uncertainty and Lineage. In
Design. In DIS, 1997.
VLDB, 2006.
[35] A. Elkiss, Y. Li, and H. V. Jagadish. Ranked Relatedness
[9] F. Benzi, D. Maio, and S. Rizzi. Visionary: A
Queries for XML Databases. Technical report, University
Viewpoint-based Visual Language for Querying Relational
of Michigan, 2007.
Databases. Journal of Visual Languages and Computing,
10(2), 1999. [36] D. W. Embley. NFQL: The Natural Forms Query
Language. ACM Trans. Database Syst., 1989.
[10] D. Bhagwat, L. Chiticariu, W. C. Tan, and
G. Vijayvargiya. An Annotation Management System for [37] M. Erwig. A Visual Language for XML. In IEEE
Relational Databases. In VLDB, 2005. Symposium on Visual Languages, 2000.
[11] G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and [38] J. Frew and R. Bose. Earth System Science Workbench: A
S. Sudarshan. Keyword Searching and Browsing in Data Management Infrastructure for Earth Science
Databases using BANKS. In ICDE, 2002. Products. In SSDBM, 2001.
[12] A. Blum. Microsoft English Query 7.5: Automatic [39] A. Furtado and M. Casanova. Updating relational views.
Extraction of Semantics from Relational Databases and In Query Processing in Database Systems, 1985.
OLAP Cubes. In VLDB, 1999. [40] R. Goldman, N. Shivakumar, S. Venkatasubramanian, and
[13] D. Braga, A. Campi, and S. Ceri. XQBE (XQuery By H. Garcia-Molina. Proximity Search in Databases. In
Example): A Visual Interface to the Standard XML Query VLDB, 1998.
Language. ACM Trans. Database Syst., 30(2), 2005. [41] L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram.
[14] S. Brin and L. Page. The Anatomy of a Large-Scale XRANK: Ranked Keyword Search over XML Documents.
Hypertextual Web Search Engine. Computer Networks, In SIGMOD, 2003.
30(1-7):107–117, 1998. [42] A. Y. Halevy, M. J. Franklin, and D. Maier. Principles of
[15] A. Brown, L. Chung, W. Kakes, C. Ling, and Dataspace Systems. In PODS, 2006.
D. Patterson. Experience With Evaluating [43] P. Hanrahan. VizQL: A Language for Query, Analysis and
Human-Assisted Recovery Processes. Dependable Systems Visualization. SIGMOD, pages 721–721, 2006.
and Networks, pages 405–410, 2004. [44] E. Hatcher and O. Gospodnetic. Lucene in Action.
[16] A. B. Brown, L. C. Chung, and D. A. Patterson. Including Manning Publications, 2004.
the Human Factor in Dependability Benchmarks. In DSN [45] T. Haveliwala. Topic-Sensitive PageRank: A
Workshop on Dependability Benchmarking, 2002. Context-Sensitive Ranking Algorithm for Web Search.
[17] P. Buneman, A. Chapman, and J. Cheney. Provenance IEEE Transactions on Knowledge and Data Engineering,
Management in Curated Databases. In SIGMOD, 2006. 15(4):784–796, 2003.
[18] P. Buneman, S. Khanna, and W.-C. Tan. Why and Where: [46] T. Haveliwala, S. Kamvar, and G. Jeh. An Analytical
A Characterization of Data Provenance. In ICDT, 2001. Comparison of Approaches to Personalizing PageRank,
[19] Business Objects, Inc. Crystal Xcelsius, Preprint, June 2003.
http://xcelsius.com. [47] V. Hristidis and Y. Papakonstantinou. DISCOVER:
[20] R. Butterworth, A. Blandford, and D. Duke. Using Formal Keyword Search in Relational Databases. In VLDB, 2002.
Models to Explore Display-Based Usability Issues. Journal [48] J. J. Inman, J. S. Dyer, and J. Jia. A Generalized Utility
of Visual Languages and Computing, 10(5), 1999. Model of Disappointement and Regret Effects on
[21] D. Carmel, Y. S. Maarek, M. Mandelbrod, Y. Mass, and Post-Choice Valuation. Marketing Science, 16(2):97–111,
A. Soffer. Searching XML Documents via XML 1997.
Fragments. In SIGIR, 2003. [49] Y. E. Ioannidis and S. Viglas. Conversational Querying.
[22] I. Ceaparu, J. Lazar, K. Bessiere, J. Robinson, and Inf. Syst, 31(1):33–56, 2006.
B. Shneiderman. Determining Causes and Severity of [50] H. V. Jagadish, S. Al-Khalifa, A. Chapman, L. V.
End-User Frustration. International Journal of Human Lakshmanan, A. Nierman, S. Paparizos, J. M. Patel,
Computer Interaction, 17(3), 2004. D. Srivastava, N. Wiwatwattana, Y. Wu, and C. Yu.
[23] S. Chaudhuri and G. Weikum. Rethinking Database TIMBER: A Native XML Database. VLDB Journal,
System Architecture: Towards a Self-Tuning, RISC-style 11(4):274–291, 2002.
[51] M. Jayapandian, A. Chapman, V. G. Tarcea, C. Yu, Instant-Response Interfaces. In SIGMOD, 2007.
A. Elkiss, A. Ianni, B. Liu, A. Nandi, C. Santos, [75] A. Nandi and H. V. Jagadish. Effective Phrase Prediction.
P. Andrews, B. Athey, D. States, and H. Jagadish. Technical report, University of Michigan, 2007.
Michigan Molecular Interactions (MiMI): Putting the [76] OpenRecord. http://www.openrecord.org/.
Jigsaw Puzzle Together. Nucleic Acids Research, pages [77] D. Oppenheimer. The Importance of Understanding
D566–D571, Jan 2007. Distributed System Configuration. System Administrators
[52] T. S. Jayram, R. Krishnamurthy, S. Raghavan, are Users, Too: Designing Workspaces for Managing
S. Vaithyanathan, and H. Zhu. Avatar Information Internet-scale Systems, CHI 2003 Workshop, 2003.
Extraction System. IEEE Data Eng. Bull., 29(1):40–48, [78] C. Pancerella, J. Hewson, W. Koegler, et al. Metadata in
2006. the Collaboratory for Multi-Scale Chemical Science. In
[53] G. Jeh and J. Widom. Scaling Personalized Web Search. Dublin Core Conference, 2003.
WWW, pages 271–279, 2003. [79] Y. Papakonstantinou, M. Petropoulos, and V. Vassalos.
[54] Y. Kanza and Y. Sagiv. Flexible Queries Over QURSED: Querying and Reporting Semistructured Data.
Semistructured Data. In PODS, 2001. In SIGMOD, 2002.
[55] J. F. Kelley. An Iterative Design Methodology for [80] M. Perkowitz and O. Etzioni. Adaptive Web Sites.
User-Friendly Natural Language Office Information CACM, 43(8):152–158, 2000.
Applications. ACM Trans. Database Syst., 2(1), 1984. [81] A. Pfeiffer. Why Features Don’t Matter Anymore: The
[56] G. Koutrika and Y. Ioannidis. Personalization of Queries New Laws of Digital Technology. Ubiquity, 7(7), Feburary
in Database Systems. In ICDE, 2004. 2006.
[57] Y. Li, I. Chaudhuri, H. Yang, S. Singh, and H. V. [82] Pilot Software. http://www.pilotsoftware.com/.
Jagadish. DaNaLIX: a Domain-adaptive Natural [83] A.-M. Popescu, O. Etzioni, and H. A. Kautz. Towards a
Language Interface for Querying XML. In SIGMOD, 2007. Theory of Natural Language Interfaces to Databases. In
[58] Y. Li, H. Yang, and H. V. Jagadish. NaLIX: an Interactive IUI, 2003.
Natural Language Interface for Querying XML. In [84] R. E. Sabin and T. K. Yap. Integrating Information
SIGMOD, 2005. Retrieval Techniques with Traditional DB Methods in a
[59] Y. Li, H. Yang, and H. V. Jagadish. Constructing a Web-Based Database Browser. In SAC, 1998.
Generic Natural Language Interface for an XML [85] A. Sengupta and A. Dillon. Query by Templates: A
Database. In EDBT, 2006. Generalized Approach for Visual Query Formulation for
[60] Y. Li, H. Yang, and H. V. Jagadish. Term Disambiguation Text Dominated Databases. In ADL, 1997.
in Natural Language Query for XML. In FQAS, 2006. [86] P. Shannon et al. Cytoscape: A Software Environment for
[61] Y. Li, H. Yang, and H. V. Jagadish. NaLIX: A Generic Integrated Models of Biomolecular Interaction Networks.
Natural Language Search Environment for XML Data. Genome Res, 13(11):2498–504, 2003.
acmtds, accepted. [87] B. Sheneiderman. Improving the Human Factors Aspect of
[62] Y. Li, C. Yu, and H. V. Jagadish. Schema-Free XQuery. Database Interactions. ACM Trans. Database Syst., 3(4),
In VLDB, 2004. 1978.
[63] Y. Li, C. Yu, and H. V. Jagadish. Enabling Schema-Free [88] S. M. Shugan. The Cost of Thinking. Journal of
XQuery with Meaningful Query Focus. VLDB Journal, in Consumer Research, 7(2):99–111, 1980.
press. [89] Y. Simmhan, B. Plale, and D. Gannon. A Survey of Data
[64] S. Lightstone, G. M. Lohman, P. J. Haas, et al. Making Provenance in E-Science. SIGMOD Record, 34(3):31–36,
DB2 Products Self-Managing: Strategies and Experiences. 2005.
IEEE Data Eng. Bull, 29(3):16–23, 2006. [90] S. Sinha, K. Bowers, and S. A. Mamrak. Accessing a
[65] J. Madhavan, S. Jeffery, S. Cohen, X. Dong, D. Ko, C. Yu, Medical Database using WWW-Based User Interfaces.
and A. Halevy. Web-scale Data Integration: You Can Technical report, Ohio State University, 1998.
Only Afford to Pay As You Go. In CIDR, 2007. [91] C. Soules, S. Shah, G. R. Ganger, and B. D. Noble. It’s
[66] V. Markl, G. M. Lohman, and V. Raman. LEO: An Time to Bite the User Study Bullet. Technical report,
Autonomic Query Optimizer for DB2. IBM Systems University of Michigan, 2007.
Journal, 42(1):98–106, 2003. [92] A. Sutcliffe, M. Ryan, A. Doubleday, and M. Springett.
[67] I. Mervielde. The Need for Closure and the Spontaneous Model Mismatch Analysis: Towards a Deeper Explanation
Use of Complex and Simple Cognitive Structures. The of Users’ Usability Problems. Behavior & Information
Journal of Social Psychology, 2003. Technology, 19(1), 2000.
[68] K. Mitchell and J. Kennedy. DRIVE: An Environment for [93] A. Tornqvist, C. Nelson, and M. Johnsson. XML and
the Organized Construction of User-Interfaces to Objects-The Future for E-Forms on the Web. In
Databases. In Interfaces to Databases (IDS-3), 1996. WETICE. IEEE Computer Society, 1999.
[69] B. Mobasher, R. Cooley, and J. Srivastava. Automatic [94] A. I. Wasserman. User Software Engineering and the
Personalization Based on Web Usage Mining. CACM, Design of Interactive Systems. In ICSE, Piscataway, NJ,
43(8):142–151, 2000. USA, 1981. IEEE Press.
[70] A. Motro. Query generalization: A method for [95] J. Widom. Trio: A System for Integrated Management of
interpreting null answers. In Workshop on Expert Data, Accuracy, and Lineage. In CIDR, 2005.
Database Systems, 1986. [96] A. Woodruff and M. Stonebraker. Supporting
[71] P. Mukhopadhyay and Y. Papakonstantinou. Mixing Fine-grained Data Lineage in a Database Visualization
Querying and Navigation in MIX. In ICDE, 2002. Environment. In ICDE, 1997.
[72] S. Munroe, S. Miles, L. Moreau, and J. Vázquez-Salceda. [97] C. Yu and H. V. Jagadish. Schema Summarization. In
PrIMe: A Software Engineering Methodology for VLDB, 2006.
Developing Provenance-Aware Applications. In SEM, [98] C. Yu and H. V. Jagadish. Querying Complex Structured
2006. Databases. Technical report, University of Michigan, 2007.
[73] N. Murray, N. Paton, and C. Goble. Kaleidoquery: A [99] W. Yuan. End-User Searching Behavior in Information
Visual Query Language for Object Databases. In Retrieval: A Longitudinal Study. JASIST, 48(3), 1997.
Advanced Visual Interfaces, 1998. [100] M. M. Zloof. Query-by-Example: the Invocation and
[74] A. Nandi and H. V. Jagadish. Assisted Querying using Definition of Tables and Forms. In VLDB, 1975.

You might also like