Professional Documents
Culture Documents
What Does it Take to Have a Decision-Support Engine?
ABSTRACT It is fair to say that one of the most salient
hara
teristi
s
TBA (even though not the only one) of a De
ision-Support sys-
tem is the
omplexity of queries and the size of databases
supported. This makes query pro
essing and optimization
Categories and Subject Descriptors an area whi
h requires spe
ial attention when designing an
H.2.4 [Database Management℄: Systems|Query Pro- OLAP system. Thus, one of main obsta
les for a non-OLAP
essing system to be
ome one is to develop extremely eÆ
ient sup-
port for
omplex queries. The question is, what does it take
General Terms to a
hieve su
h support? The authors fa
ed this questions
re
ently when they tried to add support for subqueries to
Languages the MySQL system ([12℄).
While well known and widely used, MySQL is
onsidered
Keywords a light system, with a small footprint, light resour
e usage,
SQL, query optimization, unnesting and able to handle heavy transa
tion loads (and thus very
popular as a ba
k-end for Web sites), but unable to han-
1. INTRODUCTION dle De
ision-Support style SQL queries. In fa
t, MySQL
does not handle nested subqueries in its
urrent in
arna-
Sin
e Kimball's pro
lamation that \one size does not t tion1 . In this proje
t, we set up to add subquery pro
essing
all" ([9℄), it is
ustomary to divide database systems into to MySQL while avoiding large
hanges to the query pro
ess-
transa
tion pro
essing (OLTP) and De
ision-Support (OLAP) ing
ode. Our strategy followed two steps: rst, we added
systems. The dieren
es in several key parameters of the the ability to handle subqueries in the FROM
lause. Se
ond,
workload, like type of queries, size of tables,. . . have
on- we rewrote queries with subqueries in the WHERE
lause by
vin
ed experts that dierent systems must be used for op- moving the subquery to the FROM
lause and
hanging the
timal performan
e in ea
h environment: OLAP systems re-
onditions in the WHERE
lause. While the
hanges are trivial
quire spe
i
te
hniques for data warehousing, like mate- in some
ases (e.g., it is well known that an IN
ondition is
rialized views, query rewriting, et
. whi
h implement re- equivalent to a semijoin), some other
ases required more
sear
h results spe
i
ally geared towards De
ision-Support
omplex
hanges. The tri
ky (and sometimes
onfusing) se-
environments ([8, 2℄). However, vendors of existing database manti
s of SQL make some of the translations diÆ
ult to
systems have not developed new systems from s
rat
h. Ea- dene and prove
orre
t. In the pro
ess, we show that all
ger to exploit their existing te
hnology (and the investment SQL queries, no matter how
omplex,
an be supported by
of time and money), they have ne-tuned their produ
ts, a query engine with support for proje
tion, sele
tion, join,
introdu
ing spe
i
data warehousing tools into their oer- semijoin, outerjoin, and grouping (union and dieren
e of
ing. While this has required substantial extensions, it has queries
an be handled re
ursively). Of
ourse, this is not
also shown the enormous
exibility in the underlying rela- all the system needs; support for large les, spe
ial indi
es
tional te
hnology. (bitmaps, index joins, et
.), materialized views, and more,
A full version of this paper is available as a te
hni
al report is needed for a true OLAP system. For instan
e, an eÆ
ient
at date.spd.louisville.edu/badia/forloop.html. This sorting method is also required for the SORT BY
lause of
resear
h was sponsored by NSF under grant IIS-0091928. SQL, and as implementation tool for other algorithms. In
this paper, we restri
t our attention to the set of algebrai
1
Permission to make digital or hard copies of all or part of this work for Adding subquery support to MySQL has been in the devel-
personal or classroom use is granted without fee provided that copies are opers to-do list for quite some time, but it has not material-
not made or distributed for profit or commercial advantage and that copies ized at the time of writing this paper. Part of the reason for
bear this notice and the full citation on the first page. To copy otherwise, to the postponement is probably the fa
t that MySQL does not
republish, to post on servers or to redistribute to lists, requires prior specific use a real query tree and its ba
k-end pro
essing is geared
permission and/or a fee. towards SPJ types of queries. Thus, the addition of sub-
DOLAP’02, November 4–9, 2002, McLean, Virginia, USA. query pro
essing support would require deep
hanges in the
Copyright 2002 ACM 1-58113-492-4/02/0011 ...$5.00. query pro
essing me
hanism.
operators that is required to support SQL queries. We stress ered;
hoi
es for join algorithm seem limited to nested loop
that our aim is to obtain
orre
t,
omplete transformations (with or without use of indi
es in the inner relation). It
for SQL subqueries (i.e. transformations appli
able to all seems
lear that this framework is suÆ
ient to handle the
kinds of subqueries). Issues of optimization are mentioned type of queries present in transa
tion-oriented environments
only from an algebrai
perspe
tive, but the larger issue of (simple SPJ queries, usually involving one or few tables, and
whether the transformation provide performan
e advantages with highly restri
tive sele
tions), while oering low over-
is left for further work, sin
e a large part of performan
e head and simpli
ity. However, the framework is ill-suited
depends on physi
al
hara
teristi
s (support for large les, for De
ision-Support queries; it is in parti
ular diÆ
ult to
spe
ial indi
es, et
.). Thus, we fo
us on
hanges needed to determine how to best extend it to deal with subqueries.
the ba
k-end from an algebrai
perspe
tive. The
on
lusion The absen
e of a real query tree makes implementation of
is that the dieren
e between support of all SQL queries standard pro
essing te
hniques, like unnesting ([10, 3, 11℄),
and SPJ queries is not as signi
ant as one would think; the extremely
ompli
ated3 . Thus, the approa
h taken was de-
main dieren
e lies in the optimizer2 . termined by the goal of
hanging as little of the
ode as
In se
tion 2 we des
ribe how MySQL pro
esses queries, possible, and the need to work without a query tree.
and explain the diÆ
ulties of adding subquery support in
the standard way. In se
tion 3 we show our strategy to deal 3. ADDING FROM CLAUSE SUBQUERIES
with subqueries in the FROM
lause, and in se
tion 4, with
WHERE-
lause subqueries. In se
tion 5, we analyze the trade- The ability to deal with FROM
lause subqueries was added
os of our approa
h and dis
uss optimization issues, and in by modifying only the front-end. The
hanges were de-
se
tion 6 we oer some
on
lusions. signed to make the ba
k-end believe it was pro
essing a
regular query when it was handling the subquery. Details
on the modi
ations, as well as examples, are given in the
2. OVERVIEW OF MYSQL QUERY PRO- te
hni
al report of whi
h this paper is a summary ([1℄).
CESSING A simple example is provided here to illustrate the pro-
ess: assume a database with tables table1(
ola,
olb)
MySQL uses a
lient-server model with multiple threads and table2(
ola,
ol
). Consider the query in 1(a). The
on the server. When the database re
eives a SQL query, it query has a FROM-
lause subquery in it. When the parser
spawns a new thread to handle the query and sends the text dete
ts it, the subquery is sent for pro
essing through the
of the SQL query to the parser. The parser is a large lex- parser and all the way to the ba
k end, whi
h is instru
ted
ya
s
ript whi
h
onstru
ts a query stru
ture
orresponding to deposit the results in a temporary table. This table is
to the query. This stru
ture basi
ally
ontains all strings asso
iated with its alias (a). All the information about the
found in the SQL query
lassied by
lause. The parser table is then added to the table list for the main query be-
simply
he
ks for
orre
t SQL grammar; it does very little fore pro
eeding with the pro
essing of the main query. Thus,
veri
ation of the query's elements (the parser will not
he
k the ba
k end sees the queries in 1(b) and in 1(
). By the
the existen
e of the tables, the existen
e of
olumns or if an time this last query is pro
essed, an entry for table a has
aggregate fun
tion was used without a GROUP BY
lause). been
reated and is pointing to the le where the results
The query stru
ture from the parser is passed to the query (temporary table) have been stored.
pro
essor. The rst step in the query pro
essing is to
he
k
the user's a
ess to all the tables in the table list of the
query. The tables need to be opened and lo
ked so the SELECT
ola
query
an exe
ute. On
e the tables are opened and lo
ked, FROM (SELECT
ola,
olb
the pro
essor moves to a sele
t (query) handler. If a query is FROM table1
a UNION, a re
ursive union handler will be
alled. Otherwise, WHERE
olb < 4) AS a
the query handler is
alled to exe
ute a simple query. This WHERE a.
ola > 3
handler will
arry out the query by intera
tion with the
ba
k-end pro
essor, whi
h takes
are of le (disk) a
ess. (a) Original Query
After analyzing MySQL's pro
essing
ow and data stru
-
tures, it seems
lear that MySQL does not use a query tree,
i.e. a stru
ture where the query is represented with a re- SELECT
ola,
olb SELECT
ola
lational algebra-like expression whi
h
an be manipulated FROM table1 FROM a
for optimization and further pro
essing. The query is repre- WHERE
olb < 4 WHERE a.
ola > 3
sented by two lists: a list of table (le) a
esses,
alled the
table list, and a list of joins,
alled the join list. Sele
tions (b) Subquery (
) Final Query
are atta
hed to the tables to whi
h they apply, and so are
part of the table list. Optimization seems to be absent from
the system ex
ept at the most trivial level. In parti
ular, Figure 1: Transformation for FROM
lause sub-
and as a result of the representation
hosen for the query, queries
it seems that all sele
tions are pushed down and pipelined
regardless of sele
tivity or form; and that join order is pretty
mu
h xed by the query. No dierent orderings are
onsid- There were many areas that needed to be
hanged to add
2
FROM
lause subqueries to MySQL. The parser had to be
A dierent issue is whether SQL itself is well suited to 3
OLAP. It is by now a well established opinion that SQL On a pra
ti
al note, the almost non existent do
umentation
needs to be extended and
omplemented for a true De
ision- on system internals and abundant use of global variables
Support environment ([9, 6℄). makes any modi
ations to the sour
e
ode quite risky.
modied to both a
ept the additional fun
tionality in the tion in the subquery (i.e. if su
h predi
ate is absent, it is
SQL language and to store it
orre
tly in the query stru
- understood that the subquery is not
orrelated). Variable
ture. The query stru
ture was expanded to store the infor- names are meant to be des
riptive; thus op and op2 stand for
mation
on
erning the subquery. The query pro
essor had operators in predi
ates; single-
ol stands for an attribute
to be modied to a
tually exe
ute the subquery and in
or- name whi
h appears alone in a SELECT
lause;
orr-value
porate it into the query stru
ture again so that the ba
k-end appears for an attribute name that introdu
es
orrelation in
ould nish the pro
essing of the (main) query. The whole a subquery; and aggr-
olumn denotes the aggregate fun
-
pro
ess was designed to be re
ursive, and therefore it
an tion and attribute name used in aggregates subqueries. Vari-
handle queries with a FROM
lause subquery whi
h in turn ables ending in list or list2 are meant to stand for a list
ontains a FROM
lause subquery, and so on. On
e subquery of elements.
pro
essing is done, pro
essing of the main query
ontinues. In our transformations, we have made the following as-
The ba
k-end does not know that the temporary table
re- sumptions:
ated by the subquery is not an a
tual table in the database;
it knows that the table has an alias and that it does not have ea
h relation has an attribute denoted by # whi
h
any indi
es. The subquery pro
essing is
ontained in its own serves as a (non null) primary key; and
thread. After the query nishes pro
essing, the temporary
table has to be released and the thread terminated. all queries are
onne
ted, i.e., all tables appearing in a
FROM
lause are joined together in a
ommon table.