You are on page 1of 8

Adding Subqueries to MySQL, or


What Does it Take to Have a Decision-Support Engine?

Antonio Badia Matt Chanda Bin Cao

Computer Engineering and Computer Science Department


University of Louisville
Louisville, KY 40292

ABSTRACT It is fair to say that one of the most salient hara teristi s
TBA (even though not the only one) of a De ision-Support sys-
tem is the omplexity of queries and the size of databases
supported. This makes query pro essing and optimization
Categories and Subject Descriptors an area whi h requires spe ial attention when designing an
H.2.4 [Database Management℄: Systems|Query Pro- OLAP system. Thus, one of main obsta les for a non-OLAP
essing system to be ome one is to develop extremely eÆ ient sup-
port for omplex queries. The question is, what does it take
General Terms to a hieve su h support? The authors fa ed this questions
re ently when they tried to add support for subqueries to
Languages the MySQL system ([12℄).
While well known and widely used, MySQL is onsidered
Keywords a light system, with a small footprint, light resour e usage,
SQL, query optimization, unnesting and able to handle heavy transa tion loads (and thus very
popular as a ba k-end for Web sites), but unable to han-
1. INTRODUCTION dle De ision-Support style SQL queries. In fa t, MySQL
does not handle nested subqueries in its urrent in arna-
Sin e Kimball's pro lamation that \one size does not t tion1 . In this proje t, we set up to add subquery pro essing
all" ([9℄), it is ustomary to divide database systems into to MySQL while avoiding large hanges to the query pro ess-
transa tion pro essing (OLTP) and De ision-Support (OLAP) ing ode. Our strategy followed two steps: rst, we added
systems. The di eren es in several key parameters of the the ability to handle subqueries in the FROM lause. Se ond,
workload, like type of queries, size of tables,. . . have on- we rewrote queries with subqueries in the WHERE lause by
vin ed experts that di erent systems must be used for op- moving the subquery to the FROM lause and hanging the
timal performan e in ea h environment: OLAP systems re- onditions in the WHERE lause. While the hanges are trivial
quire spe i te hniques for data warehousing, like mate- in some ases (e.g., it is well known that an IN ondition is
rialized views, query rewriting, et . whi h implement re- equivalent to a semijoin), some other ases required more
sear h results spe i ally geared towards De ision-Support omplex hanges. The tri ky (and sometimes onfusing) se-
environments ([8, 2℄). However, vendors of existing database manti s of SQL make some of the translations diÆ ult to
systems have not developed new systems from s rat h. Ea- de ne and prove orre t. In the pro ess, we show that all
ger to exploit their existing te hnology (and the investment SQL queries, no matter how omplex, an be supported by
of time and money), they have ne-tuned their produ ts, a query engine with support for proje tion, sele tion, join,
introdu ing spe i data warehousing tools into their o er- semijoin, outerjoin, and grouping (union and di eren e of
ing. While this has required substantial extensions, it has queries an be handled re ursively). Of ourse, this is not
also shown the enormous exibility in the underlying rela- all the system needs; support for large les, spe ial indi es
tional te hnology. (bitmaps, index joins, et .), materialized views, and more,
A full version of this paper is available as a te hni al report is needed for a true OLAP system. For instan e, an eÆ ient
at date.spd.louisville.edu/badia/forloop.html. This sorting method is also required for the SORT BY lause of
resear h was sponsored by NSF under grant IIS-0091928. SQL, and as implementation tool for other algorithms. In
this paper, we restri t our attention to the set of algebrai

1
Permission to make digital or hard copies of all or part of this work for Adding subquery support to MySQL has been in the devel-
personal or classroom use is granted without fee provided that copies are opers to-do list for quite some time, but it has not material-
not made or distributed for profit or commercial advantage and that copies ized at the time of writing this paper. Part of the reason for
bear this notice and the full citation on the first page. To copy otherwise, to the postponement is probably the fa t that MySQL does not
republish, to post on servers or to redistribute to lists, requires prior specific use a real query tree and its ba k-end pro essing is geared
permission and/or a fee. towards SPJ types of queries. Thus, the addition of sub-
DOLAP’02, November 4–9, 2002, McLean, Virginia, USA. query pro essing support would require deep hanges in the
Copyright 2002 ACM 1-58113-492-4/02/0011 ...$5.00. query pro essing me hanism.
operators that is required to support SQL queries. We stress ered; hoi es for join algorithm seem limited to nested loop
that our aim is to obtain orre t, omplete transformations (with or without use of indi es in the inner relation). It
for SQL subqueries (i.e. transformations appli able to all seems lear that this framework is suÆ ient to handle the
kinds of subqueries). Issues of optimization are mentioned type of queries present in transa tion-oriented environments
only from an algebrai perspe tive, but the larger issue of (simple SPJ queries, usually involving one or few tables, and
whether the transformation provide performan e advantages with highly restri tive sele tions), while o ering low over-
is left for further work, sin e a large part of performan e head and simpli ity. However, the framework is ill-suited
depends on physi al hara teristi s (support for large les, for De ision-Support queries; it is in parti ular diÆ ult to
spe ial indi es, et .). Thus, we fo us on hanges needed to determine how to best extend it to deal with subqueries.
the ba k-end from an algebrai perspe tive. The on lusion The absen e of a real query tree makes implementation of
is that the di eren e between support of all SQL queries standard pro essing te hniques, like unnesting ([10, 3, 11℄),
and SPJ queries is not as signi ant as one would think; the extremely ompli ated3 . Thus, the approa h taken was de-
main di eren e lies in the optimizer2 . termined by the goal of hanging as little of the ode as
In se tion 2 we des ribe how MySQL pro esses queries, possible, and the need to work without a query tree.
and explain the diÆ ulties of adding subquery support in
the standard way. In se tion 3 we show our strategy to deal 3. ADDING FROM CLAUSE SUBQUERIES
with subqueries in the FROM lause, and in se tion 4, with
WHERE- lause subqueries. In se tion 5, we analyze the trade- The ability to deal with FROM lause subqueries was added
o s of our approa h and dis uss optimization issues, and in by modifying only the front-end. The hanges were de-
se tion 6 we o er some on lusions. signed to make the ba k-end believe it was pro essing a
regular query when it was handling the subquery. Details
on the modi ations, as well as examples, are given in the
2. OVERVIEW OF MYSQL QUERY PRO- te hni al report of whi h this paper is a summary ([1℄).
CESSING A simple example is provided here to illustrate the pro-
ess: assume a database with tables table1( ola, olb)
MySQL uses a lient-server model with multiple threads and table2( ola, ol ). Consider the query in 1(a). The
on the server. When the database re eives a SQL query, it query has a FROM- lause subquery in it. When the parser
spawns a new thread to handle the query and sends the text dete ts it, the subquery is sent for pro essing through the
of the SQL query to the parser. The parser is a large lex- parser and all the way to the ba k end, whi h is instru ted
ya s ript whi h onstru ts a query stru ture orresponding to deposit the results in a temporary table. This table is
to the query. This stru ture basi ally ontains all strings asso iated with its alias (a). All the information about the
found in the SQL query lassi ed by lause. The parser table is then added to the table list for the main query be-
simply he ks for orre t SQL grammar; it does very little fore pro eeding with the pro essing of the main query. Thus,
veri ation of the query's elements (the parser will not he k the ba k end sees the queries in 1(b) and in 1( ). By the
the existen e of the tables, the existen e of olumns or if an time this last query is pro essed, an entry for table a has
aggregate fun tion was used without a GROUP BY lause). been reated and is pointing to the le where the results
The query stru ture from the parser is passed to the query (temporary table) have been stored.
pro essor. The rst step in the query pro essing is to he k
the user's a ess to all the tables in the table list of the
query. The tables need to be opened and lo ked so the SELECT ola
query an exe ute. On e the tables are opened and lo ked, FROM (SELECT ola, olb
the pro essor moves to a sele t (query) handler. If a query is FROM table1
a UNION, a re ursive union handler will be alled. Otherwise, WHERE olb < 4) AS a
the query handler is alled to exe ute a simple query. This WHERE a. ola > 3
handler will arry out the query by intera tion with the
ba k-end pro essor, whi h takes are of le (disk) a ess. (a) Original Query
After analyzing MySQL's pro essing ow and data stru -
tures, it seems lear that MySQL does not use a query tree,
i.e. a stru ture where the query is represented with a re- SELECT ola, olb SELECT ola
lational algebra-like expression whi h an be manipulated FROM table1 FROM a
for optimization and further pro essing. The query is repre- WHERE olb < 4 WHERE a. ola > 3
sented by two lists: a list of table ( le) a esses, alled the
table list, and a list of joins, alled the join list. Sele tions (b) Subquery ( ) Final Query
are atta hed to the tables to whi h they apply, and so are
part of the table list. Optimization seems to be absent from
the system ex ept at the most trivial level. In parti ular, Figure 1: Transformation for FROM lause sub-
and as a result of the representation hosen for the query, queries
it seems that all sele tions are pushed down and pipelined
regardless of sele tivity or form; and that join order is pretty
mu h xed by the query. No di erent orderings are onsid- There were many areas that needed to be hanged to add
2
FROM lause subqueries to MySQL. The parser had to be
A di erent issue is whether SQL itself is well suited to 3
OLAP. It is by now a well established opinion that SQL On a pra ti al note, the almost non existent do umentation
needs to be extended and omplemented for a true De ision- on system internals and abundant use of global variables
Support environment ([9, 6℄). makes any modi ations to the sour e ode quite risky.
modi ed to both a ept the additional fun tionality in the tion in the subquery (i.e. if su h predi ate is absent, it is
SQL language and to store it orre tly in the query stru - understood that the subquery is not orrelated). Variable
ture. The query stru ture was expanded to store the infor- names are meant to be des riptive; thus op and op2 stand for
mation on erning the subquery. The query pro essor had operators in predi ates; single- ol stands for an attribute
to be modi ed to a tually exe ute the subquery and in or- name whi h appears alone in a SELECT lause; orr-value
porate it into the query stru ture again so that the ba k-end appears for an attribute name that introdu es orrelation in
ould nish the pro essing of the (main) query. The whole a subquery; and aggr- olumn denotes the aggregate fun -
pro ess was designed to be re ursive, and therefore it an tion and attribute name used in aggregates subqueries. Vari-
handle queries with a FROM lause subquery whi h in turn ables ending in list or list2 are meant to stand for a list
ontains a FROM lause subquery, and so on. On e subquery of elements.
pro essing is done, pro essing of the main query ontinues. In our transformations, we have made the following as-
The ba k-end does not know that the temporary table re- sumptions:
ated by the subquery is not an a tual table in the database;
it knows that the table has an alias and that it does not have  ea h relation has an attribute denoted by # whi h
any indi es. The subquery pro essing is ontained in its own serves as a (non null) primary key; and
thread. After the query nishes pro essing, the temporary
table has to be released and the thread terminated.  all queries are onne ted, i.e., all tables appearing in a
FROM lause are joined together in a ommon table.

4. ADDING WHERE CLAUSE SUBQUERIES 4.1 Joined Subqueries


We lassify SQL subqueries orresponding to the ondi- The queries that an be transformed into join are sub-
tion that they express. Thus, we distinguish between EX- queries with EXISTS and IN (and also NOT EXISTS, in the
ISTS and NOT EXISTS subqueries, IN and NOT IN sub- non- orrelated ase); subqueries with SOME and subqueries
queries, SOME subqueries, ALL subqueries and aggregate with aggregation, in the non- orrelated ase. We show ea h
subqueries. Ea h one of these lasses is further subdivided one separately.
a ording to whether the subquery is orrelated or not. We
present ea h lass of queries separately; however, a general 1. [NOT℄ EXISTS non- orrelated Subqueries. The
strategy takes are of all ases with just two variations. EXISTS fun tion he ks for the existen e of a result
Our strategy was to move the subquery in the WHERE from a subquery. In non orrelated subqueries, this
lause to the FROM lause, and then use the temporary table result is independent of the main query; thus, this ase
reated to nish up pro essing. While this strategy is trivial redu es to he king whether a given query returns the
in some ases (like EXISTS and IN), there are some others empty set or not. Thus, a very simple transformation
whi h require areful rewriting of the query onditions. In an take are of the non- orrelated EXISTS. The sub-
general, two di erent pro edures were developed: for some query an simply be inserted in the FROM lause. As a
ases, moving the query to the FROM lause and adding a result, a Cartesian produ t takes pla e between the ta-
join was suÆ ient. This takes are of EXISTS subqueries bles in the main query and the result of exe uting the
(both orrelated and non- orrelated); and NOT EXISTS, IN subquery. If su h a result is empty, so is the Cartesian
and aggregate subqueries (non- orrelated ase only). For produ t (sin e, for any relation R, R  ; = ;), and
other ases, the table that results from the subquery must therefore no result is returned (as it would happen in
be onne ted to other existing tables through an outer join; the original query). On the other hand, if the table
further modi ations to the WHERE lause are also needed. representing the result of pro essing the subquery is
This takes are of NOT IN, SOME and ALL subqueries (non- not empty, the Cartesian produ t is not empty either,
orrelated ase only); and of NOT EXISTS, NOT IN, ALL and but multiple opies of the tuples in the main query
aggregate subqueries ( orrelated ase only)4 . would result. Therefore, we use the following tri k:
We des ribe the transformations with pairs of patterns, we retrieve an arbitrary onstant whi h does not ap-
i.e. ombinations of onstants (keywords) and variables that pear in the database instead of any value. The end
spe ify the form the rewrite takes. The rst pattern in the result is to retrieve either one tuple or nothing for the
pair shows the original query, and the se ond one the result subquery. This allows existen e testing and gets rid of
of the rewriting. Use of the same variables in the rewrit- the dupli ate problem. In our implementation, we fol-
ten pattern shows how elements of the old query are used. lows this approa h sin e it an be made very eÆ ient,
Square bra kets ([,℄) are used to show optional elements. with some simple additions. The additions in lude a
Thus, [a | b℄ is used to hoose one of a or b. In general, Boolean variable to hold the result of the subquery,
this is used for transformations whi h apply to orrelated and a boolean ag to let the subquery pro essor know
and non- orrelated queries. Parenthesis are used for larity. that the subquery was in an EXISTS fun tion. If it is,
A di erent pattern pair is developed for ea h ase. Keywords the pro essor sets the row limit on the query to one
are shown in upper ase; variables are shown in lower ase. In row and exe utes the query into a temporary table5 .
the WHERE lause of a query, we show expli itly the predi ate It then he ks the number of rows in the temporary
that onne ts query and subquery ( alled the linking pred- table. If one row exists then the value of the exists
i ate), together with any attributes or operators involved. subquery result in the subquery lass is set to true. If
We also show expli itly any predi ate introdu ing orrela-
5
4
The row limit of a query is the maximum number of rows
Further work was done to treat the division (or universal that will be returned by the database for a query, and is
quanti ation) as a ase apart. While an interesting rewrite usually unde ned. This row limit is also set for aggregate
was proposed, it was not implementable. subqueries, see next subse tion.
the NOT keyword is added before the EXISTS fun - (essentially, IN and = SOME are equivalent, so IN is just
tion, the orre t answer an be obtained by inverting a spe ial ase of SOME). Thus, as a general solution we
the ag. Thus, no Cartesian produ t is a tually per- use a GROUP BY lause to remove dupli ates after the
formed in pra ti e. However, the approa h annot be join between the tables in the main query and the re-
extended to NOT EXISTS in SQL, so we propose the sult of evaluating the subquery. Note the addition of
transformation shown in gure 2 for the general ase. a key from one of the tables in table-list indi ated
In this approa h, we use ounting to determine the by one-of-the-table-list.#. The transformation is
number of rows in the resulting table. Whenever the shown in gure 3.
number is 0, the EXISTS predi ate is false and the NOT
EXISTS predi ate is true. Whenever the result is not
0, the EXISTS predi ate is true and the NOT EXISTS SELECT olumn-list
predi ate is false. Thus, we use SQ. ount(*) 6= 0 for FROM table-list
EXISTS and SQ. ount(*) = 0 for NOT EXISTS. Note WHERE riteria-list AND
our use of '*' instead of any attribute name, in order parent-tbl. ol [IN j op1 SOME℄
to make sure that the transformation is orre t even (SELECT single- ol
in the presen e of nulls in olumn-list2. FROM table-list2
WHERE riteria-list2
[AND table. olumn op2 orr-value℄)
SELECT olumn-list
FROM table-list (a) Original Query
WHERE riteria-list AND [EXISTS j NOT EXISTS℄
(SELECT olumn-list2
FROM table-list2 SELECT olumn-list
WHERE riteria-list2) FROM (SELECT olumn-list, one-of-table-list.#
FROM table-list,
(a) Original Query (SELECT single- ol,[table. olumn℄
FROM table-list2
WHERE riteria-list2) AS SQ
SELECT olumn-list WHERE riteria-list AND
FROM table-list, parent-tbl. ol [= j op1℄ SQ.single- ol
(SELECT COUNT(*) [AND SQ.table. olumn op2 orr-value℄
FROM table-list2 GROUP BY one-of-table-list.#, olumn-list)
WHERE riteria-list2) AS SQ(CT) AS Q
WHERE riteria-list AND SQ.CT [!= j =℄ 0
(b) Translated Query
(b) Translated Query
Figure 3: Transformation for IN/SOME Subqueries
Figure 2: Transformation for Non- orrelated EX-
ISTS/NOT EXISTS Subqueries Again, the parts in bra kets are there for the orrelated
ase; they should be ignored for the non orrelated
ase.
2. IN/SOME Subqueries. This ase is well known
from the literature in query optimization through unnest- 3. Aggregate Subqueries. In this ase, an aggregation
ing ([10, 5, 3, 11℄). However, it presents some interest- in the SELECT of the subquery for es it to return a sin-
ing te hni al problems. For years, it was onsidered gle value. For this ase, we move the subquery to the
that both predi ates were essentially equivalent to a FROM lause and join the resulting table to the tables
join. But IN and join are not equivalent, sin e a join in the main query. Note that, sin e the result of the
may introdu e multiple opies of a given tuple, while subquery is guaranteed to have only one row, no du-
the predi ates IN is simply true or false for a given pli ates an be introdu ed. The translation is shown
tuple, no matter how many mat hes this tuple has in in gure 4.
the subquery result. In fa t, the IN ase is equivalent Note also that we have transformed the ondition into
to a semijoin, whi h is not part of many implementa- a join even though we know that temporary table A
tions of the SQL standard. In the semijoin, be ause of has only one row with one olumn. Thus, in pra -
the impli it proje tion, no dupli ates are allowed. The ti e we an think of this as a sele tion; however, this
same is true of the SOME predi ate. In the ase of IN, we is the only way to rewrite the query in SQL. In our
an solve this problem in SQL by adding the DISTINCT implementation, we again set the row limit to 1, and
keyword to the SELECT lause of our subquery, so as this improves performan e signi antly by holding the
to remove dupli ates. In e e t, when a join is later returned result in main memory and avoiding a real
performed with tuples in the main query, those tuples join.
have at most one mat h and therefore no dupli ation
o urs. However, this does not work for SOME, sin e the 4.2 Outerjoin Subqueries
ondition in the SOME may ause a tuple to nd multi- Queries that need an outerjoin and some further trans-
ple mat hes in the join even if dupli ates are removed formation of the onditions are the orrelated subqueries
SELECT olumn-list SELECT olumn-list
FROM table-list FROM table-list
WHERE riteria-list AND WHERE riteria-list AND [EXISTS j NOT EXISTS℄
parent-tbl. ol op (SELECT aggr- olumn (SELECT olumn-list2
FROM table-list2 FROM table-list2
WHERE riteria-list2) WHERE riteria-list2 AND
table. olumn op orr-value)
(a) Original Query
(a) Original Query
SELECT olumn-list
FROM table-list, SELECT olumn-list
(SELECT aggr- olumn FROM (SELECT olumn-list,
FROM table-list2 one-of-table-list1.#
WHERE riteria-list2) AS SQ FROM table-list LEFT OUTER JOIN
WHERE riteria-list AND (SELECT olumn-list2, table. olumn
parent-tbl. ol op SQ.aggr- olumn FROM table-list2
WHERE riteria-list2) AS SQ
(b) Translated Query ON SQ.table. olumn op orr-value
WHERE riteria-list
GROUP BY one-of-table-list1.#,
Figure 4: Transformation for Non- orrelated Aggre- olumn-list
gate Subqueries HAVING COUNT(SQ.table. olumn) [!= j =℄ 0)
AS Q
with EXISTS and NOT EXISTS; subqueries with NOT IN; sub- (b) Translated Query
queries with ALL; and orrelated subqueries with aggrega-
tion. Again, we present ea h ase separately.
Figure 5: Transformation for Correlated EX-
1. Correlated [NOT℄ EXISTS subqueries. Corre-
ISTS/NOT EXISTS Subqueries
lated NOT EXISTS subqueries annot be dealt with in a
manner similar to non- orrelated ones. The reason is
that in the non orrelated ase a subquery was either
empty or not regardless of any ondition involving val- outerjoin operator as follows: an outerjoin based on a
ues in the main query, while now a subquery is empty ondition whi h is the negation of the original ondi-
if the orrelated value nds no mat h in the subquery. tion in the NOT IN or ALL predi ates is used, and then
It is not possible to express this negation (or, equiva- the tuples whi h are padded with nulls are sele ted.
lently, the universal quanti ation it presupposes) di- Thus, to evaluate the predi ate att > ALL (Sele t att2
re tly in SQL; instead, we paraphrase the negation us- ..., we outerjoin the table ontaining att with the ta-
ing the outer join operator. This is the same strategy ble that results from evaluating the subquery using
we will follow in the negation of IN (the NOT IN oper- the omparison operator , and pi k the tuples that
ator) and in the universal quanti er (ALL). While the do not mat h anything (i.e. if it is never the ase that
idea is very simple, implementing it in SQL alls for att  att2, then att > att2 for all values of att2. Un-
solving some issues raised by nulls and by the dupli- fortunately, this simple approa h fails in the presen e
ation introdu ed by the outer join. The rst step in of nulls: if the attribute att2 ontains nulls, the ALL
the translation for the NOT EXISTS fun tion is to add predi ate will fail, but so will all omparisons with the
the value in the subquery that is ompared to the or- negated operator, thus qualifying the original tuple.
related attribute to the SELECT lause of the subquery. Also, the predi ate returns false when att is null, ex-
The subquery is then moved to the FROM lause and the ept when the subquery returns an empty answer, in
subquery result is outerjoined to the orrelated value whi h ase the predi ate returns true. Therefore, in
table. The rows that do not mat h up in the left outer our translation we must take are of these situations,
join are desired, so we ount the values in the outer whi h we do by modifying the ondition of the outer
table after joining by one of the key values. Note that join as follows: if any of the attributes involved in the
in this ase we ount on the attribute spe i ed by the outer join ondition (att and att2, in our example) is
original query, so that nulls are ignored. Note also that null, we still qualify the tuple as mat hing. Note that
the grouping has the e e t of removing dupli ates from the outer join of relation R and an empty relation re-
the nal result. The translation is shown in gure 5. sults in all tuples in R padded with nulls; and therefore
we would pi k them all. This oin ides with the orig-
2. NOT IN/ALL Subqueries. The idea to deal with inal query, sin e when the subquery evaluates to an
this ase is simple. Unfortunately, the idiosyn rasies empty result, all rows in the main query qualify. The
of SQL semanti s make it a tri ky ase. Our general behavior of NOT IN is similar: att NOT IN Q, where Q
strategy for dealing with negation (or, equivalently, is a subquery, will be false if att is null, unless Q eval-
universal quanti ation) is to paraphrase the logi al uates to an empty answer, in whi h ase the predi ate
equivalen e :9x'(x) = 8x:'(x) with the help of the evaluates to false.
The translation, shown in gure 6) will move the sub-
query to the FROM lause, but will move the orrela- SELECT olumn-list
tion riteria to the outer join ondition (whenever a FROM table-list
orrelation exists. This pattern is to be used for both WHERE riteria-list AND
orrelated and non- orrelated subqueries, using or not parent-tbl. ol op1
the part in bra kets) and omplement it as explained (SELECT aggr- olumn
above. FROM table-list2
WHERE riteria-list2 AND
table. olumn op2 orr-value)
SELECT olumn-list
FROM table-list (a) Original Query
WHERE riteria-list AND
parent-tbl. ol [NOT IN j op1 ALL℄
(SELECT single- ol SELECT olumn-list
FROM table-list2 FROM table-list,
WHERE riteria-list2 (SELECT aggr- olumn,
[AND table. olumn op2 orr-value℄) orr-value-table.# AS #
FROM orr-value-table
(a) Original Query LEFT OUTER JOIN table-list2
ON (table. olumn op2 orr-value)
WHERE riteria-list2
SELECT olumn-list GROUP BY orr-value-table.#) AS SQ
FROM (SELECT olumn-list, SQ.# WHERE riteria-list AND
FROM table-list LEFT OUTER JOIN parent-tbl. ol op1 SQ.aggr- olumn AND
(SELECT single- ol, [table. olumn℄, one-of-table-list.# = SQ.#
one-of-table-list2.# AS #
FROM table-list2 (b) Translated Query
WHERE riteria-list2) AS SQ
ON ((parent-tbl. ol [= j opp-op1℄ SQ.single- ol
OR parent-tbl. ol IS NULL Figure 7: Transformation for Correlated Aggregate
OR SQ.single- ol IS NULL) Subqueries
[AND SQ.table. olumn op2 orr-value℄)
WHERE riteria-list) AS Q
WHERE Q.SQ.# IS NULL and therefore the question must be asked as to whether the
present approa h presents advantages from a performan e
(b) Translated Query point of view. It is obvious that the proposed approa h
presents a serious issues with respe t to optimization: by
moving subqueries from the WHERE to the FROM lause, we
Figure 6: Transformation for NOT IN/ALL Sub- are still dividing the work in two parts di tated by the SQL
queries query. Traditional unnesting merges operators beyond the
boundaries of query and subquery, and therefore provides a
3. Aggregate Subqueries. The ase of orrelated sub-
greater degree of optimization. However, in the present ase
queries with aggregation is probably the best known there are good reasons for onsidering the approa h taken as
and most studied in the literature sin e it originated a sensible alternative. From the point of view of implemen-
the zero ount bug of Kim's approa h ([5℄). Our trans- tation in MySQL, the absen e of a query tree makes su h
lation follows the approa h of using an outerjoin before algebrai transformations extremely hard to implement. For
omputing the aggregate and grouping, as suggested the purposes of implementation in MySQL, our approa h lo-
in ([3, 11℄). This omputation is moved to the FROM alized all hanges in the front-end. Moreover, we point out
lause, as in the other ases. The query is then nished that a query with a FROM lause subquery an be onsidered
by exe uting the linking ondition in the WHERE lause as a single query with no subqueries from an algebrai point
of the main query. The transformation is shown in g- of view (i.e. it is similar to the addition of an assignment
ure 7. Also, whenever the aggregate being omputed is operator to the relational algebra, whi h does not in rease
COUNT(*), we repla e it by COUNT(one-of-tables.#)
expressive power and is mainly a matter of onvenien e).
in the translated query. Thus, if a query tree is added to MySQL's query pro essing,
our rewriting would make it very easy to a tually arry out
su h transformations. Finally, omplex unnesting is known
5. OPTIMIZATION ISSUES to limit the amount of rewriting possible, be ause of the
Our approa h is motivated by having a omplete and or- user of outer joins (but see [4℄ or [7℄ for some solutions).
re t me hanism to handle SQL subqueries. Thus, we over Therefore, the additional advantage of unnesting is limited.
all types of SQL subqueries, while most rewriting approa hes Nevertheless, it is lear that further possibilities exist for
(in luding unnesting) do not have omplete overage. Also, optimization. For instan e, in all ases trivial subqueries (no
we deal with null values and repeated tuples, to make sure WHERE or GROUP BY lause) should not generate a subquery
that the semanti s of the original queries are respe ted. in the FROM lause; the table name should suÆ e. There are
However, the purpose of rewriting is usually optimization, some additional opportunities for improvement on parti ular
ases. SELECT olumn-list
For join ases, when an EXISTS subquery is done, all we FROM (SELECT olumn-list
need to do is he k whether the answer (temporary table) is FROM table-list,
empty. If so, no further pro essing is ne essary, as the an- (SELECT DISTINCT single- ol
swer to the query is going to be empty too. If the temporary FROM table-list2
table is not empty, the temporary table an be disposed of WHERE riteria-list2) AS SQ
and the main query an be exe uted with disregard to the WHERE riteria-list AND
EXISTS subquery. Note that this is equivalent to pushing parent-tbl. ol [= j op1℄ SQ.single- ol
down the EXISTS predi ate to be done always rst, despite AS Q
the fa t that it may have a high ost asso iated with it.
For outerjoin ases, an interesting possibility arises. Sin e
after the outer join we are going to keep only rows where SQL subqueries and in the presen e of a query tree is ba-
the value is null (whi h we lo ate by ounting the number si ally equivalent to more elaborate unnesting. It is also
of non-null mat hes), this operation really is equivalent to a a good starting point for further optimization work, in the
LEFT ANTIJOIN6 . Thus, a system that is equipped with sense that omplex queries with several subqueries in both
algorithms for ANTIJOIN ould exe ute this query dire tly. FROM and WHERE lause ould in prin iple be handled
For non- orrelated aggregates, an additional optimization is with a similar approa h. The main obsta le are the idiosyn-
to take the value that is returned from the subquery and rasies of the SQL language, whi h make some transforma-
reate a onstant to remove the additional join from the tions tri ky.
query plan. Also, if the onstant returned is a null, there is We are onsidering other types of rewriting, whi h are
no need for further pro essing, as the result is going to be left for further work. In its urrent in arnation, our imple-
empty. For the orrelated ALL, there is also an opportu- mentation in ludes all non- orrelated subqueries. We are
nity for additional optimization. It should be noted that if urrently exploring the work needed to extend the approa h
nothing passes riteria-list2 then every tuple will return to orrelated subqueries, and to allow a omplete unnesting
true for the ALL predi ate; therefore if we get an empty of all queries. Unfortunately, the MySQL ode is not on-
answer for a ertain value of the orrelation there is no need du ive to extensive hanges; thus su h proje t is still under
for further testing. Finally, it has already been argued that development.
some parti ular ases o er a simpler, possibly more eÆ ient
translation. This in ludes non- orrelated EXISTS and non- 7. REFERENCES
orrelated IN. For non- orrelated EXISTS, the query of g-
ure 2(a) an be transformed into the following: [1℄ Chanda, M. and Badia, A. Adding Subqueries to
MySQL Through Query Rewriting, Te hni al
Report 02-01. Available at
SELECT olumn-list http://date.spd.louisville.edu/forloop.
FROM table-list, (SELECT
FROM table-list2 [2℄ Chaudhuri, S. and Dayal, U., An Overview of
WHERE riteria-list2) Data Warehousing and OLAP Te hnology, ACM
WHERE riteria-list SIGMOD Re ord 26(1), Mar h 1997.
[3℄ Dayal, U. Of Nests and Trees: A Uni ed
Approa h to Pro essing Queries That Contain
where is an arbitrary onstant, not in the database. As Nested Subqueries, Aggregates, and Quanti ers, in
explained in subse tion 4.1, there is an impli it Cartesian Pro eedings of the 13th VLDB Conferen e, 1987.
produ t here, but the result of the subquery is either empty [4℄ Galindo-Legaria, C. and Rosenthal, A. Outerjoin
or a single tuple, so in pra ti e this query ould be speeded Simpli ation and Reordering for Query
up onsiderably. For IN subqueries, in the non- orrelated Optimization, ACM TODS, 22(1), 1997.
ase the query of gure 3(a) an be transformed into [5℄ Ganski, R. and Wong, H. Optimization of Nested
where the DISTINCT removes dupli ates aused by the join SQL Queries Revisited, in Pro eedings of the
and hen e orresponds to a semijoin operation. Thus, in this 1987 ACM SIGMOD Conferen e.
ase a more straightforward (and potentially more eÆ ient)
translation exists. [6℄ Gray, J., Bosworth, A., Layman, A. and Pirahesh,
H. DataCube: A Relational Aggregation Operator
Generalizing Group By, Cross-Tab, and
6. CONCLUSIONS AND FURTHER RESEARCH Sub-Totals. In Pro eedings of the 12th ICDE
We added the ability to handle subqueries to the MySQL Conferen e, 1996.
system. We hose a rewrite approa h, whi h transforms [7℄ Goel, P. and Iyer, B. SQL Query Optimization:
the queries in the front end. The approa h has some ni e Reordering for a General Class of Queries, in
features: it is easy to implement, extends to all types of Pro eedings of the 1996 ACM SIGMOD
Conferen e.
6
The ANTIJOIN of relations R and S on ondition A  B is [8℄ Jarke, M., Lenzerini, M., Vassiliou, Y. and
de ned as the set of tuples t in s hema s h(R) [ s h(S ) su h Vassiliadis, P. Fundamentals of Data
that (1) t[S ℄ orresponds to a tuple t' in S su h that there Warehousing, Springer-Verlag, 2000.
are no tuples in R satisfying the join ondition for t', and [9℄ R. Kimball, Why De ision Support Fails and how
t[R℄ is padded with nulls; or (2) t[R℄ orresponds to a tuple
t' in R su h that there are no tuples in S satisfying the join to x it, SIGMOD Re ord, 24(3), 1995.
ondition for t', and t[S ℄ is padded with nulls. Left (right) [10℄ W. Kim, On Optimizing an SQL-Like Nested
antijoin is restri ted to tuples in ondition 1 (2). Query, ACM TODS, 7(3), 1982.
[11℄ Muralikrishna, M. Improving Unnesting
Algorithms for Join Aggregate Queries in SQL, in
Pro eedings of the 18th VLDB Conferen e, 1992.
[12℄ MYSQL, http://www.mysql. om.

You might also like