The Lost Art of the Self Join

Beat Vontobel, CTO, MeteoNews AG b.vontobel@meteonews.ch

Why and what?

Idea for session dates back to 2005
‣ ‣ ‣

Sudoku solver in a Stored Procedure (Per-Erik Martin) „The lost Art of the Join“ (Erik Bergen) Self Joins in my last year‘s presentation „The declarative power of VIEWs“

• •

A few serious, but simpler examples of Self Joins One to be taken less seriously, but more complex

From last year: Paradigms

Imperative Programming
‣ ‣

PHP, C, Java… Specify the Algorithm: How?

Declarative Programming
‣ ‣

Prolog, Lisp, XSLT, SQL… Specify the Goal: What?

Every Table needs an Alias
SELECT FROM INNER JOIN ON WHERE child.child AS child, Martha sibling.child AS sibling parents [AS] child Paul Chris parents [AS] sibling child.parent = sibling.parent Julie child.child != sibling.child;
parent martha chris martha child paul julie chris

A simple Self Join
SELECT FROM INNER JOIN ON WHERE child.child AS child, Martha sibling.child AS sibling parents [AS] child Paul Chris parents [AS] sibling child.parent = sibling.parent Julie child.child != sibling.child;
parent martha chris martha parent martha chris martha child paul julie chris

+-------+---------+ | child | sibling | child +-------+---------+ paul | Paul | Chris | julie | Chris | Paul | chris +-------+---------+ 2 rows in set (0.00 sec)

Trees in SQL
• •

Basic Text Book Example: Employees Table „Nested Set Model“

Google for „Trees SQL Mike Hillyer“

Restriction on Self Joins: Temporary Tables
mysql> CREATE TEMPORARY TABLE t (i INT); Query OK, 0 rows affected (0.00 sec) mysql> SELECT * FROM t t1 CROSS JOIN t t2; ERROR 1137 (HY000): Can't reopen table: 't1' Workaround: Create global tables with uniqe names (e.g. using session ID) mysql> CREATE TABLE t_89372 (i INT);

Example table: Temperatures
temps station dtime temp CHAR(3) TIMESTAMP DECIMAL(3, 1) PK

mysql1.intern-test [admin] > SELECT * FROM temps; +---------+---------------------+------+ | station | dtime | temp | +---------+---------------------+------+ | ABO | 2008-04-04 00:10:00 | -2.0 | | ABO | 2008-04-04 00:20:00 | -1.9 | | … | … | … | | BAS | 2008-04-04 00:10:00 | 6.1 | | BAS | 2008-04-04 00:20:00 | 6.2 | | … | … | … | +---------+---------------------+------+

Absolute to relative
SELECT current.station AS stat, current.dtime, previous.temp AS prev, current.temp AS curr, current.temp - previous.temp AS diff FROM temps current INNER JOIN temps previous ON current.station = previous.station AND previous.dtime = current.dtime - INTERVAL 10 MINUTE ORDER BY diff DESC LIMIT 10

Absolute to relative
+------+---------------------+-------+-------+------+ | stat | dtime | prev | curr | diff | +------+---------------------+-------+-------+------+ | SAM | 2008-04-04 08:20:00 | -4.8 | -1.1 | 3.7 | | MAG | 2008-04-04 01:10:00 | 7.6 | 10.7 | 3.1 | | BUF | 2008-04-04 22:10:00 | -13.1 | -10.2 | 2.9 | | MAG | 2008-04-04 05:00:00 | 7.3 | 10.1 | 2.8 | | CIM | 2008-04-04 10:00:00 | 1.8 | 4.6 | 2.8 | | MAG | 2008-04-04 00:20:00 | 6.0 | 8.4 | 2.4 | | CHZ | 2008-04-04 09:40:00 | 7.8 | 10.2 | 2.4 | | MAG | 2008-04-04 04:20:00 | 6.3 | 8.7 | 2.4 | | EGH | 2008-04-04 12:10:00 | -8.5 | -6.2 | 2.3 | | VIS | 2008-04-04 05:40:00 | -1.8 | 0.5 | 2.3 | +------+---------------------+-------+-------+------+ 10 rows in set (0.21 sec)

Missed opportunity for a Self Join
// Typical example of „keeping state“ over // a loop fetching rows from the database while($row = mysql_fetch_row($result)) { // Computation involving $oldrow and $row … $oldrow = $row; }

Why not in a loop?
• • •

SQL code is clearer Logical dependency between SQL and application level But: „I have to loop anyway! Might even be faster in some cases…“
‣ ‣ ‣

You have to order by what you need for computation Different order requested for the result? You might miss the opportunity to use framework

Use for any kind of serial data…
• • • • • • • •

Meteorological data Racing lap times Fuel used Bank account figures Stock values Webserver hit statistics Mails processed …

Fill the gaps (simple linear interpolation)
SELECT current.dtime, current.temp AS orig, COALESCE( current.temp, ROUND((prev.temp + next.temp) / 2, 1) ) AS interpol FROM temps current INNER JOIN temps prev ON prev.dtime = current.dtime - INTERVAL 10 MINUTE AND prev.station = current.station INNER JOIN temps next ON next.dtime = current.dtime + INTERVAL 10 MINUTE AND next.station = current.station WHERE current.station LIKE 'TAE'

Fill the gaps (simple linear interpolation)
+---------------------+------+----------+ | dtime | orig | interpol | +---------------------+------+----------+ | … | … | … | | 2008-04-04 16:30:00 | 7.9 | 7.9 | | 2008-04-04 16:40:00 | 8.0 | 8.0 | | 2008-04-04 16:50:00 | NULL | 7.9 | | 2008-04-04 17:00:00 | 7.8 | 7.8 | | … | … | … | +---------------------+------+----------+ 142 rows in set (0.01 sec)

Walking average
SELECT current.dtime, current.temp, ROUND( ( 3 * current.temp + 2 * prev1.temp + 1 * prev2.temp ) / 6, 1 ) AS walking_avg FROM temps current INNER JOIN temps prev1 ON prev1.dtime = current.dtime - INTERVAL 10 MINUTE AND prev1.station = current.station INNER JOIN temps prev2 ON prev2.dtime = current.dtime - INTERVAL 20 MINUTE AND prev2.station = current.station WHERE current.station LIKE 'TAE' ORDER BY current.dtime

Walking average
+---------------------+------+-------------+ | dtime | temp | walking_avg | +---------------------+------+-------------+ | … | … | … | | 2008-04-04 09:10:00 | 5.7 | 5.7 | | 2008-04-04 09:20:00 | 5.8 | 5.7 | | 2008-04-04 09:30:00 | 6.3 | 6.0 | | 2008-04-04 09:40:00 | 6.3 | 6.2 | | 2008-04-04 09:50:00 | 6.0 | 6.2 | | 2008-04-04 10:00:00 | 6.2 | 6.2 | | 2008-04-04 10:10:00 | 6.6 | 6.4 | | 2008-04-04 10:20:00 | 6.6 | 6.5 | | 2008-04-04 10:30:00 | 6.6 | 6.6 | | … | … | … | +---------------------+------+-------------+ 142 rows in set (0.01 sec)

Coherence/„Correlation“
SELECT source.station, correlated.station, STDDEV( source.temp - correlated.temp ) AS dev, AVG( source.temp - correlated.temp ) AS offset FROM temps source INNER JOIN temps correlated ON source.dtime = correlated.dtime WHERE source.station = 'TAE' GROUP BY source.station, correlated.station ORDER BY dev

Coherence („Correlation“)
+---------+---------+---------+----------+ | station | station | dev | offset | +---------+---------+---------+----------+ | TAE | TAE | 0.00000 | 0.00000 | | TAE | ABO | 0.60563 | 5.43636 | | TAE | FRE | 0.65031 | 4.14615 | | … | … | … | … | | TAE | CHZ | 1.05226 | -1.57063 | | TAE | CIM | 1.07280 | 3.45035 | | … | … | … | … | | TAE | SBO | 3.58539 | -6.88811 | +---------+---------+---------+----------+ 87 rows in set (0.04 sec)

Groupwise maximum row (Subquery)
SELECT FROM WHERE ORDER BY a.station, a.dtime, a.temp temps a a.temp = ( SELECT MAX(temp) FROM temps b WHERE b.station = a.station ) a.station, a.temp;

Groupwise maximum row
+---------+---------------------+-------+ | station | dtime | temp | +---------+---------------------+-------+ | ABO | 2008-04-04 14:40:00 | 4.7 | | AIG | 2008-04-04 13:40:00 | 13.0 | | ALT | 2008-04-04 12:20:00 | 10.9 | | ALT | 2008-04-04 12:30:00 | 10.9 | | … | … | … | | WYN | 2008-04-04 14:30:00 | 10.5 | | ZER | 2008-04-04 14:10:00 | 5.3 | +---------+---------------------+-------+ 114 rows in set (4.03 sec)

Groupwise maximum row (Self Join)
SELECT maximum.station, maximum.dtime, maximum.temp FROM temps maximum LEFT JOIN temps higher ON maximum.station = higher.station AND maximum.temp < higher.temp WHERE higher.station IS NULL AND maximum.temp IS NOT NULL ORDER BY maximum.station, maximum.temp

Groupwise maximum row (Joined Subquery)
SELECT a.station, a.dtime, a.temp FROM temps a INNER JOIN ( SELECT station, MAX(temp) AS temp FROM temps GROUP BY station ) b ON (a.station, a.temp) = (b.station, b.temp) ORDER BY a.station, a.temp;

Groupwise maximum row (Alternatives)
CORRELATED SUBQUERY | ZER | 2008-04-04 14:10:00 | 5.3 | +---------+---------------------+-------+ 114 rows in set (4.03 sec) SELF JOIN | ZER | 2008-04-04 14:10:00 | 5.3 | +---------+---------------------+-------+ 114 rows in set (1.43 sec) JOINED SUBQUERY | ZER | 2008-04-04 14:10:00 | 5.3 | +---------+---------------------+-------+ 114 rows in set (0.05 sec)

Comment from a Blog Post

„I left joined a table with itself once, and a lightning fast query had a second or so delay. Two joins, and it was slow. Three joins, and it took upwards of 15 seconds. This kind of joining a table to itself repeatedly kills database performance.“ (John)

A few Words of Caution
• •

Rows to scan: nm (rowstable_references ) 15‘000 rows (example table) (0.003s @ 5 Mio. rows/s)
‣ ‣ ‣

joined once: n2 = 225‘000‘000 rows (45s) joined twice: n3 = 3‘375‘000‘000‘000 rows (7.8 days) joined three times: n4 = 50‘625‘000‘000‘000‘000 rows (321 years)

• •

Check your JOIN conditions and indexes Do some EXPLAINs

„EXPLAIN demystified“ (Baron Schwartz, today, 2:00pm, Ballroom D)

A few Words of Caution: Many Joins
• •

Time spent executing query Before that: Time spent finding execution plan!

Sudoku
5 6 9 8 4 7 6 4 1 8 9 7 8 2 2 8 5 9 8 6 3 3 1 7 9 5 6 3 1 6

Sudoku: Fill every square with all digits 1–9
5 6 9 8 4 7 6 4 1 8 9 7 8 2 2 8 5 9 8 6 3 3 1 7 9 5 6 3 1 6

Sudoku: No digit repeated on column or row
5 6 9 8 4 7 6 4 1 8 9 7 8 2 2 8 5 9 8 6 3 3 1 7 9 5 6 3 1 6

Solve a Sudoku with one Query?

SQL: One „solution“ equals „one row“
‣ ‣

There might be more than one solution Soduku „spread out“ horizontally (one column per field)

Table `digits` holding the „base material“: 1, 2, 3, 4, 5…
‣ ‣ ‣

Self Joins: One table reference for every field 9-by-9: 81 table references („80 Joins“) MySQL Limit: 61 Joins (31 back in MySQL 3.23)

How to Solve a Sudoku „Brute Force“

6 6 4 1 5 1 3 4 6 1 6 3

How to Solve a Sudoku „Brute Force“

1 4 1

6 6 1 3 5 4 6 1 6 3

How to Solve a Sudoku „Brute Force“

1 4 1

6 6 1 3 5 4 6 1 6 3

How to Solve a Sudoku „Brute Force“

2 4 1

6 6 1 3 5 4 6 1 6 3

How to Solve a Sudoku „Brute Force“

2 4 1

1

6 6 1 3 5 4 6 1 6 3

How to Solve a Sudoku „Brute Force“

2 4 1

1

6

1 6 1 3 6 6 1 3

5

4

How to Solve a Sudoku „Brute Force“

2 4 1

1

6

1 6 1 3 6 6 1 3

5

4

How to Solve a Sudoku „Brute Force“

2 4 1

1

6

2 6 1 3 6 6 1 3

5

4

How to Solve a Sudoku „Brute Force“

2 4 1

1

6

? 6 1 3 6 6 1 3

5

4

How to Solve a Sudoku „Brute Force“

2 4 1

1

6

5 6 1 3 6 6 1 3

5

4

How to Solve a Sudoku „Brute Force“

Try all 6 digits for a field
‣ ‣

Still no solution? Backtrack!
Erase field Try something different in the previous field Sometimes this means „back to square one“

So, a long, long time later…

How to Solve a Sudoku „Brute Force“

5 2 4 1 3 6

3 4 6 5 1 2

6 1 3 2 5 4

2 6 1 3 4 5

1 5 2 4 6

4 3 5 6 2 1

How to Solve a Sudoku „Brute Force“

5 2 4 1 3 6

3 4 6 5 1 2

6 1 3 2 5 4

2 6 1 3 4 5

1 5 2 4 6 1

4 3 5 6 2 1

How to Solve a Sudoku „Brute Force“

5 2 4 1 3 6

3 4 6 5 1 2

6 1 3 2 5 4

2 6 1 3 4 5

1 5 2 4 6 2

4 3 5 6 2 1

How to Solve a Sudoku „Brute Force“

5 2 4 1 3 6

3 4 6 5 1 2

6 1 3 2 5 4

2 6 1 3 4 5

1 5 2 4 6 3

4 3 5 6 2 1

How to Solve a Sudoku „Brute Force“

We‘re not finished yet!
‣ ‣

There might be another solution… So, backtrack and try other possibilities…

Solving a Sudoku with one SELECT (1)
SELECT CONCAT( d11.d, ' ', d12.d, d21.d, ' ', d22.d, d31.d, ' ', d32.d, d41.d, ' ', d42.d, d51.d, ' ', d52.d, d61.d, ' ', d62.d, ) AS solution digits d11 digits d12 COALESCE(d12.d = ( d12.d != d11.d digits d13 COALESCE(d13.d = ( d13.d != d11.d AND digits d14 COALESCE(d14.d = ( d14.d != d11.d AND digits d15 COALESCE(d15.d = ( d15.d != d11.d AND digits d16 COALESCE(d16.d = ( d16.d != d11.d AND digits d21 COALESCE(d21.d = ( d21.d != d11.d digits d22 COALESCE(d22.d = ( d22.d != d21.d d22.d != d12.d d22.d != d11.d digits d23 COALESCE(d23.d = ( d23.d != d21.d AND d23.d != d13.d d23.d != d11.d AND ' ' ' ' ' ' ', ', ', ', ', ', d13.d, d23.d, d33.d, d43.d, d53.d, d63.d, ' ' ' ' ' ' ', ', ', ', ', ', d14.d, d24.d, d34.d, d44.d, d54.d, d64.d, ' ' ' ' ' ' ', ', ', ', ', ', d15.d, d25.d, d35.d, d45.d, d55.d, d65.d, ' ' ' ' ' ' ', ', ', ', ', ', d16.d, d26.d, d36.d, d46.d, d56.d, d66.d, ' ' ' ' ' ' ', ', ', ', ', ', CHAR(10), CHAR(10), CHAR(10), CHAR(10), CHAR(10), CHAR(10)

FROM INNER ON AND INNER ON AND INNER ON AND INNER ON AND INNER ON AND INNER ON AND INNER ON AND AND AND INNER ON AND AND AND

JOIN JOIN JOIN JOIN JOIN JOIN JOIN

SELECT d FROM start WHERE i = 1 AND j = 2 ), 1) SELECT d FROM start WHERE i = 1 AND j = 3 ), 1) d13.d != d12.d SELECT d FROM start WHERE i = 1 AND j = 4 ), 1) d14.d != d12.d AND d14.d != d13.d SELECT d FROM start WHERE i = 1 AND j = 5 ), 1) d15.d != d12.d AND d15.d != d13.d AND d15.d != d14.d SELECT d FROM start WHERE i = 1 AND j = 6 ), 1) d16.d != d12.d AND d16.d != d13.d AND d16.d != d14.d AND d16.d != d15.d SELECT d FROM start WHERE i = 2 AND j = 1 ), 1) SELECT d FROM start WHERE i = 2 AND j = 2 ), 1)

JOIN

SELECT d FROM start WHERE i = 2 AND j = 3 ), 1) d23.d != d22.d d23.d != d12.d

Solving a Sudoku with one SELECT (2)
INNER ON AND AND INNER ON AND AND AND INNER ON AND AND AND INNER ON AND INNER ON AND AND INNER ON AND AND INNER ON AND AND INNER ON AND AND INNER ON AND AND JOIN JOIN digits d24 COALESCE(d24.d d24.d != d21.d d24.d != d14.d digits d25 COALESCE(d25.d d25.d != d21.d d25.d != d15.d d25.d != d14.d digits d26 COALESCE(d26.d d26.d != d21.d d26.d != d16.d d26.d != d14.d digits d31 COALESCE(d31.d d31.d != d11.d digits d32 COALESCE(d32.d d32.d != d31.d d32.d != d12.d digits d33 COALESCE(d33.d d33.d != d31.d d33.d != d13.d digits d34 COALESCE(d34.d d34.d != d31.d d34.d != d14.d digits d35 COALESCE(d35.d d35.d != d31.d d35.d != d15.d digits d36 COALESCE(d36.d d36.d != d31.d d36.d != d16.d = ( SELECT d FROM start WHERE i = 2 AND j = 4 ), 1) AND d24.d != d22.d AND d24.d != d23.d = ( SELECT d FROM start WHERE i = 2 AND j = 5 ), 1) AND d25.d != d22.d AND d25.d != d23.d AND d25.d != d24.d

JOIN

= ( SELECT d FROM start WHERE i = 2 AND j = 6 ), 1) AND d26.d != d22.d AND d26.d != d23.d AND d26.d != d24.d AND d26.d != d25.d AND d26.d != d15.d = ( SELECT d FROM start WHERE i = 3 AND j = 1 ), 1) AND d31.d != d21.d = ( SELECT d FROM start WHERE i = 3 AND j = 2 ), 1) AND d32.d != d22.d = ( SELECT d FROM start WHERE i = 3 AND j = 3 ), 1) AND d33.d != d32.d AND d33.d != d23.d = ( SELECT d FROM start WHERE i = 3 AND j = 4 ), 1) AND d34.d != d32.d AND d34.d != d33.d AND d34.d != d24.d = ( SELECT d FROM start WHERE i = 3 AND j = 5 ), 1) AND d35.d != d32.d AND d35.d != d33.d AND d35.d != d34.d AND d35.d != d25.d = ( SELECT d FROM start WHERE i = 3 AND j = 6 ), 1) AND d36.d != d32.d AND d36.d != d33.d AND d36.d != d34.d AND d36.d != d35.d AND d36.d != d26.d

JOIN JOIN

JOIN

JOIN

JOIN

JOIN

Solving a Sudoku with one SELECT (3)
INNER ON AND INNER ON AND AND AND INNER ON AND AND AND INNER ON AND AND INNER ON AND AND AND INNER ON AND AND AND INNER ON AND INNER ON AND AND JOIN JOIN digits d41 COALESCE(d41.d d41.d != d11.d digits d42 COALESCE(d42.d d42.d != d41.d d42.d != d12.d d42.d != d31.d digits d43 COALESCE(d43.d d43.d != d41.d d43.d != d13.d d43.d != d31.d digits d44 COALESCE(d44.d d44.d != d41.d d44.d != d14.d digits d45 COALESCE(d45.d d45.d != d41.d d45.d != d15.d d45.d != d34.d digits d46 COALESCE(d46.d d46.d != d41.d d46.d != d16.d d46.d != d34.d digits d51 COALESCE(d51.d d51.d != d11.d digits d52 COALESCE(d52.d d52.d != d51.d d52.d != d12.d = ( SELECT d FROM start WHERE i = 4 AND j = 1 ), 1) AND d41.d != d21.d AND d41.d != d31.d = ( SELECT d FROM start WHERE i = 4 AND j = 2 ), 1) AND d42.d != d22.d AND d42.d != d32.d = ( AND AND AND SELECT d d43.d != d43.d != d43.d != FROM start WHERE i = 4 AND j = 3 ), 1) d42.d d23.d AND d43.d != d33.d d32.d

JOIN

JOIN

= ( SELECT d FROM start WHERE i = 4 AND j = 4 ), 1) AND d44.d != d42.d AND d44.d != d43.d AND d44.d != d24.d AND d44.d != d34.d = ( SELECT d FROM start WHERE i = 4 AND j = 5 ), 1) AND d45.d != d42.d AND d45.d != d43.d AND d45.d != d44.d AND d45.d != d25.d AND d45.d != d35.d = ( AND AND AND SELECT d d46.d != d46.d != d46.d != FROM start WHERE i = 4 AND j = 6 ), 1) d42.d AND d46.d != d43.d AND d46.d != d44.d AND d46.d != d45.d d26.d AND d46.d != d36.d d35.d

JOIN

JOIN

JOIN JOIN

= ( SELECT d FROM start WHERE i = 5 AND j = 1 ), 1) AND d51.d != d21.d AND d51.d != d31.d AND d51.d != d41.d = ( SELECT d FROM start WHERE i = 5 AND j = 2 ), 1) AND d52.d != d22.d AND d52.d != d32.d AND d52.d != d42.d

Solving a Sudoku with one SELECT (4)
INNER ON AND AND INNER ON AND AND INNER ON AND AND INNER ON AND AND INNER ON AND INNER ON AND AND AND INNER ON AND AND AND INNER ON AND AND JOIN JOIN digits d53 COALESCE(d53.d d53.d != d51.d d53.d != d13.d digits d54 COALESCE(d54.d d54.d != d51.d d54.d != d14.d digits d55 COALESCE(d55.d d55.d != d51.d d55.d != d15.d digits d56 COALESCE(d56.d d56.d != d51.d d56.d != d16.d digits d61 COALESCE(d61.d d61.d != d11.d digits d62 COALESCE(d62.d d62.d != d61.d d62.d != d12.d d62.d != d51.d digits d63 COALESCE(d63.d d63.d != d61.d d63.d != d13.d d63.d != d51.d digits d64 COALESCE(d64.d d64.d != d61.d d64.d != d14.d = ( SELECT d FROM start WHERE i = 5 AND j = 3 ), 1) AND d53.d != d52.d AND d53.d != d23.d AND d53.d != d33.d AND d53.d != d43.d = ( SELECT d FROM start WHERE i = 5 AND j = 4 ), 1) AND d54.d != d52.d AND d54.d != d53.d AND d54.d != d24.d AND d54.d != d34.d AND d54.d != d44.d = ( SELECT d FROM start WHERE i = 5 AND j = 5 ), 1) AND d55.d != d52.d AND d55.d != d53.d AND d55.d != d54.d AND d55.d != d25.d AND d55.d != d35.d AND d55.d != d45.d = ( SELECT d FROM start WHERE i = 5 AND j = 6 ), 1) AND d56.d != d52.d AND d56.d != d53.d AND d56.d != d54.d AND d56.d != d55.d AND d56.d != d26.d AND d56.d != d36.d AND d56.d != d46.d = ( SELECT d FROM start WHERE i = 6 AND j = 1 ), 1) AND d61.d != d21.d AND d61.d != d31.d AND d61.d != d41.d AND d61.d != d51.d = ( SELECT d FROM start WHERE i = 6 AND j = 2 ), 1) AND d62.d != d22.d AND d62.d != d32.d AND d62.d != d42.d AND d62.d != d52.d = ( AND AND AND SELECT d d63.d != d63.d != d63.d != FROM start WHERE i = 6 AND j = 3 ), 1) d62.d d23.d AND d63.d != d33.d AND d63.d != d43.d AND d63.d != d53.d d52.d

JOIN

JOIN

JOIN JOIN

JOIN

JOIN

= ( SELECT d FROM start WHERE i = 6 AND j = 4 ), 1) AND d64.d != d62.d AND d64.d != d63.d AND d64.d != d24.d AND d64.d != d34.d AND d64.d != d44.d AND d64.d != d54.d

Solving a Sudoku with one SELECT (5)
INNER JOIN ON AND AND AND INNER JOIN ON AND AND AND WHERE digits d65 COALESCE(d65.d d65.d != d61.d d65.d != d15.d d65.d != d54.d digits d66 COALESCE(d66.d d66.d != d61.d d66.d != d16.d d66.d != d54.d COALESCE(d11.d = ( SELECT d FROM start WHERE i = 6 AND j = 5 ), 1) AND d65.d != d62.d AND d65.d != d63.d AND d65.d != d64.d AND d65.d != d25.d AND d65.d != d35.d AND d65.d != d45.d AND d65.d != d55.d = ( AND AND AND = ( SELECT d d66.d != d66.d != d66.d != SELECT d FROM start WHERE i d62.d AND d66.d != d26.d AND d66.d != d55.d FROM start WHERE i = 6 AND j = 6 ), 1) d63.d AND d66.d != d64.d AND d66.d != d65.d d36.d AND d66.d != d46.d AND d66.d != d56.d = 1 AND j = 1 ), 1)

Table `digits` for the „pool“ of digits
+---+ | d | +---+ | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | +---+

Table `start` for initial conditions
+---+---+------+ | i | j | d | +---+---+------+ | 1 | 3 | 6 | | 2 | 4 | 6 | | 2 | 6 | 3 | | 3 | 1 | 4 | | 3 | 4 | 1 | | 4 | 1 | 1 | | 4 | 4 | 3 | | 4 | 6 | 6 | | 5 | 3 | 5 | | 5 | 4 | 4 | | 5 | 5 | 6 | | 6 | 6 | 1 | +---+---+------+

6 6 4 1 5 1 3 4 6 1 6 3

How the query works: First field
… FROM digits d11 INNER JOIN digits d12 ON COALESCE( d12.d = ( SELECT d FROM start WHERE i = 1 AND j = 2 ), 1 ) AND d12.d != d11.d INNER JOIN digits d13 ON COALESCE( d13.d = ( SELECT d FROM start WHERE i = 1 AND j = 3 ), 1 ) AND d13.d != d11.d AND d13.d != d12.d …

How the query works: Second field
… FROM digits d11 INNER JOIN digits d12 ON COALESCE( d12.d = ( SELECT d FROM start WHERE i = 1 AND j = 2 ), 1 ) AND d12.d != d11.d INNER JOIN digits d13 ON COALESCE( d13.d = ( SELECT d FROM start WHERE i = 1 AND j = 3 ), 1 ) AND d13.d != d11.d AND d13.d != d12.d …

How the query works: Third field
… FROM digits d11 INNER JOIN digits d12 ON COALESCE( d12.d = ( SELECT d FROM start WHERE i = 1 AND j = 2 ), 1 ) AND d12.d != d11.d INNER JOIN digits d13 ON COALESCE( d13.d = ( SELECT d FROM start WHERE i = 1 AND j = 3 ), 1 ) AND d13.d != d11.d AND d13.d != d12.d …

How the query works: Last field

… INNER JOIN digits d66 ON COALESCE( … ) AND d66.d != d61.d d66.d != d63.d d66.d != d65.d AND d66.d != d16.d d66.d != d36.d d66.d != d56.d AND d66.d != d54.d …

AND d66.d != d62.d AND AND d66.d != d64.d AND AND d66.d != d26.d AND AND d66.d != d46.d AND AND d66.d != d55.d

How the query works: Last field

… INNER JOIN digits d66 ON COALESCE( … ) AND d66.d != d61.d d66.d != d63.d d66.d != d65.d AND d66.d != d16.d d66.d != d36.d d66.d != d56.d AND d66.d != d54.d …

AND d66.d != d62.d AND AND d66.d != d64.d AND AND d66.d != d26.d AND AND d66.d != d46.d AND AND d66.d != d55.d

How the query works: Last field

… INNER JOIN digits d66 ON COALESCE( … ) AND d66.d != d61.d d66.d != d63.d d66.d != d65.d AND d66.d != d16.d d66.d != d36.d d66.d != d56.d AND d66.d != d54.d …

AND d66.d != d62.d AND AND d66.d != d64.d AND AND d66.d != d26.d AND AND d66.d != d46.d AND AND d66.d != d55.d

How the query works: Last field

… INNER JOIN digits d66 ON COALESCE( … ) AND d66.d != d61.d d66.d != d63.d d66.d != d65.d AND d66.d != d16.d d66.d != d36.d d66.d != d56.d AND d66.d != d54.d …

AND d66.d != d62.d AND AND d66.d != d64.d AND AND d66.d != d26.d AND AND d66.d != d46.d AND AND d66.d != d55.d

Conclusions from the „Sudoku-Case“

Declarative Paradigm (Constraint Programming)
‣ ‣

Don‘t care about the „how“, but about the „what“ Optimizer does a great job!

• •

(Ab-)use built-in Backtracking of Join Engine A query might look awkward – but still performs!

Some reasons for reasonable performance…
• • •

Very small table (`digits`) and covering index Small result set: Always working on one row! Subqueries basically optimized away
‣ ‣

„Impossible WHERE noticed“ (no pre-condition case) Constant (pre-condition case)

Optimizer/Join Engine is good at this stuff!

+----+-------------+-------+-------+---------------+---------+---------+-------------+------+-------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+-------------+------+-------------------------------+ | 1 | PRIMARY | d11 | index | NULL | PRIMARY | 1 | NULL | 6 | Using where; Using index | | 1 | PRIMARY | d12 | index | NULL | PRIMARY | 1 | NULL | 6 | Using where; Using index | | … | … | … | … | … | … | … | … | … | … | | 37 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after| | | | | | | | | | | reading const tables | | 36 | SUBQUERY | start | const | PRIMARY | PRIMARY | 2 | const,const | 1 | | | … | … | … | … | … | … | … | … | … | … | +----+-------------+-------+-------+---------------+---------+---------+-------------+------+-------------------------------+ 72 rows in set (0.01 sec)

Final Message

Have fun with the declarative power of SQL!

Despite its flaws…

• • •

Do it the SQL way! Slides and code will be made available on conference website Check out Developer Zone on MySQL website for an upcoming article version of my last year‘s session „The declarative power of VIEWs“

This work is licensed under the Creative Commons AttributionNoncommercial-Share Alike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Sign up to vote on this title
UsefulNot useful