You are on page 1of 25

Spring 2008

Issue 4

MySQL Magazine ®


2 A Tour of MySQL Certification

2 Book Worm
7 On Efficiently Geo-Referencing IPs With
3 Coding Corner
MaxMind GeoIP and MySQL GIS
18 News
15 Introducing Kickfire
24 log-bin
18 Automatic Query Optimization with QOT

News, articles and feedback for MySQL database administrators and developers using
MySQL on a daily basis to provide some of the best database infrastructure available.
Page 2 Spring 2008 Issue 4 MySQL Magazine

do with MySQL server using stored procedures,

A Tour of MySQL Certification data types, triggers and views, importing and
By Mark Schoonover
exporting data to name a few. If your task is to
CMA develop applications that uses MySQL to store and

Certified MySQL Associate (CMA) is the entry level retrieve data, this certification is what you need.

MySQL certification. Requiring but a single test, if CMCDBA

you have experience with other database systems, Essentially part three of the CMDBA certification
you'll find your knowledge will apply to the CMA is the Certified MySQL Cluster Database
track. If you already understand client/server, data Administrator. You must pass the CMDBA tests
definition language (DDL), basic SQL with grouping, first before taking this exam and should have
joining tables and the ability to modify data, you'll direct experience with MySQL cluster in a
have a huge head start on studying, or you could be production environment. This certification will go
ready to take the exam! over the NDB storage engine, performance tuning,
CMDBA backups and recovery and others. Pass a single

The CMA will get you started towards working on exam to obtain this certification.

the Certified MySQL Database Administrator Resources

(CMDBA) certificate. This exam is very
comprehensive in what a database administrator
should know about MySQL. Keep in mind, this is a
starting point for a DBA, not an ending point. You
should have about six months of daily experience in
a moderately sized MySQL environment. Having the About the author
ability to install and use a MySQL learning Mark Schoonover lives near San Diego, California
environment is crucial too. There will be parts of the with his wife, three boys, a neurotic cat, and a
curriculum that will go into areas of MySQL that you retired Greyhound. He's experienced as a DBA,
environment may not require. system administrator, network engineer and web
This certification continues on where the CMA developer. He enjoys amateur radio, running
leaves off. Not only are there more requirements, it marathons, and long distance cycling. He can also
goes deeper into the details. You'll be challenged, be found coaching youth soccer, and getting yelled
but if you've created a learning environment, you at as a referee on the weekends.
can try all the commands on a live system to see
how things work. This is a crucial concept, some Book Worm
questions may ask you all the ways to do the same I am excited to have the chance to review High
thing. If you know only a single way to accomplish a Performance MySQL – second edition. The authors
certain task, you'll get the question wrong. all have extensive experience in MySQL and it
If you're responsible for the daily operation, shows. Every section is carefully crafted and covers
tuning and maintenance of a MySQL server, the important topics for an experienced DBA who is
CMDBA certification is what you want. looking to take his or her skill set to the next level.
CMDEV Many topics are covered including backups and
restores, replication, tuning (OS and MySQL), and
Certified MySQL Developer (CMDEV). The exams
significant information on scaling applications
aren't language specific, so if you're a programmer
using sharding. This edition is updated with a lot
using Perl, PHP, Java, or connecting via ODBC to a
of new material. It is well worth a read. The book
MySQL server, this is the certification you want.
will be released in June.
You'll cover all the things a developer would need to
Page 3 Spring 2008 Issue 4 MySQL Magazine

C o d i n g
o By Peter Brawley
r Take the Drudgery Out
n of Writing Crosstabs
e A crosstab (or pivot table) is a cross-tabulation. It displays results as they
distribute across distinct values of two or more variables. Statisticians call it a
r contingency table.
Is there an organization in the whole world that doesn’t need crosstab reports? Suppose you have a sales
table listing product, salesperson and sales totals or amount:
DROP TABLE IF EXISTS sales; (1,'radio','bob','100.00'),
CREATE TABLE sales ( (2,'radio','sam','100.00'),
id int(11) default NULL, (3,'radio','sam','100.00'),
product char(5) default NULL, (4,'tv','bob','200.00'),
salesperson char(5) default NULL, (5,'tv','sam','300.00'),
amount decimal(10,2) default NULL (6,'radio','bob','100.00');

SELECT * FROM sales;

| id | product | salesperson | amount |
| 1 | radio | bob | 100.00 |
| 2 | radio | sam | 100.00 |
| 3 | radio | sam | 100.00 |
| 4 | tv | bob | 200.00 |
| 5 | tv | sam | 300.00 |
| 6 | radio | bob | 100.00 |
We wish to see sales totals for each salesperson and each product. To do this we write a query which
1. groups results by product,
2. uses a conditional sum to accumulate the sales total of each salesperson in its own column, and
3. creates a horizontal sum column:
SUM( CASE salesperson WHEN 'bob' THEN amount ELSE 0 END ) AS 'Bob',
SUM( CASE salesperson WHEN 'sam' THEN amount ELSE 0 END ) AS 'Sam',
SUM( amount ) AS Total
FROM sales
| product | Bob | Sam | Total |
| radio | 200.00 | 200.00 | 400.00 |
| tv | 200.00 | 300.00 | 500.00 |
| NULL | 400.00 | 500.00 | 900.00 |
This crosstab query generates one row per product, one column per salesperson, and a total-by-product
column on the far right. The pivoting CASE expressions assign sales.amount values to matching
salespersons’ columns. WITH ROLLUP adds a row at the bottom for salesperson sums.

Coding Corner continues on page 4

Page 4 Spring 2008 Issue 4 MySQL Magazine

Coding Corner from page 3

This is a snap for two products and two salespersons. When there are dozens of salespersons, writing the
query becomes so tiresome and error-prone It is time to invoke the programmer's fundamental rule: what you
always have to do, ensure that you never have to do.
Some years ago Giuseppe Maxia published a little query that automates writing these crosstab expressions
(see His idea was to embed the
syntax for lines like the SUM( CASE ...) lines above in a query for the DISTINCT values. At the time Giuseppe
was writing, MySQL did not support stored procedures. Now that it does, we can further generalize Giuseppe's
idea by parameterizing it in a stored procedure.
Admittedly it is a little daunting. To write a query with variable names rather than the usual literal table and
column names we have to write PREPARE statements. Now we propose to write SQL that writes PREPARE
statements. Code, which writes code, which writes code. Not a job for the back of a napkin.
It is easy enough to write the stored procedure shell. To make sure that all our general purpose stored
routines are in one predictable place and available to all databases on a server, we keep generic routines in a
sys database, so our crosstab drudge killer needs parameters specifying database, table, pivot column and (in
some cases) the aggregating column.
“It’s time to invoke the
What do we do next? What worked for us was to proceed from back to front: programmer's
fundamental rule: what
1. Write the pivot expressions for specific cases.
you always have to do,
2. Write the PREPARE statement that generates those expressions.
3. Parameterize the result of number two.
ensure that you never
4. Encapsulate the result of number three in a stored procedure. have to do.”

To complicate matters a bit more, we soon found that different summary aggregations, for example COUNT
and SUM, require different code. Here is the routine for generating COUNT pivot expressions:
USE sys;
CREATE PROCEDURE writecountpivot( db CHAR(64), tbl CHAR(64), col CHAR(64) )
DECLARE datadelim CHAR(1) DEFAULT '"';
DECLARE singlequote CHAR(1) DEFAULT CHAR(39);
SET @sqlmode = (SELECT @@sql_mode);
SET @@sql_mode='';
SET @sql = CONCAT( 'SELECT DISTINCT CONCAT(', singlequote,
',SUM(IF(', col, ' = ', datadelim, singlequote, comma,
col, comma, singlequote, datadelim, comma, '1,0)) AS `',
singlequote, comma, col, comma, singlequote, '`',
singlequote, ') AS countpivotarg FROM ', db, '.', tbl,
' WHERE ', col, ' IS NOT NULL' );
-- SELECT @sql;
PREPARE stmt FROM @sql;
SET @@sql_mode=@sqlmode;
DELIMITER ; Coding Corner continues on page 5
CALL sys.writecountpivot('test','sales','salesperson');
Page 5 Spring 2008 Issue 4 MySQL Magazine

Coding Corner from page 4

For our little sales table, this generates ... What is going on when SHOW
CONCAT(',SUM in the “Sleep” status?
(IF(salesperson = "',salesperson,'",1,0))
AS `',salesperson,'`') Sleep rows are when a thread is
AS countpivotarg
FROM test.sales connected but not doing anything.
WHERE salesperson IS NOT NULL | Sleep rows do not actually pose a
problem by themselves. If an
and returns...
application has a connection pool,
+--------------------------------------------+ then sleep rows are expected.
| countpivotarg |
+--------------------------------------------+ If an application does not utilize
| ,SUM(IF(salesperson = "bob",1,0)) AS `bob` |
| ,SUM(IF(salesperson = "sam",1,0)) AS `sam` |
connection pooling, it is possible
+--------------------------------------------+ that database connections and
handlers are not closing properly.
which we plug into ...
So, the MySQL server thinks the
SELECT thread is still connected, but not
,SUM(IF(salesperson = "bob",1,0)) AS `bob` sending any queries. That can be
,SUM(IF(salesperson = "sam",1,0)) AS `sam`
,COUNT(*) AS Total
problematic -- MySQL has some
FROM test.sales per-thread memory it allocates, so
+---------+------+------+-------+ many unnecessary threads can eat
| product | bob | sam | Total | up memory.
| radio | 2 | 2 | 4 | This might be caused by scripts
| tv | 1 | 1 | 2 |
| NULL | 3 | 3 | 6 | timing out on the application side
+---------+------+------+-------+ before they can close the db
connection properly, or just not
Not overwhelming for two columns. Very convenient
indeed if there are fifty. A point to notice is that the two closing the connection properly.
levels of code generation create quote mark nesting The "interactive_timeout"
problems. To make the double quote mark ‘"’ available for parameter can be set lower, which
data value delimiting, we make sure that sql_mode
allows less time for connections to
ANSI_QUOTES is not set during code generation.
"Sleep". The defaults are usually 1
day. Note that this accounts for any
login to the database, so if the
timeout is set to 10 minutes, then
when command line login
connections will be closed after 10
minutes of being idle. Basically you
do not want to set it too short.
Page 6 Spring 2008 Issue 4 MySQL Magazine

Coding Corner from page 4

SUM pivot queries needs a different syntax:

USE sys;
CREATE PROCEDURE writesumpivot(
db CHAR(64), tbl CHAR(64), pivotcol CHAR(64), sumcol CHAR(64)
DECLARE datadelim CHAR(1) DEFAULT '"';
DECLARE singlequote CHAR(1) DEFAULT CHAR(39);
SET @sqlmode = (SELECT @@sql_mode);
SET @@sql_mode='';
SET @sql = CONCAT( 'SELECT DISTINCT CONCAT(', singlequote,
',SUM(IF(', pivotcol, ' = ', datadelim, singlequote,
comma, pivotcol, comma, singlequote, datadelim,
comma, sumcol, ',0)) AS `',
singlequote, comma, pivotcol, comma, singlequote, '`',
singlequote, ') AS sumpivotarg FROM ', db, '.', tbl,
' WHERE ', pivotcol, ' IS NOT NULL' );
-- SELECT @sql;
PREPARE stmt FROM @sql;
SET @@sql_mode=@sqlmode;
CALL writesumpivot('test','sales','salesperson','amount');
| sumpivotarg |
| ,SUM(IF(salesperson = "bob",amount,0)) AS `bob` |
| ,SUM(IF(salesperson = "sam",amount,0)) AS `sam` |

which we insert in the report query …

,SUM(IF(salesperson = "bob",amount,0)) AS `bob`
,SUM(IF(salesperson = "sam",amount,0)) AS `sam`
,SUM(amount) AS Total
FROM test.sales
GROUP BY product;
| product | bob | sam | Total |
| radio | 200.00 | 200.00 | 400.00 |
| tv | 200.00 | 300.00 | 500.00 |
Other aggregating functions also require somewhat different code. Maybe you are the reader who will be
inspired to write the completely general case? While you’re at it, consider including logic to write the rest of
the crosstab query, too.
About the author
Peter Brawley is president of Artful Software Development and co-author of Getting It Done With MySQL 5.
Page 7 Spring 2008 Issue 4 MySQL Magazine

On Efficiently Geo-Referencing Ips

with MaxMind GeoIP and MySQL GIS
By Jeremy Cole

Geo-referencing IPs is, in a nutshell, converting an IP address in dotted-quad human

into the name of some entity owning that IP address. There are a readable format, e.g.
number of reasons you might want to geo-reference IP addresses “″. This is a handy
to country, city, etc. Some examples might be simple ad targeting way for a human to read an
systems, geographic load balancing, web analytics and many more IP address, but a very
applications. inefficient way for a
This is a very common task, but I have never actually seen it computer to store and
done efficiently using MySQL in the wild. There is a lot of handle IP addresses.
questionable advice on forums, blogs and other sites out there on • ip from, ip to (integer) —
this topic. After working with a Proven Scaling customer, I recently The same start and end IP
did some thinking and some performance testing on this problem, addresses as 32-bit
so I thought I would publish some hard data and advice for integers1, e.g. 50331648.
everyone. • country code — The 2-
Unfortunately, R-tree (spatial) indexes have not been added to letter ISO country code for
InnoDB yet, so the tricks in this entry only work efficiently with the country to which this IP
MyISAM tables. They should work with InnoDB but they will address has been assigned,
perform poorly. This is actually OK for the most part, as the geo- or in some cases other
referencing functionality most people need doesn’t really need strings, such as “A2″
transactional support, and since the data tables are basically read- meaning “Satellite Provider”.
only (there are monthly published updates), the likelihood of • country name — The full
corruption in MyISAM due to any server failures isn’t very high. country name of the same.
The Data Provided By MaxMind This is redundant with the
MaxMind ( is a great company that produces country code if you have a
several geo-referencing databases. They release both a commercial lookup table of country
(for-pay, but affordable) product called GeoIP, and a free version of codes (including MaxMind’s
the same databases, called GeoLite. The most popular of their non-ISO codes), or if you
databases that I have seen used is GeoLite Country. This allows you make one from the GeoIP
look up nearly any IP and find out which country (hopefully) its user data.
resides in. The free GeoLite versions are normally good enough, at
about 98% accurate, but the for-pay GeoIP versions in theory are
more accurate. In this article I will refer to both GeoIP and GeoLite
Efficient Geo-referencing
as “GeoIP” for simplicity.
continues on page 8
GeoIP Country is available as a CSV file containing the following
• ip from, ip to (text) — The start and end IP addresses as text

MySQL, MySQL logos and the Sakila dolphin are registered trademarks of MySQL AB in the United
States, the European Union and other countries. MySQL Magazine is not affiliated with MySQL AB,
and the content is not endorsed, reviewed, approved nor controlled by MySQL AB.
Page 8 Spring 2008 Issue 4 MySQL Magazine

Efficient Geo-referencing continued from page 7 Unfortunately, while simple and natural, this construct
is extremely inefficient, and can’t effectively use indexes
A Simple Way to Search For an IP
(although it can use them, it isn’t efficient). The reason for
Once the data has been loaded into MySQL
this is that it’s an open-ended range, and it is impossible
(which will be explained in depth later), there
to close the range by adding anything to the query. In fact
will be a have a table with a range (a lower
I haven’t been able to meaningfully improve on the
and upper bound), and some metadata about
performance at all.
that range. For example, one row from the
GeoIP data (without the redundant columns)
A Much Better Solution
looks like: While it probably isn’t the first thing that would come to
mind, MySQL’s GIS support is actually perfect for this task.
ip_from ip_to country_code
Geo-referencing an IP address to a country boils down to
50331648 68257567 US “find which range or ranges this item belongs to”, and this
The natural thing that would come to can be done quite efficiently using spatial R-tree indexes
mind (and in fact the solution offered by in MySQL’s GIS implementation.
MaxMind themselves2) is BETWEEN. A simple The way this works is that each IP range of (ip_from,
query to search for the IP would be: ip_to) is represented as a rectangular polygon from
SELECT country_code (ip_from, -1) to (ip_to, +1) as illustrated on the next page:
FROM ip_country
BETWEEN ip_from AND ip_to; Efficient Geo-referencing continues on page 9

Efficient Geo-referencing continues on page 8

Page 9 Spring 2008 Issue 4 MySQL Magazine

Efficient Geo-referencing continued from page 8 Pretty cool huh? I will show how to load
the data and get started, then take look at
how it performs in the real world, and
compare the raw numbers between the two

Loading Data and Preparing

For Work
First, a table must be created to hold the
data. A POLYGON field will be used to store
In SQL/GIS terms, each IP range is represented by a 5- the IP range. Technically, at this point the
point rectangular POLYGON like this one, representing the ip_from and ip_to fields are unnecessary,
IP range of – but given the complexity of extracting the
POLYGON(( IPs from the POLYGON field using MySQL
50331648 -1,
68257567 -1, functions, they will be kept anyway. This
68257567 1, schema can be used to hold the data4:
50331648 1,
CREATE TABLE ip_country(
50331648 -1
The search IP address can be represented as a ip_poly POLYGON NOT NULL,
point of (ip, 0), and that point with have a relationship with ip_from INT UNSIGNED NOT NULL,
at least one of the polygons (provided it’s a valid IP and
country_code CHAR(2) NOT NULL,
part of the GeoIP database) as illustrated here: PRIMARY KEY (id),
SPATIAL INDEX (ip_poly));
After the table has been created, the
GeoIP data must be loaded into it from the
CSV file, GeoIPCountryWhois.csv,
downloaded from MaxMind. The LOAD
DATA command can do this like so:
Efficient Geo-referencing
continues on page 10

Have a topic for

It is then possible to search these polygons for a
a future edition
specific point representing an IP address using the GIS of the MySQL Magazine?
spatial relationship function MBRCONTAINS and POINT3 to Send articles and ideas to
search for “which polygon contains this point” like this: Keith Murphy, Editor
SELECT country_code
FROM ip_country
), 0)));
Page 10 Spring 2008 Issue 4 MySQL Magazine

Efficient Geo-referencing continued from page 9 There are a few interesting metrics that I
LOAD DATA LOCAL INFILE "GeoIPCountryWhois.csv" tested for:
INTO TABLE ip_country
FIELDS TERMINATED BY "," • The latency and queries per
ENCLOSED BY "\"" second with a single client
LINES repeatedly querying.
( • Does the number of queries
@ip_from_string, @ip_to_string, handled increase as the number
@ip_from, @ip_to, of clients increases?
@country_code, @country_string
) • Is latency and overall performance
SET adversely affected by many
id := NULL, clients?
ip_from := @ip_from,
ip_to := @ip_to, The test consisted of an IP search using
ip_poly := GEOMFROMWKB(POLYGON(LINESTRING( the two different methods, and varying
/* clockwise, 4 points and back to 0 */
the number of clients between one and
POINT(@ip_from, -1), /* 0, top left */
POINT(@ip_to, -1), /* 1, top right */ sixteen in the following configurations:
POINT(@ip_to, 1), /* 2, bottom right */
POINT(@ip_from, 1), /* 3, bottom left */ Clients Machines Threads
POINT(@ip_from, -1), /* 0, back to start */
))), 1 1 1
country_code := @country_code
; 2 1 2
During the load process, the ip_from_string, ip_to_string, and
4 1 4
country_string fields are thrown away, as they are redundant.
A few GIS functions are used to build the POLYGON for 8 2 4
ip_poly from the ip_from and ip_to fields on-the-fly. On my
test machine it takes about five seconds to load the 96,641 16 4 4
rows in this month’s CSV file. Each test finds the country code for a
At this point the data is loaded, and everything is ready to random dotted-quad format IP address
go to use the above SQL query to search for IPs. Try a few out passed in as a string.
to see if they seem to make sense!
Performance: The test setup
How does it perform? How
In order to really test things, a bigger load testing
framework will be needed, as well as a few machines to does it compare?
generate load. In my tests the machine being tested, kamet, There are a few metrics for
is a Dell PowerEdge 2950 with Dual Dual Core Xeon 5050 @ determining the performance of these
3.00Ghz, and 4 GB RAM. We have four test clients, searches. If you tried the BETWEEN
makalu{0-3}, which are Apple Mac Mini with 1.66Ghz Intel version of this query, you may have
CPUs and 512MB RAM. The machines are all connected with a noticed that, in terms of human time, it
Netgear JGS524NA 24-port GigE switch. For the purposes of doesn’t take very long anyway: I pretty
this test, the disk configuration is not important. On the consistently got one row in set (0.00 sec).
software side, the server is running CentOS 4.5 with kernel But don’t let that fool you.
2.6.9-55.0.2.ELsmp. The Grinder 3.0b32 is used as a load
generation tool with a custom Jython script5 and Connector/J It’s clear that GIS wins hands down.
5.1.5 to connect to MySQL 5.0.45. Efficient Geo-referencing
continues on page 11
Page 11 Spring 2008 Issue 4 MySQL Magazine

Efficient Geo-referencing continued from page 10

First a look at raw performance in terms of queries per second.
Using BETWEEN, we max out at 264 q/s with sixteen clients:

Using MBRCONTAINS, we max out at 17600 q/s with sixteen clients, and it appears that it’s the test clients
that are maxed out, not the server:

Next, a look at latency of the individual responses.

Using BETWEEN, we start out with a single client at 15.5 ms per request, which is not very good, but still
imperceptible to a human. But with sixteen clients, the latency has jumped to 60 ms, which is longer than
many web shops allocate to completely construct a response. As the number of test clients increases, the
latency gets much worse, because the query is so dependent on CPU:

Efficient Geo-referencing continues on page 12

Page 12 Spring 2008 Issue 4 MySQL Magazine

Efficient Geo-referencing continued from page 11

Using MBRCONTAINS, we start out with a single client at 0.333 ms per request, and even with sixteen clients,
we are well under 1 ms at 0.743 ms:

Definitely consider using MySQL GIS whenever you need to search for a point within a set of ranges.
Performance is fantastic, and it’s relatively easy to use. Even if you are an all-InnoDB shop, as most of our
customers are (and we would recommend), it may very well be worth it to use MyISAM specifically for this
Update 1: Another way to do it, and a look at performance
Andy Skelton and Nikolay Bachiyski left a comment below suggesting another way this could be done:
SELECT country_code
FROM ip_country
WHERE ip_to >= INET_ATON('%s')
LIMIT 1 Efficient Geo-referencing continues on page 13
Page 13 Spring 2008 Issue 4 MySQL Magazine

Efficient Geo-referencing continued from page 12

This version of the query doesn’t act exactly the same as the other two — if your search IP is not part of any
range, it will return the next highest range. You will have to check whether ip_from is <= your IP within your
own code. It may be possible to do this in MySQL directly, but I haven’t found a way that doesn’t kill the
Andy’s version actually performs quite well — slightly faster and more scalable than MBRCONTAINS. I added
two new performance testing configurations to better show the differences between the two:
Clients Machines Threads

32 4 8

64 4 16
Here’s a performance comparison of MBRCONTAINS vs. Andy’s Method:
Latency (ms) — Lower is better:

Queries per second — Higher is better:

Efficient Geo-referencing continues on page 14

Page 14 Spring 2008 Issue 4 MySQL Magazine

Efficient Geo-referencing continued from page 13 DriverManager.registerDriver(Driver())

Once I get some more time to dig into this, I will p = grinder.getProperties()
look at why exactly BETWEEN is so slow. I’ve also p.setLong("grinder.threads", 8)
run into an interesting possible bug in MySQL: If p.setLong("grinder.runs", 100000000)
you add a LIMIT 1 to the BETWEEN version of the p.setLong("grinder.duration", 120 * 1000)
query, performance goes completely to hell. Huh? t = Test(1, "Query")
Thanks for the feedback, Andy and Nikolay.
def getConnection():
FOOTNOTES return DriverManager.getConnection(
1 MySQL provides the INET_ATON() and "jdbc:mysql://server/geoip",
"geoip", "geoip")
INET_NTOA() functions for converting back and
forth between dotted-quad strings (CHAR(15)) class TestRunner:
and 32-bit integers (INT UNSIGNED). You can also def __init__(self):
self.connection = getConnection()
use the equivalent functions, if they exist, if your
favorite programming language so that you can def __call__(self):
just feed an integer to MySQL. I haven’t tested the r = Random()
s = self.connection.createStatement()
(positive) performance implications of doing that.
2 Although, strangely they offer a different q = t.wrap(s)
solution specifically for MySQL using <= and >= ip = "%i.%i.%i.%i" % ((r.nextInt() %
operators instead of BETWEEN. I don’t find that 256), (r.nextInt() % 256), (r.nextInt() %
that difference has any effect on MySQL. Maybe it 256), (r.nextInt() % 256))
was for a really old version of MySQL that didn’t # Using BETWEEN
have BETWEEN? #q.execute("select country_code from
3 ip_country_bad where inet_aton('%s')
Pet peeve: Why does MySQL require you to pass between ip_from and ip_to" % ip )
the output of its own POLYGON, LINESTRING,
POINT, etc., functions through GEOMFROMWKB in # Using MBRCONTAINS
#q.execute("select country_code from
order to use them? It makes life suck that little bit ip_country where mbrcontains(ip_poly,
more than necessary. pointfromwkb(point(inet_aton('%s'), 0)))"
% ip )
Note that if you’re looking to play around with
the BETWEEN version of things, you will want to s.close()
add some indexes on ip_from and ip_to. I would
def __del__(self):
recommend INDEX (ip_from, ip_to) and INDEX self.connection.close()
(ip_to, ip_from) as those two seemed to perform
the best that I could find (given its poor efficiency About the Author
to start with). Jeremy Cole is MySQL Geek and co-founder at
5 Custom Jython Script listing: Proven Scaling where he consults on architecture,
performance, scalability, reliability, and availability.
from net.grinder.script.Grinder import Prior to starting Proven Scaling, Jeremy was MySQL
from net.grinder.script import Test Geek at Yahoo! Inc., where he was responsible for
internal consulting and support, and maintained
from java.util import Random the internal releases of MySQL. Previous to Yahoo!
from java.sql import DriverManager Inc, Jeremy worked for over four years at MySQL
from com.mysql.jdbc import Driver AB.
Page 15 Spring 2008 Issue 4 MySQL Magazine

Introducing Kickfire non-profit organization was formed call the

Transaction Processing Performance Council
WARNING (TPC). They have developed standard tests that
The reader needs to understand that the can be used to compare performance of database
Kickfire Database Appliance will be released in servers on a variety of hardware configurations.
Beta version on April 14, 2008. While I have talked One of the test developed by TPC is call TPC-H.
extensively with Kickfire about the product and its The test is defined in Wikipedia, ”The TPC
capabilities I have not had a chance to work Benchmark™H (TPC-H) is a decision support
hands-on with the Kickfire product. Benchmarks benchmark. It consists of a suite of business
in this article were independently execute and oriented ad-hoc queries and concurrent data
audited. That being said, I have seen a demo modifications. The queries and the data
system running the benchmark tests. It does populating the database have been chosen to
perform as reported. In addition, the TPC have broad industry-wide relevance. This
organization audited these test. benchmark illustrates decision support systems
MySQL is not the first thing to come to mind that examine large volumes of data, execute
when thinking “data warehousing”. When queries with a high degree of complexity, and
discussing hundreds and thousands of terabytes give answers to critical business questions.”
of data are in storage companies tend to think
more about Oracle or other specialized servers. Wikipedia continues by saying “The
That might be about to change... performance metric reported by TPC-H is called
A small Silicon Valley company, based in Santa the TPC-H Composite Query-per-Hour
Clara, Calif., is gearing up to release a Performance Metric (QphH@Size), and reflects
revolutionary new product called the Kickfire multiple aspects of the capability of the system to
Database Appliance. Do not let the name mislead process queries. These aspects include the
selected database size against which the queries
you. This product could turn the entire database
are executed, the query processing power when
world on its collective ear. It is a new paradigm
queries are submitted by a single stream, and the
for thinking about SQL and data warehousing and
query throughput when queries are submitted by
how to best utilize it.
multiple concurrent users. The TPC-H
Before we look into the details of what the
Price/Performance metric is expressed as
Kickfire Database Appliance entails, it is useful to
define a few terms so that everyone understands
what the discussion is all about. First, what does
Now that some background has been
data warehousing mean? According to Wikipedia,
established, it is time to answer the ever pressing
a 'data warehouse' is a repository of an
question - what's the big deal? The big deal is a
organization's electronically stored data. Data
chip. To provide some context, remember
warehouses are designed to facilitate reporting
graphics accelerator chips when they first came
and analysis.” This is fairly easy to understand.
out? They transferred some of the graphics
Data warehouses are a fairly new paradigm in the
software manipulation from the operating system
database world with the concept being introduced
level onto a chip. Over time this has resulted in
in the mid '80s by Barry Devlin and Paul Murphy,
dramatic improvements in performance. Kickfire
researchers at IBM. It wasn't until the mid-90s
has done the equivalent of this for MySQL. An
that data warehousing came into significant use.
expansion card that includes a SQL accelerator
When benchmarking data warehousing tools it
chip plugs into a server. With the addition of the
is important to use standard tests. In 1988 a
Kickfire continues on page 16
Page 16 Spring 2008 Issue 4 MySQL Magazine

Kickfire continued from page 15 accelerate reads from the database. Typically an
Linux operating system, MySQL and additional Online Transaction Processing (OLTP) website will
software that acts as glue – the end result is a contain a significant mix of reads and writes.
system that performs significantly faster. The That is why this product is targeted at data
additional chip is called a SQL Chip. The SQL Chip warehousing where the dataset does not change
is a device that will at the same time both execute that much and the application is primarily reading
queries in parallel and accelerate their execution from the database(es).
time. As this is done directly in the SQL Chip, it The first generation will accelerate queries on
offloads the base or host server. up to three terabytes of data on a single node.
There are three basic components to a Kickfire Over time this is expected to increase to ten's of
Database Appliance. The first part is what we are terabytes.
all familiar with – the MySQL server. Next, there is Kickfire currently has audited results of their
a software component that Kickfire provides – a TPC-H benchmarks on a 100 gigabyte dataset.
new storage engine for MySQL, including Three systems were compared: a Sybase IQ/Sun
optimizer components. Finally, the SQL Chip system ($45,467), a Microsoft SQL Server/HP
provides acceleration of the SQL execution. system ($19,437) and a MySQL/Kickfire system
MySQL provides: ($34,300). The performance of these system was
● connectivity compared in two ways, Query Geometric Mean
● security (the geometric mean of the 22 queries in the TCP
● administration benchmark) and QphH (Queries executed per
hour). With Query G.M. the comparison is as
Kickfire software provides: follows:
● optimizer
● column store and cache Sybase IQ/Sun 43.60
● transactional engine Microsoft/HP 64.40
MySQL/Kickfire 5.32
SQL Chip provides:
● SQL execution In this case smaller (execution times) is going to
● memory management be better. Clearly Kickfire is the winner.
● loader acceleration
With the QphH comparison:
Utilizing the Kickfire Database Appliance a
company can perform many tasks that would Sybase IQ/Sun 8,587
normally be performed by a high-end Oracle Microsoft/HP 4,521
server in a data warehousing setup. MySQL/Kickfire 49,267
What does the future hold? Kickfire has plans
to release some higher-end units that are With the QphH metric a larger number is
equipped to handle larger datasets. Also, Kickfire better. Once again, Kickfire is the winner being
is looking at integrating multiple SQL Chips in a over 5.5 times faster than the Sybase system and
system which should ultimately help build even over 10.8 times faster than the MSFT system.
higher-throughput solutions. Kickfire is now certified by the TPC as the number
Is this going to replace the MySQL server one data warehousing system in terms of
powering a busy Internet website? No. The reason price/performance and the number one system in
why is that the function of the SQL Chip is to
Kickfire continues on page 17
Page 17 Spring 2008 Issue 4 MySQL Magazine

Kickfire continued from page 16

terms of performance for non-clustered (single-
node) systems.
Kickfire did one other independently audited test
run. Percona Consulting
ran the TPC-H test on a Dell server running MySQL
and also a Kickfire system (the same one used in the
other TPC testing). Here are the results:
Query G.M.
MySQL/Dell ($15,000) 5,188.00
MySQL/Kickfire ($34,300) 5.32

Once again, smaller numbers are better. Clearly

the MySQL/Kickfire server is far faster. In fact, some
of the queries running on the MySQL/Dell server
were killed because they were taking more than three
hours to run.
In the words of Raj Cherabuddi, Kickfire's CEO and
co-founder, “The problem today is not query
performance. There are lots of vendors that solve
that problem. At Kickfire, we wanted to deliver high
performance without the massive hardware, power,
cooling, and space costs of existing offerings. That's
why what takes competitors half a room of hardware
to do, we can do in a double height pizza box with the power consumption of a microwave oven. “
It is a very safe statement to say that this is what sets Kickfire apart from any other data warehousing
company. To solve problems in an efficient manner you ordinarily have to spend large sums of money.
Kickfire wants to solve problems in an efficient manner without spending more money than many mortgages.
Some might wonder why Kickfire chose MySQL instead of another database. Raj had this to say about their
choice of MySQL “We chose MySQL not only because it is the leading open source database but, perhaps more
importantly, it has emerged as a de facto standard. Being a standard, MySQL enjoys extensive third-party
vendor and service provider support and is surrounded by an ecosystem of skills and knowledge all of which
Kickfire can leverage.”
Kickfire is something new. They bring an approach to resolving the problems of data warehousing that I
don't believe has ever been tried before. I am confident in saying that with the level of performance offered,
the pricing is going to be very low when compared to other options available on the market today. I might be
pulling out the crystal ball, but I see a very bright future for Kickfire.

Kickfire Homepage:

TPC Benchmark Resources:
Page 18 Spring 2008 Issue 4 MySQL Magazine

Automatic Query Optimization with QOT

This article is about QOT - Query analysis an Optimization Tool for MySQL. The initial idea behind QOT was
to create an open source tool that would help DBAs and developers to create optimal indexes based on
existing table structure and queries that will be executed by the application. Later when the development was
started it appeared that along with index generation the tool can provide several additional capabilities based
on the query analysis that it performs. This includes error checking, SQL static checking, query rewrites, query
reverse engineering and more...
Some of the mentioned features are already available, other are still in development or are just planned. In
this article I will focus on the currently available features (as of version 0.0.2). I will start with the primary
feature – index generation.
Automatic Index Generation
Whenever you write a new SELECT query you always face the same question - how fast will it perform, or in
other words how good will the query fit the existing data and index structure. To answer this question you
need to have an idea about several facts such as:

● what is the current data structure?

● what indexes are already defined?
● how can MySQL optimizer use existing indexes with the new query?
● which indexes can be added to improve query performance?

Needless to say that gathering and analyzing all this information is not a trivial task and this is where QOT
can be useful. The tool can analyze the existing data and index structure and find out which indexes will be
available to the optimizer during the execution of the query. Also the tool can automatically generate the
missing indexes to improve query performance.
We will consider several use cases. Just before we start let me mention that the tool works fully
autonomously and doesn’t require a MySQL server. All the input data (queries and DDL) is provided via files
and directly on the command line.
In the examples below I will use a sample schema consisting of two tables that could serve as a simple
music catalog (I will assume that the below model definition is stored in file music.sql):


USE music;
How about a new MySQL information site:
id INT NOT NULL PRIMARY KEY, started by Matt Reid.
band_title VARCHAR(45));
It is a blog aggregate and news site.


band_id INT NOT NULL,
album_title VARCHAR(45),
INDEX ix0 (band_id));
QOT continues on page 19
Page 19 Spring 2008 Issue 4 MySQL Magazine

Kickfire continued from page 18

This is a simple model with a one-to-many relationship. Tables have primary keys defined. Field band_id in
table albums serves to store the relationship. Thus it has an index. This is a very typical design that you would
get using a ER design tool.
Now I will write several queries for the music catalog and use QOT to optimize them. Query one will fetch
the ids of all albums of a given band:

SELECT id FROM music.albums WHERE band_id = 3;

Run QOT (I won’t focus on command line options, as I hope their meaning is not too hard to guess. Please
refer to the tool documentation for details):

$ qot \
--input-file="music.sql" \
--input-query="SELECT id FROM music.albums WHERE band_id = 3;" \

/* Output produced by qot 0.0.2 GPL */

Query: SELECT id FROM music.albums WHERE band_id = 3
selectivity: zero or more rows
tables used in this query: albums (zero or more rows)
existing lookup indexes for this query:
existing covering indexes for this query:

The default output mode is SQL, that is why all free-text information is enclosed in comments (more on
this later). This query did not produce any error messages, so right under the query text you can find the
information about query selectivity. This value is calculated per table based on the indexes used and the
WHERE-condition. Next goes the information about the existing indexes that can be used with this query.
Lookup indexes are the indexes that can be used to resolve the WHERE-condition. Covering indexes are ones
that can be used to completely avoid table access, another words, a covering index can be used to both
resolve WHERE-condition and to return the SELECTed item list.
The tool reports that there is an index (ix0) that can be used by the optimizer for lookup and there are no
covering indexes defined. Overall this output means that the optimizer will have only one index to consider
for this query. Also this means that for every row that matches the WHERE-condition (i.e. for every found
music album) MySQL will perform a table access which implies at least one additional random I/O for every
found row. This may (depending on the table state) force optimizer to resort to the full table scan making the
existing index practically unusable. The optimizer would not have to perform table access if all the query
fields were included in the index. So it looks like a covering index (i.e. an index that includes all query fields)
would save the situation. Let’s use the tool to generate it:

QOT continues on page 20

Page 20 Spring 2008 Issue 4 MySQL Magazine

QOT continued from page 19

$ qot \
--input-file="music.sql" \
--input-query="SELECT id FROM music.albums WHERE band_id = 3;" \

/* Output produced by qot 0.0.2 GPL */

/* additional covering indexes that can be created to improve query performance */
CREATE UNIQUE INDEX ix1 ON `music`.`albums` (

QOT has generated an index which can be used as a covering index for this query. Notice that the output is
valid SQL so you can redirect it to the input of the mysql monitor or execute in Query Browser. You may
wonder why the generated index is UNIQUE. That is because it includes the primary key id column which is
If you apply the new index to the server and run an EXPLAIN SELECT command for the above query you
will see that the optimizer really likes the new index:

mysql> explain SELECT id FROM music.albums WHERE band_id = 3 \G

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: albums
type: ref
possible_keys: ix1,ix0
key: ix1
key_len: 4
ref: const
rows: 2
Extra: Using index
1 row in set (0.00 sec)

Notice the Using_index value in the Extra column. It indicates precisely what we wanted to achieve – that
MySQL will not touch the table.

Should we always generate the missing covering keys? Well, it depends… the major factors are the query
selectivity ratio and the time saved in the query with the new covering index versus the additional time lost in
UPDATE, INSERT and DELETE queries for index maintenance. The future versions of QOT will give hints to help
make the right decision with this.

Now that we have an idea about generating covering indexes it is time to try something more complex. In
the next example there will be queries that do not have lookup indexes. We will see how to fix this using QOT.

QOT continues on page 21

Page 21 Spring 2008 Issue 4 MySQL Magazine

QOT continued from page 20

Want to learn about MySQL?
SELECT album_title
FROM albums
WHERE rating = 5; For free?

SELECT album_title MySQL University is in session!

FROM albums
WHERE band_id = 1 AND rating = 5;
SELECT band_title
FROM albums
INNER JOIN bands ON ( Sessions scheduled every
WHERE where band_id = 1 Thursday at 15:00 CET
AND rating = 5;

Run QOT:

$ qot \
--input-file="music.sql" \
--input-query="select album_title from albums where rating = 5;\
select album_title from albums where band_id = 1 and rating = 5;\
select band_title from albums, bands where band_id = 1 \
and rating = 5 and = band_id;" \
--info \

/* Output produced by qot 0.0.2 GPL */

Query: select album_title from albums where rating = 5
selectivity: zero or more rows
tables used in this query: albums (zero or more rows)
existing lookup indexes for this query:
existing covering indexes for this query:
Query: select album_title from albums where band_id = 1 and rating = 5
selectivity: zero or more rows
tables used in this query: albums (zero or more rows)
existing lookup indexes for this query:
existing covering indexes for this query:
/* QOT continues on page 22
Page 22 Spring 2008 Issue 4 MySQL Magazine

QOT continued from page 21

Query: select band_title from albums, bands where band_id = 1 and rating = 5 and
= band_id
selectivity: zero or more rows
tables used in this query: bands (zero or more rows), albums (zero or more rows)
existing lookup indexes for this query:
existing covering indexes for this query:
/* additional merged lookup indexes that can be created to improve performance of *all*
the above queries */
CREATE INDEX index0 ON `music`.`albums` (

The report indicates that the first query cannot use any indexes at all. This means that the optimizer will
have to resort to full table scan. The second query has some relevant indexes but the indexes are not optimal
as they do not cover all the fields from the WHERE-condition. The same is true for the last query. Finally QOT
proposes to create a single index on the albums table that will fix problems with all three queries. Let me
explain why this index would be so good.
The main reason is that the index fits the pattern of the WHERE conditions of all these queries. In this form
the index can be used with both expressions like (rating = 5) and (band_id = 1 and rating = 5). Neither
reversed index (band_id, rating) nor separate indexes for band_id and rating columns would be as good here.
With this example I will conclude the description of index generation features. I will not cover the topic of
per-query lookup index generation, because it is very similar to the last example with the exception that the
indexes are generated separately for every query. This makes sense if you want per-query optimization.

Other Features
Besides the index generation this release includes query rewrites and static SQL checking.


Query rewriting is modifying of query SQL source. There is only one rewrite available at the moment – the
expansion of ‘*’ wildcards into field lists. Keeping wildcards in your queries might be sometimes unwanted.
For example, if you later add a column to the table it will be automatically fetched by all the queries that use
the ‘*’ wildcard, which is often not desired. QOT can help you convert those into fixed field lists. You can do
that manually by running the tool and copy/pasting the output or if you code editor supports some kind of
scripting with the possibility to run shell commands you can write a simple script for this task. In order to
facilitate such scripting QOT supports XML output mode:

QOT continues on page 23

Page 23 Spring 2008 Issue 4

QOT continued from page 22

$ qot \
--input-file="music.sql" \
--input-query="select * from albums" \
--rewrite=expand-stars \

<?xml version="1.0" encoding="UTF-8"?>

<qot version="0.0.2 GPL">
<sql><![CDATA[select `music`.`albums`.`id`, `music`.`albums`.`band_id`,
`music`.`albums`.`album_title`, `music`.`albums`.`rating` from


Static checking is the process of testing source code for constructs that are formally correct but might
cause problems during code execution. At the moment QOT is able to analyze SQL for type conversion related
problems. Consider the following query:

SELECT * FROM albums WHERE id='0'

It might look quite innocent, but running the tool:

$ qot \
--input-file="music.sql" \
--input-query="select * from albums where id='0'" \

/* Output produced by qot 0.0.2 GPL */

/* Static query checking results:
WARNING at 'id='0'': string to numeric conversion, all alphabetic values will evaluate to

The warning suggests that the literal will be parsed as a string - not a number – and the server will perform
conversion of the string value ‘0’ into integer during query evaluation. This is, of course, a performance issue
but that is not the only problem. The values like ‘0’ or ‘123’ will be converted as expected, but the conversion
result might be surprising for values like ‘1abc’ or ‘a’. The first will be converted to 1 and the second to 0.

QOT continues on page 24

Page 24 Spring 2008 Issue 4 MySQL Magazine

QOT continued from page 23

Using QOT in Physical Database Design

Usually physical database design involves defining indexes on tables. Indexes are usually defined based on
relations between entities. But there is something a bit surprising about this. If you scroll back to the above
examples you will see that in four fairly common queries the original index that was created on the band_id
field appeared to be suboptimal. Moreover, the newly generated indexes totally superseded it. The only thing
it now does is take additional maintenance time. Why this happened? In general the reason for such problems
is that the index design was based on the model relations and not on the queries that were to be executed on
the model.
Shouldn’t all the indexes be designed based on the queries that they are created for? Yes, this seems
reasonable but there is a problem - complexity. Even for a mid-size database application there could be
hundreds of queries with various filtering conditions and figuring out the correct indexes manually or mentally
based on the set queries is a hard, error-prone work. That is why model designers resort to indirect
approximate methods. For example if we know that customers and orders have a relation then it is likely that
there will be a query that fetches records based on this relation, etc. This approach works but it is obviously
suboptimal, as there’s no guaranteed correspondence between indexes and queries.
A more optimal way to design indexes is to avoid defining indexes based on entity relations and define
manually only primary keys and other semantic constraints. After the table structure with primary keys is
defined and set of queries that the application will use to fetch data is known, all this information can be
passed to QOT to generate optimal indexes. Besides the direct gain of getting optimal index structure and
having all your queries checked for potential problems you make further development of your database much
easier. Every time database structure changes you just remove old indexes and rerun the tool.
That is it for now. I hope you find the tool and this article interesting and valuable. To get more information
about the tool visit project’s home page

About the Author

Vladimir Kolesnikov is a software engineer at MySQL AB. He is involved in the development of the MySQL

In case you don't get your MySQL news from anywhere else – I have a shocker. Sun bought MySQL AB for
one billion USD in stock and cash earlier this year. The deal was finalized just a few weeks ago. So there -
now you know. And I thought I would take the time to write a couple of paragraphs about how this will affect
the MySQL community down the road.
There has been much virtual ink (blog entries) written about this deal. Some people feel like this could be
the best thing that could ever happen to MySQL. Some people probably think this is the death knell of MySQL.
Me?I certainly don't think this is the “end” for MySQL. I am somewhere in the middle. Sure things will change,
but change is part of life.
Examining the worst case scenario first, here is what I see. Let's say Sun pretty much kills MySQL, runs off
the core developers and basically buries it. That is very unlikely considering the amount of money invested
but lets assume this is what they do.

log-bin continues on page 25

log-bin continued from page 24 MySQL provides the same or better performance
While the code of MySQL is very complex, on the same hardware. And with a beefed up
possibly as complex as anything that runs on support program through Sun large companies will
Linux (including the kernel), it is still open source. feel safe put their data in the hands of a MySQL
That means that the project can “fork”, taking the database. This will only create more opportunities
current codebase of MySQL and going off and for MySQL DBAs.
playing with it on our own..maybe even under an Somewhere in between these two views is
new name. Even if the server changed names the probably where the future is found. Sun is a
project could conceivably continue. Even with this gargantuan corporation. MySQL isn't small
worst case it is entirely possible (and very anymore. Meshing these two companies together
probably) that MySQL continues. will create some problems. There will be MySQL
What is the best case? Do you realize that Sun employees who leave. Adjustments will be
has some of the best hardware and software necessary. Some operational aspects will probably
engineers in the world? Back in 2000 and 2001 I not be as smooth as we would like. However, in
worked extensively with Sun servers running the end, there will be new opportunities for
Solaris. These servers provided incredible amount growth. This cross-pollination I wrote about will
of processing power at the time. They supported occur even if only on a limited basis. The product
up to 64 CPUs and 64 gigabytes of RAM. Their that comes out of this marriage will be stronger
hardware and software has only continued to for it. Sun probably has servers in every Fortune
improve. What if Sun “cross-pollinates” some of 500 company in the United States. Now that
these engineers to help improve the MySQL MySQL is a Sun product there will be new
codebase? What if engineers who have worked on opportunities for MySQL in these companies.
parallel-processing problems contributing to MySQL can continue to embrace future
MySQL server? And it would be only natural to technology and extend its reach ever deeper into
improve the ability of MySQL to run on Sun data centers everywhere. That is where I am
hardware with Solaris (the number three platform placing my bet. Feel free to write me with your
for MySQL server). In addition, over the last thoughts at
several years, Sun has made a commitment to
Linux. Did you know that Sun sells a server that Thanks,
supports up to 16 cores of processing power and a
whopping 256 GB of RAM and runs either Linux or Keith
Solaris? Meet the SunFire X4600.
I know that it is very important that MySQL
scales out. It does this very well. But there is also
opportunities for MySQL to scale up. There are
companies who run Oracle on some very high end
boxes. Sun engineers working with MySQL
engineers can help make it possible to replace
Oracle on those servers and save companies
millions of dollars. What a way for a company to
cut cost. Sure it cost money to migrate large
datasets and applications from Oracle to MySQL.
But with the astronomical licensing fees of Oracle
it makes perfect sense for companies to do this if