Professional Documents
Culture Documents
Summer 2009
Issue #1
http://www.osdbzine.net
Page 2 Open Source Summer 2009
Database Magazine
Welcome to the inaugural issue of Open Source Database Magazine. It is my goal that this magazine
provides a place for people to learn about open source databases of any stripe - be they Postgres,
SQLite, MySQL, Drizzle, CouchDB, Hadoop or something else. The exchange of knowledge is partly
what makes the open source software so powerful. Open Source Database Magazine is designed to
help facilitate that exchange of knowledge. Drop by and learn something new. And remember, there is
always a chance to share some of your knowledge with others. Submit an article for the next issue!
News
There has been much news lately in the open source db world. In April there were two huge
announcements at the MySQL Conference. Sun announced that there would be a new version of
MySQL - MySQL Server 5.4. This version was primarily centered around performance and scaling
improvements - bringing many community performance patches into the fold. A future issue of OS DB
Magazine will cover in detail this release. In addition, it was announced that Oracle would be buying
Sun. In June Sun announced that developement of MySQL Server would be moving forward with a new
development model - one that is designed to insure a more consistent release cycle.
July the first brought news from the PostgreSQL world with the announcement of the release of
PostgreSQL Server version 8.4. This is a major new release with many features. This issue has a feature
article by Robert Treat on these new features beginning at page 20.
The hot days ofAugust bring the second edition of OpenSQL camp (http://www.opensqlcamp.org). This
time around it's the European edition, held in conjuction with FrOSCon. Dates are August 22 and
August 23 and location is near Bonn, Germany.
On September the 19th there will be a PgDay in Athens, Georgia at the University of Georgia. The call
for papers ongoing. Details are available at http://www.postgresqlconference.org/2009/pgday/athens.
In This Issue
3 The Book Shelf
5 Coding Corner: Who Knows Where the Time Goes
13 The Lab: The XtraBackup Program
20 Postgre 8.4
24 Transaction Log
Notice: all trademarks and registered trademarks are the property of their respective owners.
Page 3 Open Source Summer 2009
Database Magazine
On the front cover of MySQL Administrator's to make MySQL the number one open source
Bible is a sentence that reads: database used on the Internet. After that, this
section covers the same type of topics covering
"The book you need to succeed!" just about any other mainstream databases such as
using stored procedures, cursors, events, views
I must say, I do agree. Authored by two very and transactions.
experienced DBAs, Sheeri Cabral and Keith
Murphy, they have combined their talents to cover The Core MySQL Administration section is the
what you really need to know to succeed. This book heart of this book. It covers MySQL server tuning,
is very versatile. If you are new to MySQL, or covering all major storage engines including
experienced in another database and have to start MyISAM, InnoDB, Falcon, PBXT, and NDB
administrating MySQL, you need this book. I can engines including the first time I've seen in print,
honestly say, even if you have years of MySQL the Maria storage engine. An entire chapter is
experience, you will learn something new. I did. devoted to implementing cache tables and using
Divided into four parts, MySQL Administrator's the query cache. Memcached is also mentioned,
Bible covers your First Steps with MySQL, and mentioned again in the final section.
Developing with MySQL, Core MySQL Continuing on with what I consider the most
Administration and Extending Your Skills. important job of a DBA, backup and recovery.
Databases are very central to running a business,
First Steps with MySQL starts with a gentle any data loss could put a company out of
introduction to MySQL with company information, business. Be prepared.
which seems to be changing annually, and most
importantly, the MySQL community itself. What This section gives a solid introduction to the topic
makes MySQL so fantastic is the community. After of dealing with users, and how they are managed
that, you will be lead into installing and configuring within MySQL. Count on covering
MySQL on various platforms including Linux, GRANT/REVOKE, using SHOW GRANTS and
Windows and Solaris while touching on post mkshowgrants Maatkit tool. Partitioning,
installation configuration too. Basic security is logging and replication and measuring
covered as well as some tips on troubleshooting and performance rounds out this section.
accessing your new MySQL installation using tools
included with MySQL or using third party If you have experience with another RDBMS,
software. plan on spending a significant amount of time in
this section. Not that the other sections aren't
Developing with MySQL covers the MySQL important, they are, but this is the bread and butter
Language Structure and if you're coming from of what a MySQL DBA does on a daily basis.
another RDBMS, it covers how MySQL deviates
from the SQL standard by extending that standard Book Corner continues on the next page
Page 4 Open Source Summer 2009
Database Magazine
The Extending You Skills section can be considered What didn't I like about the book? There are only
getting your Masters in Database Administration. a couple of things, all personal I'm sure. First, I
Just about every DBA will have to tackle improving really don't care too much for tables of options
queries and the tuning of indexes. The second most from the various tools. Most open source tools
important job of a DBA is monitoring performance are developed rather quickly and options change.
of your MySQL server. Don't let your users be your This could render portions of the book out of
first line of monitoring! Be proactive, there are date quickly.
plenty of open source monitoring tools available.
The most popular are discussed, as well as MySQL The other thing I noticed that wasn't mentioned
Enterprise and third party companies too. MySQL in the book was the community versions of
Data Dictionary is covered in in detail over 58 pages. MySQL supported by Open Query and Percona.
This is the most I've read in any book about the data The latter has their own storage engine, XtraDB
dictionary. and backup solution, XtraBackup.
Last but not least, most high performance MySQL All in all, this is a very solid book on
systems involve scaling up or out. MySQL administering MySQL. This book digs deeper,
Administrator's Bible covers the usual suspects of the experience of the authors really show. Well
replication, MySQL Cluster, and memcached. done Sheeri and Keith!
MySQL Proxy is initially covered and has an
appendix to expand on that information. MySQL Disclaimer: The publisher provided me with a
Proxy itself is worthy of its own book. (hint, hint :) ) copy of MySQL Administrator's Bible.
Two more appendices cover MySQL Functions and
Operators, and additional resources.
Take a listen to the Open Source
Even though this book targets MySQL 5.1/6.0, there Database podcast. The first 'cast will
is plenty of information that will apply to 5.0. If feature an interview with Brian Aker
you're still on 5.0, don't hesitate to pick up a copy. talking about the Drizzle project.
This will be a book that can stay with you as your
upgrade to 5.1 and beyond. The companion website It will available July the 24th @
http://www.wiley.com/go/mysqladminbible contains http://www.osdbzine.net/podcast
all the code from the book too, rounding out this fine
tome.
n
e
r
SQL logic is based on atoms of data. Its atomicity rule is fundamental. But a time period needs a pair of
values to represent it—when it began, and when it ended. And for most time period problems, you have
to worry about more than period starts and stops: a 12-hour period between two timestamps actually has
43,200 data points, one for each second in the period. Remove any second other than the first or last, and
the period no longer exists. There goes atomicity, and with it your hopes for simple time period queries.
Consider: you track resource booking periods in a MySQL database, and you need to report total daily
usage for any bookable resource.
If a booked period started at datetime pStart and ends at datetime pEnd, then today’s usage began at
12 o’clock last night if the booking period began before today, otherwise it starts at pStart. Likewise
if the booking extends past today, today’s usage ends at 12 o’clock tonight, otherwise it ends at pEnd.
So today’s usage from this booking begins at…
IF( pStart < CURDATE(), CAST(pDate AS DATETIME ), pStart )
it ends at…
IF( DATE(pEnd) > CURDATE(), CAST(ADDDATE(CURDATE(), 1) AS DATETIME), pEnd )
and usage is the difference in seconds. To unclutter queries, encapsulate this logic in a stored function:
SET GLOBAL log_bin_trust_function_creators=1;
DROP FUNCTION IF EXISTS DaySeconds;
CREATE FUNCTION DaySeconds( pStart datetime, pEnd datetime, pDate date ) RETURNS INT
RETURN UNIX_TIMESTAMP( IF( DATE(pEnd) > pDate, CAST(ADDDATE(pDate, 1) AS DATETIME), pEnd ))
UNIX_TIMESTAMP( IF( pStart < pDate, CAST(pDate AS DATETIME ), pStart ));
SELECT CEIL(DaySeconds('200811 10:05:00','200811 10:59:30','200811')/60) AS Mins;
++
| Mins |
++
| 55 |
++
To report daily usage over a date range, we need a calendar table. Here is one for the first 100 days
of 2008:
DROP TABLE IF EXISTS ints,calendar;
CREATE TABLE INTS(i int);
INSERT INTO ints VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
DROP TABLE IF EXISTS calendar;
CREATE TABLE calendar(date date);
SET @n=1;
INSERT INTO calendar SELECT ADDDATE('200811',@n:=@n+1) FROM ints a JOIN ints b;
And we need a table to hold resource bookings. Populate it with data for five bookings:
CREATE TABLE bookings( id INT PRIMARY KEY, resourceID int, startdate datetime,
enddate datetime );
INSERT INTO bookings VALUES
(1,1,'20080203 17:05','20080203 19:00'),
(2,1,'20080204 17:05','20080204 18:00'),
(3,1,'20080204 19:30','20080204 20:00'),
(4,1,'20080205 23:05','20080206 01:00'),
(5,2,'20080205 14:05','20080205 15:00');
Now to retrieve daily usage for resourceID=1, run DaySeconds() against a JOIN from
calendar to bookings:
SELECT
c.date AS date,
SUM( CEIL(( DaySeconds( b.startdate, b.enddate, c.date ) / 60 ))) AS Mins
FROM calendar c
JOIN bookings b ON c.date BETWEEN DATE(b.startdate) AND DATE(b.enddate)
WHERE b.resourceID = 1
GROUP BY c.date;
To report usage over a date range, join the calendar table to the above query, specifying a date range:
Not too hard so far, but time sum problems usually have more moving parts. In fact adding one simple
rule at least trebles complexity. For example, add the rule that no booking is allowed for time blocks
declared unavailable. Now we must subtract specified unavailable blocks from the calendar before
recording any bookings.
Can we subtract unavailable periods and sum booked periods in one query? Your first thought may be a
wish that MySQL implement the SQL DIFFERENCE operator. Failing that, you might consider a
DIFFERENCE workaround (see “Intersection and Difference” at
http://www.artfulsoftware.com/infotree/queries.php). But DIFFERENCE will not work for time
periods. Time periods aren’t atomic.
As usual for SQL time period arithmetic, the solution comes one slogging detail at a time:
1. Build a basic available-periods table for a date range. Call the table cal. Before we block out any
unavailable periods, each cal row specifies a whole day. Once blocks are entered, some days have
multiple rows.
DROP TABLE IF EXISTS cal;
CREATE TABLE cal( ID int PRIMARY KEY AUTO_INCREMENT, start DATETIME, end DATETIME);
INSERT INTO cal (start, end)
SELECT CAST(date AS DATETIME), CAST(ADDDATE(date,1) AS DATETIME) FROM calendar;
2. Create a table for unavailable periods. Call it uncal and for testing add a couple of unavailable
periods:
DROP TABLE IF EXISTS uncal;
CREATE TABLE uncal LIKE cal;
INSERT INTO uncal (start, end) VALUES
('20080203 12:00', '20080203 13:00'), ('20080203 23:50', '20080204 01:15');
3. For a full system we’d write uncal triggers to update the cal table whenever a blocked period is
added, edited or deleted. We don’t have space for that here. To illustrate the logic, we write a stored
procedure which removes one blocked time period from the cal table. The algorithm is:
(i) if the blocked period begins and ends on the same day, find the row whose start and end times
surround the block,
(ii) if no such row exists, the block is outside the table’s range, so do nothing. Otherwise update the
availability end time to the block start time, and insert a row with start time = block end time, and end
time = the found row’s original end time,
(iii) if the period spans more than a day, find the row with the nearest start time earlier than the block
start time, and if such a row exists, set that row’s end time to the block start time,
(iv) delete any rows whose start time is later than block start and whose end is earlier than block end,
(v) find the row with the earliest start time on the block end day, and set its start time to the block end
datetime:
DELIMITER |
DROP PROCEDURE IF EXISTS CalDel |
CREATE PROCEDURE CalDel( pStart datetime, pEnd datetime )
BEGIN
DECLARE vID int;
DECLARE vStart, vEnd, vNewEnd datetime;
IF DATE(pStart) = DATE(pEnd ) THEN
SELECT ID, start, end INTO vID,vStart,vEnd FROM cal WHERE start <= pStart AND end >= pEnd;
IF vID IS NOT NULL THEN
BEGIN
IF vStart < pStart THEN
UPDATE cal SET end=pStart WHERE ID=vID;
END IF;
IF vEnd > pEnd THEN
INSERT INTO cal (start,end) VALUES(pEnd,vEnd);
END IF;
END;
END IF;
ELSE
BEGIN
UPDATE cal SET end = pStart
WHERE start = (SELECT MAX(start) FROM cal WHERE start < pStart);
DELETE FROM cal WHERE start > pStart AND end < pEnd;
UPDATE cal SET start = pEND
WHERE start = (SELECT MIN(start) FROM cal WHERE DATE(start)=DATE(pEnd));
END;
END IF;
END;
|
DELIMITER ;
5. Write a function which, given available-period datetimes, booking start and end datetimes, and a given
date, returns usage for one booking on its start date if the given date is the same as the booking start
date, returns usage after the start date up to and including the given date if the given date is later than the
start date, and otherwise returns 0:
SET GLOBAL log_bin_trust_function_creators=1;
DELIMITER |
DROP FUNCTION IF EXISTS BookedSecs |
CREATE FUNCTION BookedSecs( cStart datetime, cEnd datetime, bStart datetime, bEnd datetime, theDate date )
RETURNS INT
BEGIN
DECLARE n, x int DEFAULT 0;
IF DATE(bEnd) > DATE(bStart) THEN
IF DATE(bStart) = theDate THEN
SET bEnd = CONCAT( ADDDATE(theDate,1), ' 00:00:00' );
ELSEIF DATE(bStart) > theDate THEN
RETURN 0;
ELSE
BEGIN
SET x = ( DATEDIFF( DATE(bEnd), DATE(bStart)) 1 ) * 86400;
SET bStart = CAST(theDate AS DATETIME);
END;
END IF;
END IF;
IF bStart >= cStart AND bStart < cEnd THEN
IF bEnd <= cEnd THEN
SET n = UNIX_TIMESTAMP( bEnd ) UNIX_TIMESTAMP( bStart );
ELSE
SET n = UNIX_TIMESTAMP( cEnd ) UNIX_TIMESTAMP( bStart );
END IF;
ELSEIF bStart < cStart AND bEnd <= cEnd THEN
SET n = UNIX_TIMESTAMP( bEnd ) UNIX_TIMESTAMP( cStart );
END IF;
RETURN x+n;
END ;
|
DELIMITER ;
6. Write a query using BookedSecs() to show that it returns booking period durations correctly:
SELECT
c.start AS PeriodStart, c.end AS PeriodEnd, b.startdate AS BookingStart, b.enddate AS BookingEnd,
CEIL(BookedSecs( c.start, c.end, b.startdate, b.enddate, DATE(b.startdate)) / 60) AS nToday,
IF( DATE(b.startdate) = DATE(b.enddate), 0,
CEIL(BookedSecs( c.start, c.end, b.startdate, b.enddate, DATE(b.enddate)) / 60)) AS nAfter
FROM cal c
JOIN bookings b ON DATE(b.startdate) = DATE(c.start) OR DATE(b.enddate) = DATE(c.start)
WHERE b.resourceID = 1
ORDER BY c.Start ;
+++++++
| PeriodStart | PeriodEnd | BookingStart | BookingEnd | nToday | nAfter |
+++++++
| 20080203 00:00:00 | 20080204 00:00:00 | 20080203 17:05:00 | 20080203 19:00:00 | 115 | 0 |
| 20080204 00:00:00 | 20080205 00:00:00 | 20080204 17:05:00 | 20080204 18:00:00 | 55 | 0 |
| 20080204 00:00:00 | 20080205 00:00:00 | 20080204 19:30:00 | 20080204 20:00:00 | 30 | 0 |
| 20080205 00:00:00 | 20080206 00:00:00 | 20080205 23:05:00 | 20080206 01:00:00 | 55 | 0 |
| 20080206 00:00:00 | 20080207 00:00:00 | 20080205 23:05:00 | 20080206 01:00:00 | 0 | 60 |
+++++++
8. Join the calendar to the aggregate result to obtain the required query:
SELECT c.date, CEIL(IFNULL(sums.Mins,0)) AS Mins
FROM calendar c
LEFT JOIN (
SELECT
DATE( c.start ) AS theDate,
SUM(
BookedSecs( c.start, c.end, b.startdate, b.enddate, DATE(b.startdate)) / 60 +
IF( DATE(b.startdate) = DATE(b.enddate), 0,
BookedSecs( c.start, c.end, b.startdate, b.enddate, DATE(b.enddate)) / 60 )) AS Mins
FROM cal c
JOIN bookings b ON DATE(b.startdate) = DATE(c.start) OR DATE(b.enddate) = DATE(c.start)
WHERE b.resourceID = 1
GROUP BY theDate
) sums ON c.date = sums.theDate
WHERE c.date BETWEEN '200821' AND '2008210'
ORDER BY c.date; Who Knows Where the Time Goes?
continues on the next page
Page 12 Open Source Summer 2009
Database Magazine
+++
| date | Mins |
+++
| 20080201 | 0 |
| 20080202 | 0 |
| 20080203 | 115 |
| 20080204 | 85 |
| 20080205 | 55 |
| 20080206 | 60 |
| 20080207 | 0 |
| 20080208 | 0 |
| 20080209 | 0 |
| 20080210 | 0 |
+++
Once you get the hang of time period arithmetic, you can safely combine some steps. But until SQL
incorporates advanced time operators, time period query development will continue to be slow and
painstaking.
Looking for some blogs to read? Here are some good starting places:
http://planet.mysql.com
http://planetdrizzle.org
http://www.planetpostgresql.org
Page 13 Open Source Summer 2009
Database Magazine
Hot backups with XtraBackup can be run with The Lab continues on the next page
Page 14 Open Source Summer 2009
Database Magazine
space to save the backup and the need for remote Installation of XtraBackup
storage to satisfy disaster recovery requirements.
Another use for streaming backups is the swift XtraBackup is available as either source code or
configuration of a server as a slave or for binaries. Binaries are available at
development and testing. http://www.percona.com/mysql/xtrabackup.
Currently the binaries cover Red Hat Enterprise
While XtraBackup does not block connections Linux versions four and five, a Debian package,
from interacting with the database(s) being backed FreeBSD, OS X and a generic GNU/Linux
up, the backup itself can cause problems because binary. All are 64-bit versions. Installation is as
of its use of server I/O. To help resolve this simple as installing the appropriate package, or
problem XtraBackup can throttle the amount of I/O unzipping the binary and copying the files to a
used by the backup program. good location.
The addition of incremental backups is a great Source compilation
feature to reduce your storage needs. For example,
many backup plans consist of a full backup one a Compiling the XtraBackup program is fairly easy,
week and then nightly incremental backups for the although a little different than a typical
other six days of the week. With a large database compilation. To begin you will need the source
this can save a tremendous amount of space and code for both XtraBackup and MySQL Server.
yet give you the security of knowing that backups The source code for XtraBackup is available at
are being performed every night. https://launchpad.net/percona-xtrabackup while
You also need to copy the XtraBackup source directory to the innobase directory in the MySQL source
code. Once this is done you can patch the innobase files with the included patch:
mordor ~/mysql5.0.83] patch p1 < fix_innodb_for_backup.patch
There are minimal prerequisites for the build. I needed the gccc++ and ncursesdevel rpm
packages on a stock Red Hat 5.3 64-bit server installation. Once this is done you should begin the build
process:
mordor ~/mysql5.0.8] ./configure; make
When this part of the process completes (which could take a while) you will need to finish up the
compilation by running make in the xtrabackup directory:
mordor ~/mysql5.0.83] cd innobase/xtrabackup0.8src/
mordor ~/mysql5.0.83/innobase/xtrabackup0.8src] make
Once the compilation process is completed successfully, you will have a binary executable
xtrabackup in this directory. In addition, there is the innobackupex script. Copy both of these to
the desired location on your system and you are ready.
Putting XtraBackup Into Use
There are two parts to any backup scenario: the backup and the restore. They are equally important! Do
not forget the restoration process and you should always be performing test restores. This is especially
true if you are just getting started with XtraBackup.
Backup
The primary function of XtraBackup is to perform a backup. Doing so proves to be fairly easy. In the
following example the innobackupex script is used in order to show how XtraBackup can interact
with MyISAM tables. A username/password for access to the MySQL server must be specified along
with a directory in which to store the backups:
bash3.2$ /usr/bin/innobackupex1.5.1 user=root password=pa$$W0rd
/backup/full_backup
innobackupex: Using mysql Ver 14.12 Distrib 5.0.45, for redhatlinuxgnu (x86_64)
using readline 5.0
innobackupex: Using mysql server version 5.0.45
With the innobackupex script, by default, the backup is created in a subdirectory that corresponds to
the date/time when the backup was executed in the specified backup directory. Notice that the backup
records the last checkpoint used by InnoDB. This could be used to create an incremental backup where
the incremental backup would begin from the point where this full backup ended. This checkpoint is also
stored in the file xtrabackup_checkpoints in the root of the specified backup directory.
Restore
You can create a backup by either running the innobackupex script or the xtrabackup command.
Of course, you need to keep in mind that the xtrabackup command will ONLY backup InnoDB or
XtraDB tables. In either case, in order to perform a restore, you must run xtrabackup a second time
to “prepare” the backup files. Preparing the backup files is done by using the prepare option with
xtrabackup. This step recreates the ib_logfile* files. Once this step is complete, you are ready
to move the backup files into place. The following shows a restore of the previous full backup using the
innobackupex script. Once the restore is done it is moved into place using the copyback option:
targetdir=/backups/full_backups/20090714_191503
[notice (again)]
If you use binary log and don't use any hack of group commit,
the binary log position seems to be:
InnoDB: Last MySQL binlog file position 0 530, file name ./mysqlbin.000003
When performing a restore, it will be necessary to remove any old data files. Of course the MySQL
daemon must be shut down before this process begins. Once the restore is done you should check the
permissions of the restored directories, modify as necessary and then you should be able to start up the
MySQL server. The following example show the copyback option being used which will restore the
backup to the datadir.
At this point you just start up the MySQL server and you have completed a full backup and restore. In
the example the database backup contained the sakila and employees sample databases:
If you run the innobackupex and xtrabackup programs as the same user who runs the MySQL
server daemon you should not have permissions issues. While this can take more planning up front, it
will create a smoother restore process.
Next issue we will cover the streaming, I/O throttling and incremental backup options for xtrabackup.
Until then, keep working on those lab exercises . . . midterms are coming!
Resources
It's not everyday that something old becomes new. For Open Source Database Magazine, it is both
something new and something old. So what better way to ring in this era than to discuss one of the most
popular open source databases, who also happens to be releasing a new major release. Yes, July 2009
marks the release of Postgres 8.4. Once again, the Postgres community has reached out to hundreds of
developers, compiling close to 300 improvements to the system. Yes, there's something for everyone, but
for now, let's take a look at some of the major improvements that Postgres 8.4 has in store.
CTE's and Window Functions
While much of the focus of 8.4 was working to make administrators lives easier, the most visible change
for SQL oriented developers is the introduction of Common Table Expressions (CTE) and Windowing
Aggregates. For those not familiar with them, CTE's provide several key features that are not easily be
reproduced, or poor performing otherwise.
1. Allow for referencing a derived subquery several times within a query, without the need to execute
that subquery several times.
2. Make it easier to group by columns based on function calls, expressions, or other expensive scenarios.
3. Works as a substitute for Views in cases where storing a view definition as DDL is not required.
4. Allow for the creation of recursive queries
The basic syntax of a recursive query uses the SQL standard WITH syntax, which looks like this:
pagila=# WITH epic_films AS (
select
film_id,
array_agg(first_name ||' '||last_name) as featuring
from
film
join film_actor using (film_id)
join actor using (actor_id)
group by film_id
)
select
*
from
epic_films
where
array_upper(featuring,1) > 12 ;
film_id | featuring
+
606 | {"HELEN VOIGHT","KEVIN BLOOM","TIM HACKMAN","GOLDIE BRODY","ANGELINA ASTAIRE","SPENCER
PECK","WALTER TORN","SIDNEY CROWE","GINA DEGENERES","RUSSELL BACALL","DAN STREEP","ROCK DUKAKIS","AUDREY
BAILEY"}
714 | {"JENNIFER DAVIS","LUCILLE TRACY","BURT DUKAKIS","REESE KILMER","CARMEN HUNT","JUDE
CRUISE","ANGELA HUDSON","SPENCER DEPP","HARRISON BALE","HARVEY HOPE","NICK DEGENERES","DEBBIE
AKROYD","THORA TEMPLE"}
508 | {"WOODY HOFFMAN","VAL BOLGER","REESE KILMER","JULIA BARRYMORE","MENA TEMPLE","CHRISTIAN
NEESON","BURT POSEY","SCARLETT DAMON","WALTER TORN","CAMERON ZELLWEGER","LUCILLE DEE","FAY
WINSLET","JAYNE NOLTE","MENA HOPPER","JULIA ZELLWEGER"}
146 | {"JOHNNY LOLLOBRIGIDA","LUCILLE TRACY","ELVIS MARX","SISSY SOBIESKI","VAL BOLGER","SUSAN
DAVIS","RUSSELL TEMPLE","AL GAR
LAND","NICK DEGENERES","OLYMPIA PFEIFFER","LISA MONROE","HUMPHREY GARLAND","ROCK DUKAKIS"}
249 | {"NICK WAHLBERG","JODIE DEGENERES","CARMEN HUNT","CAMERON WRAY","MICHELLE MCCONAUGHEY","SEAN
WILLIAMS","BEN WILLIS","GREG CHAPLIN","MORGAN HOPKINS","DARYL CRAWFORD","MORGAN WILLIAMS","IAN
TANDY","REESE WEST"}
87 | {"ED CHASE","JENNIFER DAVIS","UMA WOOD","FRED COSTNER","KIRSTEN PALTROW","SANDRA PECK","DAN
HARRIS","RAY JOHANSSON","KENNETH PESCI","CHRIS BRIDGES","WARREN JACKMAN","HUMPHREY GARLAND","AUDREY
BAILEY"}
188 | {"SISSY SOBIESKI","WOODY JOLIE","MEG HAWKE","RUSSELL BACALL","MORGAN MCDORMAND","ALBERT
NOLTE","CATE HARRIS","RUSSELL TEMPLE","VIVIEN BASINGER","HARVEY HOPE","WILL WILSON","ALAN
DREYFUSS","GENE MCKELLEN"}
(7 rows)
What we are doing here is pre-computing the value of our movies, getting the film id and the list of
actors in each film, and then selecting from that set all of the films with more than 12 actors. Of course
we didn't have to use a CTE for this, however this does help seperate the definition of our data from the
selectivity of the query. Also it is important to remember that the information derived in the WITH
portion of the query is only calculated once, this becomes even more important if you want to do
multiple joins to the CTE, or when you get into tricks like recursive querying.
Recursive queries are accomplished by adding the RECURSIVE keyword to the WITH statment. This
addition allows the statement to refer back to its original inputs. This simple example from the Postgres
documentation shows how it works:
WITH RECURSIVE t(n) AS (
VALUES (1)
UNION ALL
SELECT n+1 FROM t WHERE n < 100
)
SELECT sum(n) FROM t;
sum
5050
(1 row)
Postgres 8.4 continues on the next page
Page 22 Open Source Summer 2009
Database Magazine
Here we define our WITH input, but then refer back to it, recursing through all the values between 1 and
100, summing the values as we go. As you can imagine, this type of functionality makes dealing with
hierarchical data much simpler than using standard SQL. In fact, it enables some operations that simply
aren't possible without functional programming.
Of course that is only part of the picture. The other half of the picture is the introduction ofWindowing
Functions. Window Functions allow you to run either functions or aggregates across a set of rows. That
sounds nice, but what does it really mean? Well to get an idea of what you can do with a window
function, let's first look at this bit of SQL, intended to produce a list of the top three sellers for each store
in our sample database.
pagila=# SELECT * FROM (select c1.first_name, c1.last_name, c1.store_id, p1.total,
(select 1+count(*) from customer c2 join (select customer_id, sum(amount) as
total from only payment group by customer_id) p2 using (customer_id) where
c2.store_id = c1.store_id and p2.total > p1.total as rank from customer c1 join
(select customer_id, sum(amount) as total from only payment group by customer_id)
p1 using (customer_id) ) x WHERE x.rank <=3 ORDER BY x.store_id, x.rank;
As you can imagine, queries like this tend to be slow, both in execution time, and in the time it takes to
run them. Luckily, we now have window functions. Let's try this version instead:
pagila=# select * from (
with cte as (
select first_name, last_name, store_id, sum(amount) as total from payment join
customer using (customer_id) group by first_name, last_name, store_id
)
select first_name, last_name, store_id, total, rank() over (partition by store_id
order by total desc) from cte ) x where rank <= 3;
In this query, we first set up our CTE to derive the name, store, and purchase amounts for each of our
customers once, and then we apply the rank() function across those rows, filtering by store, to
determine our top 3.
Again, this is just one example, there are many other window functions beyond rank, including
row_number, first and last values, lead and lag, and several others. I encourage you to look
through the Postgres documentation for more information.
The bottom line here is that the combinations of CTE's, Recursive Queries, and Window Functions
opens up a whole range of new applications which have not been available to mainstream open source
databases until recently. Also, for those running commercial systems who may have used some of these
features previously, it opens the possibilities of moving toward an open source solution that much more.
Making Administration Easier
The other big area that was worked on was that of easing Postgres administration. While in comparison
to commercial systems, Postgres is fairly easy to administor, there are still a few areas where DBA's tend
to run into trouble, and version 8.4 will really help to make things easier on them.
The most signifigant change in this area was the rewrite of Postgres' free-space-map implementation.
Postgres, like other MVCC based databases, is implemented so that when you update a row in the
database, you can write the new row without removing the old row from the system, allowing others to
select the old row until you commit your changes. On many systems, the commit action signals for the
database to clear the old row from disk, and possibly write out the information elsewhere. Postgres on
the other hand has taken the approach that there is no need to issue the extra writes operations at the time
you commit, instead it simply marks the row as no longer needed, and then leaves the clean up of these
rows to a background process, rather than in the critical path of your updates. This background process
is called vacuuming, and the thing that keeps track of what needs to be vacuumed is the free-space-map.
Prior to 8.4, this map was kep in shared memory, which forced artificial limits on the amount of space
that could be tracked, and when you ran out of space, vaccuming becomes very ineffective, leaving lots
Postgres 8.4 continues on the next page
Page 24 Open Source Summer 2009
Database Magazine
of wasted space in your tables and indexes. These limits were defined by two configuration variables
that could only be modified with a restart of the database, something not always acceptable in todays
database environments. What is worse, the variables were often difficult to size correctly, which made
everybodys life that much harder.
But all of that is gone in version 8.4. The new implementation no longer uses shared memory, so it can
be sized dynamically based on the amount of data in the system, meaning you no longer have to tune it
at all. It also completely removes the configuration variables. But wait, that's not all! In the old days,
when you vacuumed a table, you had to scan the entire table in order to clean-up all the space. However,
the new implementation is much more efficient, keeping track of the parts of your table which have been
modified, and only scanning those pieces when the time comes for vacuuming. For folks running classic
applications like forum systems, where you have a large table of posts, with active new records, but also
a large number of old, unchanging records, you will see dramatic increases in your vacuum
performance.
Of course Postgres has added many other new features for database administrators. These include
enhanced query and function execution statistics, improved deadlock information, the ability to kill
connections using SQL, improved statistics monitoring performance, column specific privileges, and
many others. The point remains the same, Postgres is getting more advanced, but also getting much
simpler to use, and requiring very little maintanance for most setups.
Improved Upgrade Support
The last thing I want to highlight will not be an issue for new deployments, but is a nice improvement
for existing deployments. Postgres has a fantasic model for bugfix and support releases; they never
backport features, sticking strictly to the idea of only making changes in stable branches for the purpose
of bugfixes or security fixes. Upgrading a minor version of Postgres typically involves nothing more
than putting the new binaries on the system and restarting your database. Great for long term system
management.
However, as great as this is, the situation for major version upgrades (say 7.4 to 8.0, or 8.3 to 8.4) has
always had some pretty rough edges around it. Until now there have been generally two ways to do a
major version upgrade in Postgres; either do a logical dump and restore of the data or setup up cross-
versional replication. While these are both possible, they have the downside of requiring quite a bit of
extra hardware to keep the upgrade process possible; and for significantly sized databases this
requirement was too much.
The first improvement in this area comes in the form of a multi-threaded pg_restore tool. Prior to
8.4, when you did a logical dump a restore, the pg_restore process would be single threaded,
loading each table at a time, and then creating each index one after the other. Starting in Postgres 8.4,
you can now split this process up, taking advantage of multiple spindles or cpus, to load tables in
parralel, greatly reducing the overall time needed to restore most databases.
Of course for larger databases even this improvement isn't enough, so there has been renewed effort put
into an in-place migration tool, known as pg_migrator. The idea behind this tool is to allow database
administrators to upgrade their systems without having to move any of the data files on disk. What it
does behind the scenes is a little bit of black magic; basically dumping out the schema from your old
database, creating a new installation on disk, and then re-mapping all of your datafiles on disk into the
new systems database catalogs. Once it's done, it informs you of any additional steps you might need to
make, but otherwise you're ready to start up your database and push forward. Of course pg_migrator
gives you options to do a dry run, or to copy the data files if you want to keep a backup.
Conclusions
Really this is just the tip of the iceberg. There are more improvements in just about every area:
performance, tools, administration, scalability; the list goes on. If you want to find out more, check out a
much more detailed list in the release notes (http://www.postgresql.org/docs/8.4/interactive/release-8-
4.html), or better yet, head over to http://www.postgresql.org/download/, grab the latest download, and
play along.
Transaction Log
I hope you have enjoyed this inagural issue of Open Source Database Magazine. The database market
in general is in some turmoil with new technologies arising to compete in new markets. An example of
this is the Drizzle project which is the brainchild of Brian Aker. It was designed with the "target for the
project is web infrastructure backend and cloud components." This comes from the FAQ on the Drizzle
website. Cloud computing wasn't even commercially available before Amazon's web services in 2005.
Now an open source database is poised to leverage this technology. The fall issue of OS DB Magazine
will include a feature article on Drizzle and our first podcast (scheduled for release July the 24th) will
include an interview with Brian Aker.
It's a good time to be involved with open source databases. Companies are seeking to leverage open
source technology because of the reliability, the cost benefits, the ability to resolve issues themselves if
necessary and the general freedom inherent in open source software. This magazine is just one way you
can broaden your skillset. Much to think about. As Bob Dylan once said "The times they are a changin".
Don't get run over by the change.
A special thanks to the contributing authors. The change in scope from "just MySQL" to all open source
databases has been some work and while it wasn't perfect I have no complaints. It is people like our
contributing authors who help make the open source community a great place to be involved.
And a very big thank you for reading the magazine. I hope you enjoyed it.
Keith
bmurphy@paragoncs.com