You are on page 1of 38

pgpool: Features and

Development

Tatsuo Ishii
pgpool Global Development Group
SRA OSS, Inc. Japan
Agenda
● Developers
● History
● Existing pgpool project
● Ongoing pgpool-II project
● Demonstration
Who are we?
● pgpool Global Development Group
– Tatsuo Ishii(SRA OSS, Inc. Japan)
– Devrim Gunduz(Command Prompt, Inc.)
– Yoshiyuki Asaba(SRA OSS, Inc. Japan)
– Taiki Yamaguchi(SRA OSS, Inc. Japan)
– pgpoo-II development team
● In addition to Tatsuo, Yoshiyuki and Taiki:
– Tomoaki Sato, Yoshiharu Mori, Kaori Inaba (all from SRA
OSS, Inc. Japan)

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 3
Developers!
Tomoaki
Communication Kaori
manager Project manager
System DB

Taiki Yoshiharu
PCP Query rewriting
Query Cache Parallel execution
pgpool engin
developer

Yoshiyuki
Tatsuo Project leader
Enhance Our Boss pgpool
pgpool-I “Green Turtle” developer
pgpool
developer

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 4
pgpool-I: the history
● V0.1: Started as a personal project
(2003/6/27)
● V1.0: Synchronous replication (2004/3)
● V2.0: Load balance (2004/6)
● V3.0: pgpool Global Development
Group(2006/2)

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 5
Why pgpool?
● No general purpose connection pooling
software was available
– Java has its own, but PHP does not...
● No small to mid scale/light weight
synchronous replication tool was
available

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 6
pgpool process architecture

pgpool
parent process

pre-fork

pgpool PostgreSQL
child process backend process

pgpool.conf

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 7
pgpool functionality:
connection pooling
● Reduce connection overhead
● Limit maximum number of connections to
the PostgreSQL backend
– New incoming connections are queued in
the kernel if all pgpool processes are busy

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 8
How does connection pooling
work?(1)
pgpool

PostgreSQL
user1/db1

user1/db1
PostgreSQL
user1/db1

user1/db1
PostgreSQL

user1/db1
user1/db1 PostgreSQL

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 9
How does connection pooling
work?(2)
pgpool
user1/db1
PostgreSQL
user2/db2 user2/db2

PostgreSQL
user3/db3 user2/db2

user3/db3
PostgreSQL
user3/db3 user2/db2

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 10
pgpool understands
frontend/backend protocol
● Pgpool is transparent to both applications
and PostgreSQL
● Virtually no modifications to applications
are needed
● No special APIs are needed
● Can be used with existing language APIs
● Can be more efficient than libpq because
of smaller controlling granuality

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 11
Limitations of current
implementation
● resetting pooled connection may not be
perfect(need modification to applications)
– temp tables
– need help from PostgreSQL
● SSL is not supported
– fall back to non SSL mode
● no pg_hba.conf like IP based
authentication

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 12
pgpool functionality: replication
● Queries are duplicated and sent to
PostgreSQL servers

PostgreSQL
query

pgpool

query
PostgreSQL

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 13
Dead lock problem
master secondary
session 1 session 2 session 1 session 2
t
lock
wait
lock

lock
wait
lock

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 14
“Strict” mode in replication
to avoid deadlock problem
Query Query PostgreSQL
pgpool packet
PostgreSQL

Wait until PostgreSQL
pgpool something
returns PostgreSQL

PostgreSQL
pgpool Query
PostgreSQL

reply back PostgreSQL
pgpool Wait until
the result something PostgreSQL
returns
2006/07/09 Tronto Copyright(c) 2006 pgpool DG 15
pgpool functionality: load
balanace
● SELECT queries are sent to randomly
chosen backend
● The ratio for load balancing can be
changed

PostgreSQL

pgpool

PostgreSQL

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 16
Limitations of current
implementation in replication
● CURRENT_TIMESTAMP and server- dependent-
value-returning-functions cannot be replicated
● MD5 and crypt authentication are not supported
● Sequences and SERIAL needs table locking if
there are more than 1 connections
● Functions having side effects cannot be load
balanced

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 17
pgpool-II project
● Information-Technology Promotion
Agency, Japan (IPA: http://www.ipa.go.jp)
granted project
● Started in February 2006, expected to
release the first version in September
2006 under BSD license
● Features including parallel query and
enhancement to pgpool
● Successor to pgpool

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 18
Goal of pgpool-II project
● Implement parallel query processing
● Enhance pgpool
– Allow to have more than 2 DB nodes
– More precise control using shared memory
– Easy to manage
● Control port/protocol
● Detailed statistics on node status
● GUI management tool

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 19
pgpool-II architecture overview

Client
SQL Query
Parser Cache

Communication
pgpoolAdmin Query pgpool
Manager
Rewriting Catalog

shared Replication/load Parallel
PCP command
memory balance execution PostgreSQL
PCP lib manager engine engine

pgpool-II System DB

PostgreSQL
DB node
2006/07/09 Tronto Copyright(c) 2006 pgpool DG 20
pgpool-I and pgpool-II mode

replication
parallel query
load balance
fail over
virtually compatible with pgpool

pgpoo-I pgpoo-II
mode mode

1000 1000 1000 3000
2000 2000 2000 4000
3000 3000
4000 4000

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 21
Table partitioning control
CREATE TABLE dist_def (
dbname TEXT, -- database name
schema_name TEXT, --schema name
table_name TEXT, -- table name
col_name TEXT, -- key col name
col_list TEXT[], -- col names
type_list TEXT[], -- col types
dist_def_func TEXT, -- function name
PRIMARY KEY (dbname, schema_name, table_name)
);

INSERT INTO dist_def VALUES ('y-mori','public','accounts','aid',
ARRAY['aid','bid','abalance','filler'],
ARRAY['integer','integer','integer','character(84)'],'dist_def_accounts'
);

CREATE OR REPLACE FUNCTION dist_def_accounts (val ANYELEMENT)
RETURNS INTEGER AS '
SELECT CASE WHEN $1 >= 0 and $1 < 100001 THEN 0
WHEN $1 >= 100001 and $1 < 200001 THEN 1
WHEN $1 >= 200001 and $1 < 300000 THEN 2
END' LANGUAGE
2006/07/09 Tronto SQL; Copyright(c) 2006 pgpool DG 22
Simple parallel query
SELECT * FROM accounts
WHERE aid = 1000;
SELECT * FROM accounts
WHERE aid = 1000; PostgreSQL

pgpool

PostgreSQL

SELECT * FROM accounts
WHERE aid = 1000;

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 23
Complex query example
SELECT * FROM accounts SELECT * dblink('con',
WHERE aid = 1000 'SELECT pool_parallel('SELECT * FROM accounts
ORDER BY aid; WHERE aid = 1000')') AS foo(...)
ORDER BY aid;

pgpool
System DB
PostgreSQL

pgpool SELECT pool_parallel('SELECT * FROM accounts
WHERE aid = 1000')')

SELECT * FROM accounts
PostgreSQL WHERE aid = 1000;

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 24
DML
● INSERT recognize partition key value in a
query and INSERT into appropreate DB
node
● UPDATE/DELETE simply issues the same
query to all DB nodes

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 25
INSERT
call Syetm DB
INSERT INTO accounts(aid, bid) SELECT
VALUES(500,100); return dist_def_accounts(500);
DB node = 1

INSERT

DB node 0 DB node 1 DB node 2

aid = 0-499 aid = 500-999 aid = 1000-1499

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 26
Query Cache
● Caches query result
● Caches can be validated by pgpoolAdmin

CREATE TABLE pgpool_catalog.query_cache (
hash TEXT, -- query string(MD5 hashed)
query TEXT, -- query string
value bytea, -- query result(RowDescription and
DataRow packets)
dbname TEXT, -- database name
create_time TIMESTAMP WITH TIME ZONE, -- cache creation time
PRIMARY KEY(hash)
);

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 27
Using parallel query and
replication together

pgpool-II
with parallel query
Parallel query
layer

pgpool-II pgpool-II
with replication with replication
Replication
and load
balance
layer

PostgreSQL PostgreSQL PostgreSQL PostgreSQL

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 28
pgpoolAdmin
● Web based pgpool
management tool
● Apache/PHP/Smarty
● functions
– Stopping pgpool
– Switch over
– Monitoring connection pool status
– Monitoring process status
– Editing configuration file
– Query Cache management

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 29
Benchmarks
● Hitachi Blade Symphony BS1000
– 10 blades
– Xeon 3.0GHz x 1
– 1GB Mem
– UL320 SCSI Disks
● Cent OS 4.3
● PostgreSQL 8.1.4/pgbench
● Please note that these results are measured on
pre-alpha version of pgpool-II and maybe slightly
different from the future official release version
2006/07/09 Tronto Copyright(c) 2006 pgpool DG 30
Sequencial Scan case(mid size)
● scale factor = 90
PostgreSQL pgpool-II

1.3 (90M rows)
1.2
1.1 ● SELECT * FROM
1
0.9 accounts WHERE
0.8
0.7
aid = :aid
TPS

0.6
0.5 ● pgpool-II is 7 times
0.4
0.3 faster at the best
0.2
0.1
(32 concurrent
0
1 2 4 8 16 32
users)
concurrent users

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 31
Index Scan case(mid size)
● scale factor = 90
PostgreSQL pgpool-II

3750 (90M rows)
3500
3250
3000
● SELECT * FROM
2750
2500
accounts WHERE
2250
2000
aid = :aid
TPS

1750
1500
1250
● pgpool-II is 9 times
1000
750
faster at the best
500
250
(16 concurrent
0
1 2 4 8 16 32
users)
concurrent users

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 32
Index Scan case(large size)
● scale factor = 900
PostgreSQL pgpool-II

1300 (900M rows)
1200
1100 ● SELECT * FROM
1000
900 accounts WHERE
800
700
aid = :aid
TPS

600
500 ● pgpool-II is 18
400
300 times faster at the
200
100
best (32
0
1 2 4 8 16 32
concurrent users)
concurrent users

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 33
complex query case
● scale factor = 900
PostgreSQL pgpool-II

45
(900M rows)
40 ● SELECT a1.abalance
35
FROM accounts as a1
30
,accounts as a2
25
WHERE a1.aid = :aid1
TPS

20
and a2.aid = :aid2
15

10 ● pgpool-II is faster
5 only when at 8
0
1 2 4 8 concurrent users
concurrent users

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 34
pgpool-II restrictions
● SQL restrictions
– COPY is not supported
– SELECT INTO, INSERT INTO ... SELECT is not supported
– Extended protocol is not supported
– INSERT requires explicit values if target column is the key for data
patitioning
– UPDATE with WHERE clause including function call is not supported
– CREATE TABLE/ALTER TABLE requires pgpool restarting
● Transactions
– If a DB node fails during INSERT/UPDATE/DELETE, data consistency will
be broken
– SQL commands via dblink will be executed as separate transactions

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 35
Future plans(TODO)
● More intelligent query rewriting
● Remove SQL restrictions
● Employ two phase commit to keep consistency
among DB nodes. However this technique can
be used only for queries outside transaction
block since PREPARE TRANSACTION closes
current transaction block
● More intelligent cache validation

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 36
References
● pgpool page
– http://pgpool.projects.postgresql.org/
● pgpool-II page
– English page
● http://pgpool.projects.postgresql.org/pgpool-II/en/
– Japanese page
● http://pgpool.projects.postgresql.org/pgpool-II/ja/

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 37
Thank you!

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 38