Scaling PostgreSQL PgDay EU 2009

Scaling PostgreSQL Under Fire
Gavin M. Roy
Chief Technology Officer
myYearbook.com
pgDay Europe 2009

About myYearbook.com
2007 - 100M Page views per Month
2009 - 1.5B Page views per Month
Top 5 Social Network in the United States as measured by

Hitwise
Top 25 trafficked site in the United States as measured by

ComScore
99% Uptime
The 1am Phone Call
Growth is a double-edged sword.
Database Project Considerations
• Good Schema
• Is it designed to Scale?
• Is it designed to be Maintained?
• Good Hardware
• Will you have enough space for unexpected growth?
• Will it be fast enough to handled additional load?
• Will it be stable in production under load?

Database Project Considerations
• Good Planning
• What will you do when there is a failure?
• How long will it take to recover?
• What kind of failures will you have?
• How will you handle upgrades and downtime?

Internet Startup Growth Cycle
1. Prototype
2. Launch
3. Re-Engineer (Fix problems)
4. Add new functionality
5. Repeat Steps 3 and 4

Internet Startup Growth Cycle
• Steps 1 & 2
• Limited Budget
• Limited Time & Resources
• Steps 3 & 4
• Increased Budget
• Limited Time & Resources

“The best laid schemes o’ mice an’ men
Gang aft agley”
- Robert Burns, To a Mouse

Prepare for Growth
• Hardware
• CPU Horsepower based upon need
• Disk based upon need
• RAM based upon budget.
• Get 2
“The Cloud”
Plan for Growth Concurrency & Data Growth
Concurrency in PostgreSQL
• Each running PostgreSQL connection carries overhead
• More connections == Slower queries
• Pool your connections
• pgBouncer - light weight libevent based pooling daemon
• pgPool II - Does pooling and much, much more
• Language specific pooling

Table Partitioning
• Supported in PostgreSQL as of 8.1
• Excellent method for maintaining data
• Allows for removal of aged data without bloat
• Focused SELECTS while allowing ad-hoc SELECT across all partitions

Vertical Partitioning Data
• Isolate application data in different database servers
• Replicate common data needed for joins

Horizontally Partitioning Data
• Spread table data across multiple database servers
• Use a CRC or Hashing algorithm to determine server location of data
• Roll your own in your client code
• Use pl/Proxy
• Plan for growth, use multiple server slots per server

“Anything that can possibly go wrong, does.”
- Jack Sack
Recovering from Failures
• Daily backup is not enough
• Disaster recovery option
• Replicate data for failover and maintenance
• via Replication Tools like Londiste, Bucardo and Slony
• via Warm Standby (PITR Log Shipping)
• Hot standby? Maybe in 8.5

PostgreSQL is YeSQL Saying no to the naysayers
No[SQL] is Not an Option*
• Key/Value store databases are not new, they’re just trendy
• Facebook still has to use Memcached even though it developed the

Cassandra distributed key/value database.
• Same type of developers who jumped on MySQL because “it was fast”
• Data without schema has limited use
• Ad-hoc reporting is a key business value that is realistically unachievable in

non-structured data
• Find the balance of speed and simplicity with normalization

How to kill PostgreSQL performance.
Inadequate Hardware
• Not enough RAM
• Not enough disk bandwidth
• Depending on topology
• Not enough disks
• Not enough controllers
• Slow communication bus
• Over-saturated CPU
Lock Contention == Death
• Nothing brings a server to its knees faster than a long running Exclusive Lock
• Even high share count locks go hand-and-hand with slowness
• Reduce lock contention
• Use partitioning schemes
• Use concurrent operations for maintenance related activities

Bloat == Slower Death
• MVCC is both integral to PostgreSQL performance and impacts performance
• Enter Heap Only Tuples (HOT)
• Updates to non-indexed columns in rows can re-use the same tuple
• Dead tuples can be re-used
• Index Bloat
• Massive index bloat can occur in high-write transaction databases
• Address by concurrently reindex tables without locking*

“Knowledge is Power”
- Sir Francis Bacon

Reacting can be Proactive
• check_postgres.pl by Greg Mullane
• Nagios plugin
• Bloat
• Management activity such as last analyze and vacuum
• wal file count, txid wrap around, sequence exhaustion
• many other items it checks

Be Trendy
• Know your database behavior

over time
• Predict future issues and

behavior
• Identify issues as they occur
• Review impact of maintenance
• Know if your heap use exceeds

your index use
• Look for daily change

Trending and Analysis at myYearbook.com
• Cacti - http://www.cacti.net
• Posuta - http://code.google.com/p/posuta/
• pgFouine - http://pgfouine.projects.postgresql.org/
• Staplr - http://github.com/gmr/staplr
Fin
• Questions?
• Follow me on Twitter: http://twitter.com/Crad
• Leave feedback: http://2009.pgday.eu/feedback

Scaling PostgreSQL PgDay EU 2009

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Scaling PostgreSQL PgDay EU 2009

Uploaded by

Copyright:

Available Formats

Scaling PostgreSQL Under Fire

pgDay Europe 2009

2007 - 100M Page views per Month

2009 - 1.5B Page views per Month

Top 5 Social Network in the United States as measured by

Top 25 trafficked site in the United States as measured by

• Will you have enough space for unexpected growth?

• Will it be fast enough to handled additional load?

• Will it be stable in production under load?

• What will you do when there is a failure?

• How long will it take to recover?

• What kind of failures will you have?

• How will you handle upgrades and downtime?

3. Re-Engineer (Fix problems)

4. Add new functionality

5. Repeat Steps 3 and 4

• Limited Time & Resources

• Limited Time & Resources

- Robert Burns, To a Mouse

• CPU Horsepower based upon need

• Disk based upon need

• RAM based upon budget.

• Each running PostgreSQL connection carries overhead

• More connections == Slower queries

• Pool your connections

• pgBouncer - light weight libevent based pooling daemon

• pgPool II - Does pooling and much, much more

• Language specific pooling

• Supported in PostgreSQL as of 8.1

• Excellent method for maintaining data

• Allows for removal of aged data without bloat

• Focused SELECTS while allowing ad-hoc SELECT across all partitions

• Isolate application data in different database servers

• Replicate common data needed for joins

• Spread table data across multiple database servers

• Use a CRC or Hashing algorithm to determine server location of data

• Roll your own in your client code

• Plan for growth, use multiple server slots per server

• Daily backup is not enough

• Disaster recovery option

• Replicate data for failover and maintenance

• via Replication Tools like Londiste, Bucardo and Slony

• via Warm Standby (PITR Log Shipping)

• Hot standby? Maybe in 8.5

• Key/Value store databases are not new, they’re just trendy

• Facebook still has to use Memcached even though it developed the

• Data without schema has limited use

• Ad-hoc reporting is a key business value that is realistically unachievable in

• Find the balance of speed and simplicity with normalization

• Not enough RAM

• Not enough disk bandwidth

• Not enough disks

• Not enough controllers

• Slow communication bus

• Even high share count locks go hand-and-hand with slowness

• Reduce lock contention

• Use partitioning schemes

• Use concurrent operations for maintenance related activities

• MVCC is both integral to PostgreSQL performance and impacts performance

• Enter Heap Only Tuples (HOT)

• Updates to non-indexed columns in rows can re-use the same tuple

• Dead tuples can be re-used

• Massive index bloat can occur in high-write transaction databases

• Address by concurrently reindex tables without locking*