You are on page 1of 29

Scaling PostgreSQL Under Fire

Gavin M. Roy
Chief Technology Officer
myYearbook.com

pgDay Europe 2009


About myYearbook.com

2007 - 100M Page views per Month

2009 - 1.5B Page views per Month

Top 5 Social Network in the United States as measured by


Hitwise

Top 25 trafficked site in the United States as measured by


ComScore

99% Uptime
The 1am Phone Call
Growth is a double-edged sword.
Database Project Considerations

• Good Schema

• Is it designed to Scale?

• Is it designed to be Maintained?

• Good Hardware

• Will you have enough space for unexpected growth?

• Will it be fast enough to handled additional load?

• Will it be stable in production under load?


Database Project Considerations

• Good Planning

• What will you do when there is a failure?

• How long will it take to recover?

• What kind of failures will you have?

• How will you handle upgrades and downtime?


Internet Startup Growth Cycle

1. Prototype

2. Launch

3. Re-Engineer (Fix problems)

4. Add new functionality

5. Repeat Steps 3 and 4


Internet Startup Growth Cycle

• Steps 1 & 2

• Limited Budget

• Limited Time & Resources

• Steps 3 & 4

• Increased Budget

• Limited Time & Resources


“The best laid schemes o’ mice an’ men
Gang aft agley”

- Robert Burns, To a Mouse


Prepare for Growth

• Hardware

• CPU Horsepower based upon need

• Disk based upon need

• RAM based upon budget.

• Get 2
“The Cloud”
Plan for Growth Concurrency & Data Growth
Concurrency in PostgreSQL

• Each running PostgreSQL connection carries overhead

• More connections == Slower queries

• Pool your connections

• pgBouncer - light weight libevent based pooling daemon

• pgPool II - Does pooling and much, much more

• Language specific pooling


Table Partitioning

• Supported in PostgreSQL as of 8.1

• Excellent method for maintaining data

• Allows for removal of aged data without bloat

• Focused SELECTS while allowing ad-hoc SELECT across all partitions


Vertical Partitioning Data

• Isolate application data in different database servers

• Replicate common data needed for joins


Horizontally Partitioning Data

• Spread table data across multiple database servers

• Use a CRC or Hashing algorithm to determine server location of data

• Roll your own in your client code

• Use pl/Proxy

• Plan for growth, use multiple server slots per server


“Anything that can possibly go wrong, does.”

- Jack Sack
Recovering from Failures

• Daily backup is not enough

• Disaster recovery option

• Replicate data for failover and maintenance

• via Replication Tools like Londiste, Bucardo and Slony

• via Warm Standby (PITR Log Shipping)

• Hot standby? Maybe in 8.5


PostgreSQL is YeSQL Saying no to the naysayers
No[SQL] is Not an Option*

• Key/Value store databases are not new, they’re just trendy

• Facebook still has to use Memcached even though it developed the


Cassandra distributed key/value database.

• Same type of developers who jumped on MySQL because “it was fast”

• Data without schema has limited use

• Ad-hoc reporting is a key business value that is realistically unachievable in


non-structured data

• Find the balance of speed and simplicity with normalization


How to kill PostgreSQL performance.
Inadequate Hardware

• Not enough RAM

• Not enough disk bandwidth

• Depending on topology

• Not enough disks

• Not enough controllers

• Slow communication bus

• Over-saturated CPU
Lock Contention == Death

• Nothing brings a server to its knees faster than a long running Exclusive Lock

• Even high share count locks go hand-and-hand with slowness

• Reduce lock contention

• Use partitioning schemes

• Use concurrent operations for maintenance related activities


Bloat == Slower Death

• MVCC is both integral to PostgreSQL performance and impacts performance

• Enter Heap Only Tuples (HOT)

• Updates to non-indexed columns in rows can re-use the same tuple

• Dead tuples can be re-used

• Index Bloat

• Massive index bloat can occur in high-write transaction databases

• Address by concurrently reindex tables without locking*


“Knowledge is Power”

- Sir Francis Bacon


Reacting can be Proactive

• check_postgres.pl by Greg Mullane

• Nagios plugin

• Bloat

• Management activity such as last analyze and vacuum

• wal file count, txid wrap around, sequence exhaustion

• many other items it checks


Be Trendy

• Know your database behavior


over time

• Predict future issues and


behavior

• Identify issues as they occur

• Review impact of maintenance

• Know if your heap use exceeds


your index use

• Look for daily change


Trending and Analysis at myYearbook.com

• Cacti - http://www.cacti.net

• Posuta - http://code.google.com/p/posuta/

• pgFouine - http://pgfouine.projects.postgresql.org/

• Staplr - http://github.com/gmr/staplr
Fin

• Questions?

• Follow me on Twitter: http://twitter.com/Crad

• Leave feedback: http://2009.pgday.eu/feedback

You might also like