what happens after you’re scalable

capacity planning for LAMP

MySQL Conf and Expo April 2007

John Allspaw
•Engineering • •
Manager (Operations) at flickr (Yahoo!)

Yay!
•You’re scalable! (or not) •Now you can simply add
hardware as you need capacity. ?)

•(right

•But: •How many

servers ?

•How •How •How much shared storage ? How many network •switches ? •What about caching ? How many •these ? CPUs in all of •How much RAM ? •How many drives in each ? •WHEN should we order all

BUT, um, many databases ? wait.... ? many webservers

•••••-

~35M photos in squid cache (total) ~2M photos in squid’s RAM ~470M photos, 4 or 5 sizes of each 38k req/sec to memcached (12M objects) 2 PB raw storage (consumed about ~1.5TB on Sunday)

some stats

capacity

capacity doesn’t mean speed

capacity is for business

too much

Buying enough for now not enough too soon

too late

•••-

Planning (what ?/why ? /when ?) Deployment (install/config/manage) Measurement (graph the world)

3 main parts

boring queueing theory •Forced Flow Law: •X = V x X •Little’s Law: •N = X x R •Service Demand Law: •D = V x S = U / X •
i i 0 i i i i 0

my theory
•capacity
planning math should be based on real things, not abstract ones.

predicting the

consumable

concurrent

considerations: social applications •- Have the ‘network
•• •
effect’ Exponential growth

considerations: social applications •Event-related growth
•(press, trends,
• •

news event, social etc.)

Examples: London bombing, holidays, tsunamis, etc.

What do you have NOW ?
•When
will your current capacity be depleted or outgrown ?

finding ceilings
•MySQL (disk IO •SQUID (disk IO •memcached (CPU
network ?) ?) ? or CPU ?) ? or

•boring •to use •not
load

forget benchmarks

in capacity planning...not usually worth the time representative of real

test in production

•define what is acceptable •examples: •squid hits should take •SQL

what do you expect ?

less than X milliseconds queries less than Y milliseconds, and also keep up with replication

measurement

accept the observer effect
•measurement
necessity. is a

•it’s

not optional.

http://ganglia.sf.n

•#!/bin/sh
•UTIL=`grep

super simple graphing

•/usr/bin/iostat

-x 4 2 sda | grep -v ^$ | tail -4 > /tmp/disk-io.tmp sda /tmp/disk-io.tmp | awk '{print $14}'` -t uint16 -n diskutil -v$UTIL -u '%'

•/usr/bin/gmetric

memcached

what if you have graphs but no raw data ? •GraphClick •http://www.arizona•

software.ch/applications/g raphclick/en/

application usage Usage stats are just as •
•as
important server stats!

•Examples: •# of user registrations •# of photos uploaded
every hour

not a straight line

another not straight line

but straight relationships!

measurement examples

queries

disk I/O

•we

can do at least 1500 qps (peak) without:

What we know now
slave lag

•••-

unacceptable avg response time waiting on disk IO

•find ceilings of existing h/w •tie app usage to server stats •find ceiling:usage ratio •do this again:

MySQL capacity

••-

regularly (monthly)

when new features are released

caching maximums

caching ceilings squid, memcache •working-set specific: •- tiny enough to all fit
••in memory ? some/more/all on disk ? watch LRU churn

•Ceilings at: •- LRU ref age •-

churning full caches
small enough to affect hit ratio too much Request rate large enough to affect disk IO (to 100%)

squid

requests and hits

squid hit ratio

LRU reference age

hit response times

What we know now •we can do at least 620
•••LRU affecting hit ratio unacceptable avg response time

req/sec (peak) without:

waiting too much on diskIO

not full caches
•(working •max size) set smaller than

request rate large enough to bring network or CPU to 100%

deployment

Automated Deploy Tools •SystemImager/SystemConfigurat
or

•CVSup: - http://www.cvsup.org • •Subcon: •http://code.google.com/p/subcon/

•http://wiki.systemimager.org

•http://flickr.com/photos/gaspi/62165296/ •http://flickr.com/photos/marksetchell/2796 •http://flickr.com/photos/sheeshoo/72709413 •http://flickr.com/photos/jaxxon/165559708/ •http://flickr.com/photos/bambooly/29863254 •http://flickr.com/photos/colloidfarl/81564 •http://flickr.com/photos/sparktography/754

questions ?