Professional Documents
Culture Documents
Management
for Web
Operations
John Allspaw
Operations
Engineering
the book I’m writing
???
Rules of Thumb
Planning/Forecasting
• security incidents
• real capacity problems*
* (should be the last thing you need to worry about)
Capacity != Performance
• Automated Stuff
• Scalable Metric
Collection/Display
(apache requests)
(concurrent busy apache procs)
Metrics
App-level meets system-level
The End
Use real live production
data
to find ceilings
webserver!
Safety Factors
what you have left
“safe”
ceiling
@85% CPU
(photo requests/second)
Forecasting
Forecasting
Fictional Example:
webservers
Forecasting
now
when is this?
what you have left
Use http://fityk.sf.net to
automate the curve-fit
Forecasting
Fictional Example:
storage consumption
Forecasting Automation
(SAME)
Capacity Health
alert if higher
alert if lower
4 cores
8 cores
to:
1036.8 8U
8 1120
HP DL140 G3s Watts photos/min rack
!!! (75% faster, even)
3
.52
running hot,
so add more
2nd Order Effects
(beware the wandering
bottleneck)
www100
www118
dbcontacts3
admin1
admin2
Stupid Capacity Tricks
quick and dirty management
[root@netmon101 ~]# dsh N group.of.servers
dsh> date
executing 'date'
www100: Mon Jun 23 14:14:53 UTC 2008
www118: Mon Jun 23 14:14:53 UTC 2008
dbcontacts3: Mon Jun 23 07:14:53 PDT 2008
admin1: Mon Jun 23 14:14:53 UTC 2008
admin2: Mon Jun 23 14:14:53 UTC 2008
dsh>
Stupid Capacity Tricks
Turn Stuff OFF