performance tweaks and tools for linux

joe damato twitter: @joedamato blog: timetobleed.com

~12 hour flight +11 hour time change

and

side effects are..

nasty bugs
ejpphoto (flickr)

fatboyke (flickr)

code

memory bloat
37prime (flickr)

?

TOOLS

list open files

LSOF

lsof -nPp <pid>

-n Inhibits the conversion of network numbers to host names. -P Inhibits the conversion of port numbers to names for network files
FD cwd txt mem mem mem mem 0u 1w 2w 3u 10u 11u 12u 33u TYPE DIR REG REG REG REG REG CHR REG REG IPv4 IPv4 IPv4 REG IPv4

lsof -nPp <pid>

json NAME /var/www/myapp memcached /usr/bin/ruby mysql /json-1.1.9/ext/json/ext/generator.so http /json-1.1.9/ext/json/ext/parser.so /memcached-0.17.4/lib/rlibmemcached.so /mysql-2.8.1/lib/mysql_api.so /dev/null /usr/local/nginx/logs/error.log /usr/local/nginx/logs/error.log 10.8.85.66:33326->10.8.85.68:3306 (ESTABLISHED) 10.8.85.66:33327->10.8.85.68:3306 (ESTABLISHED) 127.0.0.1:58273->127.0.0.1:11211 (ESTABLISHED) /tmp/RackMultipart.28957.0 174.36.83.42:37466->69.63.180.21:80 (ESTABLISHED)

trace system calls and signals strace -cp <pid> strace -ttTp <pid> -o <file>

STRACE

strace -cp <pid>
-c Count time, calls, and errors for each system call and report a summary on program exit. -p pid Attach to the process with the process ID pid and begin tracing.
% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------50.39 0.000064 0 1197 592 read 34.65 0.000044 0 609 writev 14.96 0.000019 0 1226 epoll_ctl 0.00 0.000000 0 4 close 0.00 0.000000 0 1 select 0.00 0.000000 0 4 socket 0.00 0.000000 0 4 4 connect 0.00 0.000000 0 1057 epoll_wait ------ ----------- ----------- --------- --------- ---------------100.00 0.000127 4134 596 total

strace -ttTp <pid> -o <file>
-tt If given twice, the time printed will include the microseconds. -T Show the time spent in system calls. -o filename Write the trace output to the file filename rather than to stderr.

epoll_wait(9, {{EPOLLIN, {u32=68841296, u64=68841296}}}, 4096, 50) = 1 <0.033109> accept(10, {sin_port=38313, sin_addr="127.0.0.1"}, [1226]) = 22 <0.000014> fcntl(22, F_GETFL) = 0x2 (flags O_RDWR) <0.000007> fcntl(22, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000008> setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000008> accept(10, 0x7fff5d9c07d0, [1226]) = -1 EAGAIN <0.000014> epoll_ctl(9, EPOLL_CTL_ADD, 22, {EPOLLIN, {u32=108750368, u64=108750368}}) = 0 <0.000009> epoll_wait(9, {{EPOLLIN, {u32=108750368, u64=108750368}}}, 4096, 50) = 1 <0.000007> read(22, "GET / HTTP/1.1\r"..., 16384) = 772 <0.000012> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007> poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000008> write(5, "1\0\0\0\0\0\0-\0\0\0\3SELECT * FROM `table`"..., 56) = 56 <0.000023> read(5, "\25\1\0\1,\2\0x\234m"..., 16384) = 284 <1.300897>

http client connection read 772 bytes

read(22, "GET / HTTP/1.1\r"..., 16384) = 772 <0.0012>

incoming http request took 0.0012s

mysql connection write sql query to db
write(5, "SELECT * FROM `table`"..., 56) = 56 <0.0023> read(5, "\25\1\0\1,\2\0x\234m"..., 16384) = 284 <1.30>

read query response

slow query

strace ruby to see some interesting things...

stracing ruby: SIGVTALRM
--- SIGVTALRM (Virtual rt_sigreturn(0x1a) --- SIGVTALRM (Virtual rt_sigreturn(0x1a) --- SIGVTALRM (Virtual rt_sigreturn(0x1a) --- SIGVTALRM (Virtual rt_sigreturn(0x1a) --- SIGVTALRM (Virtual rt_sigreturn(0x1a) --- SIGVTALRM (Virtual rt_sigreturn(0x1a) --- SIGVTALRM (Virtual rt_sigreturn(0x1a) --- SIGVTALRM (Virtual timer expired) @ 0 (0) --= 2207807 <0.000009> timer expired) @ 0 (0) --= 0 <0.000009> timer expired) @ 0 (0) --= 140734552062624 <0.000009> timer expired) @ 0 (0) --= 140734552066688 <0.000009> timer expired) @ 0 (0) --= 11333952 <0.000008> timer expired) @ 0 (0) --= 0 <0.000009> timer expired) @ 0 (0) --= 1 <0.000010> timer expired) @ 0 (0) ---

• ruby 1.8 uses signals to schedule its green threads • process receives a SIGVTALRM signal every 10ms

stracing ruby: sigprocmask
% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------100.00 0.326334 0 3568567 rt_sigprocmask 0.00 0.000000 0 9 read 0.00 0.000000 0 10 open 0.00 0.000000 0 10 close 0.00 0.000000 0 9 fstat 0.00 0.000000 0 25 mmap ------ ----------- ----------- --------- --------- ---------------100.00 0.326334 3568685 0 total

• debian/redhat compile ruby with --enable-pthread • uses a native thread timer for SIGVTALRM • causes excessive calls to sigprocmask: 30%
slowdown!

dump traffic on a network tcpdump -i eth0 -s 0 -nqA tcp dst port 3306

TCPDUMP

tcpdump -i <eth> -s <len> -nqA <expr> tcpdump -i <eth> -w <file> <expr>
-i <eth> Network interface. -s <len> Snarf len bytes of data from each packet. -n Don't convert addresses (host addresses, port numbers) to names. -q Quiet output. Print less protocol information.

-A Print each packet (minus its link level header) in ASCII. -w <file> Write the raw packets to file rather than printing them out. <expr> libpcap expression, for example: tcp src port 80 tcp dst port 3306

tcp dst port 80
19:52:20.216294 IP 24.203.197.27.40105 > 174.37.48.236.80: tcp 438 E...*.@.l.%&.....%0....POx..%s.oP....... GET /poll_images/cld99erh0/logo.png HTTP/1.1 Accept: */* Referer: http://apps.facebook.com/realpolls/? _fb_q=1

tcp dst port 3306
19:51:06.501632 IP 10.8.85.66.50443 > 10.8.85.68.3306: tcp 98 E..."K@.@.Yy .UB .UD.....z....L............ GZ.y3b..[......W.... SELECT * FROM `votes` WHERE (`poll_id` = 72621) LIMIT 1

tcpdump -w <file>

google's performance tools CPUPROFILE=/tmp/myprof ./myapp pprof ./myapp /tmp/myprof

PERFTOOLS

wget http://google-perftools.googlecode.com/files/googleperftools-1.6.tar.gz download tar zxvf google-perftools-1.6.tar.gz cd google-perftools-1.6 ./configure --prefix=/opt make sudo make install # for linux export LD_PRELOAD=/opt/lib/libprofiler.so

compile

setup

# for osx export DYLD_INSERT_LIBRARIES=/opt/lib/libprofiler.dylib CPUPROFILE=/tmp/ruby.prof ruby -e' 5_000_000.times{ "hello world" } ' pprof `which ruby` --text /tmp/ruby.prof
profile

report

pprof ruby ruby.prof --text
Total: 103 samples 95 92.2% rb_yield_0 103 100.0% rb_eval 12 11.7% gc_sweep 52 50.5% rb_str_new3 3 2.9% obj_free 103 100.0% int_dotimes 12 11.7% gc_mark

pprof ruby ruby.prof --gif

Profiling MRI
• • •
10% of production VM time spent in rb_str_sub_bang String#sub! called from Time.parse

return return return return return return

unless unless unless unless unless unless

str.sub!(/\A(\d{1,2})/, '') str.sub!(/\A( \d|\d{1,2})/, '') str.sub!(/\A( \d|\d{1,2})/, '') str.sub!(/\A(\d{1,3})/, '') str.sub!(/\A(\d{1,2})/, '') str.sub!(/\A(\d{1,2})/, '')

Profiling EM + threads
Total: 3763 samples 2764 73.5% catch_timer 989 26.3% memcpy 3 0.1% st_lookup 2 0.1% rb_thread_schedule 1 0.0% rb_eval 1 0.0% rb_newobj 1 0.0% rb_gc_force_recycle

• • • • •

known issue: EM+threads = slow memcpy?? thread context switches copy the stack w/ memcpy EM allocates huge buffer on the stack solution: move buffer to the heap

PERFTOOLS.RB
perftools for ruby code pprof.rb /tmp/myrbprof
github.com/tmm1/perftools.rb

gem install perftools.rb RUBYOPT="-r`gem which perftools | tail -1`" CPUPROFILE=/tmp/myrbprof ruby myapp.rb pprof.rb /tmp/myrbprof --text pprof.rb /tmp/myrbprof --gif > /tmp/myrbprof.gif

require 'sinatra' get '/sleep' do sleep 0.25 'done' end

$ ab -c 1 -n 50 http://127.0.0.1:4567/compute $ ab -c 1 -n 50 http://127.0.0.1:4567/sleep

Sampling profiler:

• • • •

232 samples total 83 samples were in /compute 118 samples had /compute on the stack but were in another function /compute accounts for 50% of process, but only 35% of time was in /compute itself

get '/compute' do proc{ |n| a,b=0,1 n.times{ a,b = b,a+b } b }.call(10_000) 'done' end

== Sinatra has ended his set (crowd applauds) PROFILE: interrupts/evictions/bytes = 232/0/2152 Total: 232 samples 83 35.8% 35.8% 56 24.1% 59.9% 35 15.1% 75.0%

118 56 113

50.9% Sinatra::Application#GET /compute 24.1% garbage_collector 48.7% Integer#times

CPUPROFILE_REALTIME=1 CPUPROFILE=app-rt.prof

CPUPROFILE=app.prof

redis-rb bottleneck

why is rubygems slow?

faster bundle install
• • •
23% spent in Gem::Version#<=> simple patch to rubygems improved overall install performance by 15% http://gist.github.com/ 458185

CPUPROFILE_OBJECTS=1 CPUPROFILE=app-objs.prof

• • • •

object allocation profiler mode built-in 1 sample = 1 object created Time parsing is both CPU and object allocation intensive using mysql2 moves this to C

trace library calls ltrace -cp <pid> ltrace -ttTp <pid> -o <file>

LTRACE

ltrace -c ruby threaded_em.rb
% time seconds usecs/call calls function ------ ----------- ----------- --------- -------------------48.65 11.741295 617 19009 memcpy 30.16 7.279634 831 8751 longjmp 9.78 2.359889 135 17357 _setjmp 8.91 2.150565 285 7540 malloc 1.10 0.265946 20 13021 memset 0.81 0.195272 19 10105 __ctype_b_loc 0.35 0.084575 19 4361 strcmp 0.19 0.046163 19 2377 strlen 0.03 0.006272 23 265 realloc ------ ----------- ----------- --------- -------------------100.00 24.134999 82999 total

ltrace -ttT -e memcpy ruby threaded_em.rb
01:24:48.769408 --- SIGVTALRM (Virtual timer expired) --01:24:48.769616 memcpy(0x1216000, "", 1086328) = 0x1216000 <0.000578> 01:24:48.770555 memcpy(0x6e32670, "\240&\343v", 1086328) = 0x6e32670 <0.000418> 01:24:49.899414 --- SIGVTALRM (Virtual timer expired) --01:24:49.899490 memcpy(0x1320000, "", 1082584) = 0x1320000 <0.000628> 01:24:49.900474 memcpy(0x6e32670, "", 1086328) = 0x6e32670 <0.000479>

LTRACE/LIBDL
trace dlopen’d library calls ltrace -F <conf> -bg -x <symbol> -p <pid>
github.com/ice799/ltrace/tree/libdl

ltrace -F <conf> -b -g -x <sym>
-b Ignore signals. -g Ignore libraries linked at compile time. -F <conf> Read prototypes from config file. -x <sym> Trace calls to the function sym.

-F ltrace.conf
int mysql_real_query(addr,string,ulong); void garbage_collect(void); int memcached_set(addr,string,ulong,string,ulong);

ltrace -x garbage_collect
19:08:06.436926 19:08:15.329311 19:08:17.662149 19:08:20.486655 19:08:25.102302 19:08:35.552337 garbage_collect() garbage_collect() garbage_collect() garbage_collect() garbage_collect() garbage_collect() = = = = = = <void> <void> <void> <void> <void> <void> <0.221679> <0.187546> <0.199200> <0.205864> <0.214295> <0.189172>

ltrace -x mysql_real_query
mysql_real_query(0x1c9e0500, mysql_real_query(0x1c9e0500, mysql_real_query(0x19c7a500, mysql_real_query(0x1c9e0500, "SET NAMES 'UTF8'", 16) "SET SQL_AUTO_IS_NULL=0", 22) "SELECT * FROM `users`", 21) "COMMIT", 6) = = = = 0 0 0 0 <0.000324> <0.000322> <1.206506> <0.000181>

ltrace -x memcached_set
memcached_set(0x15d46b80, memcached_set(0x15d46b80, memcached_set(0x15d46b80, memcached_set(0x15d46b80, memcached_set(0x15d46b80, memcached_set(0x15d46b80, memcached_set(0x15d46b80, "Status:33", "Status:96", "Status:57", "Status:10", "Status:67", "Status:02", "Status:34", 21, 21, 21, 21, 21, 21, 21, "\004\b", "\004\b", "\004\b", "\004\b", "\004\b", "\004\b", "\004\b", 366) 333) 298) 302) 318) 299) 264) = = = = = = = 0 0 0 0 0 0 0 <0.01116> <0.00224> <0.01850> <0.00530> <0.00291> <0.00658> <0.00243>

the GNU debugger gdb <executable> gdb attach <pid>

GDB

Debugging Ruby Segfaults
test_segv.rb:4: [BUG] Segmentation fault ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-darwin9.7.0] def test require 'segv' 4.times do Dir.chdir '/tmp' do Hash.new{ segv }[0] end end end sleep 10 test() #include "ruby.h" VALUE segv() { VALUE array[1]; array[1000000] = NULL; return Qnil; }

void Init_segv() { rb_define_method(rb_cObject, "segv", segv, 0); }

1. Attach to running process
$ ps aux | grep ruby joe 23611 0.0 0.1 25424 7540 S Dec01 0:00 ruby test_segv.rb

$ sudo gdb ruby 23611 Attaching to program: ruby, process 23611 0x00007fa5113c0c93 in nanosleep () from /lib/libc.so.6 (gdb) c Continuing. Program received signal SIGBUS, Bus error. segv () at segv.c:7 7 array[1000000] = NULL;

2. Use a coredump
Process.setrlimit Process::RLIMIT_CORE, 300*1024*1024 $ sudo mkdir /cores $ sudo chmod 777 /cores $ sudo sysctl kernel.core_pattern=/cores/%e.core.%s.%p.%t $ sudo gdb ruby /cores/ruby.core.6.23611.1259781224

def test require 'segv' 4.times do Dir.chdir '/tmp' do Hash.new{ segv }[0] end (gdb) where end #0 segv () at segv.c:7 end #1 0x000000000041f2be in ... test() #13 0x000000000043ba8c in ... #19 0x000000000043b92a in ... #26 0x00000000004bb7bc in #27 0x000000000041d8d7 in #28 0x00000000004bb93a in ... #35 0x000000000041c444 in #36 0x0000000000450690 in ... #48 0x0000000000412a90 in #49 0x000000000041014e in

call_cfunc () at eval.c:5727 rb_hash_default () at hash.c:521 rb_hash_aref () at hash.c:429 chdir_yield () at dir.c:728 rb_ensure () at eval.c:5528 dir_s_chdir () at dir.c:816 rb_yield () at eval.c:5142 int_dotimes () at numeric.c:2834 ruby_run () at eval.c:1678 main () at main.c:48

gdb with MRI hooks gem install gdb.rb gdb.rb <pid>
github.com/tmm1/gdb.rb

GDB.RB

(gdb) ruby eval 1+2 3 (gdb) ruby eval Thread.current #<Thread:0x1d630 run> (gdb) ruby eval Thread.list.size 8

(gdb) ruby threads list 0x15890 main thread THREAD_STOPPED 0x19ef4 thread THREAD_STOPPED 0x19e34 thread THREAD_STOPPED 0x19dc4 thread THREAD_STOPPED 0x19dc8 thread THREAD_STOPPED 0x19dcc thread THREAD_STOPPED 0x22668 thread THREAD_STOPPED 0x1d630 curr thread THREAD_RUNNABLE

WAIT_JOIN(0x19ef4) WAIT_TIME(57.10s) WAIT_FD(5) WAIT_NONE WAIT_NONE WAIT_NONE WAIT_NONE WAIT_NONE

(gdb) ruby objects HEAPS 8 SLOTS 1686252 LIVE 893327 (52.98%) FREE 792925 (47.02%) scope regexp data class hash object array string node 1641 2255 3539 3680 6196 8785 13850 105350 742346 (0.18%) (0.25%) (0.40%) (0.41%) (0.69%) (0.98%) (1.55%) (11.79%) (83.10%)

(gdb) ruby objects strings 140 u'lib' 158 u'0' 294 u'\n' 619 u'' 30503 unique strings 3187435 bytes

def test require 'segv' 4.times do Dir.chdir '/tmp' do Hash.new{ segv }[0] end end end (gdb) ruby threads test() 0xa3e000 main curr thread THREAD_RUNNABLE WAIT_NONE node_vcall segv in test_segv.rb:5 node_call test in test_segv.rb:5 node_call call in test_segv.rb:5 node_call default in test_segv.rb:5 node_call [] in test_segv.rb:5 node_call test in test_segv.rb:4 node_call chdir in test_segv.rb:4 node_call test in test_segv.rb:3 node_call times in test_segv.rb:3 node_vcall test in test_segv.rb:9

rails_warden leak
(gdb) ruby objects classes 1197 MIME::Type 2657 NewRelic::MetricSpec 2719 TZInfo::TimezoneTransitionInfo 4124 Warden::Manager 4124 MethodOverrideForAll 4124 AccountMiddleware 4124 Rack::Cookies 4125 ActiveRecord::ConnectionAdapters::ConnectionManagement 4125 ActionController::Session::CookieStore 4125 ActionController::Failsafe 4125 ActionController::ParamsParser 4125 Rack::Lock 4125 ActionController::Dispatcher 4125 ActiveRecord::QueryCache 4125 ActiveSupport::MessageVerifier 4125 Rack::Head

middleware chain leaking per request

mongrel sleeper thread
0x16814c00 node_fcall node_fcall node_fcall node_call node_call node_call node_call node_call node_call node_fcall thread THREAD_STOPPED WAIT_TIME(0.47) 1522 bytes sleep in lib/mongrel/configurator.rb:285 run in lib/mongrel/configurator.rb:285 loop in lib/mongrel/configurator.rb:285 run in lib/mongrel/configurator.rb:285 initialize in lib/mongrel/configurator.rb:285 new in lib/mongrel/configurator.rb:285 run in bin/mongrel_rails:128 run in lib/mongrel/command.rb:212 run in bin/mongrel_rails:281 (unknown) in bin/mongrel_rails:19

def run @listeners.each {|name,s| s.run }

$mongrel_sleeper_thread = Thread.new { loop { sleep 1 } }
end

god memory leaks
(gdb) ruby objects arrays elements instances 94310 3 94311 3 94314 2 94316 1 5369 arrays 2863364 member elements 43 43 43 43 43 43 43 45 86 327 327 406 God::Process God::Watch God::Driver God::DriverEventQueue God::Conditions::MemoryUsage God::Conditions::ProcessRunning God::Behaviors::CleanPidFile Process::Status God::Metric God::System::SlashProcPoller God::System::Process God::DriverEvent

many arrays with 90k+ elements!

5 separate god leaks fixed by Eric Lindvall with the help of gdb.rb!

BLEAK_HOUSE
ruby memory leak detector ruby-bleak-house myapp.rb bleak /tmp/bleak.<PID>.*.dump
github.com/fauna/bleak_house

BleakHouse • installs a patched version of ruby: ruby-bleakhouse • unlike gdb.rb, see where objects were created (file:line) • create multiple dumps over time with `kill -USR2 <pid>` and compare to find leaks

191691 total objects Final heap size 191691 filled, 220961 free Displaying top 20 most common line/class pairs 89513 __null__:__null__:__node__ 41438 __null__:__null__:String 2348 ruby/site_ruby/1.8/rubygems/specification.rb:557:Array 1508 ruby/gems/1.8/specifications/gettext-1.9.gemspec:14:String 1021 ruby/gems/1.8/specifications/heel-0.2.0.gemspec:14:String 951 ruby/site_ruby/1.8/rubygems/version.rb:111:String 935 ruby/site_ruby/1.8/rubygems/specification.rb:557:String 834 ruby/site_ruby/1.8/rubygems/version.rb:146:Array

summarizes strace and lsof
wget http://aspersa.googlecode.com/svn/trunk/ioprofile

IOPROFILE

strace -cp <pid>
ioprofile is a script that captures one sample of lsof output then starts strace for a specified amount of time. after strace finishes, the results are processed. below is an example that comes with ioprofile

$ ioprofile t/samples/ioprofile-001.txt total pread read pwrite 10.094264 10.094264 0.000000 0.000000 8.356632 8.356632 0.000000 0.000000 0.048850 0.046989 0.000000 0.001861 0.035016 0.031001 0.000000 0.004015 0.013360 0.000000 0.001723 0.000000 0.008676 0.000000 0.000000 0.000000 0.002060 0.000000 0.000000 0.002060 0.001490 0.000000 0.000000 0.001490 0.000555 0.000000 0.000000 0.000000 0.000141 0.000000 0.000000 0.000141 0.000100 0.000000 0.000000 0.000100

write 0.000000 0.000000 0.000000 0.000000 0.011637 0.008676 0.000000 0.000000 0.000555 0.000000 0.000000

filename /data/data/abd_2dia/aia_227_228.ibd /data/data/abd_2dia/aia_227_223.ibd /data/data/abd/aia_instances.ibd /data/data/abd/vo_difuus.ibd /var/log/mysql/mysql-relay.002113 /data/data/master.info /data/data/ibdata1 /data/data/ib_logfile1 /var/log/mysql/mysql-relay-log.info /data/data/ib_logfile0 /data/data/abd/9fvus.ibd

strace -ttTp <pid> -o <file>
-c CELL specify what to put in the cells of the output. ‘times’, ‘count’, or ‘sizes‘. below is an example of -c sizes:
$ ioprofile -c sizes t/samples/ioprofile-001.txt total pread read pwrite write 90800128 90800128 0 0 0 52150272 52150272 0 0 0 999424 0 0 999424 0 638976 131072 0 507904 0 327680 114688 0 212992 0 305263 0 149662 0 155601 217088 0 0 217088 0 22638 0 0 0 22638 16384 0 0 16384 0 1088 0 0 0 1088 512 0 0 512 0 filename /data/data/abd_2dia/aia_227_223.ibd /data/data/abd_2dia/aia_227_228.ibd /data/data/ibdata1 /data/data/abd/vo_difuus.ibd /data/data/abd/aia_instances.ibd /var/log/mysql/mysql-relay.002113 /data/data/ib_logfile1 /data/data/master.info /data/data/abd/9fvus.ibd /var/log/mysql/mysql-relay-log.info /data/data/ib_logfile0

/proc lots of interesting things hiding in proc

/proc/[pid]/pagemap
• • • • •
(as of linux 2.6.25) /proc/[pid]/pagemap - find out which physical frame each virtual page is mapped to and swap flag. /proc/kpagecount - stores number of times a particular physical page is mapped. /proc/kpageflags - flags about each page (locked, slab, dirty, writeback, ...) read more: Documentation/vm/pagemap.txt

/proc/[pid]/stack

• get a process’s stack trace

/proc/[pid]/status

• lots of small bits of information

/proc/sys/kernel/core_pattern

• tell system where to output core dumps • can also launch user program when
core dumps happen.

• echo "|/path/to/core_helper.rb %p %s %u
%g" > /proc/sys/kernel/core_pattern

• /proc/[pid]/ files are kept in place until your
handler exits, so full state of the process at death may be inspected.

• http://gist.github.com/587443

and lots more /proc/meminfo /proc/scsi/* /proc/net/* /proc/sys/net/ipv4/* ...

MONITOR RAID STATUS

• how do you know when a hard drive in
your RAID array fails?

• turns out there some command line tools
that most RAID vendors provide. AL

• adaptec - /usr/StorMan/arcconf getconfig 1 • 3ware - /usr/bin/tw_cli

snooki:/# /usr/StorMan/arcconf getconfig 1 AL Controllers found: 1 ---------------------------------------------------------------------Controller information ---------------------------------------------------------------------Controller Status : Optimal Channel description : SAS/SATA Controller Model : Adaptec 3405 Controller Serial Number : 7C4911519E3 Physical Slot :2 Temperature : 42 C/ 107 F (Normal) Installed memory : 128 MB Copyback : Disabled Background consistency check : Enabled Automatic Failover : Enabled Global task priority : High Stayawake period : Disabled Spinup limit internal drives :0 Spinup limit external drives :0 Defunct disk drive count :0 Logical devices/Failed/Degraded : 1/0/0 -------------------------------------------------------Controller Version Information -------------------------------------------------------BIOS : 5.2-0 (17304) Firmware : 5.2-0 (17304) Driver : 1.1-5 (2461)

• write a script to parse that • run it with cron • the ugly ruby script i use for adaptec:
http://gist.github.com/643666

more well known tools: iostat, vmstat, top, and free

LINUX TWEAKS

ADJUST TIMER FREQUENCY

CONFIG_HZ_100=y CONFIG_HZ=100
• Set the timer interrupt frequency. • Fewer timer interrupts means processes
run with fewer interruptions.

• Servers (without interactive software)
should have lower timer frequency.

CONNECTOR

CONFIG_CONNECTOR=y CONFIG_PROC_EVENTS=y

• connector kernel module is useful for
process monitoring. processes.

• build or use a system like god to watch • when processes die the kernel notifies you • you can restart/recover/etc.

TCP SEGMENTATION OFFLOADING

sudo ethtool -K eth1 tso on
• Allows kernel to offload large packet
with: segmentation to the network adapter.

• Frees the CPU to do more useful work. • After running the command above, verify
[joe@timetobleed]% dmesg | tail -1 [892528.450378] 0000:04:00.1: eth1: TSO is Enabled

http://kerneltrap.org/node/397
Tx/Rx TCP file send long (bi-directional Rx/Tx): w/o TSO: 1500Mbps, 82% CPU w/ TSO: 1633Mbps, 75% CPU Tx TCP file send long (Tx only): w/o TSO: 940Mbps, 40% CPU w/ TSO: 940Mbps, 19% CPU

INTEL I/OAT DMA ENGINE

CONFIG_DMADEVICES=y CONFIG_INTEL_IOATDMA=y CONFIG_DMA_ENGINE=y CONFIG_NET_DMA=y

• these options enable Intel I/OAT DMA
engine present in recent Xeon CPUs.

• increases throughput because kernel can • CPU can do more useful work. • statistics about savings can be found in
sysfs: /sys/class/dma/

offload network data copying to the DMA engine.

check if I/OAT is enabled
[joe@timetobleed]% dmesg | grep ioat ioatdma 0000:00:08.0: setting latency timer to 64 ioatdma 0000:00:08.0: Intel(R) I/OAT DMA Engine found, 4 channels, device version 0x12, driver version 3.64 ioatdma 0000:00:08.0: irq 56 for MSI/MSI-X

DIRECT CACHE ACCESS

CONFIG_DCA=y
• I/OAT includes Direct Cache Access (DCA) • DCA allows a driver to warm CPU cache. • Requires driver and device support. • Intel 10GbE driver (ixgbe) supports this
feature.

• Must enable this feature in the BIOS. • Some vendors hide BIOS option so you will
need a hack to enable DCA.

DCA enable hack
get a hack to enable DCA on some vendor locked systems: http://github.com/ice799/dca_force read about the hack here: http://timetobleed.com/enabling-biosoptions-on-a-live-server-with-no-rebooting/

THROTTLE NIC INTERRUPTS

• • • • • • •

insmod e1000e.ko InterruptThrottleRate=1
SOME drivers allow you to specify the interrupt throttling algorithm. e1000e is one of these drivers. Two dynamic throttling algorithms: dynamic (1) and dynamic conservative (3). The difference is the interrupt rate for “Lowest Latency” traffic. Algorithm 1 is more aggressive for this traffic class. Read driver documentation for more information. Be careful to avoid receive livelock.

PROCESS AFFINITY

Process Affinity
• Linux allows you to set which CPU(s) a
process may run on.

• For example, set PID 123 to CPUs 4-6:
# taskset -c 4,5,6 123

IRQ AFFINITY

IRQ Affinity
• NIC, disk, etc IRQ handlers can be set to
execute on specific processors. /proc/interrupts

• The IRQ to CPU map can be found at: • Individual IRQs may be set in the file:
/proc/irq/[IRQ NUMBER]/smp_affinity

IRQ affinity
• Can pin IRQ handlers for devices to specific
CPUs.

• Can then use taskset to pin important
processes to other CPUs.

• The result is NIC and disk will not interrupt
important processes running elsewhere.

• Can also help preserve CPU caches.

irqbalance
• http://www.irqbalance.org • “irqbalance is a Linux* daemon that
distributes interrupts over the processors and cores you have in your computer system.”

OPROFILE

CONFIG_OPROFILE=y CONFIG_HAVE_OPROFILE=y

• • • •

oprofile is a system wide profiler that can profile both kernel and application level code. oprofile has a kernel driver which collects data from CPU registers (on x86 these are MSRs). oprofile can also annote source code with performance information. this can help you find and fix performance problems.

LATENCYTOP

CONFIG_LATENCYTOP=y
• LatencyTOP helps you understand what
caused the kernel to block a process.

nasty bugs
ejpphoto (flickr)

fatboyke (flickr)

code

memory bloat
37prime (flickr)

TOOLS

TWEAKS

Thanks. (spasiba ?) @joedamato timetobleed.com