Debugging Ruby

with MongoDB
Aman Gupta @tmm1

Ruby developers know...

Ruby is
fatboyke (flickr)

Ruby loves eating RAM

37prime (flickr)

ruby allocates memory from the OS memory is broken up into slots each slot holds one ruby object

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the empy slots on the ruby heap

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the empy slots on the ruby heap

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the empy slots on the ruby heap

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the empy slots on the ruby heap

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the empy slots on the ruby heap

when you need an object, it’s pulled off the freelist if the freelist is empty, GC is run

a linked list called the ‘freelist’ points to all the empy slots on the ruby heap

when you need an object, it’s pulled off the freelist if the freelist is empty, GC is run GC finds non-reachable objects and adds them to the freelist

a linked list called the ‘freelist’ points to all the empy slots on the ruby heap

when you need an object, it’s pulled off the freelist if the freelist is empty, GC is run GC finds non-reachable objects and adds them to the freelist if the freelist is still empty (all slots were in use) a linked list called the ‘freelist’ points to all the empy slots on the ruby heap

when you need an object, it’s pulled off the freelist if the freelist is empty, GC is run GC finds non-reachable objects and adds them to the freelist if the freelist is still empty (all slots were in use) a linked list called the ‘freelist’ points to all the empy slots on the ruby heap another heap is allocated all the slots on the new heap are added to the freelist

antphotos (flickr)

turns out, Ruby’s GC is also one of the reasons it can be so slow

Matz’ Ruby Interpreter (MRI 1.8) has a...
john_lam (flickr)

Conservative
lifeisaprayer (flickr)

Stop the World
benimoto (flickr)

Mark and Sweep
michaelgoodin (flickr)

kiksbalayon (flickr)

Garbage Collector

• •

conservative: the VM hands out raw pointers to ruby objects stop the world: no ruby code can execute during GC objects in use, sweep away unmarked objects

• mark and sweep: mark all

more objects = longer GC
mckaysavage (flickr)

longer GC = less time to run your ruby code
kgrocki (flickr)

fewer objects = better performance
januskohl (flickr)

improve performance
1. remove unnecessary object allocations object allocations are not free

improve performance
1. remove unnecessary object allocations object allocations are not free 2. avoid leaked references not really memory ‘leaks’ you’re holding a reference to an object you no longer need. GC sees the reference, so it keeps the object around

the GC follows references recursively, so a reference to classA will ‘leak’ all these objects

let’s build a debugger
• step 1: collect data • list of all ruby

objects in memory

• step 2: analyze data • group by type • group by file/line

version 1: collect data
• simple patch to ruby VM (300 lines of C) • http://gist.github.com/73674 • simple text based output format
0x154750 0x15476c 0x154788 0x1547c0 0x1547dc 0x154814 0x154a98 0x154b40 @ @ @ @ @ @ @ @ -e:1 -e:1 -e:1 -e:1 -e:1 -e:1 -e:1 -e:1 is is is is is is is is OBJECT of type: T HASH which has data ARRAY of len: 0 STRING (SHARED) len: 2 and val: hi STRING len: 1 and val: T CLASS named: T inherits from Object STRING len: 2 and val: hi OBJECT of type: Range

version 1: analyze data
$ wc -l /tmp/ruby.heap  1571529 /tmp/ruby.heap

version 1: analyze data
$ wc -l /tmp/ruby.heap  1571529 /tmp/ruby.heap $ cat /tmp/ruby.heap | awk '{ print $3 }' | sort | uniq -c | sort -g | tail -1  236840 memcached/memcached.rb:316

version 1: analyze data
$ wc -l /tmp/ruby.heap  1571529 /tmp/ruby.heap $ cat /tmp/ruby.heap | awk '{ print $3 }' | sort | uniq -c | sort -g | tail -1  236840 memcached/memcached.rb:316 $ grep "memcached.rb:316" /tmp/ruby.heap | awk '{ print $5 }' | sort | uniq -c | sort -g | tail -5            10948  20355  30744 64952 123290 ARRAY OBJECT DATA HASH STRING

version 1
• it works! • but... • must patch and rebuild ruby binary • no information about references between
objects

• limited analysis via shell scripting

version 2 goals
• better data format • simple: one line of text per object • expressive: include all details about
object contents and references

• easy to use: easy to generate from C
code & easy to consume from various scripting languages

equanimity (flickr)

version 2 is memprof
• no patches to ruby necessary • gem install memprof • require ‘memprof’ • Memprof.dump_all(“/tmp/app.json”) • C extension for MRI ruby VM
http://github.com/ice799/memprof as json

• uses libyajl to dump out all ruby objects

strings
{ "_id": "0x19c610", "file": "file.rb", "line": 2, "type": "string", "class": "0x1ba7f0", "class_name": "String", "length": 10, "data": "helloworld" }

Memprof.dump{ "hello" + "world" }

memory address of object file and line where string was created address of the class “String” length and contents of this string instance

arrays
{ "_id": "0x19c5c0", "class": "0x1b0d18", "class_name": "Array", "length": 4, "data": [ 1, ":b", "0x19c750", "0x19c598" ] }

Memprof.dump{ [ 1, :b, 2.2, "d" ] }

integers and symbols are stored in the array itself floats and strings are separate ruby objects

hashes
{ "_id": "0x19c598", "type": "hash", "class": "0x1af170", "class_name": "Hash", "default": null, "length": 2, "data": [ [ ":a", 1 ], [ "0xc728", "0xc750" ] ] }

Memprof.dump{ { :a => 1, "b" => 2.2 } }

no default proc hash entries as key/value pairs

classes
{ "_id": "0x19c408", "type": "class", "name": "Hello", "super": "0x1bfa48", "super_name": "Object", "ivars": { "@@var": "Const": }, "methods": "world": } }

Memprof.dump{ class Hello @@var=1 Const=2 def world() end end }

superclass object reference class variables and constants are stored in the instance variable table references to method objects

1, 2 { "0x19c318"

version 2: memprof.com a web-based heap visualizer and leak analyzer

built on...

$ mongoimport -d memprof -c rails --file /tmp/app.json $ mongo memprof

let’s run some queries.

how many objects?

thaths (flickr)

how many objects?
> db.rails.count() 809816

• ruby scripts create a lot of objects • usually not a problem, but... • MRI has a naïve stop-the-world mark/
sweep GC

• fewer objects = faster GC = better
performance

what types of objects?

brettlider (flickr)

what types of objects?
> db.rails.distinct(‘type’) [‘array’, ‘bignum’, ‘class’, ‘float’, ‘hash’, ‘module’, ‘node’, ‘object’, ‘regexp’, ‘string’, ...]

mongodb: distinct
• • •
distinct(‘type’)

list of types of objects
distinct(‘file’)

list of source files
distinct(‘class_name’)

list of instance class names

• optionally filter first

distinct(‘name’, {type:“class”})

names of all defined classes

improve performance
with indexes
> db.rails.ensureIndex({‘type’:1}) > db.rails.ensureIndex( {‘file’:1}, {background:true} )

mongodb: ensureIndex
• add an index on a field (if it doesn’t exist yet) • improve performance of queries against
common fields: type, class_name, super, file
ensureIndex(‘methods.add’) find({‘methods.add’:{$exists:true}})

• can index embedded field names
• •
find classes that define the method add

how many objs per type?

darrenhester (flickr)

how many objs per type?
> db.rails.group({ initial: {count:0}, key: {type:true}, cond: {}, reduce: function(obj, out) { out.count++ } }).sort(function(a,b) { return a.count - b.count }) group on type increment count for each obj sort results

how many objs per type?
[ ..., {type: ‘array’, count: 7621}, {type: ‘string’, count: 69139}, {type: ‘node’, count: 365285} ] lots of nodes

• nodes represent ruby code • stored like any other ruby object • makes ruby completely dynamic

mongodb: group
• cond: query to filter objects before
grouping

• key: field(s) to group on • initial: initial values for each group’s
results

• reduce: aggregation function

mongodb: group
type or class • bykey: {type:1} file line • bykey:&{file:1, line:1} type • bycond: in a specific file {file: “app.rb”},

• • • • •

key: {class_name:1}

key: {file:1, line:1}

length of strings in a specific file • bycond: {file:“app.rb”,type:‘string’},
key: {length:1}

what subclasses String?

davestfu (flickr)

what subclasses String?
> db.rails.find( {super_name:"String"}, {name:1} select only name field ) {name: {name: {name: {name: {name: {name: "ActiveSupport::SafeBuffer"} "ActiveSupport::StringInquirer"} "SQLite3::Blob"} "ActiveModel::Name"} "Arel::Attribute::Expressions"} "ActiveSupport::JSON::Variable"}

mongodb: find
• • •
find({type:‘string’})

all strings
find({type:{$ne:‘string’}})

everything except strings
find({type:‘string’}, {data:1})

only select string’s data field

the largest objects?

http://body.builder.hu/imagebank/pictures/1088273777.jpg

the largest objects?
> db.rails.find( {type: {$in:['string','array','hash']} }, {type:1,length:1} ).sort({length:-1}).limit(3) {type: "string", length: 2308} {type: "string", length: 1454} {type: "string", length: 1238}

mongodb: sort, limit/skip
• • •
sort({length:-1,file:1})

sort by length desc, file asc
limit(10)

first 10 results
skip(10).limit(10)

second 10 results

when were objs created?

zoutedrop (flickr)

when were objs created?
• useful to look at objects over time • each obj has a timestamp of when it was
created

• find minimum time, call it
start_time

• create buckets for every

minute of execution since start

• place objects into buckets

when were objs created?
> db.rails.mapReduce(function(){ var secs = this.time - start_time; var mins_since_start = secs % 60;

emit(mins_since_start, 1);
}, function(key, vals){ for(var i=0,sum=0; i<vals.length; sum += vals[i++]); return sum; }, { scope: { start_time: db.rails.find ().sort({time:1}).limit(1)[0].time } } start_time = min(time) ) {result:"tmp.mr_1272615772_3"}

mongodb: mapReduce
• arguments • map: function that emits one or more
key/value pairs given each object this result, given key and list of values

• reduce: function to return aggregate • scope: global variables to set for funcs • results • stored in a temporary collection (tmp.mr_1272615772_3)

when were objs created?
> db.tmp.mr_1272615772_3.count() 12 script was running for 12 minutes > db.tmp.mr_1272615772_3.find().sort ({value:-1}).limit(1) {_id: 8, value: 41231} 41k objects created 8 minutes after start

references to this object?

jeffsmallwood (flickr)

references to this object?
ary = [“a”,”b”,”c”] ary references “a” “b” referenced by ary

• ruby makes it easy to “leak” references • an object will stay around until all
references to it are gone performance

• more objects = longer GC = bad • must find references to fix leaks

references to this object?
• • •
db.rails_refs.insert({ _id:"0xary", refs:["0xa","0xb","0xc"] })

create references lookup table
db.rails_refs.ensureIndex({refs:1})

add ‘multikey’ index to refs array
db.rails_refs.find({refs:“0xa”})

efficiently lookup all objs holding a ref to 0xa

mongodb: multikeys
• indexes on array values create a ‘multikey’
index

• classic example: nested array of tags

find({tags: “ruby”})

find objs where obj.tags includes “ruby”

version 2: memprof.com a web-based heap visualizer and leak analyzer

a web-based heap visualizer and leak analyzer

memprof.com

a web-based heap visualizer and leak analyzer

memprof.com

a web-based heap visualizer and leak analyzer

memprof.com

a web-based heap visualizer and leak analyzer

memprof.com

a web-based heap visualizer and leak analyzer

memprof.com

a web-based heap visualizer and leak analyzer

memprof.com

a web-based heap visualizer and leak analyzer

memprof.com

plugging a leak in rails3
• in dev mode, rails3 is leaking 10mb per request
let’s use memprof to find it!
# in environment.rb require `gem which memprof/signal`.strip

plugging a leak in rails3
send the app some requests so it leaks
$ ab -c 1 -n 30 http://localhost:3000/

tell memprof to dump out the entire heap to json
$ memprof --pid <pid> --name <dump name> --key <api key>

2519 classes 30 copies of TestController

2519 classes 30 copies of TestController mongo query for all TestController classes

details for one copy of TestController

find references to object

find references to object

find references to object

“leak” is on line 178

holding references to all controllers

• In development mode, Rails reloads all your
application code on every request

• ActionView::Partials::PartialRenderer is caching
reloaded version of those controllers

partials used by each controller as an optimization

• But.. it ends up holding a reference to every single

• In development mode, Rails reloads all your
application code on every request

• ActionView::Partials::PartialRenderer is caching
reloaded version of those controllers

partials used by each controller as an optimization

• But.. it ends up holding a reference to every single

Questions?
Aman Gupta @tmm1

Sign up to vote on this title
UsefulNot useful