You are on page 1of 70

Garbage Collection and the

Ruby Heap
Joe Damato and Aman Gupta
@joedamato @tmm1
About Joe Damato
CMU/VMWare alum
memprof, ltrace-libdl, performance
improvements to REE
http://timetobleed.com
@joedamato
About Aman Gupta
San Francisco, CA
Ruby Hero 2009
EventMachine, amqp, REE, sinbook,
perftools.rb, gdb.rb
github.com/tmm1
@tmm1
Why Garbage Collection?
We use Ruby because it’s simple and elegant
the GC is designed to make your life easier
how is it easier? no more:
memory management
memory leaks

not convinced? let’s look at some C code...
C code vs Ruby code
void func() {
char *stack = "hello";
char *heap = malloc(6); def func
strncpy(heap, "world", 5); local = "hello"
free(heap); @instance = "world"
} end

memory explicitly allocated no concept of stack
on either the stack or the allocated variables
heap
even local variables
local variables usually live live on the heap
on the stack
no way to explicitly
heap allocated memory free memory
must be free()d or it will leak
Recap: Stack vs Heap

bytes on stack bytes on heap
Recap: Stack vs Heap

func1()
4 bytes void *data;
func2();

bytes on stack bytes on heap
Recap: Stack vs Heap

func2()
4 bytes char *string = func3();
free(string);

func1()
4 bytes void *data;
func2();

bytes on stack bytes on heap
Recap: Stack vs Heap

char *func3()
char buffer[8];
12 bytes char *string = malloc(10);

return string;

func2()
4 bytes char *string = func3();
free(string);

func1()
4 bytes void *data;
func2();

bytes on stack bytes on heap
Recap: Stack vs Heap

char *func3()
char buffer[8];
12 bytes char *string = malloc(10); 10 bytes

return string;

func2()
4 bytes char *string = func3();
free(string);

func1()
4 bytes void *data;
func2();

bytes on stack bytes on heap
Recap: Stack vs Heap

10 bytes

func2()
4 bytes char *string = func3();
free(string);

func1()
4 bytes void *data;
func2();

bytes on stack bytes on heap
Recap: Stack vs Heap

func1()
4 bytes void *data;
func2();

bytes on stack bytes on heap
Ruby Objects (in MRI)
always allocated on the heap (even local variables)
fixed size structure
sizeof(struct RVALUE) = 40 bytes
“allocated” in gc.c’s rb_newobj()
“freed” in gc.c’s add_freelist() via garbage_collect()

let’s look at some code...
VALUE
rb_newobj()
rb_newobj creates a
{ new ruby object
VALUE obj;

if (during_gc)
rb_bug("allocation during GC");

if (!freelist) force GC if freelist is empty
garbage_collect();

obj = (VALUE)freelist; pull object off the freelist
freelist = freelist->as.free.next;
MEMZERO((void*)obj, RVALUE, 1);
return obj; return new object
}
VALUE
rb_newobj()
{
VALUE obj;

if (during_gc)
add_freelist frees an
rb_bug("allocation during GC"); existing object
if (!freelist)
garbage_collect();

obj = (VALUE)freelist;
freelist = freelist->as.free.next;
MEMZERO((void*)obj, RVALUE, 1);
return obj; static inline void
} add_freelist(p)
RVALUE *p;
{
p->as.free.flags = 0;
p->as.free.next = freelist;
add object to top of freelist freelist = p;
}
The Ruby heap
The Ruby heap sort of resembles a slab allocator
Ruby allocates a slab by calling malloc
This space is carved up into fixed size slots for holding
Ruby objects
You can get an unused object from the Ruby heap by
calling rb_newobj
If there are no objects available, GC is run
If there are still no objects available, another slab is
created
Heaps on top of heaps
The Freelist
rb_newobj() tries to pull a free
slot off the freelist

the freelist is a linked list
across slots on the ruby
heap
The Freelist
rb_newobj() tries to pull a free
slot off the freelist

the freelist is a linked list
across slots on the ruby
heap
The Freelist
rb_newobj() tries to pull a free
slot off the freelist

the freelist is a linked list
across slots on the ruby
heap
The Freelist
rb_newobj() tries to pull a free
slot off the freelist

the freelist is a linked list
across slots on the ruby
heap
The Freelist
rb_newobj() tries to pull a free
slot off the freelist

if the freelist is empty, GC is run

the freelist is a linked list
across slots on the ruby
heap
The Freelist
rb_newobj() tries to pull a free
slot off the freelist

if the freelist is empty, GC is run

GC finds non-reachable
objects and adds them to the
freelist

the freelist is a linked list
across slots on the ruby
heap
The Freelist
rb_newobj() tries to pull a free
slot off the freelist

if the freelist is empty, GC is run

GC finds non-reachable
objects and adds them to the
freelist

if the freelist is still empty (all
slots were in use)

the freelist is a linked list
across slots on the ruby
heap
The Freelist
rb_newobj() tries to pull a free
slot off the freelist

if the freelist is empty, GC is run

GC finds non-reachable
objects and adds them to the
freelist

if the freelist is still empty (all
slots were in use)
another heap is allocated

the freelist is a linked list all the slots on the new heap
across slots on the ruby are added to the freelist
heap
but what’s inside these slots...?
typedef struct RVALUE {

MRI Heap slots
union {
struct {
unsigned long flags;

are RVALUEs
struct RVALUE *next;
} free;
struct RBasic basic;
struct RObject object;
can be one of many different types of struct RClass klass;
struct RFloat flonum;
ruby objects (uses a C union) struct RString string;
struct RArray array;
union is called as, so you can do struct RRegexp regexp;
struct RHash hash;
obj->as.string struct RData data;
struct RStruct rstruct;
union contains free section for struct RBignum bignum;
unused slots struct RFile file;
struct RNode node;
obj->free.next points to the next struct RMatch match;
struct RVarmap varmap;
free slot for the freelist struct SCOPE scope;
} as;
} RVALUE;
RBasic is a basic ruby object
#define T_NONE 0x00
struct RBasic { #define T_NIL 0x01
unsigned long flags; #define
#define
T_OBJECT
T_CLASS
0x02
0x03
VALUE klass; #define
#define
T_ICLASS
T_MODULE
0x04
0x05

}; #define
#define
T_FLOAT
T_STRING
0x06
0x07
#define T_REGEXP 0x08
#define T_ARRAY 0x09
#define T_FIXNUM 0x0a
all objects have flags #define T_HASH 0x0b
#define T_STRUCT 0x0c
#define T_BIGNUM 0x0d
flags == 0 means unused #define T_FILE 0x0e

slot #define T_TRUE 0x20
#define T_FALSE 0x21

flags contains information #define
#define
T_DATA
T_MATCH
0x22
0x23
about the type of object #define T_SYMBOL 0x24

(T_STRING, T_FLOAT, etc) #define
#define
T_BLKTAG
T_UNDEF
0x3b
0x3c
#define T_VARMAP 0x3d
#define T_SCOPE 0x3e
#define T_NODE 0x3f
RString is for String
struct RString { if obj->as.basic.flags contains
struct RBasic basic; T_STRING, you can interpret the slot
long len; as a RString
char *ptr;
union { RString “extends” RBasic by
long capa; including it and adding additional
VALUE shared; fields
} aux;
}; slot for ruby object is fixed width, but
obj->as.string.ptr points to variable
sized memory on the heap holding the
actual string data
strings can also point to another obj-
>as.string.aux.shared object instead
of making a copy of the string data
RString is for String
struct RString { if obj->as.basic.flags contains
struct RBasic basic; T_STRING, you can interpret the slot
long len; as a RString
char *ptr;
union { RString “extends” RBasic by
long capa; including it and adding additional
VALUE shared; fields
} aux;
}; slot for ruby object is fixed width, but
obj->as.string.ptr points to variable
10.times{"abc"}
sized memory on the heap holding the
will use up 10 slots on actual string data
the ruby heap, but
they’ll all point to the strings can also point to another obj-
same string “abc” on >as.string.aux.shared object instead
the heap of making a copy of the string data
RClass is for Class/Module
modules are just classes as far
as MRI is concerned
classes contain a m_tbl
struct RClass {
struct RBasic basic; contains pointers to method
struct st_table *iv_tbl; bodies
struct st_table *m_tbl;
VALUE super; has a super class which is
}; used in method lookup
also contain an iv_tbl

actually holds instance vars,
class vars and constants
enum node_type {

RNode is for your code NODE_METHOD,
NODE_FBODY,
NODE_CFUNC,
NODE_SCOPE,
ruby code is stored on the heap like any other object NODE_BLOCK,
NODE_IF,
NODE_CASE,
allows code to be dynamically added and removed at NODE_WHEN,
runtime NODE_WHILE,
NODE_UNTIL,
NODE_ITER,
MRI has over 130 different types of nodes NODE_FOR,
NODE_BREAK,
including a NODE_NEWLINE for newline or semicolon in NODE_NEXT,
your codebase NODE_HASH,
NODE_RETURN,
NODE_STR,
nodes point to other objects created during code parse NODE_SPLAT,
NODE_TO_ARY,
a literal in your code creates a NODE_LIT that points to NODE_CLASS,
a RString/RFloat/RRegexp NODE_MODULE,
NODE_SELF,
NODE_NIL,
strings are special: new slot with shared pointer used NODE_TRUE,
every time a string is evaluated NODE_FALSE,
NODE_DEFINED,
floats/regexp/etc are created only once upfront during NODE_NEWLINE,
...
parse and reused during evaluation };
And many more...
For details about hashes, arrays, floats, blocks,
fixnums, symbols and many other types of ruby objects
in MRI, see:
http://timetobleed.com/what-is-a-ruby-object-
introducing-memprof-dump/

...so how are all these objects garbage collected?
Finding Garbage

MRI uses a
Finding Garbage
conservative
Finding Garbage
stop the world
Finding Garbage
mark and sweep
Finding Garbage
garbage collector.
conservative
MRI has a conservative GC
Raw pointers are handed to C
extensions
When scanning the Ruby
process stack it must assume
that anything that looks like a
pointer to a Ruby object is a
pointer to a Ruby object
stop the world

MRI’s GC stops the world
MRI uses a “big hammer” to
put the system into a quiescent
state
No Ruby code can run while
GC is running
mark and sweep

MRI’s GC is a naïve mark and
sweep collector
The collection cycle is broken up
into a mark phase and a sweep
phase
All objects still in use are marked
Any unmarked objects are swept
away
The mark phase
During the mark phase, the garbage collector walks the
entire object graph
starts at the root objects:
global variables, top level constants, threads, etc
follows all references recursively
Since raw pointers are handed out, GC needs to
examine everything:
CPU registers, program stack, and thread stacks
The sweep phase
The sweep phase is pretty simple
Walk the Ruby heap and add unmarked objects to the
freelist
Reset the mark flag for the next GC run
Must iterate over every slot on each slab of the heap
MRI’s GC tradeoffs
MRI’s GC is very simple
The implementation is relatively
short and straightforward
however, the simple design of
the system makes more
advanced GC techniques
difficult or impossible to
implement without breaking
compatibility with C extensions
Alternative Approaches
Memory Management GC Algorithms
explicit (malloc/free) precise
reference counting incremental / concurrent
(python/perl/php)
tri-color
generational
copying
Tri-Color GC
Tri-Color
Put objects into 3 different groups (colors)
Objects are moved from group to group as they are
scanned by the GC
GC can free the recyclable group
Avoids walking the entire object tree and Ruby heap
each GC cycle.
Moving/Copying GC

Used in conjunction with tri-color
Moving GC relocates reachable objects
Once an entire memory region (a slab) has no reachable
objects left, the entire region is freed
Makes other lower level optimizations possible
But MRI can’t move objects
To get the full benefit of tri-color, objects need to be
moved
MRI can’t move objects because objects are raw C
pointers
Can use write barriers, but not without either:
breaking binary compatibility
being really slow
Generational GC
Generational GC algorithms split objects into groups
based on their age
The key axiom of this algorithm is:
Freshly hatched objects are more likely to become
garbage than older objects that have been around
for a while
Any of the younger objects that are referenced by an
older object can get promoted to an older group
The younger group can then be destroyed
MRI can fake Generational
The “long life GC” patch attempts to do something
similar
“long life GC” moves RNode objects onto a separate
heap so they are not scanned in each mark and free
cycle
makes a big difference since code is a large part of
the Ruby heap
still can’t take full advantage because it can’t move
objects between generations

...fixing the GC is hard, can we make mark/sweep faster?
Tuning the GC
Ruby Enterprise Edition contains a GC tuning patch

We use:
RUBY_GC_MALLOC_LIMIT=60000000
RUBY_HEAP_MIN_SLOTS=500000
RUBY_HEAP_SLOTS_GROWTH_FACTOR=1
RUBY_HEAP_SLOTS_INCREMENT=1
malloc_limit = 60MB
force garbage collection after
void * every N bytes worth of calls
ruby_xmalloc(size)
long size; to malloc or realloc
{
void *mem; defaults to 8MB
if (malloced > malloc_limit)
garbage_collect(); high traffic ruby servers can
easily allocate and free more
mem = malloc(size);
malloced += size; than 8mb in a single request

return mem; gc.c’s ruby_xmalloc wrapper
} used by internal objects such
as String, Array and Hash
HEAP_MIN_SLOTS = 500k
(gdb) ruby objects nodes
defaults to 10k 20996 NODE_CONST
21620 NODE_SCOPE
number of slots in the 26329 NODE_LASGN
first slab 26747 NODE_STR
33178 NODE_METHOD
40678 NODE_LIT
a new rails app boots 79046 NODE_LVAR
up with almost 500k 90646 NODE_NEWLINE
objects on the heap 95758 NODE_BLOCK
(mostly nodes) 107357 NODE_CALL
150298 NODE_ARRAY
HEAP_SLOTS_GROWTH = 1
defaults to 1.8x
each new slab is almost twice
as big as the last
normal growth: tuned growth:
10k 500k
10k + 18k = 28k 500k + 500k = 1M
10k + 18k + 36k = 64k

...do I need to tune my GC?
Measuring GC performance
You can use ltrace to measure (among other things) GC
performance
The system’s ltrace will work, but the output is noisy
git://github.com/ice799/ltrace.git
flags to quiet the output
libdl support
backtrace support
and more.
Measuring GC performance
ltrace -F ltrace.conf -ttTg -x garbage_collect ruby gc.rb

15:39:22.637185 garbage_collect() = <void> <0.002420>
15:39:22.650797 garbage_collect() = <void> <0.005480>
15:39:22.677607 garbage_collect() = <void> <0.012134>
15:39:22.729645 garbage_collect() = <void> <0.024849>
15:39:22.828402 garbage_collect() = <void> <0.048067>
15:39:23.007304 garbage_collect() = <void> <0.089344>
15:39:23.339801 garbage_collect() = <void> <0.163595>
15:39:23.929944 garbage_collect() = <void> <0.297686>

GC can get pretty slow, even after tuning...
...so let’s reduce the # of objects to mark and sweep
Ruby memory leaks
Not your classic memory leak
classic memory leak: call malloc, but never call free

These are reference leaks
object A holds references to objects B and C
the result is objects B and C (and their data) is never freed

As long as anyone is holding a reference to an object, that object
can not be freed
This dependency recurses
The leaked reference may be an object holding refs to other
objects, which hold references to other objects, which hold ...
Ruby reference leaks
As long as someone,
somewhere is holding a
reference to this instance of
classA, all the objects in
this picture can not be
freed
This could add up to a lot
of memory very fast
How can we track down
these reference leaks?
gdb.rb: gdb hooks (gdb) ruby objects
HEAPS
SLOTS
8
1686252

for REE LIVE
FREE
893327 (52.98%)
792925 (47.02%)

scope 1641 (0.18%)
http://github.com/tmm1/gdb.rb regexp 2255 (0.25%)
data 3539 (0.40%)
class 3680 (0.41%)
attach to a running REE process and hash 6196 (0.69%)
inspect the heap object 8785 (0.98%)
array 13850 (1.55%)
number of nodes by type string 105350 (11.79%)
node 742346 (83.10%)
number of objects by class
number of strings by content (gdb) ruby objects strings
140 u 'lib'
number of arrays/hash by size 158 u '0'
294 u '\n'
uses gdb7 + python scripting 619 u ''

30503 unique strings
linux only 3187435 bytes
fixing a leak in rails_warden
(gdb) ruby objects classes
1197 MIME::Type
2657 NewRelic::MetricSpec
2719 TZInfo::TimezoneTransitionInfo
4124 Warden::Manager
4124 MethodOverrideForAll
4124 AccountMiddleware
4124 Rack::Cookies
4125 ActiveRecord::ConnectionAdapters::ConnectionManagement
4125 ActionController::Session::CookieStore
4125 ActionController::Failsafe
4125 ActionController::ParamsParser
4125 Rack::Lock
4125 ActionController::Dispatcher
4125 ActiveRecord::QueryCache
4125 ActiveSupport::MessageVerifier
4125 Rack::Head

middleware chain leaking per request
god memory leaks
(gdb) ruby objects classes
43 God::Process (gdb) ruby objects arrays
43 God::Watch elements instances
43 God::Driver 94310 3
43 God::DriverEventQueue 94311 3
43 God::Conditions::MemoryUsage 94314 2
43 God::Conditions::ProcessRunning 94316 1
43 God::Behaviors::CleanPidFile
45 Process::Status 5369 arrays
86 God::Metric 2863364 member elements
327 God::System::SlashProcPoller
327 God::System::Process arrays with 94k+
406 God::DriverEvent elements!

useful, but you can’t tell where the objects came from...
bleak_house
191691 total objects Final heap size 191691 filled, 220961 free
Displaying top 20 most common line/class pairs
89513 __null__:__null__:__node__
41438 __null__:__null__:String
2348 site_ruby/1.8/rubygems/specification.rb:557:Array
1508 gems/1.8/specifications/gettext-1.9.gemspec:14:String

http://github.com/fauna/bleak_house
installs a custom patched ruby
enables GC_DEBUG to track file/line in rb_newobj
increases size of RVALUE slots by 16 bytes
better than gdb.rb- you can see where the leaking
object was allocated
but, can’t run it in production without overhead
memprof
git://github.com/ice799/memprof.git

replacement for gdb.rb and bleak_house
requires no patches to the ruby VM
simply gem install and require ‘memprof’

well, not yet; still a work in progress
mostly works on x86_64 linux
almost works on ruby 1.9
kind of works on osx
memprof under the hood
rewrites your Ruby binary in memory (while its running)
injects short trampolines for all calls to rb_newobj() and
add_freelist() to do tracking
uses libdwarf and libelf to access VM internals like the
ruby heap slabs
uses libyajl to dump out ruby objects as json

http://timetobleed.com/string-together-global-offset-tables-to-build-a-ruby-memory-profiler/
http://timetobleed.com/memprof-a-ruby-level-memory-profiler/
http://timetobleed.com/what-is-a-ruby-object-introducing-memprof-dump/
http://timetobleed.com/hot-patching-inlined-functions-with-x86_64-asm-metaprogramming/
http://timetobleed.com/rewrite-your-ruby-vm-at-runtime-to-hot-patch-useful-features/
plugging a leak in rails3
in dev mode, rails3 is leaking 10mb per request

let’s use memprof to find it!
# in environment.rb
require 'memprof'
Memprof.start
trap('USR2'){
pid = Process.pid
fork{
# fork to prevent blocking the app
Memprof.dump_all("#{pid}-#{Time.now.to_i}.json")
exit!
}
}
plugging a leak in rails3
send the app some requests so it leaks
$ ab -c 1 -n 50 http://localhost:3000/

tell memprof to dump out the entire heap to json
$ kill -USR2 3372

import the heap dump to mongodb
$ mongoimport -h localhost -d memprof --drop -c rails
--file 3372-1266658113.json

connect to mongo
$ monogo localhost/memprof

count the number of objects
> db.rails.count()
809816
plugging a leak in rails3
find files with the most objects
> db.rails.group({ key:{file:true}, initial:{count:0},
reduce: function(d,o){ o.count++ } })

application_controller.rb is leaking.. lets find that class
> db.rails.find
({type:"class",name:"ApplicationController"}).count()
50

aha! one ApplicationController leaked per request
is it just ApplicationController?
> db.rails.find({type:"class",name:/Controller
$/}).count()
250

nope!
plugging a leak in rails3
find one of the leaked controllers
> db.rails.findOne
({type:"class",name:"AccountsController"})._id
0x3b56780

find out what’s referencing it
$ grep 0x3b56780 3372-1266658113.json

{"_id":"0x4a8e6d0","file":"actionpack-3.0.0.beta/lib/
abstract_controller/localized_cache.rb","line":
3,"type":"hash","length":21}

{"_id":"0x4c78540","file":"actionpack-3.0.0.beta/lib/
action_controller/metal.rb","line":
74,"type":"hash","length":21}

{"_id":"0x29be3b0","type":"hash","length":21}
plugging a leak in rails3
first two are leaks!
module ActionController
class Metal < AbstractController::Base
class ActionEndpoint
@@endpoints = Hash.new {|h,k| h[k] = Hash.new {|sh,sk| sh[sk] = {} } }
module AbstractController
class HashKey
@hash_keys = Hash.new {|h,k| h[k] = Hash.new {|sh,sk| sh[sk] = {} } }

dev mode enables source reload, but globals holding
refs to old controllers!
figure out what the third leak is
$ grep 0x29be3b0 3372-1266658113.json
{"type":"class","name":"ActionView::Partials::PartialR
enderer","ivars":{"PARTIAL_NAMES":"0x29be3b0"}}

module ActionView
module Partials
class PartialRenderer
PARTIAL_NAMES = Hash.new {|h,k| h[k] = {} }
memprof
still a long and manual process, but memprof provides
all the data to make debugging memory issues
possible
coming soon: memprof.com
a web-based heap visualizer and leak analyzer
as a user, you simply:
gem install memprof
memprof MY_RAILS_APP_PID
visit http://memprof.com/c4e4d3eb0e18
see line numbers where your app is leaking
Questions?

Joe Damato Aman Gupta
@joedamato @tmm1
timetobleed.com github.com/tmm1