Descent into Darkness: Understanding your system’s binary interface is the only way out.

joe damato @joedamato timetobleed.com

About Joe Damato
• ex-vmware, cmu alumni • memprof, ltrace libdl/libunwind patchset,
ree/mri thread implementation rewrite

• http://timetobleed.com • @joedamato

Only have 30 minutes...
welcome to flight school.

No clue why this was accepted.
This talk will have about 5 lines of Ruby code.

Before we get started

I need to introduce you a good friend of mine...

This talk is about how being evil is totally awesome.

Don’t do any of this, ever.

The problem
• My ruby process is 700 megabytes. Why?

The problem
• It is very easy to leak references in your
Ruby code.

• Leaking a reference to an object causes

that object and all objects it references to stick around in memory.

The problem
• As long as someone, somewhere is
holding a reference to this instance of classA, all the objects in this picture can not be freed memory very fast.

• This could add up to a lot of • GC will scan each object every run • This could add up to a lot of CPU
burned. to see if it is time to free the object.

The problem
• But memory is cheap who cares? • Ruby’s GC is a naïve stop the world mark
and sweep.

• The more objects that stick around in
app has to run Ruby code.

memory the longer your GC runs take.

• The longer GC takes, the less time your

The problem
• Eliminate leaked references, reduce the • Cool. But how can you track down
reference leaks? length of your GC runs, run more of your Ruby application code.

Problem Requirements
• I don’t want to apply
patches and rebuild Ruby. require, and done. much work.

• I want to gem install, • Anything else is too

Luckily, we can turn to evil.

• • • • • •

Verbiage
amd64 is a CPU spec was proposed by AMD as a way to add 64bit support to x86. Intel Architecture 64 (IA64) spec is a completely new 64bit instruction set. amd64 != IA64 Intel then decided to adopt AMD’s 64bit spec. They did and called it IA-32e, EM64T, and finally Intel 64. Intel 64 ~= amd64

Verbiage
amd64 Intel64

compilers generate code that uses the subset of the amd64 spec that both intel and amd comply to. usually called x86_64 or amd64.

WTF is an ABI?
• Application Binary Interface
“describes the low-level interface between a program and the operating system or another application.” (wikipedia)

WTF is an ABI?
• alignment • calling conventions • object file and library formats • syscalls (how they work, where they live)

WTF is an ABI?
System V ABI (271 pages) System V ABI AMD64 Architecture Processor Supplement (128 pages) System V ABI Intel386 Architecture Processor Supplement (377 pages) MIPS, ARM, PPC, and IA-64 too!

I brought copies of all three for everyone. We will now read them together.

No. But let’s blaze through the important pieces now.

Evil Devices
• nm - dump symbol table • objdump - disassemble lots of different
objects. can do lots, lots more.

• readelf - dump information • dwarfdump - dump debugging information

nm
% nm /usr/bin/ruby
000000000048ac90 t Balloc 0000000000491270 T Init_Array 0000000000497520 T Init_Bignum

symbol “value”

000000000041dc80 T Init_Binding 000000000049d9b0 T Init_Comparable 000000000049de30 T Init_Dir 00000000004a1080 T Init_Enumerable 00000000004a3720 T Init_Enumerator 00000000004a4f30 T Init_Exception 000000000042c2d0 T Init_File 0000000000434b90 T Init_GC

symbol names

objdump
% objdump -D /usr/bin/ruby

offsets

opcodes

instructions

helpful metadata

readelf
% readelf -a /usr/bin/ruby

This is a *tiny* subset of the data available

dwarfdump
% dwarfdump -a /usr/bin/ruby

Some friends
• Registers are important. They are small, fast
pieces of memory on the CPU.

• Some registers have a specific job: • %rax - holds a return value • %rip - instruction pointer • Can refer to pieces of registers.

%rax uncensored
%rax = 64 bits, 8 bytes, 1 quadword %eax = 32 bits, 4 bytes, 1 dword %ax = 16 bits, 2 bytes, 1 word %ah = 8 bits, 1 byte, 1 halfword %al = 8 bits, 1 byte, 1 halfword lower 16 bits lower 32 bits lower 8 bits upper 8 bits

%ah %al %ax %eax

%rax

• Two different syntaxes: gas/att and intel. • GDB disassembly is gas/att by default. • set disassembly-flavor intel • objdump is gas/att by default • objdump -M intel • I prefer gas/att.

Some x86_64 asm notes

unless otherwise noted, everything will be in att/gas syntax.

Moving stuff
mov source, dest mov $0,%rbx # move immediate (0) to register mov %eax,%rax # mov eax into rax.

source and dest cannot both be memory.

Calling functions
• •
Lot’s of different ways to call functions. Two ways we care about (there are more):
# indirect absolute # RIP relative with 32bit displacement

callq *%rbx callq 0xdeadbeef

Calling convention (x86_64)
from right • function arguments %rcx, left to%r9 live in: %rdi, %rsi, %rdx, %r8,

• that’s for INTEGER class items. • Other stuff gets passed on the stack (like
on i386).

• end of argument area must be aligned on a
16-byte boundary.

• registers can be caller or called saved.

intel syntax

att/gas syntax

Save the old stack frame base pointer. Set the base pointer to the current stack pointer. int again(int amount) { int ret = 0; ret = amount + 150; return ret; }

intel syntax *(rbp - 0x14) = amount; *(rbp - 0x4) = 0; int again(int amount) { int ret = 0; ret = amount + 150; return ret; }

att/gas syntax

intel syntax

att/gas syntax eax = *(rbp - 0x14); eax = eax + 0x96; /* 0x96 = 150 :P */

int again(int amount) { int ret = 0; ret = amount + 150; return ret; }

intel syntax *(rbp - 0x4) = eax; eax = *(rbp - 0x4);

att/gas syntax /* not needed */ /* not needed */

int again(int amount) { int ret = 0; ret = amount + 150; return ret; }

intel syntax

att/gas syntax

restore the stack pointer and old base pointer return from the funtion int again(int amount) { int ret = 0; ret = amount + 150; return ret; }

ELF Objects

ELF Objects
• ELF objects have headers • elf header (describes the elf object) • program headers (describes segments) • section headers (describes sections) • memprof uses libelf to wander the elf object
extracting information.

• the executable and each .so has its own set of data

Sections that matter to memprof
• .text - code lives here • .plt - stub code that helps to “resolve”
absolute function addresses. by .plt entries.

• .got.plt - absolute function addresses; used

plt
• Procedure Linkage Table (plt) is used to find
functions in shared libraries at runtime. and can be mapped anywhere in the address space.

• Shared libraries are position independent

Um, what does this have to do with Ruby?

The ingredients for evil
• we know the x86_64 ABI • we know how ELF objects work • we know ruby calls functions in the VM to
allocate and free objects (rb_newobj, add_freelist)

You won’t.
Let’s combine all of this knowledge and ... Rewrite the Ruby VM in memory while it is running.

Hook rb_newobj
• The Ruby VM calls rb_newobj to allocate a
new object.

• We’ll need to know when this happens so
we can track objects.

• Let’s scan the Ruby binary in memory and
rewrite all function calls to rb_newobj to call a handler function instead.

Hook rb_newobj
(objdump output)
412d16: e8 c1 36 02 00 412d1b: .....
address of this instruction call opcode*
32bit displacement to the target function from the next instruction.

callq 4363dc # <rb_newobj>

Hook rb_newobj
(objdump output)
412d16: e8 c1 36 02 00 412d1b: ..... callq 4363dc # <rb_newobj>

(x86 is little endian)

412d1b + 000236c1

= 4363dc

Hook rb_newobj
Overwrite the displacement so that all calls to rb_newobj actually call a different function instead. It may look like this:
VALUE other_function() { VALUE new_obj = rb_newobj(); /* set up tracking of new_obj */ return new_obj; }

Doesn’t work for all
• That trick only works for Ruby built with -disable-shared (no libruby.so)

• Ruby built with --enable-shared (with
libruby.so) doesn’t work like that. PLT.

• Code in libruby.so calls rb_newobj via the

How the plt works
.got.plt entry 0x7ffff7afd6e6

Initially, the .got.plt entry contains the address of the instruction after the jmp.

How the plt works
.got.plt entry 0x7ffff7afd6e6

An ID is stored and the rtld is invoked.

How the plt works
.got.plt entry 0x7ffff7b34ac0

rtld writes the address of rb_newobj to the .got.plt entry.

How the plt works
.got.plt entry 0x7ffff7b34ac0

rtld writes the address of rb_newobj to the .got.plt entry. calls to the PLT entry jump immediately to rb_newobj now that .got.plt is filled in.

Hook the GOT
Redirect execution by overwriting the .got.plt entry for rb_newobj with a handler function instead.

Hook the GOT

VALUE other_function() { VALUE new_obj = rb_newobj(); /* set up tracking of new_obj */ return new_obj; }

.got.plt entry 0xdeadbeef

WAIT... other_function() calls rb_newobj() isn’t that an infinite loop? NO, it isn’t. other_function() lives in memprof.so, so its calls to rb_newobj() use the .plt/.got.plt in memprof.so. As long as we leave memprof.so unmodified, we’ll avoid an infinite loop.

Hook add_freelist
• We’re now tracking objects at the time of
creation.

• In order to find leaks we need to track when
objects get freed too. object is freed.

• add_freelist is called in the VM when an • Why not just overwrite call instructions or
hook the GOT?

Hook add_freelist
• Can’t because add_freelist is inlined:
static inline void add_freelist(p) RVALUE *p; { p->as.free.flags = 0; p->as.free.next = freelist; freelist = p; }

• The compiler has the option of

inserting the instructions of this function directly into the callers.

• If this happens, you won’t see any calls.

So... what now?
• Look carefully at the generated code:
static inline void add_freelist(p) RVALUE *p; { p->as.free.flags = 0; p->as.free.next = freelist; freelist = p; }

• Notice that freelist gets updated. • freelist has file level scope. • hmmmm......

A (stupid) crazy idea
• freelist has file level scope, so it lives at some
static address.

• add_freelist updates freelist, so... • Why not search the binary for mov instructions
that have freelist as the target! our code!

• Overwrite that mov instruction with a call to • But... we have a problem. • The system isn’t ready for a call instruction.

Isn’t ready? What?
• • • • •
The 64bit ABI says that the stack must be aligned to a 16byte boundary after any/all arguments have been arranged. Since the overwrite is just some random mov, no way to guarantee that the stack is aligned. If we just plop in a call instruction, we won’t be able to arrange for arguments to get put in the right registers. Must save caller saved registers. So now what?

jmp
• • • • • •
Can use a jmp instruction. call saves a return address jmp does not. Transfer execution to an assembly stub that sets the system up according to the ABI. then do the call to the C handler don’t forget to jmp back when handler is done!

this instruction updates the freelist and comes from add_freelist:

Can’t overwrite it with a call instruction because the state of the system is not ready for a function call.

address of assembly stub

The jmp instruction and its offset are 5 bytes wide. Can’t grow or shrink the binary, so insert 2 one byte NOPs.

this instruction updates the freelist and comes from add_freelist:

Can’t overwrite it with a call instruction because the state of the system is not ready for a function call.
must jump back here

The jmp instruction and its offset are 5 bytes wide. Can’t grow or shrink the binary, so insert 2 one byte NOPs.

assembly stub*
*slightly abbreviated

void handler(VALUE freed_object) { mark_object_freed(freed_object); return; }

Sample Output
require 'memprof' Memprof.start require "stringio" StringIO.new Memprof.stats

object count file, line number, class name
108 14 2 1 1 1 1 /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:__node__ test2.rb:3:String /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:Class test2.rb:4:StringIO test2.rb:4:String test2.rb:3:Array /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:Enumerable

Or just track a block

require 'memprof' Memprof.start Memprof.track(‘/tmp/file’) { do_something } require 'memprof' Memprof.start do_stuff Memprof.dump_all(‘/tmp/file’)

Or dump the entire heap as JSON

• Use memprof as middleware • Get per-request object count information
rails 3, environment.rb: require 'memprof/middleware' MyApp::Application.configure do config.middleware.use Memprof::Middleware end

Middleware

569 lib/ruby/1.8/yaml.rb:133:String 528 gems/sequel-3.9.0/lib/sequel/model/base.rb:393:__node__ 522 gems/haml-2.2.20/lib/haml/precompiler.rb:545:String 522 gems/haml-2.2.20/lib/haml/helpers.rb:135:String 522 gems/haml-2.2.20/lib/haml/helpers.rb:135:ActiveSupport::SafeBuffer 507 gems/haml-2.2.20/lib/haml/precompiler.rb:317:String 488 gems/sequel-3.9.0/lib/sequel/adapters/mysql.rb:410:String 445 lib/ruby/1.8/yaml.rb:133:YAML::Syck::Node 432 gems/haml-2.2.20/lib/haml/precompiler.rb:566:String 406 gems/sequel-3.9.0/lib/sequel/model/base.rb:392:__node__

memprof.com

memprof limitations
• only works on amd64 linux and snow leopard • only works with MRI and REE 1.8 • only works on binaries that are NOT STRIPPED. • OSX System Ruby is NOT supported (yet). • support for EY rubies is forthcoming - you will
have to install -dbg packages, though.

More evil is brewing
• We have some crazy, scary, stupid ideas that
we think you’ll love.

• Stay tuned to find out what they are. • 1.9 support is one of the ideas.

Use RVM.
This would have been really hard to test on all the different Ruby binaries without RVM. Use it. Donate money. (Not my project). http://rvm.beginrescueend.com/

Get memprof
• This talk was about the memprof Ruby gem
which is free and provides text output.

• memprof.com is separate and visualizes the
output from the memprof gem.

• github.com/ice799/memprof • gem install memprof • #memprof on irc.freenode.net

• memprof.com is in alpha.

Special Thanks
• Aman Gupta (@tmm1) - web ui, json output, and
much more and more.

• Jake Douglas (@jakedouglas) - mach-o layer, bugfixes, • Brian Lopez (@brianmario) - because he’s cool. • Brian Mitchell (@binary42) - for convincing me to do
this by telling me I wouldn’t and was too scared.

Questions ?

@joedamato timetobleed.com github.com/ice799