You are on page 1of 192

adventures in bug hunting

@joedamato http://timetobleed.com

whoami

http://timetobleed.com

@joedamato

http://boundary.com (make use of it)

rst, a confession.

debugging > programming

before we get this horror show rolling


kernels, drivers, glibc, and everything else
changes.

code snips will differ from what you are


running on your machines. time.

some things are simplied in the interest of

bprobe

boundary IPFIX ow meter collects ow data by snifng packets with libpcap also collects low level NIC data from the driver

packets tx/rx bytes tx/rx ethernet collisions ethernet errors

ethernet bonding (aka teaming)


combine a group of physical NICs (eth0, eth1, ...)
into a single virtual device (bond0, bond1, ...).

different modes active-passive round robin link aggregation

ethernet bonding (aka teaming)

how does bonding work (on linux) ?

at a high level... the bonding driver creates a virtual device when a packet is sent, bonding driver gures when a packet comes in, the NICs pass the

out which physical NIC to transmit the packet on. incoming packet up for the higher layers of the network stack to gure out.

bprobe and bonding


bprobe discovers bonded network
interfaces.

uses libpcap to monitor the underlying


physical NICs instead of bond devices.

detecting link failures, etc

everything was looking good until....

Bug was led...


Debian Lenny, 64bit. Bonded ethernet interfaces. No incoming packets are showing up.

Step 0

Take a step back. Breathe. Do not break the computer.

Step 1
Examine our assumptions: The packets are making it to the kernel. The packets are being handed up from the
kernel to libpcap.

libpcap doesnt lose any packets before


bprobe examines them.

bprobe has some weird bug in it.

packets are making it to the kernel

watch -n 1 'cat /proc/net/dev'

packets are making it to the kernel

packets are being handed up from the kernel

Peel some layers away


bprobe is really libpcap + packet analysis +
output.

if this is a bug in the kernel or libpcap then


other programs that use libpcap (like tcpdump) will also fail the same way.

so, do they?

tcpdump

bonded ethernet interfaces (on linux) are virtual devices created by combining other devices. for example:

bond0

eth0 eth2 eth4 ...

First, sniff bond0...


% sudo tcpdump -i bond0 dst 172.16.209.136 and proto 1 12:57:26.275660 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 54, length 64 12:57:27.275731 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 55, length 64 ^C 2 packets captured 2 packets received by filter 0 packets dropped by kernel

Everything is cool.

Now eth0 (the active NIC in bond0)


% sudo tcpdump -i eth0 dst 172.16.209.136 and proto 1 ^C 0 packets captured 2 packets received by filter 0 packets dropped by kernel

Everything is not cool.

incoming packets appear to be missing when snifng the physical device.

(only on debian lenny)

outgoing packets show up regardless.

tcpdump mailing list

only way to gure out where they are getting lost is to follow them through the kernel.

Step 2

Lets start digging.

Steps 3-5
Dig until you see something you havent
seen before.

Read all of the code and understand it. Go to step 2.

how are packets received?


packets come in from the wire. a couple different ways for the kernel to
know about new packets.

lets just look at the simple case. an interrupt is raised when a packet arrives. both paths hand data up to the higher
layers in similar ways.

e1000

e1000

netif_rx
queues packets up. another thread pulls packets off and processes them.

OK, but how does pcap nd out about these packets?

a more fundamental question...


how does pcap actually work?

bprobe/tcpdump/etc
(in userland) (in userland)

libpcap

packet protocol family


(in the kernel)
(in the kernel)

network device agnostic layer

bprobe/tcpdump/etc
(in userland) (in userland)

libpcap

packet protocol family


(in the kernel)
(in the kernel)

network device agnostic layer

bprobe/tcpdump/etc
(userland)

call pcap_open_live or pcap_create/pcap_activate


to initialize libpcap.

call pcap_next_ex to get packets from libpcap. examine the packets and do stuff.

bprobe/tcpdump/etc
(in userland) (in userland)

libpcap

packet protocol family


(in the kernel)
(in the kernel)

network device agnostic layer

libpcap (userland)
creates a socket of type PF_PACKET two ways to get get packets from the kernel: one by one (slow) via shared memory (fast) libpcap tries to use the fast method if it fails, it falls back to slow.

libpcap creating PF_PACKET socket

new fast way is being setup.

the new way of pulling packets out.

the old way is getting setup when the new way failed to initialize.

pull packets out from the kernel the old way.

bprobe/tcpdump/etc
(in userland) (in userland)

libpcap

packet protocol family


(in the kernel)
(in the kernel)

network device agnostic layer

PF_PACKET (kernel)
libpcap creates the PF_PACKET socket the PF_PACKET code in the kernel
(eventually) executes.

this code does some initialization and


inserts a protocol hook...

bprobe/tcpdump/etc
(in userland) (in userland)

libpcap

packet protocol family


(in the kernel)
(in the kernel)

network device agnostic layer

network device agnostic layer

pulls packets off the backlog queue. calls netif_receive_skb() has some logic to determine who the real
sender is when bonding is enabled. hooks.

passes the packet through the protocol

(run through all protocol blocks handing the packet over)

we now know the path packets take so they can be examined by pcap apps.

bprobe/tcpdump/etc
(in userland) (in userland)

libpcap

packet protocol family


(in the kernel)
(in the kernel)

network device agnostic layer

back to the bug


so, the bug was that packeting snifng what do we now know about our
environment? physical NICs on bonded hosts was not revealing incoming packets.

what would be the best place to look to


track down this bug?

we know

assume the following setup


bond0

eth0 eth1 eth2

packet came in on eth0 thus: skb->dev = eth0 skb->dev->master = bond0

we know

before

after
skb->dev = bond0 code returns eth0 as orig_dev

skb->dev = eth0

we know

LOOK

we know

Did you see it?

Bug

We overwrite the packets device with the bond device. The protocol hook check, checks to see if the hook is for the device on the packet. It isnt we are snifng eth0 skb->dev was overwritten to bond0. Thats why if you sniff bond0 you see packets but if you sniff eth0 you see nothing.

packets are being handed up from the kernel

YYYYYyyyyYYyYYyyYYy YYYYYYYYYYYYYYYyY eeeEEeeEEeEEEeEEEEee eeeEEeEEEeeEEeEeEEaA AaaaAaaAaAAaAaAAaa AaAAAAAaaaAAa!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

EASY FIX

YYYYYyyyyYYyYYyyYYy YYYYYYYYYYYYYYYyY eeeEEeeEEeEEEeEEEEee eeeEEeEEEeeEEeEeEEaA AaaaAaaAaAAaAaAAaa AaAAAAAaaaAAa!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

reboot and try the new kernel...

First, sniff bond0...


% sudo tcpdump -i bond0 dst 172.16.209.136 and proto 1 12:57:26.275660 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 54, length 64 12:57:27.275731 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 55, length 64 ^C 2 packets captured 2 packets received by filter 0 packets dropped by kernel

Everything is cool.

Now eth0 (the active NIC in bond0)


% sudo tcpdump -i eth0 dst 172.16.209.136 and proto 1 ^C 0 packets captured 2 packets received by filter 0 packets dropped by kernel

Everything is not cool.

NO

!"

NEIN!

tcpdump/bprobe/other pcap apps STILL FAIL.

???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????

In real life I spent the next 4 days looking over the same kernel code, hundreds of times.

Every single day from the moment I woke up (9am) until I searched all day until I collapsed with exhaustion (3am).

I got so wound up in trying to get my x working, I lost track of the process.

It was a miserable 4 days.

Until I realized...

Step 0

Take a step back. Breathe. Do not break the computer.

Step 1
Examine our assumptions: The kernel code is still broken. The incoming packets are being queued up for
libpcap to pull out of PF_PACKET properly. tcpdump.

There probably isnt bug in bProbe and

Step 2

Lets start digging.

Steps 3-5
Dig until you see something you havent
seen before.

Read all of the code and understand it. Go to step 2.

verify my assumption
modify libpcap to verify that the kernel really is still broken

i used apt-get source to retrieve the ofcial source for debian lennys libpcap and I found something surprising.

old way of doing pcap


debian lennys kernel supports the new way
of getting packets out of the kernel via mmap.

but, debian lennys libpcap is not new

enough and therefore uses the old way to examine packets. the libpcap version i want, my app will just perform worse on lenny.

this also means that unless i statically link

reading a packet the old way

that if statement fails.


we are snifng packets on a physical device BUT in the kernel we are changing the
device a packet comes in on to the bond device (remember in netif_receive_skb?)

that if statement fails.

the index of the bond device is different from the index of the physical device we are snifng

so this if statement evaluates to TRUE libpcap returns without processing


the packet.

why?
this code exists to prevent a race condition when snifng packets the old way in some kernels.

solution
boot into our xed debian lenny kernel. download a version of libpcap that is newer and
supports the mmap method for packet snifng. and has better performance.

new method doesnt have this race condition link bprobe/tcpdump/other pcap apps against it.

First, sniff bond0...


% sudo tcpdump -i bond0 dst 172.16.209.136 and proto 1 12:57:26.275660 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 54, length 64 12:57:27.275731 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 55, length 64 ^C 2 packets captured 2 packets received by filter 0 packets dropped by kernel

Next, sniff eth0...


% sudo tcpdump -i eth0 dst 172.16.209.136 and proto 1 12:57:26.275660 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 54, length 64 12:57:27.275731 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 55, length 64 ^C 2 packets captured 2 packets received by filter 0 packets dropped by kernel

YYYYYyyyyYYyYYyyYYy YYYYYYYYYYYYYYYyY eeeEEeeEEeEEEeEEEEee eeeEEeEEEeeEEeEeEEaA AaaaAaaAaAAaAaAAaa AaAAAAAaaaAAa!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

summarize
kernel bug when overwriting the device the
packet arrived on. failed.

xed this bug, but bprobe/tcpdump still libpcap bug when pulling packets out the
kernel the old way

can avoid this bug and get better

performance with a newer libpcap

Step 0

Take a step back. Breathe. Do not break the computer.

Step 1-5
Examine your assumptions. Start digging. Keep going until you see something you
havent seen before.

Read all of the code and understand it. Go to step 2.

! Happy debugging!

questions?
twitter: @joedamato blog: http://timetobleed.com

if there is extra time...

an warmup bug

cool operating system.

no, not really.

but, people use it.

ipx_reader

a test program links against yajl because it generates JSON output works on ubuntu, but not on centos5

TOO EASY, JOE.

but, wait.

heres another program that links ne to a lib in /usr/local/lib ON THE SAME SYSTEM.

W A T
We have 2 programs: Both link against libraries in /usr/local/lib/ Only one works. The broken programs library is in /usr/local/lib/

Step 0

Take a step back. Breathe. Do not break the computer.

Step 1
Examine our assumptions: The programs and libraries are both 64bit. /usr/local/lib/ is in the library search path

both programs and their libraries are 64bit.

program 1: ipx_reader

program 2: bprobe

both programs and their libraries are 64bit.

/usr/local/lib/ is in the library search path

Lets check... ldcong -p

/usr/local/lib/ is in the library search path

So... ipx_reader doesnt work because /usr/local/lib is not in the search path.

but... how can bprobe be working ne?

Strange
This is confusing. bprobe should fail. But, the shared libraries a particular binary
dynamically links to at runtime are built into the binary itself.

So....

Step 2

Lets start digging.

Steps 3-5
Dig until you see something you havent
seen before.

Read all of the code and understand it. Go to step 2.

Lets take a look with readelf

(lets resize it)

rpath

ah ha!
bprobe works and can link because the but, now there are 2 more questions: how did the rpath tag get there? why doesnt ipx_reader have one?
binary is storing the library path inside of itself.

how did the rpath tag get there?

why doesnt ipx_reader have rpath?

almost forgot...

an warmup bug feature