You are on page 1of 105

Part 2: Advanced Static Analysis

Chapter 4: A Crash Course in x86 Disassembly


Chapter 5: IDA Pro
Chapter 6: Recognizing C Code Constructs in
Assembly
Chapter 7: Analyzing Malicious Windows
Programs
Chapter 4: A Crash Course in x86
Disassembly
How software works
gcc compiler driver pre-processes, compiles,
assembles and links to generate executable
 Links together object code (i.e. game.o) and
static libraries (i.e. libc.a) to form final
executable
 Links in references to dynamic libraries for
code loaded at load time (i.e. libc.so.1)
 Executable may still load additional dynamic
libraries at run-time
hello.c Pre- hello.i hello.s hello.o hello
Compiler Assembler Linker
processor

Program Modified Assembly Object Executable


Source Source Code Code Code
Executables
Various file formats
 Linux = Executable and Linkable Format (ELF)
 Windows = Portable Executable (PE)
ELF Object File Format
ELF header ELF header
0
 Magic number, type (.o, exec, Program header table
.so), machine, byte ordering, etc. (required for executables)

Program header table .text section


 Page size, virtual addresses of .data section

memory segments (sections), .bss section


segment sizes, entry point .symtab

.text section .rel.text


 Code .rel.data

.data section .debug


Section header table
 Initialized (static) data (required for relocatables)
.bss section
 Uninitialized (static) data
 “Block Started by Symbol”
ELF Object File Format (cont)
.rel.text section 0
ELF header
 Relocation info for .text section
Program header table
 Addresses of instructions that will need
(required for executables)
to be modified in the executable
 Instructions for modifying. .text section

.rel.data section .data section


 Relocation info for .data section .bss section
 Addresses of pointer data that will need .symtab
to be modified in the merged executable
.rel.text
.symtab section .rel.data
 Symbol table
.debug
 Procedure and static variable names
 Section names and locations Section header table
(required for relocatables)
.debug section
 Info for symbolic debugging (gcc -g)
PE (Portable Executable) file format
Windows file format for executables
Based on COFF Format
 Magic Numbers, Headers, Tables,
Directories, Sections
Example C Program

m.c a.c
int e=7; extern int e;

int main() { int *ep=&e;


int r = a(); int x=15;
exit(0); int y;
}
int a() {
return *ep+x+y;
}
Merging Relocatable Object Files into an
Executable Object File

Relocatable Object Files Executable Object File

system code .text 0


headers
system data .data
system code

main() .text

a()
main() .text
m.o
int e = 7 .data more system code
system data
int e = 7 .data
a() .text int *ep = &e
int x = 15
a.o int *ep = &e .data int y .bss
int x = 15
.bss .symtab
int y
.debug
Program execution
Operating system provides
 Protection and resource allocation
 Abstract view of resources (files, system calls)
 Virtual memory
 Uniform memory space abstraction for each process
 Gives the illusion that each process has entire
memory space
How does a program get loaded?
The operating system creates a new process.
 Including among other things, a virtual memory space
System loader
 Loads the executable file from the file system into the
memory space
 Done via DMA (direct memory access)
 Executable contains code and statically link libraries
 Executable in file system remains and can be executed again
 Loads dynamic shared objects/libraries into memory space
 Done via DMA from file system as with original executable
 Resolves addresses in code (using .rel.text and .rel.data
information) based on where code/data is loaded
 Starts a thread of execution running based on specified
entry point in ELF/PE header
Loading Executable Binaries
Executable object file for
example program p
0
ELF header
Virtual addr
Process image
Program header table
(required for executables) 0x080483e0
init and shared lib
.text section segments

.data section
0x08048494
.bss section .text segment
(r/o)
.symtab
.rel.text 0x0804a010
.data segment
.rel.data (initialized r/w)
.debug
0x0804a3b0
Section header table .bss segment
(required for relocatables) (uninitialized r/w)
Example: Linux virtual memory space (32-bit)
0xffffffff
kernel virtual memory memory
(code, data, heap, stack) invisible to
0xc0000000 user code
user stack
(created at runtime)
%esp (stack pointer)

memory mapped region for


shared libraries
0x40000000

brk
run-time heap
(managed by malloc)

read/write segment
(.data, .bss)
loaded from the
read-only segment executable file
(.init, .text, .rodata)
0x08048000
unused
0

cat /proc/self/maps
Relocation
Virtual memory abstraction makes compilation and linking easy
 Compared to a single, shared real memory address space (e.g. original
Mac)
 Linker statically binds all program code and data to absolute virtual
addresses
 Linker decides entire memory layout at compile time
 Example: Windows ".com" format effectively a memory image
Issues
 Support dynamic libraries to avoid statically linking things like libc into all
processes.
 Dynamic libraries might want to be loaded at the same address!
 Need to support relative addressing and relocation again
 Want to support address-space layout randomization
 Security defense mechanism requiring everything to be relocatable
 What Meltdown/Spectre malware might attack first
More on relocation
Relocation in Windows PE (.exe) and Linux ELF
Requires position-independent code
 Compiler makes all jumps and branches relative to current
location or relative to a base register set at run-time
 Compiler labels any accesses to absolute addresses and has
loader rewrite them to their actual run-time values
 Compiler uses indirection and dynamically generated offset
tables to determine addresses
 Example: Procedure Link and Global Offset Tables in ELF
 GOT contains addresses where imported library calls are loaded at run-
time
 Library calls index GOT to determine location to jump to
 Note: Can be targetted by malware for hooks!
Program execution
CPU Memory
Addresses
Registers Object Code
E Data Program Data
I OS Data
P Condition Instructions
Codes

Program-Visible State Stack


 EIP - Instruction Pointer
 a. k. a. Program Counter
 Address of next instruction
 Register File
 Heavily used program data Memory
 Condition Codes  Byte addressable array
 Store status information about  Code, user data, OS data
most recent arithmetic
operation
 Includes stack used to
support procedures
 Used for conditional branching
IA32 Register file
31 15 87 0
%ax
%eax %ah %al

%cx
%ecx %ch %cl

%dx
General purpose %edx %dh %dl
registers (mostly)
%bx
%ebx %bh %bl

%esi %si

%edi %di

%esp %sp Stack pointer


Special purpose
registers
%ebp %bp Frame pointer
Registers
The processor operates on data in registers (usually)
 movl (%eax), %ecx
 Fetch data at address contained in %eax
 Store in register %ecx
 movl $array, %ecx
 Move address of variable array into %ecx
 Typically, data is loaded into registers, manipulated or used,
and then written back to memory
The IA32 architecture is "register poor"
 Few general purpose registers
 Source or destination operand is often memory locations
 Makes context-switching amongst processes easy (less
register-state to store)
Operand types
A typical instruction acts on 1 or more operands
 addl %ecx, %edx adds the contents of ecx to edx
Three general types of operands
 Immediate
 Like a C constant, but preceded by $
 e.g., $0x1F, $-533
 Encoded with 1, 2, or 4 bytes based on instruction
 Register: the value in one of the 8 integer registers
 Memory: a memory address
 There are many modes for addressing memory
Operand examples using mov
Source Destination C Analog

Reg movl $0x4,%eax temp = 0x4;


Imm
Mem movl $-147,(%eax) *p = -147;

Reg movl %eax,%edx temp2 = temp1;


movl Reg
Mem movl %eax,(%edx) *p = temp;

Mem Reg movl (%eax),%edx temp = *p;


Addressing Modes
Immediate and registers have only one mode
Memory on the other hand needs many (so that a load
from memory can take a single instruction)
 Absolute
 specify the address of the data
 Indirect
 use register to calculate address
 Base + displacement
 use register plus absolute address to calculate address
 Indexed
 Add contents of an index register
 Scaled index
 Add contents of an index register scaled by a constant
Addressing Modes
 Absolute
movl 0x08049000, %eax
 Indirect
movl (%edx), %eax
 Base + displacement
movl 8(%ebp), %eax
 Indexed
movl (%ecx, %edx), %eax

 Scaled Index
movl (%ecx, %edx, 4), %eax
x86 instructions
Rules
 Source operand can be memory, register or
constant
 Destination can be memory or register
 Only one of source and destination can be memory
 Source and destination must be same size
What’s the "l" for on the end?
movl 8(%ebp),%eax
It stands for “long” and is 32-bits
Size of the operands
Baggage from the days of 16-bit processors

For x86, x86_64


 8 bits is a byte (movb)
 16 bits is a word (movw)
 32 bits is a double or long word (movl)
 64 bits is a quad word (movq)
Global vs. Local variables
Global variables stored in either .data or .bss
section of process
Local variables stored on stack
 Which variables?
m.c a.c
int e=7; extern int e;

int main() { int *ep=&e;


int r = a(); int x=15;
exit(0); int y;
}
int a() {
return *ep+x+y;
}
Global vs local: Which is which?
int x = 1; void a()
int y = 2; {
void a() int x = 1;
{ int y = 2;
x = x+y; x = x+y;
printf("Total = %d\n",x); printf("Total = %d\n",x);
} }
int main(){a();} int main() {a();}

080483c4 <a>: 080483c4 <a>:


80483c4: push %ebp 80483c4: push %ebp
80483c5: mov %esp,%ebp 80483c5: mov %esp,%ebp
80483c7: sub $0x18,%esp 80483c7: sub $0x8,%esp
80483ca: movl $0x1,-0x8(%ebp) 80483ca: mov 0x804966c,%edx
80483d1: movl $0x2,-0x4(%ebp) 80483d0: mov 0x8049670,%eax
80483d8: mov -0x4(%ebp),%eax 80483d5: lea (%edx,%eax,1),%eax
80483db: add %eax,-0x8(%ebp) 80483d8: mov %eax,0x804966c
80483de: mov -0x8(%ebp),%eax 80483dd: mov 0x804966c,%eax
80483e1: mov %eax,0x4(%esp) 80483e2: mov %eax,0x4(%esp)
80483e5: movl $0x80484f0,(%esp) 80483e6: movl $0x80484f0,(%esp)
80483ec: call 80482dc <printf@plt> 80483ed: call 80482dc <printf@plt>
80483f1: leave 80483f2: leave
80483f2: ret 80483f3: ret
Arithmetic operations

void f(){ 08048394 <f>:


int a = 0; 8048394: pushl %ebp
int b = 1; 8048395: movl %esp,%ebp
a = a+11; 8048397: subl $0x10,%esp
a = a-b; 804839a: movl $0x0,-0x8(%ebp)
a--; 80483a1: movl $0x1,-0x4(%ebp)
b++; 80483a8: addl $0xb,-0x8(%ebp)
} 80483ac: movl -0x4(%ebp),%eax
80483af: subl %eax,-0x8(%ebp)
int main() { f();} 80483b2: subl $0x1,-0x8(%ebp)
80483b6: addl $0x1,-0x4(%ebp)
80483ba: leave
80483bb: ret
Condition codes
The IA32 processor has a register called eflags
(extended flags)
Each bit is a flag, or condition code
CF Carry Flag SF Sign Flag
ZF Zero Flag OF Overflow Flag
As programmers, we don’t write to this register and
seldom read it directly
Flags are set or cleared by hardware on each
arithmetic/logical operation depending on the result of
an instruction
Conditional branches handled via EFLAGS
Condition codes (cont.)
Setting condition codes via compare instruction
cmpl b,a
 Computes a-b without setting destination

 CF set if carry out from most significant bit

 Used for unsigned comparisons


 ZF set if a == b
 SF set if (a-b) < 0
 OF set if two’s complement overflow
 (a>0 && b<0 && (a-b)<0) || (a<0 && b>0 &&
(a-b)>0)
 Byte and word versions cmpb, cmpw
Condition codes (cont.)
Setting condition codes via test instruction
testl b,a
 Computes a&b without setting destination

 Sets condition codes based on result


 Useful to have one of the operands be a mask
 Often used to test zero, positive
testl %eax, %eax
 ZF set when a&b == 0
 SF set when a&b < 0
 Byte and word versions testb, testw
void f(){

if statements int x = 1;
int y = 2;
if (x==y)
printf("x equals y.\n");
else
printf("x is not equal to y.\n");
}
080483c4 <f>: int main() { f();}
80483c4: pushl %ebp
80483c5: movl %esp,%ebp
80483c7: subl $0x18,%esp
80483ca: movl $0x1,-0x8(%ebp)
80483d1: movl $0x2,-0x4(%ebp)
80483d8: movl -0x8(%ebp),%eax
80483db: cmpl -0x4(%ebp),%eax
80483de: jne 80483ee <f+0x2a>
80483e0: movl $0x80484f0,(%esp)
80483e7: call 80482d8 <puts@plt>
80483ec: jmp 80483fa <f+0x36>
80483ee: movl $0x80484fc,(%esp)
80483f5: call 80482d8 <puts@plt>
80483fa: leave
80483fb: ret
if statements
Note: Microsoft assembly and reverse operand order

int a = 1, b = 3, c;
if (a > b)
c = a;
else
c = b;

mov dword ptr [ebp-4],1 ;


store a = 1
mov dword ptr [ebp-8],3 ;
store b = 3
mov eax,dword ptr [ebp-4] ;
move a into EAX register
cmp eax,dword ptr [ebp-8] ;
compare a with b (subtraction)
jle 00000036 ; if (a<=b) jump to line 00000036
mov ecx,dword ptr [ebp-4] ; else move 1 into ECX register &&
mov dword ptr [ebp-0Ch],ecx ; move ECX into c (12 bytes down) &&
jmp 0000003C ; unconditional jump to 0000003C
mov edx,dword ptr [ebp-8] ; move 3 into EDX register &&
mov dword ptr [ebp-0Ch],edx ; move EDX into c (12 bytes down)
Loops
int factorial_do(int x)
{
int result = 1;
do {
result *= x;
x = x-1;
} while (x > 1);
return result;
}

factorial_do:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
movl $1, %eax
.L2:
imull %edx, %eax
decl %edx
cmpl $1, %edx
jg .L2
leave
ret
C switch statements

switch (x) {
case 1:
case 5:
code at L0
case 2:
case 3:
code at L1
default:
code at L2
}
C switch statements
Implementation options
 Series of conditionals
 testl followed by je
 OK if few cases and large ranges of values
 Slow if many cases
 Jump table (example below)
 Lookup branch target from a table
 Possible with a small range of integer constants
GCC picks implementation based on structure
Example:switch (x) {
case 1: .L3
case 5: .L2
code at L0
case 2:
.L0 1. init jump table at .L3
case 3: .L1 2. get address at .L3+4*x
code at L1 .L1 3. jump to that address
default: .L2
code at L2
} .L0
Example int switch_eg(int x)
{
int result = x;
switch (x) {
case 100:
result *= 13;
break;

case 102:
result += 10;
/* Fall through */

case 103:
result += 11;
break;

case 104:
case 106:
result *= result;
break;

default:
result = 0;
}
return result;
}
int switch_eg(int x)
{ leal -100(%edx),%eax .L6:
cmpl $6,%eax addl $11,%edx
int result = x; ja .L9
switch (x) { jmp .L3
jmp *.L10(,%eax,4) .p2align 4,,7
case 100: .p2align 4,,7 .L8:
result *= 13; .section .rodata imull %edx,%edx
break; .align 4 jmp .L3
.align 4 .p2align 4,,7
.L10: .L9:
case 102: .long .L4 xorl %edx,%edx
result += 10; .long .L9 .L3:
/* Fall through */ .long .L5 movl %edx,%eax
.long .L6 leave
.long .L8 ret
case 103: .long .L9
result += 11; .long .L8
break; .text
.p2align 4,,7
.L4:
case 104: leal (%edx,%edx,2),%eax
case 106: leal (%edx,%eax,4),%edx
result *= result; jmp .L3
break; .p2align 4,,7
.L5:
addl $10,%edx
default:
result = 0;
}
return result;
} Key is jump table at L10
Array of pointers to jump locations
37
Avoiding conditional branches
Modern CPUs with deep pipelines
 Instructions fetched far in advance of execution
 Mask the latency going to memory
Problem: What if you hit a conditional branch?
 Must predict which branch to take and speculatively
fetch/execute!
 Branch prediction in CPUs well-studied, fairly
effective (except when it's not… ) (1/2018)
 But, best to avoid conditional branching altogether
x86 REP prefixes
Loops require decrement, comparison, and conditional
branch for each iteration
 Incur branch prediction penalty and overhead even for trivial
loops
Repeat instruction prefixes (REP, REPE, REPNE)
 Inserted just before some instructions (movsb, movsw,
movsd, cmpsb, cmpsw, cmpsd)
 REP (repeat for fixed count)
 Direction flag (DF) set via cld and std instructions
 esi and edi contain pointers to arguments
 ecx contains counts
 REPE (repeat until zero), REPNE (repeat until not zero)
 Used in conjuntion with cmpsb, cmpsw, cmpsd
x86 REP example
.data
source DWORD 20 DUP (?)
target DWORD 20 DUP (?)

.code
cld ; clear direction flag = forward
mov ecx, LENGTHOF source
mov esi, OFFSET source
mov edi, OFFSET target
rep movsd
x86 SCAS
Repeat a search until a condition is met
SCASB SCASW SCASD
 Search for a specific element in an array
 Search for the first element that does not
match a given value
x86 SCAS

.data

alpha BYTE "ABCDEFGH",0

.code
mov edi,OFFSET alpha
mov al,'F' ; search for 'F'
mov ecx,LENGTHOF alpha
cld
repne scasb ; repeat while not equal
jnz quit
dec edi ; EDI points to 'F'
x86-64 Conditionals
Conditional instruction execution
cmovXX src, dest
Move value from src to dest if condition XX holds
 No branching
 Conditional handled as operation within Execution Unit
 Added with P6 microarchitecture (PentiumPro onward)
 Must ensure gcc compiles with proper target to use
Example (x < y) ? (x) : (y)
movl 8(%ebp),%edx # Get x
movl 12(%ebp),%eax # rval=y
cmpl %edx, %eax # rval:x
cmovll %edx,%eax # If <, rval=x

Performance
 14 cycles on all data
 More efficient than conditional branching (simple control flow)
 But overhead: both branches are evaluated
x86-64 conditional example

int absdiff( # x in %edi, y in %esi


int x, int y)
{ absdiff:
int result; movl %edi, %eax # eax = x
if (x > y) { movl %esi, %edx # edx = y
result = x-y; subl %esi, %eax # eax = x-y
} else { subl %edi, %edx # edx = y-x
result = y-x; cmpl %esi, %edi # x:y
} cmovle %edx, %eax # eax=edx if <=
return result; ret
}
IA32 function calls
Handled based on calling convention used by
the processor and compiler for each language
 First, some data structures
IA32 Stack
Stack “Bottom”
Region of memory managed
with stack discipline
Grows toward lower Increasing
addresses Addresses

Register %esp indicates


lowest stack address
 address of top element

Stack Grows
Stack Down
Pointer
%esp

Stack “Top”
IA32 Stack Pushing
Stack “Bottom”

Pushing
 pushl Src
Increasing
 Decrement %esp by 4 Addresses
 Fetch operand at Src
 Write operand at
address given by
%esp
 e.g. pushl %eax Stack Grows
subl $4, %esp Down
Stack
movl %eax,(%esp) Pointer
%esp -4

Stack “Top”
IA32 Stack Popping
Stack “Bottom”

Popping
 popl Dest
Increasing
 Read operand at Addresses
address given by
%esp
 Write to Dest
 Increment %esp by 4
 e.g. popl %eax Stack
Stack Grows
Pointer
movl (%esp),%eax %esp Down
+4
addl $4,%esp

Stack “Top”
Stack Operation Examples
Initially pushl %eax popl %edx

0x110 0x110 0x110


0x10c 0x10c 0x10c
0x108 123 0x108 123 0x108 123
Top 0x104 213 0x104 213
Top Top

%eax 213 %eax 213 %eax 213


%edx %edx %edx 555
213
%esp 0x108 %esp 0x108
0x104 %esp 0x104
0x108
Procedure Control Flow
Procedure call:
 call label
 Push address of next instruction (after the call) on stack
 Jump to label
Procedure return:
 ret Pop address from stack into eip register
Procedure Call Example
804854e: e8 3d 06 00 00 call 8048b90 <main>
8048553: 50 next instruction

call 8048b90

0x110 0x110
0x10c 0x10c
0x108 123 0x108 123
0x104 0x8048553

%esp 0x108 %esp 0x108


0x104

%eip 0x804854e %eip 0x8048b90


0x804854e

%eip is program counter


Procedure Return Example
8048e90: c3 ret

ret

0x110 0x110
0x10c 0x10c
0x108 123 0x108 123
0x104 0x8048553 0x8048553

%esp 0x104 %esp 0x104


0x108

%eip 0x8048e90 %eip 0x8048553


0x8048e91

%eip is program counter


Procedure Control Flow
When procedure foo calls who:
  foo is the caller, who is the callee
 Control is transferred to the ‘callee’
When procedure returns
 Control is transferred back to the ‘caller’
Last-called, first-return (LIFO) order
 Naturally implemented via the stack
foo(…)
{
• • •
who(); call
who(…)
• • • {
} • • •
amI(); call
• • • amI(…)
amI(); {
• • • • • •
ret • • •
} ret
}
Procedure calls and stack frames
How does the ‘callee’ know where to return later?
 Return address placed in a well-known location on stack within a “stack
frame”
How are arguments passed to the ‘callee’?
 Arguments placed in a well-known location on stack within a “stack
frame”
Upon procedure invocation Stack bottom
 Stack frame created for the procedure foo’s

increasing addresses
 Stack frame is pushed onto program stack stack

stack growth
frame
Upon procedure return
 Its frame is popped off of stack who’s
stack
 Caller’s stack frame is recovered frame

amI’s
stack
frame
Call chain: foo => who => amI
Keeping track of stack frames
The stack pointer (%esp) moves around
 Can be changed within procedure
 Problem
 How can we consistently find our parameters?
 The base pointer (%ebp)
 Points to the base of our current stack frame
 Also called the frame pointer
 Within each function, %ebp stays constant
Most information on the stack is referenced
relative to the base pointer
 Base pointer setup is the programmer’s job
 Actually usually the compiler’s job
IA32/Linux Stack Frame high addresses

Caller Stack Frame (Pink)


 Arguments build for callee
Caller
 Return address Frame
 Pushed by call instruction
Arguments
Callee Stack Frame (Yellow) Frame Pointer Return Addr
 Old frame pointer (%ebp) Old %ebp
 Saved register context
Saved
 Local variables Registers
 If can’t keep in registers +
Local
 Parameters for function about Variables
to be called
 “Argument build” for call
Argument
Stack Pointer
Build
(%esp)
low addresses
swap
Calling swap from call_swap

int zip1 = 15213; call_swap:


int zip2 = 91125; • • •
pushl $zip2 # Global Var
void call_swap() pushl $zip1 # Global Var
{ call swap
swap(&zip1, &zip2); • • •
}

• Resulting

• Stack
void swap(int *xp, int *yp)
{
int t0 = *xp; &zip2
int t1 = *yp;
*xp = t1; &zip1
*yp = t0; Rtn adr %esp
}
swap
swap: void swap(int *xp, int *yp)
pushl %ebp {
movl %esp,%ebp Setup int t0 = *xp;
pushl %ebx int t1 = *yp;
*xp = t1;
movl 12(%ebp),%ecx *yp = t0;
movl 8(%ebp),%edx }
movl (%ecx),%eax
movl (%edx),%ebx Body
movl %eax,(%edx)
movl %ebx,(%ecx)

movl -4(%ebp),%ebx
movl %ebp,%esp
popl %ebp
ret Finish
swap Setup #1
Entering Resulting
Stack stack
%ebp %ebp
• •
• •
• •

&zip2 yp
&zip1 xp
Rtn adr %esp Rtn adr
Old %ebp %esp

swap:
pushl %ebp
movl %esp,%ebp
pushl %ebx
swap Setup #2
Resulting
Stack before stack
instruction
%ebp
• •
• •
• •

yp yp
xp xp
Rtn adr Rtn adr
Old %ebp %esp Old %ebp %ebp
%esp
swap:
pushl %ebp
movl %esp,%ebp
pushl %ebx
swap Setup #3
Resulting
Stack before Stack
instruction

• •
• •
• •

yp yp
xp xp
Rtn adr Rtn adr
Old %ebp %ebp Old %ebp %ebp
%esp Old %ebx %esp
swap:
pushl %ebp
movl %esp,%ebp
pushl %ebx
Effect of swap Setup
Entering Resulting
Stack Stack
%ebp
• •
• •
• Offset •
(relative to %ebp)

&zip2 12 yp
&zip1 8 xp
Rtn adr %esp 4 Rtn adr
0 Old %ebp %ebp

Old %ebx %esp

movl 12(%ebp),%ecx # get yp


movl 8(%ebp),%edx # get xp Body
. . .
swap Finish #1
swap’s
• •
Stack • •
• •
Offset Offset

12 yp 12 yp
8 xp 8 xp
4 Rtn adr 4 Rtn adr
0 Old %ebp %ebp 0 Old %ebp %ebp
-4 Old %ebx %esp -4 Old %ebx %esp

movl -4(%ebp),%ebx
movl %ebp,%esp
popl %ebp
ret
swap Finish #2
swap’s
• •
Stack • •
• •
Offset Offset
12 yp 12 yp
8 xp 8 xp
4 Rtn adr 4 Rtn adr
0 Old %ebp %ebp 0 %ebp
Old %ebp
-4 Old %ebx %esp %esp

movl -4(%ebp),%ebx
movl %ebp,%esp
popl %ebp
ret
swap Finish #3
%ebp
swap’s swap’s

Stack
• Stack •

• •
Offset Offset

12 12 yp
yp
8 8 xp
xp
4 4 Rtn adr
Rtn adr
%esp
0 Old %ebp %ebp
%esp

movl -4(%ebp),%ebx
movl %ebp,%esp
popl %ebp
ret
swap Finish #4
%ebp
swap’s %ebp
• •
Stack • •

Offset
• Exiting
Stack
12 yp &zip2
8 xp &zip1 %esp
4 Rtn adr
%esp

movl -4(%ebp),%ebx
movl %ebp,%esp
popl %ebp
ret
swap void swap(int *xp, int *yp)
{
int t0 = *xp;
int t1 = *yp;
*xp = t1;
*yp = t0;
}
swap: Setup
pushl %ebp Save old %ebp of caller frame
movl %esp,%ebp Set new %ebp for callee (current) frame
Save state of %ebx register from caller
pushl %ebx
Body
movl 12(%ebp),%ecx
Retrieve parameter yp from caller frame
movl 8(%ebp),%edx Retrieve parameter xp from caller frame
movl (%ecx),%eax
movl (%edx),%ebx Perform swap
movl %eax,(%edx)
movl %ebx,(%ecx)

movl -4(%ebp),%ebx Finish


movl %ebp,%esp
Restore the state of caller’s %ebx register
popl %ebp Set stack pointer to bottom of callee frame (%ebp)
ret Restore %ebp to original state

Pop return address from stack to %eip

Equivalent to single leave instruction


Local variables
Where are they in relation to ebp?
 Stored "above" %ebp (at lower addresses)
 e.g. -8(%ebp) , -12(%ebp) etc.
How are they preserved if the current function
calls another function?
 Compiler updates %esp beyond local variables
before issuing "call"
 e.g. subl $64,%esp
What happens to them when the current function
returns?
 Are lost (i.e. no longer valid)
Register Saving Conventions
Can Registers be Used for Temporary
Storage?
Conventions
 “Caller Save”
 Caller saves temporary in its frame before calling
 “Callee Save”
 Callee saves temporary in its frame before using
IA32 Register Usage
Integer Registers
 Two have special uses %eax
 %ebp, %esp Caller-Save
 Three managed as callee-save Temporaries %edx
 %ebx, %esi, %edi %ecx
 Old values saved on stack prior
to using %ebx
 Three managed as caller-save Callee-Save
Temporaries %esi
 %eax, %edx, %ecx
 Do what you please, but expect %edi
any callee to do so, as well
 Return value in %eax %esp
Special
%ebp
Function pointers
Pointers in C can also point to code locations
 Function pointers
 Store and pass references to code
Some uses
 Dynamic “late-binding” of functions
 Dynamically “set” a random number generator
 Replace large switch statements for implementing dynamic event
handlers
 Example: dynamically setting behavior of GUI buttons
 Emulating “virtual functions” and polymorphism from OOP
 qsort() with user-supplied callback function for comparison
 man qsort
 Operating on lists of elements
 multiplicaiton, addition, min/max, etc.

Malware leverages this to execute its own code


Function pointers example main:

#include <sys/time.h> leal 4(%esp), %ecx


#include <stdio.h> andl $-16, %esp
void fp1(int i){ printf("Even\n“,i);} pushl -4(%ecx)
void fp2(int i) { printf("Odd\n”,i); } pushl %ebp
movl %esp, %ebp
main(int argc, char **argv) { pushl %ecx
void (*fp)(int); subl $4, %esp
int i = argc;
movl (%ecx), %eax
movl $fp2, %edx
if (argc%2)
testb $1, %al
fp=fp2;
jne .L4
else
fp=fp1; movl $fp1, %edx
fp(i); .L4:
} movl %eax, (%esp)
call *%edx
addl $4, %esp
mashimaro % ./funcp a popl %ecx
Even 2 popl %ebp
mashimaro % ./funcp a b leal -4(%ecx), %esp
Odd 3
ret
mashimaro %
Uses in operating system
Interrupt descriptor table
 Pointers to interrupt handler functions
 IDTR x86 register points to IDT
System services descriptor table
 Pointers to system call functions
Import address table
 Pointers to imported library calls
Malware attacks all of these
Disassembly
Calling convention quirks
 Multiple C conventions
 Fast vs. Standard vs. cdecl
 Differ in caller-save/callee-save, parameter passing
 Shortcuts
 Omission of ebp for simple calls
 C++
 Use of ecx as this pointer
 Use of vtables to implement methods (virtual function table)
 Function prologue for frame setup varies
 Dummy bytes in WinXP SP2 to support hot patching (detours)
 Windows structured exception handling (FS register)
 Linked list of functions stored in exception frames on stack for error
handling
Windows disassembly
Largely the same with small modifications
 Size of operands (i.e. dword) specified (not in
operator suffix)
 Reverse ordering of operands
Windows disassembly example

0000 mov ecx, 5 for(int i=0;i<5;i++)


0003 push aHello {
0009 call printf printf(“Hello”);
000E loop 00000003h }
0014 ...

0000 cmp ecx, 100h if(x == 256)


0003 jnz 001Bh {
0009 push aYes printf(“Yes”);
000F call printf }
0015 jmp 0027h else
001B push aNo {
0021 call printf printf(“No”);
0027 ... }
Chapter 5: IDA Pro
Tools for disassembling
IDA Pro, IDA Pro Free
 Disassembler
 Execution graph
 Cross-referencing
 Searching
 Function analysis
 Function and variable labeling
Tools for disassembling
objdump
 objdump -d <object_file>
 Analyzes bit pattern of series of instructions
 Produces approximate rendition of assembly code
 Can be run on either executable or relocatable (.o) file
radare2
 Open-source IDA alternative for Linux
 r2 on linuxlab machines
 If you find a need to learn it...
 https://github.com/pwntester/cheatsheets/blob/master/radare2.md
 https://zachgrace.com/cheat_sheets/radare2.html
hopper ($)
Tools for disassembling
gdb Debugger
 gdb a.out
 Disassemble procedure
 layout asm
 disass main
 x/13b main
 Examine the 13 bytes starting at sum
IDA Pro walkthrough
IDA Pro intro
https://vimeo.com/203657826
In-class exercise
Lab 5-1, Lab 6-1, Lab 6-2
Chapter 7: Analyzing Malicious
Windows Programs
Types
Hungarian notation
 Size notation precedes name
 word (w) = 16 bit value
 double word (dw) = dword = 32 bit value
 dwSize = A type that is a 32-bit value
 Handles (H)
 A reference to an object (HModule, HKey)
 HWND = A handle to a window
 Long Pointer (LP)
 LPByte = Long pointer to a byte
 Callback
 e.g. InternetSetStatusCallback registers a function to be called if
Internet connectivity changes
File system functions
Malware often hits file system
 CreateFile, ReadFile, WriteFile
 Memory mapping calls: CreateFileMapping,
MapViewOfFile
 Trickiness
 Alternate Data Streams (special file data that does not
show up in directory listing)
 \Device\PhysicalMemory (accesses memory directly
for clandestine storage of code/data)
 Access eventually shut off from user space, but can still access
via rogue drivers from kernel space
 \\.\ (accesses devices via Win32 device namespace
e.g. \\.\PhysicalDisk1)
Registry functions
Registry stores OS and program configuration
information
 HKEY_LOCAL_MACHINE (HKLM) – Settings
global to the machine
 HKEY_CURRENT_USER (HKCU) – Settings for
current user
 regedit tool for examining values
 Functions: RegOpenKeyEx, RegSetValueEx,
RegGetValue (Listing 7-1, p. 141-142)
Networking functions
Berkeley sockets API
 socket, bind, listen, accept,
connect, recv, send
 Listing 7-3, p. 144

WinINet API
 Bytes over HTTP instead of socket
 InternetOpen, InternetOpenURL,
InternetReadFile
Dynamically-linked libraries (DLLs)
Used in 3 ways by malware
 Store malicious code in standard DLL or
custom one
 Inject into a process via a LoadLibrary call
 Leverage standard Windows DLLs to interact
with OS
 Leverage third-party DLLs (e.g. Firefox DLL) to
avoid re-implementing functions
Process functions
Execute code outside of current process
 CreateProcess
 Listing 7-4, p. 148
Hijack execution of current process
 Injecting code via debugger or DLLs
Kill processes
 (e.g. anti-virus, Zone Alarm, etc.)
Threading functions
Windows threads share same memory space but
have separate registers and stack
 Used by Malware to insert a malicious DLL into a
process's address space
 CreateThread with address of LoadLibrary as start
address
 Also used to remotely control a process
 Two threads created
 One takes network input and sends to process stdin (via
WriteFile)
 One takes process stdout (via ReadFile) and sends to
network
 Listing 7-6, 7-7, 7-8, p. 150-151
Service functions
Service processes run in the background
 Scheduled and run by Windows service
manager without user input
 Common calls
 OpenSCManager, CreateService, StartService
 Allows malware to maintain persistence
 Types

WIN32_SHARE_PROCESS = allows multiple processes
to contact service (e.g. svchost.exe)

WIN32_OWN_PROCESS = independent process

KERNEL_DRIVER = loads code into kernel
COM functions
Microsoft Component Object Model
 Interface standard that allows software components
to call each other
 OleInitialize, CoInitializeEx to begin use

Navigate function in IWebBrowser2 interface
 Used with CoCreateInstance to launch browser

 CLSID = class identifier, IID = interface identifier to

identify COM object


 Listing 7-11 (create), 7-12 (call), p. 155-156
COM
Example
 Malware implemented as COM server within web
browser via Browser Helper Objects
 Used to monitor traffic going through browser
without creating a separate process that can be
detected
 Can be detected via its need to export calls required by
COM servers
 DllCanUnloadNow, DllGetClassObject, DllInstall,
DllRegisterServer, DllUnregisterServer
 Deprecated in Internet Edge
 Win10 now allows you do blacklist OLE types in
documents
Exception handling
Allow program to handle exceptional
conditions during program execution
 Windows Structured Exception Handling
 Exception handling information stored on stack
before function invocation
 Listing 7-13, p. 157
 Thrown to caller's frame if not handled
 Used by malware to hijack execution
 Handler address replaced by address to injected
malicious code
 Adversary then triggers exception
Kernel-mode malware
Windows API calls (kernel32.dll)
 Typically call into underlying Native API (ntdll.dll)
 Code in ntdll then transfers to kernel
(ntoskrnl.exe) via INT 0x2E, SYSENTER, SYSCALL
 Figure 7-3, p. 159
 Malware often calls ntdll directly to avoid detection
via interposition of security programs between
kernel32.dll and ntdll.dll
 Example: Windows API (ReadFile, WriteFile) versus
Native API (NtReadFile, NtWriteFile)
 Figure 7-4, p. 160
Kernel-mode malware
Other Native API calls

NtQuerySystemInformation,
NtQueryInformationProcess,
NtQueryInformationThread,
NtQueryInformationFile, NtQueryInformationKey
 Can also carry “Zw” prefix
 NtContinue
 Used to return from an exception
 Location to return is specified in exception context,
but can be modified to transfer execution in
nefarious ways
Kernel-mode malware
Legitimate programs typically do not use
NativeAPI exclusively
Programs that are native applications (as
specified in subsytem part of PE header) are
likely malicious
In-class exercise
Lab 7-2
Extra
Run-time data structures
Process functions
Kill anti-virus, zone-alarm, firewall processes
More code snippets
Registry modifications for disabling task manager and changing
browser default page

HKEY_CURRENT_USER\Software\Policies\Microsoft\Internet Explorer\Control Panel,Homepage


HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Policies\SystemDisableRegistryTools
HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\MainStart Page
HKEY_CURRENT_USER\Software\Yahoo\pager\View\YMSGR_buzz content url
HKEY_CURRENT_USER\Software\Yahoo\pager\View\YMSGR_Launchcast DisableTaskMgr
More code snippets
New variants
 Download worm update files and register them as services
 regsvr32 MSINET.OCX
 Internet Transfer ActiveX Control

 Check for updates


Disassembly example
push ebp
int main(int argc, char **argv)
mov ebp, esp
{ sub esp, 2A8h
WSADATA wsa; lea eax, [ebp+0FFFFFE70h]
push eax
SOCKET s; push 101h
struct sockaddr_in name; call 4012BEh
test eax, eax
unsigned char buf[256]; jz 401028h
mov eax, 1
jmp 40116Fh
// Initialize Winsock push 0
if(WSAStartup(MAKEWORD(1,1),&wsa)) push 1
push 2
return 1; call 4012B8h
mov dword ptr [ebp+0FFFFFE6Ch], eax
cmp dword ptr [ebp+0FFFFFE6Ch], byte 0FFh
// Create Socket
jnz 401047h
s = socket(AF_INET,SOCK_STREAM,0); jmp 401165h
mov word ptr [ebp+0FFFFFE5Ch], 2
push 800h
if(INVALID_SOCKET == s) call 4012B2h
goto Error_Cleanup; mov word ptr [ebp+0FFFFFE5Eh], ax
push 0
call 4012ACh
name.sin_family = AF_INET; mov dword ptr [ebp+0FFFFFE60h], eax
push 10h
name.sin_port = htons(PORT_NUMBER); lea ecx, [ebp+0FFFFFE5Ch]
name.sin_addr.S_un.S_addr = htonl(INADDR_ANY); push ecx
mov edx, [ebp+0FFFFFE6Ch]
push edx
// Bind Socket To Local Port call 4012A6h
cmp eax, byte 0FFh
if(SOCKET_ERROR == bind(s,(struct sockaddr*)&name,sizeof(name)))
jnz 40108Dh
goto Error_Cleanup; jmp 401165h
push 1
mov eax, [ebp+0FFFFFE6Ch]
// Set Backlog parameters push eax
if(SOCKET_ERROR == listen(s,1)) call 4012A0h
cmp eax, byte 0FFh
goto Error_Cleanup; jnz 4010A5h
jmp 401165h

You might also like