You are on page 1of 55

Software vulnerabilities:

memory corruption
Classifying vulnerabilities
• Memory corruption (safety)
– Buffer Overflow
– Integer Overflow
– Stack Overflow
– Use after free/ Double free
– Null pointer dereference
• Exploiting memory corruption to control program
– Format string attacks
– Stack buffer overflow (or stack-based buffer overflow)
– Heap buffer overflow (or heap-based buffer overflow)
– Overflowing on non “control-data” memory
• Resulting attacks
– Code injection on the stack (e.g., shellcodes)
– Code reuse techniques (Return-to-libc, Return-Oriented
Programming, Jump-Oriented Programming)
Buffer Overflows
• A buffer overflow occurs any time a program attempts
to store data beyond the boundaries of a buffer,
overwriting the adjacent memory locations
• Originates from mistakes done while writing code
– unfamiliarity with language
– Boundary or arithmetic errors
• Mostly C / C++ programs
• Addressed by languages with automatic memory
management: dynamic bounds checks (e.g., Java) or
automatic resizing of buffers (e.g., Perl)
– Still native libraries that are written in C (e.g., JNI)
Buffer Overflows: Basics

char src[]=“ABCDEFGHI”;
char tmp_buff[5];
int password_checked;

strcpy(tmp_buff, src);
Buffer Overflows: After strcpy()

char src[]=“ABCDEFGHI”;
char tmp_buff[5];
int password_checked;

strcpy(tmp_buff, src);
Buffer Overflows: usual suspects
• String manipulation functions that don't
properly check string length
– gets(), strcpy(), …
• Copy with incorrect parameters
– memcpy() …
• Incorrect computation of required memory
length
– zero-sized malloc()s …
Preventing Buffer Overflows
• Safe usage of copy primitives using libc
methods
– For instance, limit amount of data to copy
– strncpy(tmp_buff, src, sizeof(tmp_buf))
• Compiler extension if array length can be
computed
• Annotations (e.g. Deputy, CCured…)
Integer overflows
• Unsigned ints
– 32 bits: range from 0 to 2^32-1 (4,294,967,295)
– Overflow: 4,294,967,295 + 1 = 0
• Signed ints
– 32 bits: range from − (2^31) to 2^31 − 1 (2,147,483,647)
– Overflow: 2,147,483,647+2 = -2,147,483,648
• Some languages (Java, Ada) throw exceptions, many don't
• Attacker can supply large values used in :
– Computation of the size of a buffer allocation (malloc)
– Array access (in particular the bound checks)
• Solution: always check overflow before critical operation!
More about Memory Layout
• Text
– Also called code segment
• Data
– Global initialized data
• BSS
– Global uninitialized data
• Stack
– Local variables
– Also used to store function
environments and parameters
during calls (stack frames)
– LIFO
– Multi-threading : multiple stacks
• Heap
– Dynamically allocated variables
– Reserved through calloc() and
malloc()
Program Memory Stack

a,b, ptr

ptr points to
the memory
here
y

x
Order of the function arguments
in stack
Function Call Stack
void f(int a, int b)
{
int x;
}
void main()
{
f(1,2);
printf("hello world");
}
Stack Layout for Function Call Chain
main()

foo()

bar()
Stack Overflow
• Stack Overflows can be caused by:
– Recursive calls
– Reentrant interrupts
– Allocations on the stack
• Large
• Controlled by the attacker
• Alloca(), char array[function_parameter]
Stack Overflow
• This should never happen ?

• Detection : guard page (Unix)


– Problem when allocation is too large
• the stack pointer can “jump over” the guard page
• Prevention: Not so easy
– Abstract interpretation [Regehr05]
• Works as long as control flow can be determined statically
– Forbid / bound recursion
– Forbid / limit allocation on the stack
• Problem with micro-controllers
– No MMU (Memory Management Unit)
Stack Buffer Overflow / Overrun
• A special case of buffer overflow
– Exploitation of existing vulnerability in order to
modify control flow
• The overflowed buffer is allocated on the
stack, typically corrupts return address
• First wide-scale exploitation: Morris’ worm
(1989) in Unix’s fingerd
Vulnerable Program

● Reading 300 bytes of data from


badfile.

● Storing the file contents into a str


variable of size 400 bytes.

● Calling foo function with str as an


argument.

Note : Badfile is created by the user and


hence the contents are in control of the
user.
Vulnerable Program
Consequences of Buffer Overflow

Overwriting return address with some random address can


point to :

• Invalid instruction
• Non-existing address
• Access violation
• Attacker’s code Malicious code to gain access
How to Run Malicious Code
More on Modifying return address
• Not always easy to know the
exact address

• Trampolines :
– jmp %esp at a fixed address

• NOP sled
– A long sequence of NOP
– Followed by the shellcode
– Jumping anywhere in there
leads to the shellcode
Exploiting Stack Buffer Overflows
• Corrupting the Control Flow
– Return address in a stack frame
– Function pointer
– Exception handler
– Global Offset Table (GOT)
– C++ objects

• Corrupting the Program State (other stack frames)


– Variable that stores authentication status
– Modify stored password
– Pointer value
Protections: Canaries
• Objective : Detect unexpected
modifications of values on the stack (e.g.,
return address)
• Inserting a known value on the stack
– the “canary”
• Compiler extension - inserts code that:
– Adds a random value after the return
address
– Checks canary value before using return
address
• Limitations
– Canaries can be guessed or obtained
through memory leaks
– Canary copy needs to be stored in secure
place that might also be corrupted
Protections: Non Executable Memories
(Writable XOR Executable / NX)
• Executable and writable memories regions
– Allows to directly write instructions (in a buffer) and execute them,
e.g., from the stack.
– Makes stack-based buffer overflows easy to exploit

• W XOR X (A.k.a NX / XN / DEP / PAX )


– OS splits memory regions – no region is both writable and executable
• Most systems now support a form of W XOR X:
– Was complex on x86 (segmentation)
• HW support for it since ~2004
• Many names for the same feature :
– AMD : Enhanced Virus Protection,
– ARM : XN eXecute Never
• Defeated by Return-to-libc attacks / Return Oriented programming
Protections: ASLR (Address
Space Layout Randomization)
• Randomizes base addresses of memory segments
– Addresses change at every execution
• Randomize start or base address of:
– Program code
– Libraries code
– Heap/Stack/Data regions
• Objectives:
– Difficult to guess the stack address in the memory.
– Difficult to guess %ebp address and address of the malicious code
• Many programs have problems with that
– Sometimes rewriting part of them is necessary, notably when they contain
specially crafted assembly code
– Code needs to be position independent for program randomization
• Limitations
– Memory leaks used to learn memory layout
– Address space / system limitations (e.g., page boundaries) may allow brute-
force probes
Address Space Layout
Randomization
Address Space Layout
Randomization : Working

1
2

3
Evolution of machine code level
attacks
Code injection DEP/NX Code reuse

code code
R-X R-X

static data static data


R-X R--

dynamic data dynamic data


(stack/heap) (stack/heap)
RWX RW-
exec. code ctrl. data

Attacker Control flow Control Indirect


controlled vulnerability transfer use of data
Return-to-libc
• NX makes it impossible to inject our own code and
execute it.
– No memory regions that are write and execute
• Idea : Reuse existing code
– “Fortunately” libc loaded at a constant address
– Divert control flow of exploited program into libc code
– “Load” parameters on the stack
– No code injection required: Jump to a known address
• exec(), system(), printf()
• For example:
– Exec(“/bin/sh”)
Return-Oriented Programming (ROP)
• return-into-libc seems limited and easy to defeat
– Attacker cannot execute arbitrary code
– Attacker relies on contents of libc

• This perception is false: Return-Oriented


Programming & Jump-Oriented Programming
– A special case of return-into-libc
– Arbitrary attacker computation and behavior (given
any sufficiently large codebase to draw on)
ROP: Approach
• Most directly inspired by Borrowed code
chunks [Krahmer 2005]
– Find short sequences of instructions that allow to
perform some given operations
– Termed Gadgets
– “Chain” them together using “ret”

• JOP attack = use jmp instead of ret


Return-Oriented Programming
Stack Code Actions

0xb8800000:
0xb8800030 pop eax eax = 1
ret
0x00400000 ... ebx = 2
0xb8800010 0xb8800010: eax += ebx
0xb8800020 pop ebx
ret ebx = 0x400000
0x00000002 ...
*ebx = eax
0xb8800010 0xb8800020:
add eax, ebx
0x00000001
ret
0xb8800000 ...
esp
0xb8800030:
mov [ebx], eax
ret

6/4/2014 Vasilis Pappas - Columbia University 33


ROP: Approach
• A Turing complete set of gadgets allows to perform
arbitrary computation
– Exploits are not straight-line limited
– Showed to work on most architectures
– Equivalent to having a virtual machine/interpreter
• Calls no functions at all
– can’t be defeated by removing functions like system()
– Must know the memory map (no ASLR)
– Need to find interesting gadgets and to chain them in a
given order
• Specific compilers (e.g. ROPC)
– Automation techniques to find those sequences of code
– Satisfiability Modulo Theories (SMT) Solvers
ROP: consequences & protection
• Malicious code detection cannot be limited to
executable memory regions
– Return oriented rootkits / malicious code…
– Even non executable memories needs to be verified
• ROP defeated by ASLR
– chaining returns needs to know addresses in advance
• Blind ROP
– It is possible to learn where are the gadgets, brute
force and monitor side effects
– Stack learning overwrite a byte at a time and
bruteforce it.
Heap Buffer Overflows
• The heap is the pool of memory used for dynamic allocations at
runtime
– malloc() grabs memory on the heap
– free() releases memory on the heap
• Blocks of data are stored in a doubly linked list
typedef struct __HeapHdr__ {
struct __HeapHdr__ *next;
struct __HeapHdr__ *prev;
unsigned int size;
unsigned int used;
// Usable data area starts here
} HeapHdr_t;
Heap Buffer Overflows
• next/prev pointers are stored after the data
– Overflow: overwrite the prev/next pointers (headers)
• Freeing a chunk = update double linked list
– This allows one arbitrary write at an arbitrary address
(red is attacker controlled), e.g. function pointer
– FD = hdr -> next
– BK = hdr -> prev
– FD->prev = BK
– BK->next = FD
• Detection is simple:
– Test if ( hdr->prev-> next == hdr) otherwise attack!
– canaries
Heap Overflow Exploitation
• Direct attacks: modify function pointer
– Simple overflow to the pointer location
• Often indirect attacks on the stack return address
– Fill headers with the address of the return address on the stack
– The next malloc/free operation will modify the return address at
will
• Heap spraying:
– Exploits contiguous chunk placement (e.g., browser, PDF, Flash)
– Fill up an entire chunk with NOP sled + payload and spray it
repeatedly into the heap
• Can be very complex
– Need to predict heap layout, control program state
– Otherwise lead program in a state where it is exploitable
Software exploitation: the bigger
perspective
• Software Fault Injection
– Software built for one purpose, but attacker misuses the
software for another purpose
– Notably through specifically crafted inputs
– Any Turing machine can be exploited

• Hardware Fault Injection


– Don’t forget that software runs within hardware
– Perturbating the execution environment during code execution
(laser, power supply glitch, clock glitch)
– Cosmic/Gamma rays lead to random errors (bit flips)
– Particular memory access patterns lead to bit errors in DRAM
Race Conditions
• Parallel execution of tasks
– multi-process or multi-threaded environment
– multi-user
– tasks can interact with each other
• Three properties are necessary for a race condition to exist:
– Concurrency: There must be at least two control flows
executing concurrently.
– Shared Object: A shared race object must be accessed by both
of the concurrent flows.
– State Change: At least one of the control flows must alter the
state of the object of a race
• Results of tasks depend on the relative timing of events
– Non-deterministic behavior
Race Conditions: Basics
• Programmer views a set of operations as atomic
– In reality, atomicity is not enforced
– Scheduler can interrupt a process at any time
– Even more likely if there is a blocking system call
• Attacker can take advantage of this discrepancy
• Race condition vulnerabilities typically arise when:
– checking for a given privilege, and
– exercising that privilege
• Race conditions are eliminated by making conflicting
operations mutually exclusive
TOC(T)TOU: Time-Of-Check-
(To)-Time-Of-Use
• Check – Establish some precondition
(invariant), e.g., access permission
• Use – Operate on the object assuming that
the invariant is still valid
• Can occur in any concurrent system:
– shared memory (or address space)
– file system
– signals
Shared Memory
• Sharing of memory between tasks can lead to races
– Threads share the entire memory space
– Processes may share memory mapped regions
• Use synchronization primitives:
– locking, semaphores
– Java:
• synchronized classes and methods (Monitor model)
• Atomic types (java.util.concurrent.atomic.AtomicInteger, etc.)
• Avoid shared memory:
– use message-passing model
– still need to get the synchronization right!
Shared Memory Race: Example

Looks atomic (1 line of code!)


• It's not!

Simple race:
• 2 threads read count
• both write count+1
• missed 1 increment
UNIX File System Security
• Access control: user should only be able to
access a file if he has the permission to do so
• But what if user is running as setuid-root?
– E.g., a printing program is usually setuid-root in
order to access the printer device
• Runs “as if” the user had root privileges
– But a root user can access any file!
– How does the printing program know that the user
has the right to read (and print) any given file?
• UNIX has a special access() system call
slide 45
Unix File System: Access/Open Race

fprintf(stderr, "Permission denied,


cannot open %s.\n", file);
Races on temporary files
• Similar issues as with regular files
– commonly opened in /tmp or /var/tmp
– creating files in /tmp requires no special permissions
– often guessable file name

• A possible attack:
– guess the tmp file name: "/tmp/tmp0001"
– ln -s /etc/target /tmp/tmp0001
– victim program will create file /etc/target for you,
when it tries to create the temporary file!
– if first guess doesn't work, try 1 million times
Races on temporary files
Unix File System: Script Execve Race
• Filename redirection
– soft links again
• Setuid Scripts
– execve() system call invokes seteuid() call prior to executing
program
– A: program is a script, so command interpreter is loaded first
– B: program interpreter (with root privileges) is invoked on script
name
• attacker can replace script content between step A and B
• Setuid not allowed on scripts on most platforms!
– Some work-arounds
Unix File System: Script Execve Race
Threaded programs: Use-after-free
• Thread #3 gives access to protected resources
Window of Vulnerability
• Window of vulnerability can be very short
– race condition problems are difficult to find with testing
– difficult to reproduce and debug
• Myths about race conditions
– "races are hard to exploit”
– "races cannot be exploited reliably"
– "only 1 chance in 10000 that the attack will work!"
• Attackers can often find ways to beat the odds!
– Repeated attempts
– Attacker can try to slow down the victim machine/process to
improve the odds (high load, computational complexity)
– Attacker can run the attack many times in parallel to increase
the probability that the attacking process will be scheduled by
the processor at the right moment
Slow file lookups
• Deeply nested directory structure:
– d/d/d/d/d/d/d/....../d/file.txt

• To resolve this file name, the OS must:


– look for directory named d in current working directory
– look for directory named d in that directory
– ...
– look for file named file.text in final directory

• Limit to length of a file name:


– MAXPATHLENGTH (4096 on Linux)
– Max depth of ~2000
Making It Slower: File System Maze
• Combine deeply nested directory structure with
chain of symbolic links
– MAXPATHLENGTH limits length of file parameter to a
single system call (e.g, open, access)
– But parts of a file name can themselves be links
– Length of link chain limited by kernel parameter
• 40 on Linux box
• Total file system lookups:
– follow 40 chains...
– ...each with 2000 nested directories
– 80000 lookups!
File System Maze
Prevention and Detection
• Prevention: many solutions depending on actual race
– OS specific solutions: ID or filename related
– Forking: delegate operations to separate process with EUID
(effective UID)
– Locking: suppress race, but slows down process
– Hardness amplification: Reduce success probability of
attacker (k-races, pseudo-atomic transactions)
• Detection:
– Static analysis with pattern matching
– Static analysis with model checking (MOPS, RacerX,
rccjava)
– Dynamic Analysis (Eraser)

You might also like