You are on page 1of 75

Secure Coding

Lecture 2: Injections, and


Buffer Overflows
Benny Pinkas

page 1
Injections
OWASP Top 10
A1 – Injection
A2 – Broken Authentication and Session Management
A3 – Cross-Site Scripting (XSS)
A4 – Insecure Direct Object References
A5 – Security Misconfiguration
A6 – Sensitive Data Exposure
A7 – Missing Function Level Access Control
A8 – Cross-Site Request Forgery (CSRF)
A9 – Using Known Vulnerable Components
A10 – Unvalidated Redirects and Forwards
(Source: OWASP Top 10 Web Application Security Risks, 2013-
2018)
Injections
• Injection
– Injection flaws, such as SQL, OS, and LDAP injection
occur when untrusted data is sent to an interpreter
as part of a command or query.

– The attacker’s hostile data can trick the interpreter


into executing unintended commands or accessing
data without proper authorization.
Injections
• Attackers
– Anyone who can send untrusted data to the
system, including external users, internal users,
and administrators.
• How? (easy)
– Attacker sends simple text-based attacks that
exploit the syntax of the targeted interpreter.
– The application sends this untrusted data to an
interpreter.
SQL Injections
SQL
• SQL (Structured Query Language)
– a computer language used to create, retrieve,
update and delete data from relational database
management systems.

– SQL has been standardized by both ANSI and ISO.


SQL Basics
• Tables are composed of rows and columns
EmployeeID Position Salary Benefits
10 Manager 60000 15000
20 Manager 75000 12000
30 Staff 42000 10000
40 Staff 50000 12000

• SELECT Employeeid,Position FROM


Employeetable WHERE Salary>=50000;
SQL database queries with PHP
(the wrong way)
gets input
• Sample PHP from user
$recipient = $_POST[‘recipient’];
$sql = "SELECT PersonID FROM Person WHERE
Username='$recipient'";
$rs = $db->executeQuery($sql);
• Problem
– What if ‘recipient’ is a malicious string that
changes the meaning of the query?
Example: buggy login page (ASP)

set ok = execute( "SELECT * FROM Users


WHERE user=' " & form(“user”)
& " '
AND pwd=' " & form(“pwd”) & “
'” );

if not ok.EOF
login success
else fail;
What does this code do? Is this exploitable?

Slide taken from John Mitchell


Enter
SELECT *
Username
& FROM Users
Web
Password Web WHERE user='me'
Browser AND pwd='1234' DB
Server
(Client)

Normal Query

Slide taken from John Mitchell


Bad input
• Suppose user = “ ' or 1=1 -- ” (URL encoded)

• Then scripts does:


ok = execute( SELECT …
WHERE user= ' ' or 1=1 -- … )

– The “--” causes rest of line to be ignored.


– Now ok.EOF is always false and login succeeds.
• The bad news: easy login to many sites this way.

Slide taken from John Mitchell


Even worse
• Suppose user =
“ ′ ; DROP TABLE Users -- ”

• Then script does:


ok = execute( SELECT …
WHERE user= ′ ′ ; DROP TABLE Users … )
• Deletes user table
– Similarly: attacker can add users, reset pwds, etc.

Slide taken from John Mitchell


Even worse …
• Suppose user =
′ ; exec cmdshell
′net user badguy badpwd′ / ADD --
• Then script does:
ok = execute( SELECT …
WHERE username= ′ ′ ;exec …)
If SQL server context runs as admin, attacker gets
account on DB server

Slide taken from John Mitchell


Little Bobby Tables lives in Poland
SQL injection against speed traps
CardSystems Attack
• CardSystems
– credit card payment processing company
– SQL injection attack in June 2005
– put out of business

• The Attack
– 263,000 credit card #s stolen from database
– credit card #s stored unencrypted
– 43 million credit card #s exposed
Slide taken from John Mitchell
page 27
page 28
page 29
Blind SQL Injection
• Blind SQL injection asks the database true or false
questions and determines the answer based on the
response.
• This attack is often used when the web application is
configured to show only generic error messages, but
has not mitigated the code that is vulnerable to SQL
injection.
• Blind SQL injection works even when the database
does not output data to the web page.
• The attacker steals data by asking the database a
series of true or false questions.
Blind SQL Injection
1
if ASCII(SUBSTRING(username,1,1))
= 64 waitfor delay ‘0:0:5’

2 if ASCII(SUBSTRING(username,1,1))
= 64 waitfor delay ‘0:0:5’

If the first letter of the username is A


(65), there will be a 5 second delay
Blind SQL Injection
1
if ASCII(SUBSTRING(username,1,1))
= 65 waitfor delay ‘0:0:5’

2 if ASCII(SUBSTRING(username,1,1))
= 65 waitfor delay ‘0:0:5’

By timing responses, the attacker learns


about the database one letter at a time
http://bobby-tables.com
• A lot of information on how to prevent SQL
injections in many languages
Preventing SQL Injections
Two approaches
• A patch
– Sanitizing (or “escaping”) user input.
– Much easier to deploy. Need only filter user input without
changing how queries to the database are written.
– Very hard to do right.
• A thorough solution
– Prepared statements (aka parameterized queries, or stored
procedures).
– Requires to rewrite the code accessing the SQL database.
Preventing SQL Injections: escaping
• Escaping is used in programming to allow special characters
(like “, ‘, %, \, etc.) in strings so that they can be part of a
string, and are not misinterpreted as something else.
– For example: echo ‘Lunch break - It\'s Great!'; prints
Lunch break - It's Great!

• We can use this method to preprocess the SQL input, and


escape in it all special characters, such as quotes.
• Therefore the quotes in the input will not terminate the
command, but will be part of the string that is searched for.
Preventing SQL Injections: escaping
• Just escaping quotes does not help. For example:
– Quotes can be legitimate inputs (Mark O’Leary, ‫)ז'וז'ו‬.
– In mysql a quote ‘ can be escaped as ’’ or as \’
– Many other ways
– Suppose that the programmer decides to look for quotes in the
input, and add another quote before them.
– The attacker uses the input: \'; DROP TABLE users;
– Escaping changes it to \''; DROP TABLE users;
– Mysql runs SELECT * FROM customers WHERE name = '\''; DROP TABLE
users; ';
– '\'' is interpreted as a string with a single quote. The second
quote closes the string; enables the hacker to input a command
Preventing SQL Injections: parameterized
queries
• Never build SQL commands yourself !
– Prepared statements (aka parameterized queries, or
stored procedures) force the developer to first define
all SQL code, and then pass each parameter to the
query.
– This coding style allows the database to distinguish
between code and data, regardless of what user input
is supplied.
Preventing SQL Injections: parameterized
queries
• Instead of the old way of parsing at run time a string
(containing SQL code and user input), Prepared
statements (parameterized queries) have the programmer
define all SQL code. User input is only used as a
parameter to queries.

• The parameters can only store a value of the given type


and not an arbitrary SQL fragment. Therefore an SQL
injection would simply be treated as a strange (and
probably invalid) parameter value.
Preventing SQL Injections: parameterized
queries
• An attacker is unable to change the intent of a query.
– In the example below, if an attacker enters a NAME which is
equal to tom' or '1'='1, the parameterized query would instead
look for a username which literally matches the entire string
tom' or '1'='1.

"SELECT * FROM Widget WHERE Name = @NAME“.


The ID is passed as a parameter to the SQL command, as a
data object.
command.Parameters.Add("@NAME", SqlDbType.text).Value =
name
Preventing SQL injections
• It is hard to do escaping well
• It is better to use parameterized queries

• To minimize the potential damage of a successful SQL


injection attack, minimize the privileges assigned to every
database account.
• Do not assign admin type access rights to application
accounts.
• Accounts that only need read access should only be granted
read access to the tables, or portions of tables, that they need
access to.
Buffer overflows
Buffer overflows
• A buffer overflow occurs when
– information is written into the memory allocated to a
variable in the stack
– but the size of this information exceeds what was
allocated at compile time.

page 48
Buffer overflows
• Extremely common bug.
– First major exploit: 1988 Internet Worm. (exploited Unix fingerd).

• 15 years later:  50% of all CERT advisories:


– 1998: 9 out of 13
– 2001: 14 out of 37
– 2003: 13 out of 28

• Often lead to total compromise of host.

• Developing buffer overflow attacks:


– Locate buffer overflow within an application
– Design an exploit

page 49
What is needed
• Understanding C functions and the stack.
• Some familiarity with machine code.
• Know how system calls are made.

• Attacker needs to know which CPU and OS are running


on the target machine.
– Our examples are for x86 running Linux.
– Details vary slightly between CPUs and OSs:
• Little endian vs. big endian (x86 vs. Motorola)
• Stack Frame structure (Linux vs. Windows)
• Stack growth direction.
page 50
How computers work?
• The processor executes instructions (add, sub, jump,
etc.)

• How a process runs


– Instructions are loaded into memory. Fetched and
executed one-by-one by the processor.
– An instruction pointer points to the instruction that is
currently executed.
– Each process thinks that it has 4GB (232 bytes) of virtual
memory (even though it might be called by other
processes)
page 51
Memory organization
• Process memory is divided
into three regions: higher
– The text region is fixed by the memory
stack addresses
program and includes code
(instructions) and read-only
data. It is normally marked
read-only and cannot be
written to.
– The data region contains static
variables.
– The stack is an abstract data data
type: the last object placed on
the stack will be the first
object removed (LIFO, text lower
operated on using PUSH and memory
POP instructions) addresses
page 52
Stack usage
• Modern computers are designed for high-level
languages, where programs use procedures or
functions.
– Computation is sequential: each invocation of a procedure
is initiated by a specific procedure.
– A procedure call alters the flow of control like a jump
instruction
– Unlike a jump, when finished performing its task, a
function returns control to the instruction following the
call.
– This high-level abstraction is implemented using the stack.
– The stack is also used to dynamically allocate the local
variables, pass parameters to functions, and return values.
page 53
X86 (Intel) calling convention
void f() {
• The instruction pointer, eip,
int v1, v2;
points to the current g(7, 8, v1);
instruction. }
• When calling g(), we must
void g(int a1, int a2,
remember the address from int a3) {
which the call was made. Char buf[16];
• In current processors, this …
address is saved in the stack }

• The information that is


stored when calling a
function, is called the
Activation Record (AR).
page 54
The Stack Region
• In X86 the stack grows downwards (from high
addresses to lower ones).
• A special register, esp, stores the last address that is
used by the stack.
• All data in locations >= esp belongs to the stack.

• (In HP-RISC the stack grows upwards. This does not


add a lot in terms of security.)

page 55
A comment about 64 bit
• The stack stores return addresses and local variables.
• In 32 bit x86, it also stores the arguments sent to
functions.
• In 64 bit x86, the arguments are sent in special
registers. This complicates the attacks that we will
show, but does not prevent them (since the
parameters must somehow be delivered to memory).

page 56
x86 architecture
• Any instruction can write to memory.
• Instruction pointer eip (32 bit long)
• 8 general purpose registers,
eax,ebx,ecx,edx,esi,edi,esp,ebp.
• eax contains returned value from function calls
• All other registers must be recovered after
returning from a call.

page 57
Calling a function: arguments (parameters)

• The calling function must put the arguments


in the stack before the call. Possible orders of
pushing these arguments:
– a1, a2, a3
– a3, a2, a1

page 58
Calling a function: arguments (parameters)

• The calling function must put the arguments


in the stack before the call. Possible orders of
pushing these arguments:
– a1, a2, a3
– a3, a2, a1
– The popular order in Unix is a3,a2,a1. Namely, the
arguments are pushed in reverse order.
– This is harder for the calling function.

page 59
Calling a function: arguments (parameters)

• The calling function must put the arguments


in the stack before the call. Possible orders of
pushing these arguments:
– a1, a2, a3
– a3, a2, a1
– This order is easier for the called function, since
the first argument is closest to esp, etc.
– Also, if the number of arguments can vary, as in printf, the
called function reads the first arg, and identifies how many
arguments there are.
page 60
Calling a function: eip
• The address of the instruction following the
call is pushed to the stack.
• The eip is set to the first instruction in the
called function.

• (When returning from the call, eip will be set


to the value stored in the stack.)

page 61
Calling a function: local variables
• g() defines a local array of 16 chars.
• The compiler assigns this memory location on
the stack.
• Namely sub $16, %esp

– (Convention: $ refers to a constant, % refers to a register.)

page 62
Returning from a function
• The calling function, f, must remove the three
arguments it put on the stack.
– I.e., run the instruction add $12, %esp
• The called function, g, might have put many
items on the stack.
– In order to save on book keeping, when called it
sets ebp←esp, and before a return sets esp←ebp.
– Using ebp like this is popular, since arguments are
positive offsets from ebp, and local vars are
negative offsets from ebp.
page 63
Returning from a function
• In other words:
• When g begins to run (“procedure prolog”)
– push %ebp (to save the current value of ebp)
– mov %esp, %ebp
– sub $16, %esp (for a local var of length 16)
• Before g returns it does (leave cmd, “proc epilog”)
– mov %ebp, %esp (restore stack head)
– pop %ebp
– ret

page 64
A simple example
C program: assembler code of main:
void function(int a,int b,int c) push $3
{ push $2
char buffer1[5]; push $1
char buffer2[10];
call function
}
void main() {
function(1,2,3); assembler code of
function:
} push %ebp
mov %esp,%ebp
call command pushes sub $20,%esp
return address onto
stack memory is referenced in words (4
bytes). Var lengths of each
variable are rounded.
page 65
Stack contents after function call

bottom of top of
memory memory

buffer2 buffer1 ebp ret a b c


<--- [ ][ ][ ][ ][ ][ ][ ]

previous frame pointer

page 66
Example: Buffer overflow causes segmentation fault

void function(char *str) { • strcpy() is copying the


char buffer[16]; contents of large_string[] into
strcpy(buffer,str); buffer[]
}
• 240 bytes previous to
void main() { buffer[] in the stack are filled
char large_string[256]; with `A`=0x41.
int i; • In particular, the return
for(i=0;i<255;i++) address is changed. It will be
large_string[i] = 'A';
function(large_string); 0x41414141, which is illegal.
} • The result is a segmentation
fault.

page 67
Stack contents after function call

bottom of top of
memory memory

buffer ebp ret large string


<--- [16 bytes][ ][ ] [ ]

previous frame pointer

page 68
Example: buffer overflow changes execution

void function(int a, int b, • The return address is 12


int c) {
bytes higher than buffer1.
char buffer1[5];
char buffer2[10]; • The line (*ret) +=8; adds
int *ret; two words to the return
ret = buffer1 + 12; address.
(*ret) += 8;
• The line x =1; in main() is
}
therefore not executed.
void main() {
int x;
x = 0;
function(1,2,3);
x = 1;
printf("%d\n",x);
}
page 69
Another example
• Suppose a Web server contains this function
void func(char *str) { Allocate local buffer
char buf[126]; (126 bytes reserved on stack)
strcpy(buf,str); Copy argument into local buffer
}
• When this function is invoked, a new frame with local variables
is pushed onto the stack
Stack grows this way
ret Frame of the Top of
buf ebp addr str calling function
stack

Local variables Pointer to Arguments


previous
frame

slide 70
What If Buffer is Overstuffed?
• Memory pointed to by str is copied onto stack…
void func(char *str) {
strcpy does NOT check whether the string
char buf[126];
strcpy(buf,str); at *str contains more than 126 characters
}
• If a string longer than 126 bytes is copied into buffer, it will
overwrite adjacent stack locations

Top of
overflow str
Frame of the
buf calling function
stack

This will be
interpreted
as return address!

slide 71
Executing Attack Code
• Suppose attacker manages to set buffer to contain attacker-created
string
– For example, *str contains a string received from the network as input to
some network service daemon

Top of
code ret str Frame of the
calling function
stack
Attacker puts actual assembly In the overflow, a pointer back
instructions into his input string, e.g., into the buffer appears in
binary code of execve(“/bin/sh”) the location where the system
expects to find return address

• When function exits, code in the buffer will be executed

slide 72
Buffer Overflow Issues
• Executable attack code is stored on stack, inside the
buffer containing attacker’s string

• Overflow portion of the buffer must contain correct


address of attack code in the RET position
– The value in the RET position must point to the beginning of
attack assembly code in the buffer
• Otherwise application will crash with segmentation
violation
– Attacker must correctly guess in which stack position his
buffer will be when the function is called.
slide 73
How can we find the right code for an attack?
• The code will change the return address to
run the shell.
• We can place the code we want to run in the
buffer.
– Write C code
– Compile it to assembly, find the important
part
– Handle errors to exit gracefully
– Use JMP and CALL to use indirect addressing
– no need to guess what the actual memory #include <stdio.h>
address is. void main() {
– Make sure there are no null bytes in the char *name[2];
code (strcpy). name[0] = "/bin/sh";
– The result is only 46 bytes long! name[1] = NULL;
execve(name[0],
name, NULL);
page 74 }
Shellcode
• The attacker’s code must not contain any null
byte (or otherwise strcpy terminates since this is
an end of string)
• “Shellcode”: code that does not contain null
bytes, or is only alphanumeric, or is even
legitimate English sentences.
– Was developed and can be downloaded on the net.
– Can be rather small (and then open a port and
download additional code).

page 75
Which return address to use?
• Problem: attacker must find at what address the buffer
(and thus our code) will be.
– For every program, the stack will start at the same address.
– Most programs push only a few KB into the stack.
– Can try to guess exact location, and write a program to go over
these guesses.
• NOP sled
– A simpler solution: pad the beginning of the buffer (before the
exploit program) with NOP operations.
– If the return address points inside the NOPs, the exploit runs.
– Success probability improves as the buffer size increases.
– Typically, a NOP sled of few KB.
page 76
Another example
int bar (int val1) {
int val2;
foo (a_function_pointer); val1 String
} val2 grow
Contaminated s
memory

int foo (void (*funcp)()) { arguments (funcp)


char* ptr = point_to_an_array;
char buf[128]; return address
gets (buf); ebp
strncpy(ptr, buf, 8); Most
popular pointer var (ptr)
(*funcp)();
} target buffer (buf) Stack
grow
s

slide 77
int foo (void (*funcp)()) {
char* ptr = point_to_an_array;
char buf[128];
Attack #1: Return Address gets (buf);
strncpy(ptr, buf, 8);
(*funcp)();
}

② set stack pointers to


return to a dangerous
library function
Attack code “/bin/sh”

args (funcp)
① system()
return address
① Change the return address to
ebp
point to the attack code. After
the function returns, control is pointer var (ptr)
transferred to the attack code buffer (buf)
② … or return-to-libc: use existing
instructions in the code
segment such as system(),
exec(), etc. as the attack code
slide 78
int foo (void (*funcp)()) {

Attack #2: Two attacks on char* ptr = point_to_an_array;


char buf[128];
gets (buf);

pointer Variables }
strncpy(ptr, buf, 8);
(*funcp)();

Global Offset Table


Attack code
Function pointer ①
args (funcp)
① Change a function pointer to return address
point to the attack code ebp
② Any memory (e.g. variable) even
② pointer var (ptr)
not in the stack, can be buffer (buf)
modified by the statement that
stores a value into the
compromised pointer
strncpy(ptr, buf, 8);
slide 79
int foo (void (*funcp)()) {
char* ptr = point_to_an_array;
char buf[128];
Attack #3: Frame Pointer gets (buf);
strncpy(ptr, buf, 8);
(*funcp)();
}

return address
ebp

Attack code args (funcp)


return address
ebp
① Change the caller’s saved frame pointer var (ptr)
pointer to point to attack- buffer (buf)
controlled memory. Caller’s
return address will be read from
this memory.

slide 80
Problems enabling this attack
• Stack grows in one direction (downward)
while strings and array grow in the other
direction (upward).

• No range checking: strcpy does not check


input size
– strcpy(buf, str) simply copies memory contents
into buf starting from *str until “\0” is
encountered, ignoring the size of area allocated to
buf.

slide 81
Some unsafe C lib functions
• strcpy (char *dest, const char *src)
• strcat (char *dest, const char *src)
• sprintf (const char *format, … )
– Write a string into an array
– Operates on null terminated strings and does not check for overflow
of the receiving string.
• gets (char *s)
– reads a line from stdin into a buffer until either a terminating newline
or EOF.
• scanf ( const char *format, … )
– A similar problem

• Other programming constructs are also vulnerable.


– Given a program source, use grep to find these vulnerabilities. .

page 82
Methods of finding buffer overflows
• For example, to find overflows in a web server:
– Run web server on local machine.
– Issue requests with long tags.
All long tags end with “$$$$$”.
– If web server crashes,
search core dump for “$$$$$” to find
overflow location.
• Some automated tools exist. (e.g. eEye Retina).

• Then use disassemblers and debuggers (e..g IDA-Pro) to


construct exploit.

page 83
Other types of overflow attacks
• Integer overflows:
void func(int a, int v) {
int buf[128];

buf[a] = v;
}

– Problem: a can point to `ret-addr’ on stack.

page 84
Preventing overflow attacks
• Main problem:
– strcpy(), strcat(), sprintf() have no range
checking.
– “Safe” versions strncpy(), strncat() are
misleading
• strncpy() will not write a string terminator if the source
string does not contain a string terminator within the
specified number of bytes.

• Defenses:
– Rewrite all legacy code to be safe, and be careful
when writing new code…
page 86

You might also like