You are on page 1of 22

23 JUNE 2019 / TECHNICAL

A Guide To x86 Assembly

Assembly language (or assembler), is any low-level programming


language in which there is a very strong correspondence between the
program's statements and the architecture's machine codeinstructions."
Some of you may know it from your computer science courses where you
were expected to read lots of ones and zeros.
What exactly is a low-level programming
language?
A low level programming language is a programming language that provides little to no

abstraction from the computer's instruction set architecture. Low level programming languages

run generally on instructions and commands or functions in this low level language closely

map to the processor's instruction set. The word "Low" means there is little to no abstraction

between the language and processor.

Second generation programming languages typically code programs in Assembler, which in

turn generates machine code to be executed, machine code is the only language computers

can understand without any processing. Below is an example of machine code in hexadecimal

form for generating fibonacci series terms.

X86 machine code


Just as we can't really understand machine code, a computer processor can't really

understand our language, that's where Assembly Language comes into play.

Assembler + Linker

Assembler + linker combined are called Translator which takes the assembly mnemonics we

provide and converts them to machine code we can then execute.

What are Mnemonics?


In programming, a mnemonic is a name assigned to a machine function or an abbreviation for

an operation. Each mnemonic represents a low level machine instruction or opcode in

assembly. add, mul, lea, cmp, and je are examples of mnemonics.

What Are Registers?


Registers in assembly programming can be considered to be global variables we use in higher

level programming languages for general operations.

Some Different Types of Registers :

General purpose - Eax, Ebx, Esp, Ebp

Segment - CS, CD

Control - EIP

General Purpose Registers


These are some of the general purpose registers
General PurposeinRegisters
x86 architecture, each of the above

register has capacity of storing 32 bit of data. Think of an EAX register with 32 bit, Lower part

of EAX is called AX which contains 16 bit of data, AX is also further divided in two parts AH

and AL, each with 8 bits in size, the same goes with EBX, ECX and EDX.

EAX - Accumulator Register - used for storing operands and result data

EBX- Base register - Points to data

ECX - Counter Register - Loop operations


Unlike registers we saw before, the above registers (ESP, EBP, ESI, EDI can not be divided in

small sizes of 8 bits, however they are divided in upper and lower 16 bits of register.Registers

in a cpu are limited, you can't use them to store larger chunks of data and that's where

memory comes to play. Data can be stored in memory in a stack data structure, the ESP

register serves as an indirect memory operand pointing to the top of the stack at any time.

Consider a stack which contains data, ESP points to the top of that stack. Consider that a
stack currently contains integer value 2 only. so 2 would be at the top of the stack. The ESP

register would point to integer value 2 and in the same way, EBP points to the base of a stack.

What doesn't fit in registers lives in


memory
Memory is accessed either with loads and stores at addresses as if it were a big array, or

through PUSH and POP operations on a stack.

This is general memory hierarchy of a computer, Registers are at the top of it being fastest
than rest but smaller in size as well, while moving down the hierarchy, storage size increases

as well as speed decreases

How are DataTypes are stored in memory?


There are several ways in which multibyte data types can be stored, the two most common

ways to store DataTypes in memory are Little Endian and Big Endian.

Little Endian Data Storage type is generally used in intel based processors where main focus

is processing speed not the amount of power consumed. However Arm makes processors for

mobile devices where battery and power consumption plays an important role, so Big endian

is used with arm processors.


Storing 0x01234567

The above image is the representation of how 0x01234567 would be stored in memory. In Big

Endian the data is stored as given, but in Little Endian Bytes are written in another order, from

0x01234567, 67 is written first, then 45, then 23 and at last 01.

A simpler explaination for storing CAFEBABE


Let's talk about Memory Segments!

Text

Contains Instructions for program

Data

Contains Data For Program i.e. Message Strings

BSS

Contains all uninitialized global variables

Hello World
In the above image is the structure of a Hello world program in assembly.

Entry point of program is a global variable called _start: and the program execution is started

from there. The Text section contains the instructions to print and exit the program, the Data

section contains the Message string "Hello World!" which is used in Instruction of print in text

section.

One of the Most important Registers : EIP


As we discussed before, assembly is executed instruction wise and instructions are written in

an orderly fashion.

_start:

1. mov $5, ecx

2. mov $5, edx

3. cmp ecx, edx


In above given assembly program, Execution is started with the symbol _start:

EIP points to the next instruction to execute

Before the 1st instruction of "mov $5, ecx" is executed, EIP points to the address of the first

instruction. After it is executed, EIP is then incremented by 1, so it will now point to the second

instruction. Program execution would flow this way, as an attacker if we want to take control of

the program, we should manipulate the value of EIP. Same as if else statements in higher

level programming languages, assembly also provides mnemonics to control the flow of

program, but let's first understand some basic mnemonics of assembly.


These are some of the many many provided with a processor and they are pretty much self

explanatory. Let's discuss the Jmp instruction.

jmp - it's like goto function in C, it would jump to the specified location unconditionally.

Consider this code I give you below.

1. mov $5, ecx

2. mov $5, edx

3. jmp 5

4. mov $6, ecx

5. cmp ecx, edx

6. je function

7. function :

In above given snippet of code, 1st instruction and 2nd instruction would be executed one

after another, resulting 5 in ecx and edx. The jmp 5 instruction is encountered, so flow is

directly transffered to instruction number 5. So, instruction number 4 won't ever be executed.
Now lets see the cmp instruction, after executing the 3rd instruction, execution comes to the

5th instruction.

cmp ecx, edx

Which will compare ecx and edx by substracting one out of another, if substraction is zero, it

means both values stored in registers ecx, and edx are same.

So zero flag is set to one, indicating that result is zero.

Now a JE instruction is encountered.

JE instruction will check for the zero flag of above executed instruction. JE simply means jump

if equal as the above instruction, if ecx and eds are equal, je redirects flow to the function:

Computers contain a layered structure


Level 3

Application Level Libraries

Level 2

System LEvel Libraries

Level 1

Operating System
Level 0

Bare Metal

The OS contains libraries and drivers


In order to interact with the OS we have to use a System Call (syscall), the operating system

offers some services to the application running on it. This services are accessible using these

system calls for opening files, mapping memory, reading directory content, etc. All these

actions require interaction with the hardware (the hard drive, the memory management unit)

and are managed by the OS.

Every possible linux system call is enumerated, so they can be referenced by the numbers

when making the calls in assembly.

i.e. EXIT - 1

WRITE - 4

How do system calls work?


The image below gives brief information on how system calls work.
System call management

User space program calls for a system call by invoking an Interrupt. That interrupt is then

passed to Interrupt Handlers Table, which invokes system call handler which in turn invokes

specific system call, there are mainly two modes of invoking a SystemCall

Int 0x8; and


SYSENTER

Every syscall takes some arguments, so before executing a syscall we need our parameters

ready in registers.

EAX contains the syscall number and rest of the registers contain other arguments, we can

get details about a specific syscall by visiting its man page on linux with "man (syscall name)".

I.e. man write


So, for write syscall, we'd need to store our syscall number in EAX, which is 4 then store EBX,

file descriptor, and we'd need ECX to point to our string which we need to print. and at last,

edx to contain the length we need to print. After storing all that we'd simply invoke interrupt

with int 0x80.


Same way exit

Now let's try writing our first program of printing Hello world! in assembly

global _start

section .text

_start:

mov eax, 0x4


mov ebx, 0x1

mov ecx, message

mov edx, 12

int 0x80

mov eax, 0x1


mov ebx, 0x5
int 0x80
section .data

message: db "Hello, World!"; define byte


We first declared _start as our global varible,

then started text section with .text

to execute write, we pushed syscall number of write, which is "4" into eax

then, file descriptor "1" into ebx.

then from our .data section, message pointer to ecx.

we need to print 12 bytes, so pushed 12 in edx


then called interrupt, resulting to print Hello world!

now we pushed 1 into eax, which is the syscall number of exit.

we want to exit with status code 5, so pushed 5 in ebx

then 0x80 to execute

in data section,

message: db "Hello world!" means we are defining message as a double word of "Hello

world!"

It seems we have successfully written our first program in X86 assembly!

Thanks for reading, ask me questions by messaging me directly on twitter!

twitter : @malav_vyas1 | github : github.com/malavyas | web : malavvyas.tk


A special thanks to these creators for blogs and videos helping this article: Security

Tube, LiveOverflow and 0x00sec.

The awesome image used in this article is called Dino ASCII and was created by Alexandra Hanson.

You might also like