Introduction to Assembly Language

2nd Semester SY 2009-2010 Benjie A. Pabroa

What is Assembly Language

"High"-level languages such as BASIC, FORTRAN, Pascal, Lisp, APL, etc. are designed to ease the strain of programming by providing the user with a set of somewhat sophisticated operations that are easily accessed

Assembly as Low-level language
The lesson we derive is this: a very low-level language might be very flexible and efficient (in terms of speed and memory use), but might be very difficult to program in since no sophisticated operations are provided and since the programmer must understand in detail the operation of the computer  Assembly language is essentially the lowest possible level of language.

Built-in Features
the ability to read the values stored at various "memory locations",  the ability to write a new value into a memory location,  the ability to do integer arithmetic of limited precision (add, subtract, multiply, divide),  The ability to do logical operations (or, and, not, xor),  and the ability to "jump" to programs stored at various locations in the computer's memory.

Features not included
The ability to perform graphics  and the ability to access files  ability to directly perform floating-point arithmeti

Assembly vs High Level Lang

FORTRAN code to average together the N numbers stored in the array X(I):
    

INTEGER*2 I,X(N) INTEGER*4 AVG . . .

AVERAGE THE ARRAY X, STORING THE RESULT AS AVG:
      

AVG=0 DO 10 I=1,N AVG=AVG+X(I) AVG=AVG/N . . .

Assembly vs High Level Lang
;  ;  ; mov dx,0 ;  ;  ; mov ax,0 ;  ; mov si,offset x  ;  ;  ;

mov cx,n

cx is used as the loop counter. It starts at N and counts down to zero. the dx register stores the two most significant bytes of the running sum use ax to store the least significant bytes ; use the si register to point to the currently accessed element X(I), starting with I=0

Assembly vs High Level Lang
addloop: add ax,word ptr [si] ; add X(I) to the two least  ; significant bytes of AVG adc dx,0 ; add the "carry" into the two  ; most significant bytes of AVG add si,2 ; move si to point to X(I+1) loop addloop ; decrement cx and loop again  ; if not zero div n ; divides AVG by N mov avg,ax ; save the result as AVG

Assembly vs High Level Lang

writing it required intimate knowledge of how the variables x, n, and avg were stored in memory.

PC System Architecture

Microprocessor
◦ Reading instructions from the memory and executing them
 Access memory Do arithmetic and logical operations Performs other services as well

PC System Architecture

1971: 1978: 1981: 1989: 1997: 2002:

◦ Intel’s 4004 was the first microprocessor—a 4-bit CPU (like the one from CS231) that fit all on one chip. ◦ The 8086 was one of the earliest 16-bit processors. ◦ IBM uses the 8088 in their little PC project. ◦ The 80486 includes a floating-point unit in the same chip as the main processor, and uses RISC-based implementation ideas like pipelining for greatly increased performance. ◦ The Pentium II is superscalar, supports multiprocessing, and includes special instructions for multimedia applications. ◦ The Pentium 4 runs at insane clock rates (3.06 GHz), implements extended multimedia instructions and has a large on-chip cache.

  

PC System Architecture..

Memory
◦ Store instructions(program) or data ◦ It appears as a sequence of locations(or addresses)
Each address – stored a byte

◦ Types:
ROM
 Stored byte may only be read by the CPU Cannot be changed

RAM
Stored byte may be both read and written(changed) Volatile – all data will be lost after shutdown

Both types are random access

The Process of Assembly

Assembly language is a compiled language
◦ Source-code must first be created with a texteditor program ◦ Then the source-code will be compiled ◦ Assembly language compilers => assemblers ◦ First: text-editor(source code editor) ◦ Second: assembler ◦ Third: Linker

Auxiliary Programs

Assembles source code to generate object code in the process. Combines object code modules created by assembler

The Process of Assembly..
Built-in to the operating system and is never explicitly executed. Takes the “relocatable” code created by the linker, “loads: it into memory at the lowest available location, then runs it.

◦ Fifth: Debugger
Environment for running and testing assembly language programs.

The Process of Assembly..

Source Code

Assem bler

RAM

Other Object Code1 Other Object Code2

DOS and Simple File Operation

DOS
◦ provides the environment in which programs run. ◦ Provides a set of helpful utility functions
Must be understood in order to create program in DOS

Making an assembly Source Code

You can use the edit command in DOS or just use the notepad.

AH BH CH DH SP BP SI DI

AL BL CL DL CS DS SS ES

Bus Cont rol Unit

ALU CU Flag Register 1 2 3 4 Instruction Pointer

CPU Registers

Assembly language
◦ Thought goes into the use of the computer memory and the CPU registers

Register
◦ Like a memory location in that it can store a byte (or work) value. ◦ No address in the memory, it is not part of the computer memory(built into the CPU)

CPU Registers

Importance of Registers in Assembly Prog.
◦ Instructions using registers > operating on values stored at memory locations. ◦ Instructions tend to be shorter (less room to store in memory) ◦ Register-oriented instructions operate faster that memory-oriented instructions
Since the computer hardware can access a register much faster than a memory location.

CPU Registers (8086 family)
AX BX CX DX SI DI BP The Accumulator The Pointer Register The Loop Counter Used for multiplication and Division string The “Source” index register The “Destination” Stringfor passing Used index register arguments on the stack SP The stack pointer IP The Instruction pointer CS The “code segment” register segment” DS The “data register SS The “stack segment” register ES The “Extra segment” FLAG register register The flag

Segment Registers
CS Code Segment 16-bit number that points to the active code-segment

DS

Data Segment

16-bit number that points to the active data-segment

SS

Stack Segment

16-bit number that points to the active stack-segment

ES

Extra Segment

16-bit number that points to the active extra-segment

Pointer Registers
IP Instruction Pointer 16-bit number that points to the offset of the next instruction

SP

Stack Pointer

16-bit number that points to the offset that the stack is using used to pass data to and from the stack

BP

Base Pointer

General Purpose Registers
AX Accumulator Register mostly used for calculations and for input/output Base Register Count Register Data Register Only register that can be used as an index register used for the loop instruction input/output and used by multiply and divide

BX CX DX

Index Registers
SI Source Index used by string operations as source

DI

Destination Index used by string operations as destination

CPU registers
◦ AX, BX, CX, & DX – more flexible that other
Can be used as word registers(16-bit val) Or as a pairs of byte registers (8-bit vals)

◦ A General purpose registers can be “split”
AX = AH + AL BX = BH + BL CX = CH + CL DX = DH + DL

◦ Ex: DX = 1234h, then DH = 12h and DL = 34h

Flag Registers
Consist of 9 status bits(flags)  Flags – because it can be either

◦ SET(1) ◦ NOT SET(0)

Flag Registers
Abr. OF DF Name Overflow Flag Direction Flag bit nº 11 10 Description indicates an overflow when set used for string operations to check direction

IF

Interrupt Flag

9

if set, interrupt are enabled, else disabled

TF

Trap Flag

8

if set, CPU can work in single step mode if set, resulting number of calculation is negative

SF

Sign Flag

7

Flag Registers..
Abr. Name bit nº Description

ZF

Zero Flag

6

if set, resulting number of calculation is zero

AF

Auxiliary Carry

4

some sort of second carry flag indicates even or odd parity contains the left-most bit after calculations

PF

Parity Flag

2

CF

Carry Flag

0

Test it

You want to see all these register and flags?
◦ ◦ ◦ ◦ go to DOS Type debug type "r" The you’ll see all the registers and some abbreviations for the flags. ◦ Type "q" to quit again.

Memory Segmentation

How DOS uses memory
◦ databus = 16-bit
it can move and store 16 bits(1 word = 2 bytes) at a time.

◦ If the processor store 1 word (16-bits) it stores the bytes in reverse order in the memory.
1234h (word) ---> memory 34h (byte) 12h (byte)
Memory value: 78h 56h derived value 5678h

Memory Segmentation..

Computer divides it memory into segments
◦ Standard in DOS ◦ Segments are 64KB big and have a number ◦ These numbers are stored in the segment registers (see above). ◦ Three main segments are the code, data and stack segment
 Overlap each other almost completely Try type d in the debug
 4576:0100 -> memory address where 4576 – segment number; 0100 – offset

Memory Segmentation..

Segments overlaps
◦ The address 0000:0010 = 0001:0000 ◦ Therefore, segments starts at paragraph boundaries
A paragraph = 16 bytes So a segment starts at an address divisible by 16

◦ 0000:0010 => 0h:10h => 0:16
Memory Location: (0*16)+16 = 0+16 = 16

(linear

◦ 0001:0000 => 1h:0h => 1:0
Memory Location: (1*16)+0 = 16+0 = 16

.model small .stack .data message db "Hello world, I'm learning Assembly !!!", "\$"

My First Program

  

.code

main proc  mov ax,seg message  mov ds,ax
     

mov ah,09 lea dx,message int 21h

mov ax,4c00h  int 21h main endp end main

Names

Identifiers

◦ An identifier is a name you apply to items in your program. the two types of identifiers are "name", which refers to the address of a data item, and "label", which refers to the address of an instruction. The same rules apply to names and labels ◦ ◦ A program is made of a set of statements, there are two types of statements, "instructions" such as MOV and LEA, and "directives" which tell the assembler to perform a specific action, like ".model small“ or “.code”

Statements

Statements

Here's the general format of a statement:
indentifier - operation - operand(s) - comment

◦ ◦ The identifier is the name as explained above. ◦ The operation is an instruction like MOV. ◦ The operands provide information for the Operation to act on. ◦ Like ◦ The comment is a line of text you can add as a comment, everything the assembler sees after a ";" is ignored.

MOV (operation) AX,BX (operands).

Statements

Example
◦ MOV AX,BX ;this is a MOV instruction

How to Assemble

The source code can only be assembled by an assembler or and the linker.
◦ A86 ◦ MASM ◦ TASM – we will use this one

Install TASM
Then use the tasm.exe and tlink.exe

How to Assemble
• The Assemble
– To assemble Type the ff. on the command prompt:
• cd c:\tasm\bin • tasm <filename/path of the source code>
– tasm c:\first.asm

<filename/path of the object code>

– To run call the .exe on the command prompt:
• Example in our program(First.asm)

.model small .stack .data
message db "Hello world, I'm learning Assembly !!!", "\$" .code main proc mov ax,seg message mov ds,ax mov ah,09 lea dx,message int 21h mov ax,4c00h int 21h main endp end main

Dissecting Code

.model small
◦ Lines that start with a "." are used to provide the assembler with information. ◦ The word(s) behind it say what kind of info.
 In this case it just tells the assembler that the program is small and doesn't need a lot of memory. I'll get back on this later.

.stack
◦ This one tells the assembler that the "stack" segment starts here.
 The stack is used to store temporary data.

.data
◦ indicates that the data segment starts here and that the stack segment ends there.

.model small .stack .data

message db "Hello world, I'm learning Assembly !!!", "\$"

.code main proc
mov ax,seg message mov ds,ax mov ah,09 lea dx,message int 21h mov ax,4c00h int 21h

main endp end main

Dissecting Code..

.code

◦ indicates that the code segment starts there and the data segment ends there.

main proc
◦ ◦ ◦ ◦

◦ It also tells the assembler where to start in the program.
 At the procedure called main in this case.

Code must be in procedures, just like in C or any other language. This indicates a procedure called main starts here. endp states that the procedure is finished. endmain main : tells the assembler that the program is finished.

message db "xxxx"

◦ DB means Define Byte and so it does. ◦ In the data-segment it defines a couple of bytes. ◦ It's called an "indentifier".

◦ These bytes contain the information between the brackets. ◦ "Message" is a name to indentify this byte-string.

Memory space for variables
◦ ◦ ◦ ◦ DB (Byte – 8 bit ) DW (Word – 16 bit) DD (Doubleword – 32 bit) Example:
 foo db 27 ;by default all numbers are decimal bar dw 3e1h ; appending an "h" means hexadecimal real_fat_rat dd ? ; "?" means "don't care about the value“

◦ Variable name
 Address can’t be changed Value can be changed

.model small .stack .data message db "Hello world, I'm learning Assembly !!!", "\$" .code main proc

mov ax, seg message
mov ds,ax mov ah,09 lea dx,message int 21h mov ax,4c00h int 21h main endp end main

Dissecting Code..

mov ax, seg message
◦ AX is a register.
 You use registers all the time, so that's why you had to know about them before.

◦ MOV is an instruction that moves data.
 It can have a few "operands“  Here the operands are AX and seg message.

◦ seg message can be seen as a number.
 It's the number of the segment "message“ in (The data-segment)  We have to know this number, so we can load the DS register with it.  Else we can't get to the bit-string in memory.  We need to know WHERE the bit-string is located in memory.

◦ The number is loaded in the AX register.
 MOV always moves data to the operand left of the comma and from the operand right of the comma.

The MOV Instruction

Syntax:
◦ MOV destination, source

Allows you to move data into and out the registers
◦ Destination ◦ Source
either registers or mem. Loc. can be either registers, mem. Loc. or numeric value 

Memory-to-memory transfer NOT ALLOWED

The MOV Instruction

Codes we do earlier

 

foo db 27 ;by default all numbers are decimal bar dw 3e1h ; appending an "h" means hexadecimal real_fat_rat dd ? ; "?" means "don't care about the value“

 

 mov ax,bar otice the size of the source and destination (must match in reg-reg,  mov dl,foo mem-reg, reg-mem  mov bx,ax Transfers)
 

mov bl,ch mov bar,si mov foo,dh mov al,5 mov bar,5 mov foo,5

 

 mov ax,5 onstant must consistent with the destination   

; load the word-size register ax with ; the word value stored at location bar. ; load the byte-size register dl with ; the byte value stored at location foo. ; load the word-size register bx with ; the byte value in ax. ; load the byte-size register bl with ; the byte value in ch. ; store the value in the word-size ; register si at the memory location ; labelled "bar". ; store the byte value in the register ; dh at memory location foo. ; store the word 5 in the ax register. ; store the byte 5 in the al register. ; store the word 5 at location bar. ; store the byte 5 at location foo.

Illegal Move Statement
◦ MOV AL, 3172 ◦ MOV foo, 3172

Why the code above are Illegal?

.model small .stack .data message db "Hello world, I'm learning Assembly !!!", "\$" .code main proc mov ax, seg message

mov ds,ax mov ah,09 lea dx,message
int 21h mov ax,4c00h int 21h main endp end main

Dissecting Code..

mov ds,ax

◦ Here it moves the number in the AX register (the number of the data segment) into the DS register. ◦ We have to load this DS register this way (with two instructions) ◦ Just typing: "mov ds,segment message" isn't possible.

mov ah, 09

◦ MOV again. This time it load the AH register with the constant value nine.

lea dx, message

 This instructions stores the offset within the datasegment of the bit-string message into the DX register.  This offset is the second thing we need to know, when we want to know where "message" is in the memory.  So now we have DS:DX.

.model small .stack .data message db "Hello world, I'm learning Assembly !!!", "\$" .code main proc mov ax,seg message mov ds,ax mov ah,09 lea dx,message

int 21h mov ax,4c00h int 21h
main endp end main

Dissecting Code..

int 21h

◦ This instruction causes an Interrupt. ◦ The processor calls a routine somewhere in memory. ◦ 21h tells the processor what kind of routine, in this case a DOS routine. ◦ For now assume that INT just calls a procedure from DOS. ◦ The procedure looks at the AH register to find out what it has to do. ◦ In this example the value 9 in the AH register indicates that the procedure should write a bit-string to the screen. ◦ Load the Ax register with the constant value 4c00h

mov ax, 4c00h

int 21h

◦ this time the AH register contains the value 4ch (AX=4c00h) and to the DOS procedure that means "exit program". ◦ The value of AL is used as an "exit-code" 00h means "No error"

After running:
◦ Go to DOS and type “FIRST.exe” to debug. ◦ Type d -> display some addresses ◦ Type u -> you will see something
 0F77:0000 B8790F 0F77:0003 8ED8 0F77:0005 B409  MOV AX,0F79 MOV DS,AX MOV AH,09

Segm ent Num ber & Offset Machine Code inst ruct ion

0F77:0000 B8790F 0F77:0003 8ED8 0F77:0005 B409
0F77:0000 B8790F

MOV AX,0F79 MOV DS,AX MOV AH,09
AX,0F79

MOV

originally: mov ax, seg message B8 ->mov ax 790F ->number It means that data is store in the segment with number 0F79

The other instruction lea dx,message turned into mov dx,0.
◦ So that means that the offset of the bit-string is 0 --> 0F79:0000. ◦ Try to type d 0F79:0000 ◦ ◦ Calculating other address
We will subtract 2 segments from 0F79 = 0F77 2 segments = 32 bit (0002:0000) The other address is 0F77:0020 

The Stack
The stack is a place where data is temporarily stored  The SS and SP registers point to that place like this: SS:SP

◦ So the SS register is the segment and the SP register contains the offset

There are a few instructions that make use of the stack
◦ PUSH - Push a value on the stack ◦ POP - retrieve that value from the stack

The Stack
 MOV AX,1234H PUSH AX MOV AH,09 INT 21H POP AX

◦ The final value of AX will be 1234h.
First we load 1234h into AX, then we push that value to the stack. We now store 9 in AH, so AX will be 0934h and execute an INT. Then we pop the AX register.
We retrieve the pushed value from the stack.

So AX contains 1234h again

The Stack
MOV AX, 1234H MOV BX, 5678H PUSH AX POP BX

◦ We pushed the AX to the stack ◦ and we popped that value in BX. ◦ ◦ What is the final value of AX and BX?

The Stack
It is easy done by the instruction .stack that will create a stack of 1024 bytes.  The stack uses a LIFO system (Last In First Out)

The Stack
MOV AX,1234H MOV BX,5678H PUSH AX PUSH BX POP AX POP BX

First the value 1234h was pushed after that the value 5678h was pushed to the stack. According to LIFO 5678h comes of first, so AX will pop that value and BX will pop the next. What is the value of AX and BX?

How does the stack look in memory?
it "grows" downwards in memory.  When you push a word (2 bytes) for example, the word will be stored at SS:SP and SP will be decreased to times.  So in the beginning SP points to the top of the stack and (if you don't pay attention) it can grow so big downwards in memory that it overwrites the source code.  Major system crash is the result.

Congatulation!!

If you fully understand this stuff (registers, flags, segments, stack, names, etc.) you may, from now on, call yourself a

"Level 0 Assembly Coder"