Professional Documents
Culture Documents
Xiuzhen Cheng
Encoding Assembly Code
cheng@gwu.edu
1
MIPS Architecture Assembly Variables: Registers (1/4)
MIPS – semiconductor company that built Unlike HLL like C or Java, assembly cannot use
one of the first commercial RISC variables
architectures Why not? Keep Hardware Simple
We will study the MIPS architecture in Assembly Operands are registers
some detail in this class limited number of special locations built directly into the hardware
Why MIPS instead of Intel 80x86? operations can only be performed on these!
MIPS is simple, elegant. Don’t want to get bogged Benefit: Since registers are directly in hardware, they
down in gritty details. are very fast
MIPS widely used in embedded apps, x86 little (faster than 1 billionth of a second)
used in embedded, and more embedded
computers than PCs
By convention, each register also has a name to make In C (and most High Level Languages) variables
it easier to code declared first and given a type
For now: Example:
$16 - $23 Î $s0 - $s7 int fahr, celsius;
(correspond to C variables) char a, b, c, d, e;
$8 - $15 Î $t0 - $t7 Each variable can ONLY represent a value of the type
(correspond to temporary variables) it was declared as (cannot mix and match int and
Later will explain other 16 register names char variables).
In general, use names to make your code more In Assembly Language, the registers have no type;
readable operation determines how register contents are
treated
2
Comments in Assembly Assembly Instructions
Another way to make your code more readable: In assembly language, each statement (called an
comments! Instruction), executes exactly one of a short list of
Hash (#) is used for MIPS comments simple commands
anything from hash mark to end of line is a comment and will Unlike in C (and most other High Level Languages),
be ignored each line of assembly code contains at most 1
Note: Different from C. instruction
C comments have format Instructions are related to operations (=, +, -, *, /) in
/* comment */
C or Java
so they can span many lines
Ok, enough already…gimme my MIPS!
MIPS Addition and Subtraction (1/4) Addition and Subtraction of Integers (2/4)
Syntax of Instructions: Addition in Assembly
1 2,3,4
Example: add $s0,$s1,$s2 (in MIPS)
where:
Equivalent to: a = b + c (in C)
1) operation by name
where MIPS registers $s0,$s1,$s2 are associated with C
2) operand getting result (“destination”) variables a, b, c
3) 1st operand for operation (“source1”)
Subtraction in Assembly
4) 2nd operand for operation (“source2”)
Example: sub $s3,$s4,$s5 (in MIPS)
Syntax is rigid: Equivalent to: d = e - f (in C)
1 operator, 3 operands where MIPS registers $s3,$s4,$s5 are associated with C
Why? Keep Hardware simple via regularity variables d, e, f
Addition and Subtraction of Integers (3/4) Addition and Subtraction of Integers (4/4)
How do the following C statement? How do we do this?
a = b + c + d - e; f = (g + h) - (i + j);
Break into multiple instructions Use intermediate temporary register
add $t0, $s1, $s2 # temp = b + c add $t0,$s1,$s2 # temp = g + h
add $t0, $t0, $s3 # temp = temp + d add $t1,$s3,$s4 # temp = i + j
sub $s0, $t0, $s4 # a = temp - e sub $s0,$t0,$t1 # f=(g+h)-(i+j)
3
Register Zero Immediates
One particular immediate, the number zero (0),
appears very often in code. Immediates are numerical constants.
So we define register zero ($0 or $zero) to always They appear often in code, so there are special
have the value 0; eg instructions for them.
add $s0,$s1,$zero (in MIPS) Add Immediate:
f = g (in C) addi $s0,$s1,10 (in MIPS)
where MIPS registers $s0,$s1 are associated with C f = g + 10 (in C)
variables f, g where MIPS registers $s0,$s1 are associated with C
variables f, g
defined in hardware, so an instruction
add $zero,$zero,$s0 Syntax similar to add instruction, except that last
will not do anything! argument is a number instead of a register.
4
Data Transfer: Memory to Reg (1/4) Data Transfer: Memory to Reg (2/4)
To transfer a word of data, we need to specify two To specify a memory address to copy from, specify
things: two things:
Register: specify this by # ($0 - $31) or symbolic name ($s0,…, A register containing a pointer to memory
$t0, …)
A numerical offset (in bytes)
Memory address: more difficult
The desired memory address is the sum of these
Think of memory as a single one-dimensional array, so we
can address it simply by supplying a pointer to a memory two values.
address. Example: 8($t0)
Other times, we want to be able to offset from this pointer. specifies the memory address pointed to by the value in $t0,
plus 8 bytes
Remember: “Load FROM memory”
Data Transfer: Memory to Reg (3/4) Data Transfer: Memory to Reg (4/4)
Load Instruction Syntax: Data flow
1 2,3(4)
where Example: lw $t0,12($s0)
1) operation name This instruction will take the pointer in $s0, add 12 bytes to it, and
then load the value from the memory pointed to by this calculated
2) register that will receive value sum into register $t0
3) numerical offset in bytes Notes:
4) register containing pointer to memory $s0 is called the base register
MIPS Instruction Name: 12 is called the offset
lw (meaning Load Word, so 32 bits offset is generally used in accessing elements of array or structure:
or one word are loaded at a time) base reg points to beginning of array or structure
Also want to store from register into memory Key Concept: A register can hold any 32-bit value.
Store instruction syntax is identical to Load’s That value can be a (signed) int, an unsigned
MIPS Instruction Name: int, a pointer (memory address), and so on
sw (meaning Store Word, so 32 bits or one word are loaded at a
If you write add $t2,$t1,$t0
time)
Data flow then $t0 and $t1
Example: sw $t0,12($s0)
better contain values
This instruction will take the pointer in $s0, add 12 bytes to it, and then If you write lw $t2,0($t0)
store the value from register $t0 into that memory address then $t0 better contain a pointer
Remember: “Store INTO memory” Don’t mix these up!
5
Addressing: Byte vs. word Compilation with Memory
Every word in memory has an address, similar to an What offset in lw to select A[5] in C?
index in an array 4x5=20 to select A[5]: byte v. word
Early computers numbered words like C numbers Compile by hand using registers:
elements of an array: g = h + A[5];
Memory[0], Memory[1], Memory[2], …
g: $s1, h: $s2, $s3:base address of A
Called the “address” of a word 1st transfer from memory to register:
Computers needed to access 8-bit bytes as well as lw $t0,20($s3) # $t0 gets A[5]
words (4 bytes/word) Add 20 to $s3 to select A[5], put into $t0
6
C Decisions: if Statements MIPS Decision Instructions
2 kinds of if statements in C Decision instruction in MIPS:
if (condition) clause beq register1, register2, L1
if (condition) clause1 else clause2 beq is “Branch if (registers are) equal”
Rearrange 2nd if into following: Same meaning as (using C):
if (register1==register2) goto L1
if (condition) goto L1;
clause2; Complementary MIPS decision instruction
goto L2; bne register1, register2, L1
L1: clause1; bne is “Branch if (registers are) not equal”
L2: Same meaning as (using C):
Not as elegant as if-else, but same meaning if (register1!=register2) goto L1
Called conditional branches
7
Loading, Storing bytes 1/2 Loading, Storing bytes 2/2
In addition to word data transfers What do with other 24 bits in the 32 bit register?
(lw, sw), MIPS has byte data transfers: lb: sign extends to fill upper 24 bits
load byte: lb
store byte: sb
xxxx xxxx xxxx xxxx xxxx xxxx xzzz zzzz
same format as lw, sw
byte
…is copied to “sign-extend” loaded
This bit
• Normally don't want to sign extend chars
• MIPS instruction that doesn’t sign extend
when loading bytes:
load byte unsigned: lbu
Reminder: Overflow occurs when there is a Some languages detect overflow (Ada), some don’t (C)
mistake in arithmetic due to the limited precision in MIPS solution is 2 kinds of arithmetic instructions to
computers. recognize 2 choices:
add (add), add immediate (addi) and subtract (sub) cause
Example (4-bit unsigned numbers): overflow to be detected
+15 1111 add unsigned (addu), add immediate unsigned (addiu) and
subtract unsigned (subu) do not cause overflow detection
+3 0011
+18 10010 Compiler selects appropriate arithmetic
MIPS C compilers produce
But we don’t have room for 5-bit solution, so the solution addu, addiu, subu
would be 0010, which is +2, and wrong.
8
Loops in C/Assembly (3/3) Inequalities in MIPS (1/3)
There are three types of loops in C: Until now, we’ve only tested equalities
while (== and != in C). General programs need to test < and
do… while > as well.
for Create a MIPS Inequality Instruction:
Each can be rewritten as either of the other two, so “Set on Less Than”
the method used in the previous example can be Syntax: slt reg1,reg2,reg3
applied to while and for loops as well. Meaning:
Key Concept: Though there are multiple ways of if (reg2 < reg3)
reg1 = 1; reg1 = (reg2 < reg3);
writing a loop in MIPS, the key to decision making is
else reg1 = 0;
conditional branch
In computereeze, “set” means “set to 1”,
“reset” means “set to 0”.
9
MIPS Signed vs. Unsigned – diff meanings! Example: The C Switch Statement (1/3)
MIPS Signed v. Unsigned is an “overloaded” Choose among four alternatives depending on
term whether k has the value 0, 1, 2 or 3. Compile this C
Do/Don't sign extend code:
(lb, lbu)
Don't overflow switch (k) {
(addu, addiu, subu, multu, divu) case 0: f=i+j; break; /* k=0 */
case 1: f=g+h; break; /* k=1 */
Do signed/unsigned compare case 2: f=g–h; break; /* k=2 */
(slt, slti/sltu, sltiu) case 3: f=i–j; break; /* k=3 */
}
Example: The C Switch Statement (2/3) Example: The C Switch Statement (3/3)
Final compiled MIPS code:
This is complicated, so simplify.
bne $s5,$0,L1 # branch k!=0
Rewrite it as a chain of if-else statements, which add $s0,$s3,$s4 #k==0 so f=i+j
we already know how to compile: j Exit # end of case so Exit
if(k==0) f=i+j; L1: addi $t0,$s5,-1 # $t0=k-1
else if(k==1) f=g+h; bne $t0,$0,L2 # branch k!=1
else if(k==2) f=g–h; add $s0,$s1,$s2 #k==1 so f=g+h
else if(k==3) f=i–j; j Exit # end of case so Exit
L2: addi $t0,$s5,-2 # $t0=k-2
Use this mapping: bne $t0,$0,L3 # branch k!=2
f:$s0, g:$s1, h:$s2, sub $s0,$s1,$s2 #k==2 so f=g-h
i:$s3, j:$s4, k:$s5 j Exit # end of case so Exit
L3: addi $t0,$s5,-3 # $t0=k-3
bne $t0,$0,Exit # branch k!=3
sub $s0,$s3,$s4 #k==3 so f=i-j
Exit:
10
Logical Operators (1/3) Logical Operators (2/3)
Logical Instruction Syntax:
Two basic logical operators: 1 2,3,4
where
AND: outputs 1 only if both inputs are 1
1) operation name
OR: outputs 1 if at least one input is 1 2) register that will receive value
Truth Table: standard table listing all possible 3) first operand (register)
combinations of inputs and resultant output for 4) second operand (register) or
immediate (numerical constant)
each. E.g.,
A B A AND B A OR B In general, can define them to accept > 2 inputs,
but in the case of MIPS assembly, these accept
0 0 0 0 exactly 2 inputs and produce 1 output
0 1 0 1 Again, rigid syntax, simpler hardware
1 0 0 1
1 1 1 1
Uses for Logical Operators (2/3) Uses for Logical Operators (3/3)
The second bitstring in the example is called a Similarly, note that oring a bit with 1 produces a 1
mask. It is used to isolate the rightmost 12 bits of at the output while oring a bit with 0 produces the
the first bitstring by masking out the rest of the original bit.
string (e.g. setting it to all 0s).
This can be used to force certain bits of a string to
Thus, the and operator can be used to set certain 1s.
portions of a bitstring to 0s, while leaving the rest For example, if $t0 contains 0x12345678, then after this
alone. instruction:
In particular, if the first bitstring in the above example were in ori $t0, $t0, 0xFFFF
$t0, then the following instruction would mask it: … $t0 contains 0x1234FFFF (e.g. the high-order 16 bits are
andi $t0,$t0,0xFFF untouched, while the low-order 16 bits are forced to 1s).
11
Shift Instructions (1/4) Shift Instructions (2/4)
Move (shift) all the bits in a word to the left or right Shift Instruction Syntax:
by a number of bits. 1 2,3,4
where
Example: shift right by 8 bits 1) operation name
0001 0010 0011 0100 0101 0110 0111 1000 2) register that will receive value
3) first operand (register)
4) shift amount (constant < 32)
MIPS shift instructions:
1. sll (shift left logical): shifts left and fills emptied bits with 0s
2. srl (shift right logical): shifts right and fills emptied bits with 0s
3. sra (shift right arithmetic): shifts right and fills emptied bits by sign
0000 0000 0001 0010 0011 0100 0101 0110 extending
Example: shift left by 8 bits
0001 0010 0011 0100 0101 0110 0111 1000
12
Big Idea: Stored-Program Concept Consequence #1: Everything Addressed
Computers built on 2 key principles: Since all instructions and data are stored in memory as
1) Instructions are represented as numbers. numbers, everything has a memory address:
instructions, data words
2) Therefore, entire programs can be stored in memory
both branches and jumps use these
to be read or written just like numbers (data).
C pointers are just memory addresses: they can point to
Simplifies SW/HW of computer systems: anything in memory
Memory technology for data also used for programs Unconstrained use of addresses can lead to nasty bugs; up to you in
C; limits in Java
One register keeps address of instruction being
executed: “Program Counter” (PC)
Basically a pointer to memory: Intel calls it Instruction Address
Pointer, a better name
13
R-Format Instructions (1/5) R-Format Instructions (2/5)
Define “fields” of the following number of bits each: What do these field integer values tell us?
6 + 5 + 5 + 5 + 5 + 6 = 32 opcode: partially specifies what instruction it is
Note: This number is equal to 0 for all R-Format instructions.
6 5 5 5 5 6 funct: combined with opcode, this number exactly specifies the
instruction
For simplicity, each field has a name: Question: Why aren’t opcode and funct a single 12-bit field?
Answer: We’ll answer this later.
opcode rs rt rd shamt funct
14
R-Format Example (2/2) I-Format Instructions (1/4)
What about instructions with immediates?
MIPS Instruction: 5-bit field only represents numbers up to the value 31:
add $8,$9,$10 immediates may be much larger than this
Ideally, MIPS would have only one instruction format (for
Decimal number per field representation: simplicity): unfortunately, we need to compromise
15
I-Format Example (2/2) In conclusion…
takes 16-bit immediate and puts these bits in the upper half (high Now each I-format instruction has only a 16-bit immediate.
order half) of the specified register Wouldn’t it be nice if the assembler would do this for us
sets lower half to 0s automatically? (later)
16
Branches: PC-Relative Addressing (1/5) Branches: PC-Relative Addressing (2/5)
Use I-Format How do we usually use branches?
Answer: if-else, while, for
opcode rs rt immediate Loops are generally small: typically up to 50 instructions
Function calls and unconditional jumps are done using jump
opcode specifies beq v. bne instructions (j and jal), not the branches.
rs and rt specify registers to compare Conclusion: may want to branch to anywhere in
What can immediate specify? memory, but a branch often changes PC by a small
Immediate is only 16 bits amount
PC (Program Counter) has byte address of current instruction
being executed;
32-bit pointer to memory
So immediate cannot specify entire address to branch to.
17
Branch Example (2/3) Branch Example (3/3)
MIPS Code: MIPS Code:
Loop: beq $9,$0,End Loop: beq $9,$0,End
addi $8,$8,$10 addi $8,$8,$10
addi $9,$9,-1 addi $9,$9,-1
j Loop j Loop
End: End:
Immediate Field:
Number of instructions to add to (or subtract from) the PC,
starting at the instruction following the branch.
In beq case, immediate = 3 decimal representation:
4 9 0 3
binary representation:
Does the value in branch field change if we move For branches, we assumed that we won’t want to
the code? branch too far, so we can specify change in PC.
What do we do if destination is > 215 instructions For general jumps (j and jal), we may jump to
away from branch? anywhere in memory.
Since it’s limited to ± 215 instructions, doesn’t this Ideally, we could specify a 32-bit memory address
generate lots of extra MIPS instructions? to jump to.
Unfortunately, we can’t fit both a 6-bit opcode and
a 32-bit address into a single 32-bit word, so we
compromise.
18
J-Format Instructions (4/5) J-Format Instructions (5/5)
Now specify 28 bits of a 32-bit address Summary:
Where do we get the other 4 bits? New PC = { PC[31..28], target address, 00 }
By definition, take the 4 highest order bits from the PC.
Understand where each part came from!
Technically, this means that we cannot jump to anywhere in
memory, but it’s adequate 99.9999…% of the time, since Note: { , , } means concatenation
programs aren’t that long { 4 bits , 26 bits , 2 bits } = 32 bit address
only if straddle a 256 MB boundary { 1010, 11111111111111111111111111, 00 } =
If we absolutely need to specify a 32-bit address, we can always 10101111111111111111111111111100
put it in a register and use the jr instruction.
PC-relative addressing
Name Fields Comments The address is the sum of the PC and a constant in the instruction. I-Type
Field size 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits All MIPS instructions 32 bits Eg: beq $t2, $t3, 25 # if ($t2==$t3), goto PC+4+100
R-format op rs rt rd shamt funct Arithmetic instruction format
Pseudodirect addressing
I-format op rs rt address/immediate Transfer, branch, imm. format The 26-bit constant is logically shifted left 2 positions to get 28 bits. Then
J-format op target address Jump instruction format
the upper 4 bits of PC+4 is concatenated with this 28 bits to get the new
PC address. J-type, e. g., j 2500
19
Decoding Machine Language Decoding Example (1/7)
How do we convert 1s and 0s to C code? Here are six machine language instructions in
Machine language ⇒ Assembly language ⇒ C? hexadecimal:
For each 32 bits: 00001025hex
Look at opcode: 0 means R-Format, 2 or 3 mean J-Format, 0005402Ahex
11000003hex
otherwise I-Format. 00441020hex
Use instruction type to determine which fields exist. 20A5FFFFhex
08100001hex
Write out MIPS assembly code, converting each field to name,
register number/name, or decimal/hex number. Let the first instruction be at address 4,194,304ten
Logically convert this MIPS code into valid C code. Always (0x00400000hex).
possible? Unique? Next step: convert hex to binary
20
Decoding Example (6/7) Decoding Example (7/7)
21
Example Pseudoinstructions Example Pseudoinstructions
In conclusion Questions?
Disassembly is simple and starts by decoding
opcode field.
Be creative, efficient when authoring C
Assembler expands real instruction set with
pseudoinstructions
Only hardware implemented instructions can be converted to
raw binary
Assembler’s job to do conversion
Assembler uses reserved register $at
pseudoinstructions make it much easier to write MIPS
22