Chapter 2

Instructions: Language p of the Computer

§2.1 Intro oduction

Instruction Set
„

„

The repertoire of instructions of a co pute computer Different computers have different instruction sets
„

But with many aspects in common

„

Early computers had very simple instruction sets
„

Simplified p implementation p

„

Many modern computers also have simple instruction sets
Chapter 2 — Instructions: Language of the Computer — 2

The MIPS Instruction Set
„ „

„

Used as the example throughout the book Stanford MIPS commercialized by MIPS Technologies (www.mips.com) Large share of embedded core market
„

Applications in consumer electronics, network/storage equipment, cameras, printers, … See MIPS Reference Data tear-out card, and A Appendixes di B and dE

„

Typical of many modern ISAs
„

Chapter 2 — Instructions: Language of the Computer — 3

three operands „ Two sources and one destination „ „ add a.2 Ope erations of f the Comp puter Hard dware Arithmetic Operations „ Add and subtract. c # a gets b + c All arithmetic operations have this form Design g Principle 1: Simplicity y favours regularity „ „ Regularity g y makes implementation p simpler p Simplicity enables higher performance at lower cost Chapter 2 — Instructions: Language of the Computer — 4 . b.§2.

t1 i. i j sub f. h add t1.t1 Chapter 2 — Instructions: Language of the Computer — 5 . t0.(i + j). t1 # temp t0 = g + h # temp t1 = i + j # f = t0 .Arithmetic Example „ C code: f = (g ( + h) . j) „ Compiled p MIPS code: add t0. g.

3 Ope erands of t the Compu uter Hardw ware Register Operands „ „ Arithmetic instructions use register operands MIPS has a 32 × 32-bit register file „ „ „ Use for frequently accessed data N b d0t Numbered to 31 32-bit data called a “word” $t0. $s1.f. …. $t1. $s7 for saved variables c. $t9 for temporary values $s0. main memory: millions of locations „ Assembler names „ „ „ Design Principle 2: Smaller is faster „ Chapter 2 — Instructions: Language of the Computer — 6 . ….§2.

$s4 „ C Compiled il d MIPS code: d add $t0.Register Operand Example „ C code: f = (g + h) . „ f. ….(i + j). $t1 Chapter 2 — Instructions: Language of the Computer — 7 . $t0. $s1. $t1 $s3. …. $s2 add dd $t1. $ 3 $s4 $ 4 sub $s0. j in $s0.

f. Little Endian: least-significant byte at least address Chapter 2 — Instructions: Language of the Computer — 8 „ To apply arithmetic operations „ „ „ Memory is byte addressed „ „ Words are aligned in memory „ „ MIPS is Big Endian „ „ . structures. dynamic data Load values from memory into registers Store result from register to memory Each address identifies an 8-bit byte Address must be a multiple of 4 Most-significant byte at least address of a word c.Memory Operands „ Main memory used for composite data „ Arrays.

$s1 $s2. „ g in $s1.Memory Operand Example 1 „ C code: g = h + A[8]. base address of A in $s3 „ C Compiled il d MIPS code: d „ Index 8 requires offset of 32 „ 4 bytes per word lw $t0. $s2 $t0 offset # load word base register Chapter 2 — Instructions: Language of the Computer — 9 . h in $s2. 32($s3) add $s1.

32($s3) # load word add $t0. $t0 48($s3) # store word „ Chapter 2 — Instructions: Language of the Computer — 10 . „ h in $s2. base address of A in $s3 „ C Compiled il d MIPS code: d Index 8 requires offset of 32 lw $t0. $s2. $t0 sw $t0.Memory Operand Example 2 „ C code: A[12] = h + A[8].

Memory „ „ Registers are faster to access than e oy memory Operating on memory data requires loads and stores „ More instructions to be executed „ Compiler must use registers for variables as much as possible „ „ Only y spill p to memory y for less frequently q y used variables Register optimization is important! Chapter 2 — Instructions: Language of the Computer — 11 .Registers vs.

$s3 4 „ No subtract immediate instruction „ J Just use a negative i constant addi $s2. $s3 $s3. $s1.Immediate Operands „ Constant data specified in an instruction addi $s3. -1 „ Design D i P Principle i i l 3 3: Make M k the th common case fast „ „ Small constants are common Immediate operand avoids a load instruction Chapter 2 — Instructions: Language of the Computer — 12 .

g. $s1. $zero „ Useful for common operations „ Chapter 2 — Instructions: Language of the Computer — 13 ..The Constant Zero „ MIPS register 0 ($zero) is the constant 0 „ Cannot be overwritten E. move between E b registers i add $t2.

96 . 95 Chapter 2 — Instructions: Language of the Computer — 14 .294.§2.295 .4 Sign ned and U Unsigned N Numbers Unsigned Binary Integers „ Given an n-bit number x = x n−1 2 n −1 + x n−2 2 n−2 + L + x1 2 + x 0 2 1 0 „ „ Range: 0 to +2n – 1 Example „ 0000 0000 0000 0000 0000 0000 0000 10112 = 0 + … + 1×23 + 0×22 +1×21 +1×20 = 0 + … + 8 + 0 + 2 + 1 = 1110 „ Using 32 bits „ 0 to o +4.967. 9 .

483.6 Chapter 2 — Instructions: Language of the Computer — 15 .2s-Complement Signed Integers „ Given an n-bit number x = − x n−1 2 n −1 + x n−2 2 n−2 + L + x1 2 + x 0 2 1 0 „ „ Range: –2 2n – 1 to +2n – 1 – 1 Example „ 1111 1111 1111 1111 1111 1111 1111 11002 = –1×231 + 1×230 + … + 1×22 +0×21 +0×20 = –2.648 . 83.147.147.483.647 .483.6 8 to o +2.483. . 83.147.644 = –410 „ Using 32 bits „ –2. .648 + 2.147.

2s-Complement Signed Integers „ Bit 31 is sign bit „ „ 1 for negative numbers 0 for non-negative numbers „ „ „ –(–2n – 1) can’t be represented Non-negative numbers have the same unsigned and 2s-complement representation Some specific numbers „ „ „ „ 0: 0000 0000 … 0000 –1: 1111 1111 … 1111 Most-negative: 1000 0000 … 0000 Most-positive: 0111 1111 … 1111 Chapter 2 — Instructions: Language of the Computer — 16 .

0 0→1 x + x = 1111..Signed Negation „ Complement and add 1 „ Complement means 1 → 0.1112 = −1 x + 1 = −x „ Example: negate +2 „ „ +2 2 = 0000 0000 … 00102 –2 = 1111 1111 … 11012 + 1 = 1111 1111 … 11102 Chapter 2 — Instructions: Language of the Computer — 17 ..

f. unsigned values: extend with 0s +2: 0000 0010 => 0000 0000 0000 0010 –2: 1111 1110 => 1111 1111 1111 1110 Chapter 2 — Instructions: Language of the Computer — 18 „ In MIPS instruction set „ „ „ „ Replicate the sign bit to the left „ „ Examples: a p es 8 8-bit b t to 16-bit 6 bt „ „ . lh: extend loaded byte/halfword / f beq.Sign Extension „ Representing a number using more bits „ Preserve the numeric value addi: extend immediate value lb. bne: extend the displacement c.

register numbers.§2.5 Rep presenting Instructio ons in the C Computer Representing Instructions „ Instructions are encoded in binary „ Called machine code Encoded as 32-bit instruction words Small number of formats encoding operation code (opcode). … Regularity! $t0 $ 0 – $t7 $ are reg’s ’ 8 – 15 1 $t8 – $t9 are reg’s 24 – 25 $s0 – $s7 are reg’s reg s 16 – 23 „ MIPS instructions „ „ „ „ Register numbers „ „ „ Chapter 2 — Instructions: Language of the Computer — 19 .

MIPS R-format Instructions op 6 bit bits rs 5 bit bits rt 5 bit bits rd 5 bit bits shamt 5 bit bits funct 6 bit bits „ Instruction fields „ „ „ „ „ „ op: operation code (opcode) rs: first source register number rt: second source register number rd: destination register number shamt: shift amount (00000 for now) funct: function code ( (extends opcode) ) Chapter 2 — Instructions: Language of the Computer — 20 .

$ . $s1. .R-format Example op 6 bit bits rs 5 bit bits rt 5 bit bits rd 5 bit bits shamt 5 bit bits funct 6 bit bits add $ $t0. $s2 $ special 0 000000 $s1 17 10001 $s2 18 10010 $t0 8 01000 0 0 00000 add 32 100000 000000100011001001000000001000002 = 0232402016 Chapter 2 — Instructions: Language of the Computer — 21 .

Hexadecimal „ Base 16 „ „ Compact representation of bit strings 4 bits per hex digit 0000 0001 0010 0011 4 5 6 7 0100 0101 0110 0111 8 9 a b 1000 1001 1010 1011 c d e f 1100 1101 1110 1111 0 1 2 3 „ Example: eca8 6420 „ 1110 1100 1010 1000 0110 0100 0010 0000 Chapter 2 — Instructions: Language of the Computer — 22 .

but allow 32-bit instructions uniformly Keep formats as similar as possible Chapter 2 — Instructions: Language of the Computer — 23 .MIPS I-format Instructions op 6 bit bits rs 5 bit bits rt 5 bit bits constant or address 16 bit bits „ Immediate arithmetic and load/store instructions „ „ „ rt: t destination d ti ti or source register i t number b Constant: –215 to +215 – 1 Address: dd ess o offset set added to base add address ess in rs s „ Design Principle 4: Good design demands good compromises „ „ Different formats complicate decoding.

Stored Program Computers The BIG Picture „ „ „ Instructions represented in binary.. compilers. just like data Instructions and data stored in memory P Programs can operate t on programs „ e g compilers e.g. linkers linkers. … „ Binary compatibility allows compiled programs g to work on different computers „ Standardized ISAs Chapter 2 — Instructions: Language of the Computer — 24 .

ori i nor „ Useful for extracting and inserting groups of bits in a word Chapter 2 — Instructions: Language of the Computer — 25 .§2. andi or.6 Logical Opera ations Logical Operations „ Instructions for bitwise manipulation Operation Shift left Shift right Bitwise AND Bitwise OR Bitwise NOT C << >> & | ~ Java << >>> & | ~ MIPS sll srl and.

Shift Operations op 6 bits rs 5 bits rt 5 bits rd 5 bits shamt 5 bits funct 6 bits „ „ shamt: how many positions to shift Shift left logical „ „ Shift left and fill with 0 bits sll by i bits multiplies by 2i Shift right and fill with 0 bits srl by i bits divides by 2i (unsigned only) Chapter 2 — Instructions: Language of the Computer — 26 „ Shift right logical „ „ .

$t1.AND Operations „ Useful to mask bits in a word „ Select some bits. $t2 $t2 $t1 $t0 0000 0000 0000 0000 0000 1101 1100 0000 0000 0000 0000 0000 0011 1100 0000 0000 0000 0000 0000 0000 0000 1100 0000 0000 Chapter 2 — Instructions: Language of the Computer — 27 . bits clear others to 0 and $t0.

$t1. 1 leave others unchanged or $t0.OR Operations „ Useful to include bits in a word „ Set some bits to 1. $t2 $t2 $t1 $t0 0000 0000 0000 0000 0000 1101 1100 0000 0000 0000 0000 0000 0011 1100 0000 0000 0000 0000 0000 0000 0011 1101 1100 0000 Chapter 2 — Instructions: Language of the Computer — 28 .

$zero $t1 $ $t0 0000 0000 0000 0000 0011 1100 0000 0000 1111 1111 1111 1111 1100 0011 1111 1111 Chapter 2 — Instructions: Language of the Computer — 29 .NOT Operations „ Useful to invert bits in a word „ Change 0 to 1 1. and 1 to 0 a NOR b == NOT ( a OR b ) Register 0: always read as zero „ MIPS has NOR 3-operand instruction „ nor $t0. $t1.

if (rs != rt) branch to instruction labeled L1.7 Instructions fo or Making Decisions s Conditional Operations „ Branch to a labeled instruction if a condition co d t o is st true ue „ Otherwise. continue sequentially if (rs == rt) branch to instruction labeled L1. L1 „ „ bne rs. L1 „ „ j L1 „ Chapter 2 — Instructions: Language of the Computer — 30 .§2. unconditional jump to instruction labeled L1 „ beq rs. rt. rt.

g … in $s0. $s4. $s0 $s1 $s1. else f = g-h. Else $s0. … $s3. „ f g.Compiling If Statements „ C code: if (i (i==j) j) f = g+h. $s1. $s2 Exit $s0. $s1. $s2 Assembler calculates addresses Chapter 2 — Instructions: Language of the Computer — 31 „ Compiled MIPS code: bne add j Else: sub Exit: … . f.

$t1 $t0.Compiling Loop Statements „ C code: while (save[i] == k) i += 1. „ i in $s3. $s3. Exit $s3. address of save in $s6 $t1. $s3 1 „ Compiled MIPS code: Loop: sll add lw bne addi j Exit: … Chapter 2 — Instructions: Language of the Computer — 32 . $t1. $t0. $s3 Loop $s3. k in $s5. $t1 $s6 0($t1) $s5. 2 $t1.

Basic Blocks „ A basic block is a sequence of instructions with „ „ No embedded branches (except at end) No branch targets (except at beginning) „ „ A compiler il id identifies tifi b basic i blocks for optimization An advanced processor can accelerate execution of basic blocks Chapter 2 — Instructions: Language of the Computer — 33 .

slt $t0. bne Chapter 2 — Instructions: Language of the Computer — 34 . rt „ „ slti rt. $zero. $s1. else rt = 0. rs. if (rs < constant) rt = 1. $s2 bne $t0. else l rd d=0 0. if ( (rs < rt) ) rd d=1 1.More Conditional Operations „ Set result to 1 if a condition is true „ Otherwise set to 0 Otherwise. rs. L # if ($s1 < $s2) # branch to L „ slt rd. constant „ „ Use in combination with beq q.

requiring a slower clock All instructions penalized! „ „ beq and b d bne b are the th common case This is a good design compromise Chapter 2 — Instructions: Language of the Computer — 35 . ≥.Branch Instruction Design „ „ Why not blt. ≠ „ „ Combining with branch involves more work per instruction instruction. … slower than = =. bge. etc? Hardware for < <.

$s0. $s0.967.295 > +1 ⇒ $t0 = 0 Chapter 2 — Instructions: Language of the Computer — 36 .Signed vs. $s1 # signed „ –1 < +1 ⇒ $t0 = 1 „ sltu $t0.294. sltui Example „ „ „ $s0 = 1111 1111 1111 1111 1111 1111 1111 1111 $s1 = 0000 0000 0000 0000 0000 0000 0000 0001 slt $t0. Unsigned „ „ „ Signed comparison: slt. slti Unsigned comparison: sltu. $s1 „ # unsigned +4.

8 Sup pporting Pr rocedures in Compu uter Hardw ware Procedure Calling „ Steps required 1. Place parameters in registers Transfer control to procedure Acquire storage for procedure Perform procedure’s operations Pl Place result lt in i register i t for f caller ll Return to place of call Chapter 2 — Instructions: Language of the Computer — 37 . 6.§2. 5 5. 3 3. 4. 1 2.

$ .Register Usage „ „ „ $a0 – $a3: arguments (reg’s 4 – 7) $v0.$ $v1: result values ( (reg’s g 2 and 3) ) $t0 – $t9: temporaries „ Can be overwritten by callee Must be saved/restored by callee „ $s0 – $s7: saved „ „ „ „ „ $gp: global $ l b l pointer i t f for static t ti d data t ( (reg 28) $sp: stack pointer (reg 29) $f frame $fp: f pointer i t (reg ( 30) $ra: return address (reg 31) Chapter 2 — Instructions: Language of the Computer — 38 .

g. for case/switch statements Chapter 2 — Instructions: Language of the Computer — 39 .Procedure Call Instructions „ Procedure call: jump and link jal ProcedureLabel „ Address of following instruction put in $ra „ Jumps to target address „ Procedure return: jump register jr $ra „ Copies $ra to program counter „ Can also be used for computed jumps „ e..

need to save $s0 on stack) ) „ Result in $v0 Chapter 2 — Instructions: Language of the Computer — 40 . …. $a3 „ f in $s0 ( (hence. ….( (i + j). } „ Arguments g. h i.Leaf Procedure Example „ C code: int leaf_example leaf example (int g. return f. f = (g + h) ) . g h. j in $a0. i j) { int f.

$sp 4 jr $ra Save $s0 on stack Procedure body Result Restore $s0 Return Chapter 2 — Instructions: Language of the Computer — 41 . $zero lw $s0. $t1 $ add $v0. 0($sp) addi $sp $sp. 0($sp) add dd $t0. $a2. $a3 sub $ $s0. $ .Leaf Procedure Example „ MIPS code: leaf_example: leaf example: addi $sp. $sp. $ 0 $a1 $ 1 add $t1. -4 sw $s0. $t0 $a0. $sp. . $s0. $t0.

caller needs to save on the stack: „ „ Its return It t address dd Any arguments and temporaries needed after the call „ Restore from the stack after the call Chapter 2 — Instructions: Language of the Computer — 42 .Non-Leaf Procedures „ „ Procedures that call other procedures For nested call call.

Non-Leaf Procedure Example „ C code: int fact (int n) { if ( (n < 1) ) return etu f. else return n * fact(n . } „ Argument n in $a0 „ Result in $v0 Chapter 2 — Instructions: Language of the Computer — 43 .1). .

-1 0($sp) ($ p) 4($sp) $sp. result is 1 pop 2 items from stack and d return else decrement n recursive call restore original g n and return address pop 2 items from stack multiply to get result and return Chapter 2 — Instructions: Language of the Computer — 44 . L1 $zero. 8 $a0. $v0. $ . $v0 # # # # # # # # # # # # # # adjust stack for 2 items save return address save argument test for n < 1 if so. 8 $a0. $sp. $t0. $t0. $a0. $ra $sp. fact $a0. -8 4($sp) 0($sp) $a0. 1 $zero. 1 $sp.Non-Leaf Procedure Example „ MIPS code: fact: addi sw sw slti beq addi addi j jr L1: addi jal lw lw addi mul jr $sp. $ra. $sp. $ra. $ra $ $a0. $v0.

Local Data on the Stack „ Local data allocated by y callee „ e. C automatic variables U db Used by some compilers il t to manage stack t k storage t Chapter 2 — Instructions: Language of the Computer — 45 „ Procedure frame (activation record) „ .g..

Memory Layout
„ „

Text: program code Static data: global g variables
„

„

e.g., static variables in C, constant arrays and strings $gp initialized to address allowing ±offsets into this segment E.g., E g malloc in C C, new in Java

„

Dynamic data: heap
„

„

Stack: automatic storage
Chapter 2 — Instructions: Language of the Computer — 46

§2.9 Com mmunicatin ng with Pe eople

Character Data
„

Byte-encoded character sets
„

ASCII: 128 characters
„

95 graphic, 33 control ASCII, +96 more graphic characters

„

Latin-1: 256 characters
„

„

Unicode: 32 32-bit bit character set
„ „ „

Used in Java, C++ wide characters, … M t of Most f the th world’s ld’ alphabets, l h b t plus l symbols b l UTF-8, UTF-16: variable-length encodings
Chapter 2 — Instructions: Language of the Computer — 47

Byte/Halfword Operations
„ „

Could use bitwise operations MIPS byte/halfword load/store
„

String processing is a common case
lh rt, offset(rs) ff ( ) lhu lh rt, offset(rs) ff ( ) sh h rt, offset(rs) ff ( )

lb rt, offset(rs) ff ( )
„

Sign extend to 32 bits in rt Zero extend to 32 bits in rt Store just rightmost byte/halfword
Chapter 2 — Instructions: Language of the Computer — 48

lb rt, offset(rs) lbu ff ( )
„

sb b rt, offset(rs) ff ( )
„

String Copy Example „ C code (naïve): Null-terminated Null terminated string void strcpy (char x[]. while ((x[i]=y[i])!='\0') (( [ ] y[ ]) \ ) i += 1. } „ Addresses of x. $a1 „ i in $s0 „ Chapter 2 — Instructions: Language of the Computer — 49 . y in $a0. i = 0. char y[]) { int i.

$ 2 $s0. $t1. $s0. $t3. $t2. $a1 0($t1) $s0. $t2. $s0. $zero $s0. L1 $s0.String Copy Example „ MIPS code: strcpy: addi sw add L1: add lbu add sb b beq addi j L2: lw addi jr $sp. $a0 0($t3) $zero. $t2. $ . 1 0($sp) ($ p) $sp. -4 0($sp) $zero. 4 # # # # # # # # # # # # # adjust stack for 1 item save $s0 i = 0 addr of y[i] in $t1 $t2 = y[i] addr of x[i] in $t3 x[i] = y[i] exit i loop l if y[i] [i] == 0 i = i + 1 next iteration of loop restore saved $s0 $ pop 1 item from stack and return Chapter 2 — Instructions: Language of the Computer — 50 . $sp. $ L2 2 $s0. $ra $sp.

10 MIPS Addres ssing for 3 32-Bit Imm mediates and Addres sses 32-bit Constants „ Most constants are small „ 16 bit immediate is sufficient 16-bit „ For the occasional 32-bit constant l i rt. 61 ori $s0. $s0. .§2. . 2304 0000 0000 0111 1101 0000 1001 0000 0000 Chapter 2 — Instructions: Language of the Computer — 51 . constant lui „ „ Copies 16-bit constant to left 16 bits of rt Clears right 16 bits of rt to 0 0000 0000 0111 1101 0000 0000 0000 0000 lhi $s0.

Branch Addressing „ Branch instructions specify „ Opcode two registers. registers target address F Forward d or backward b k d op 6 bits „ Most branch targets are near branch „ rs 5 bits rt 5 bits constant or address 16 bits „ PC relative addressing PC-relative „ „ Target address = PC + offset × 4 PC already l d incremented i t d by b 4b by thi this ti time Chapter 2 — Instructions: Language of the Computer — 52 . Opcode.

Jump Addressing „ Jump (j and jal) targets could be anywhere in text segment „ Encode full address in instruction op 6 bits address 26 bits „ (Pseudo)Direct jump addressing „ Target address = PC31…28 : (address × 4) Chapter 2 — Instructions: Language of the Computer — 53 .

$s3. Exit 80012 80016 80020 80024 addi $s3. $s6 $t0. 2 $t1. 1 j Exit: … Loop Chapter 2 — Instructions: Language of the Computer — 54 . $s3. $s5. 0($t1) 80000 80004 80008 0 0 35 5 8 2 0 9 9 8 19 19 22 8 21 19 20000 9 9 4 0 0 2 1 0 32 Loop: sll add lw bne $t0.Target Addressing Example „ Loop code from earlier example „ Assume Loop at location 80000 $t1. $t1.

$ 0 $ 1 L1 ↓ b bne $s0.$s1. offset assembler rewrites the code Example b beq $s0. $ 0 $ 1 L2 2 j L1 L2: … Chapter 2 — Instructions: Language of the Computer — 55 .Branching Far Away „ „ If branch target is too far to encode with 16-bit offset.$s1.

Addressing Mode Summary Chapter 2 — Instructions: Language of the Computer — 56 .

.11 Parallelism a and Instruc ctions: Syn nchronizat tion Synchronization „ Two processors sharing an area of memory „ „ P1 writes.§2.g. then P2 reads Data race if P1 and P2 don’t synchronize „ Result depends of order of accesses „ Hardware support required „ „ Atomic read/write memory operation No other access to the location allowed between the read and write E. atomic swap of register ↔ memory Or an atomic pair of instructions „ Could be a single instruction „ „ Chapter 2 — Instructions: Language of the Computer — 57 .

$t1 . offset(rs) Store conditional: sc rt.0($s1) $t0.$s4 $t1.load linked .Synchronization in MIPS „ „ Load linked: ll rt.$zero.branch store fails .0($s1) $t0 0($s1) $t0.copy exchange value .try $s4.put load value in $s4 Chapter 2 — Instructions: Language of the Computer — 58 .$zero.store conditional . .$zero. offset(rs) ( ) „ Succeeds if location not changed since the ll „ Returns 1 in rt Returns 0 in rt „ Fails if location is changed „ „ Example: p atomic swap p( (to test/set lock variable) ) try: add ll sc beq add $t0.

12 Tra anslating a and Startin ng a Progr ram Translation and Startup Many compilers produce object modules directly Static linking Chapter 2 — Instructions: Language of the Computer — 59 .§2.

$zero. $t0. $t1 bne $at. $t1 blt $t0. L → slt $at.Assembler Pseudoinstructions „ „ Most assembler instructions represent machine instructions one-to-one Pseudoinstructions: figments of the assembler’s assembler s imagination → add $t0. L „ $ t( $at (register i t 1) 1): assembler bl t temporary Chapter 2 — Instructions: Language of the Computer — 60 . $t1. $t1 move $t0. $zero.

Producing an Object Module „ „ Assembler (or compiler) translates program into machine instructions Provides information for building a complete program from the pieces „ „ „ „ „ „ Header: H d d described ib d contents t t of f object bj t module d l Text segment: translated instructions Static data segment: data allocated for the life of the program Relocation info: for contents that depend on absolute location of loaded program Symbol table: global definitions and external refs Debug info: for associating with source code Chapter 2 — Instructions: Language of the Computer — 61 .

Merges segments 1 2. Resolve labels (determine their addresses) 3 Patch location-dependent 3.Linking Object Modules „ Produces an executable image 1. no need to do this Program can be loaded into absolute location in virtual memory space Chapter 2 — Instructions: Language of the Computer — 62 . location dependent and external refs „ Could leave location dependencies for fi i b fixing by a relocating l ti l loader d „ „ But with virtual memory.

Read header to determine segment sizes 1 2. $gp) 6 Jump 6. Initialize registers (including $sp. Set up arguments on stack 4 5. do exit syscall Chapter 2 — Instructions: Language of the Computer — 63 . … and calls main When main returns returns.Loading a Program „ Load from image file on disk into memory 1. J to t startup t t routine ti „ „ Copies arguments to $a0. $fp. „ Or set page table entries so they can be faulted in 4. Create virtual address space 3 Copy text and initialized data into memory 3.

Dynamic Linking „ Only link/load library procedure when it is called „ „ „ Requires procedure code to be relocatable Avoids image bloat caused by static linking of all (transitively) referenced libraries Automatically picks up new library versions Chapter 2 — Instructions: Language of the Computer — 64 .

Lazy Linkage Indirection table Stub: Loads routine ID. Jump to linker/loader Linker/loader code Dynamically mapped code Chapter 2 — Instructions: Language of the Computer — 65 .

Starting Java Applications Simple Si l portable t bl instruction set for the JVM Compiles bytecodes of “hot” methods into native code for host machine Interprets bytecodes Chapter 2 — Instructions: Language of the Computer — 66 .

k in $a1.13 A C Sort Exa ample to P Put It All To ogether C Sort Example „ „ Illustrates use of assembly instructions o a C bubb bubble e so sort t function u ct o for Swap procedure (leaf) void swap(int v[]. [k] v[k] = v[k+1].§2. t temp = v[k]. } v in $a0. int k) { int temp. v[k+1] [ ] = temp. temp in $t0 „ Chapter 2 — Instructions: Language of the Computer — 67 . p.

0($t1) # $t0 (temp) = v[k] lw $t2. 4($t1) # $t2 = v[k+1] sw $t2. 4($t1) # v[k+1] = $t0 (temp) jr $ra j # return to calling g routine Chapter 2 — Instructions: Language of the Computer — 68 . $a0. 0($t1) # v[k] = $t2 (v[k+1]) sw $t0. $t1 # $t1 = v+(k*4) # (address of v[k]) lw $t0.The Procedure Swap swap: sll $t1. 2 # $t1 = k * 4 add $t1. $a1.

k in $a1.The Sort Procedure in C „ Non-leaf (calls swap) void sort (int v[].j). j. i += 1) ) { for (j = i – 1. int n) { int i. i < n. . . } } } v in $a0. j -= 1) { swap(v. i in $s0. j >= 0 && v[j] > v[j + 1]. for ( (i = 0. . . . j in $s1 Chapter 2 — Instructions: Language of the Computer — 69 „ .

$t4. $s3 $t0. $s0. $s1. –1 for2tst $s0 $s0. $s1. $s0 $s0. $s1. exit2 $a0. $s0 1 for1tst # # # # # # # # # # # # # # # # # # # # # save $a0 into $s2 save $a1 into $s3 i = 0 $t0 = 0 if $s0 ≥ $s3 (i ≥ n) go to exit1 if $s0 ≥ $s3 (i ≥ n) j = i – 1 $t0 = 1 if $s1 < 0 (j < 0) go to exit2 if $s1 < 0 (j < 0) $t1 = j * 4 $t2 = v + (j * 4) $t3 = v[j] $t4 = v[j + 1] $t0 = 0 if $t4 ≥ $t3 go to exit2 if $t4 ≥ $t3 1st param of swap is v (old $a0) 2nd param of swap is j call swap procedure j –= 1 jump to test of inner loop i += + 1 jump to test of outer loop Move params Outer loop p Inner loop Pass params & call Inner loop Outer loop Chapter 2 — Instructions: Language of the Computer — 70 . 2 $t2.The Procedure Body move move move for1tst: slt beq addi for2tst: slti bne sll add lw lw slt beq move move jal addi j exit2: addi j $s2. exit1 $s1. $s0. $zero. $zero $t0 $t0. $t4 4($t2) $t0. 0($t2) $t4. $s2 $a1. –1 $t0. 0 $t0. $a1 $s0. $t1 $t3. $s2. $t3 $t0. $a0 $s3. $zero exit2 $t1. $zero. $t0 $zero. $a1 $s1 swap $s1.

The Full Procedure
sort: addi $sp,$sp, –20 sw $ra, 16($sp) sw $s3,12($sp) sw $s2, 8($sp) sw $s1, 4($sp) sw $s0, 0($sp) … … exit1: lw $s0, 0($sp) lw $s1, 4($sp) lw $s2, 8($sp) lw $s3,12($sp) lw $ra,16($sp) addi $sp,$sp, 20 jr $ra # # # # # # # # # # # # # # make room on stack for 5 registers save $ra on stack save $s3 on stack save $s2 on stack save $s1 on stack save $s0 on stack procedure body restore $s0 from stack restore $s1 from stack restore $s2 from stack restore $s3 from stack restore $ra from stack restore stack pointer return to calling routine

Chapter 2 — Instructions: Language of the Computer — 71

Effect of Compiler Optimization
Compiled with gcc for Pentium 4 under Linux
3 2.5 2 1.5 1 0.5 0 none 180000 160000 140000 120000 100000 80000 60000 40000 20000 0 none O1 O2 O3 2 1.5 1 0.5 0 O1 O2 O3 none O1 O2 O3

Relative Performance

140000 120000 100000 80000 60000 40000 20000 0 none

Instruction count

O1

O2

O3

Clock Cycles

CPI

Chapter 2 — Instructions: Language of the Computer — 72

Effect of Language and Algorithm
3 2.5 2 15 1.5 1 0.5 0 C/none C/O1 C/O2 C/O3 Java/int Java/JIT

Bubblesort Relative Performance

2.5 2 1.5 1 0.5 0 C/none

Quicksort Relative Performance

C/O1

C/O2

C/O3

Java/int

Java/JIT

3000 2500 2000 1500 1000 500 0 C/none

Quicksort vs. Bubblesort Speedup

C/O1

C/O2

C/O3

Java/int

Java/JIT

Chapter 2 — Instructions: Language of the Computer — 73

Lessons Learnt „ „ „ Instruction count and CPI are not good performance indicators in isolation Compiler optimizations are sensitive to the algorithm Java/JIT compiled code is significantly f t than faster th JVM interpreted i t t d „ Comparable to optimized C in some cases „ Nothing can fix a dumb algorithm! Chapter 2 — Instructions: Language of the Computer — 74 .

14 Arr rays versu us Pointers s Arrays vs.§2. Pointers „ Array indexing involves „ „ Multiplying index by element size Adding to array base address „ Pointers P i t correspond d directly di tl t to memory addresses „ Can avoid indexing complexity Chapter 2 — Instructions: Language of the Computer — 75 .

$t0.loop1 # if (…) # goto loop1 clear2(int *array.$a0.0($t0) # Memory[p] = 0 addi $t0.Example: Clearing and Array clear1(int array[].$t1 # i = 0 # $t1 = i * 4 # $t2 = y[ ] # &array[i] sw $zero. int size) { int *p. int size) { int i.$zero loop1: sll $t1.$a1.$zero.$zero. p < &array[size].$t2 # $t3 = #(p<&array[size]) bne $t3. } move $t0.$t0.4 # p = p + 4 slt $t3. for (p = &array[0]. for (i = 0.$t0.$a1 # $t3 = # (i < size) bne $t3.2 add $t2.$a0. p = p + 1) *p = 0.$t0.loop2 # if (…) # goto loop2 Chapter 2 — Instructions: Language of the Computer — 76 .$a0 # p = & array[0] sll $t1.$t0. i += 1) array[i] = 0. i < size. 0($t2) # array[i] = 0 addi $t0.2 # $t1 = size * 4 add $t2. } move $t0.1 # i = i + 1 slt $t3.$t1 # $t2 = y[ ] # &array[size] loop2: sw $zero.

Comparison of Array vs.f. Ptr „ „ Multiply “strength reduced” to shift Array version requires shift to be inside loop „ „ Part P t of f index i d calculation l l ti f for i incremented t di c. incrementing pointer „ Compiler can achieve same effect as manual use of pointers „ „ Induction variable elimination Better to make program clearer and safer Chapter 2 — Instructions: Language of the Computer — 77 .

16 Re eal Stuff: ARM Instru uctions ARM & MIPS Similarities „ „ ARM: the most popular embedded core Similar basic set of instructions to MIPS ARM MIPS 1985 32 bits 32-bit flat Aligned 3 31 × 32-bit Memory mapped 1985 32 bits 32-bit flat Aligned 9 15 × 32-bit Memory mapped Date announced Instruction size Address space Data alignment Data addressing modes Registers Input/output Chapter 2 — Instructions: Language of the Computer — 78 .§2.

zero. carry. overflow Compare instructions to set condition codes without keeping the result Top 4 bits of instruction word: condition value C avoid Can id b branches h over single i l i instructions t ti „ Each instruction can be conditional „ „ Chapter 2 — Instructions: Language of the Computer — 79 .Compare and Branch in ARM „ Uses condition codes for result of an arithmetic/logical instruction „ „ Negative.

Instruction Encoding Chapter 2 — Instructions: Language of the Computer — 80 .

17 Re eal Stuff: x x86 Instruc ctions The Intel x86 ISA „ Evolution with backward compatibility „ 8080 (1974): 8-bit 8 bit microprocessor „ Accumulator. MMU „ „ 80386 (1985): 32-bit extension (now IA-32) „ „ Chapter 2 — Instructions: Language of the Computer — 81 .§2. plus 3 index-register pairs Complex instruction set (CISC) Adds FP instructions and register stack Segmented memory mapping and protection Additional addressing modes and operations Paged memory mapping as well as segments „ 8086 (1978): 16-bit extension to 8080 „ „ 8087 (1980): floating-point coprocessor „ „ 80286 (1982): 24-bit addresses.

The Intel x86 ISA „ Further evolution… „ i486 (1989): pipelined. The Pentium Chronicles) Added SSE (Streaming SIMD Extensions) and associated registers New microarchitecture Added SSE2 instructions Chapter 2 — Instructions: Language of the Computer — 82 „ Pentium (1993): superscalar. on-chip caches and FPU „ C Compatible tibl competitors: tit AMD AMD. i … Later versions added MMX (Multi-Media eXtension) instructions The infamous FDIV bug New microarchitecture (see Colwell Colwell. Pentium II (1997) „ „ Pentium III (1999) „ „ Pentium 4 (2001) „ „ . C Cyrix. 64-bit datapath „ „ „ Pentium Pro (1995).

more instructions „ I t l Core Intel C (2006) „ „ AMD64 (announced 2007): SSE5 instructions „ „ Advanced Vector Extension (announced 2008) „ „ If Intel didn’t extend with compatibility.The Intel x86 ISA „ And further… „ „ AMD64 (2003): extended architecture to 64 bits EM64T – Extended E d dM Memory 64 T Technology h l (2004) „ „ AMD64 adopted by Intel (with refinements) Added SSE3 instructions Added SSE4 instructions. virtual machine support Intel declined to follow. its competitors would! „ Technical elegance ≠ market success Chapter 2 — Instructions: Language of the Computer — 83 . instead… Longer SSE registers.

Basic x86 Registers Chapter 2 — Instructions: Language of the Computer — 84 .

1.Basic x86 Addressing Modes „ Two operands per instruction Source/dest operand Register Register R i t Register Memory Memory Second source operand Register Immediate M Memory Register Immediate „ Memory addressing modes „ „ „ „ Address in register Address = Rbase + displacement Address = Rbase + 2scale × Rindex (scale = 0. or 3) l × R Add Address = Rbase + 2scale di l index + displacement Chapter 2 — Instructions: Language of the Computer — 85 . 2.

x86 Instruction Encoding „ Variable length encoding „ „ Postfix bytes specify addressing mode Prefix bytes modify operation „ Operand length. locking. … Chapter 2 — Instructions: Language of the Computer — 86 . repetition.

Implementing IA-32 „ Complex instruction set makes implementation difficult „ Hardware translates instructions to simpler microoperations „ „ Simple instructions: 1–1 Complex p instructions: 1–many y „ „ Microengine similar to RISC Market share makes this economically viable Compilers avoid complex instructions Chapter 2 — Instructions: Language of the Computer — 87 „ Comparable performance to RISC „ .

including simple ones „ Compilers are good at making fast code from simple instructions But modern compilers are better at dealing with modern ode processors p ocesso s More lines of code ⇒ more errors and less productivity „ Use assembly code for high performance „ „ Chapter 2 — Instructions: Language of the Computer — 88 .18 Fa allacies and d Pitfalls Fallacies „ Powerful instruction ⇒ higher performance „ „ Fewer instructions required But complex instructions are hard to implement „ May slow down all instructions.§2.

Fallacies „ Backward compatibility ⇒ instruction set doesn’t doesn t change „ But they do accrete more instructions x86 instruction set Chapter 2 — Instructions: Language of the Computer — 89 .

not by 1! „ Keeping a pointer to an a automatic tomatic variable ariable after procedure returns „ „ e..g. passing i pointer i t b back k via i an argument t Pointer becomes invalid when stack popped Chapter 2 — Instructions: Language of the Computer — 90 .Pitfalls „ Sequential words are not at sequential addresses „ Increment by 4.

Simplicity favors regularity Smaller is faster Make a et the e co common o case fast ast Good design demands good compromises „ Layers of software/hardware „ Compiler. hardware c. x86 „ MIPS: typical of RISC ISAs „ Chapter 2 — Instructions: Language of the Computer — 91 . 3 4. 1 2. assembler. 3.19 Co oncluding R Remarks Concluding Remarks „ Design principles 1.§2.f.

slt.Concluding Remarks „ Measure MIPS instruction executions in benchmark be c a p programs og a s „ „ Consider making the common case fast Consider compromises p MIPS examples add. sub. ori. bne. lh. sll. addi lw. andi. nor. Branch Jump Chapter 2 — Instructions: Language of the Computer — 92 . lui and. sw. sb. lhu. jr. or. lbu. slti. lb. sltiu j. jal SPEC2006 Int 16% 35% 12% 34% 2% SPEC2006 FP 48% 36% 4% 8% 0% Instruction class Arithmetic Data transfer Logical g Cond. srl beq.

Sign up to vote on this title
UsefulNot useful