# Chapter 2

Instructions: Language p of the Computer

§2.1 Intro oduction

Instruction Set




The repertoire of instructions of a co pute computer Different computers have different instruction sets


But with many aspects in common



Early computers had very simple instruction sets


Simplified p implementation p



Many modern computers also have simple instruction sets
Chapter 2 — Instructions: Language of the Computer — 2

The MIPS Instruction Set
 



Used as the example throughout the book Stanford MIPS commercialized by MIPS Technologies (www.mips.com) Large share of embedded core market


Applications in consumer electronics, network/storage equipment, cameras, printers, … See MIPS Reference Data tear-out card, and A Appendixes di B and dE



Typical of many modern ISAs


Chapter 2 — Instructions: Language of the Computer — 3

three operands  Two sources and one destination   add a.2 Ope erations of f the Comp puter Hard dware Arithmetic Operations  Add and subtract. c # a gets b + c All arithmetic operations have this form Design g Principle 1: Simplicity y favours regularity   Regularity g y makes implementation p simpler p Simplicity enables higher performance at lower cost Chapter 2 — Instructions: Language of the Computer — 4 . b.§2.

t1 i. i j sub f. h add t1.t1 Chapter 2 — Instructions: Language of the Computer — 5 . t0.(i + j). t1 # temp t0 = g + h # temp t1 = i + j # f = t0 .Arithmetic Example  C code: f = (g ( + h) . j)  Compiled p MIPS code: add t0. g.

3 Ope erands of t the Compu uter Hardw ware Register Operands   Arithmetic instructions use register operands MIPS has a 32 × 32-bit register file    Use for frequently accessed data N b d0t Numbered to 31 32-bit data called a “word” \$t0. \$s1.f. …. \$t1. \$s7 for saved variables c. \$t9 for temporary values \$s0. main memory: millions of locations  Assembler names    Design Principle 2: Smaller is faster  Chapter 2 — Instructions: Language of the Computer — 6 . ….§2.

\$s4  C Compiled il d MIPS code: d add \$t0.Register Operand Example  C code: f = (g + h) .  f. ….(i + j). \$t1 Chapter 2 — Instructions: Language of the Computer — 7 . \$t0. \$s1. \$t1 \$s3. …. \$s2 add dd \$t1. \$ 3 \$s4 \$ 4 sub \$s0. j in \$s0.

f. Little Endian: least-significant byte at least address Chapter 2 — Instructions: Language of the Computer — 8  To apply arithmetic operations    Memory is byte addressed   Words are aligned in memory   MIPS is Big Endian   . structures. dynamic data Load values from memory into registers Store result from register to memory Each address identifies an 8-bit byte Address must be a multiple of 4 Most-significant byte at least address of a word c.Memory Operands  Main memory used for composite data  Arrays.

\$s1 \$s2.  g in \$s1.Memory Operand Example 1  C code: g = h + A[8]. base address of A in \$s3  C Compiled il d MIPS code: d  Index 8 requires offset of 32  4 bytes per word lw \$t0. \$s2 \$t0 offset # load word base register Chapter 2 — Instructions: Language of the Computer — 9 . h in \$s2. 32(\$s3) add \$s1.

32(\$s3) # load word add \$t0. \$t0 48(\$s3) # store word  Chapter 2 — Instructions: Language of the Computer — 10 .  h in \$s2. base address of A in \$s3  C Compiled il d MIPS code: d Index 8 requires offset of 32 lw \$t0. \$s2. \$t0 sw \$t0.Memory Operand Example 2  C code: A[12] = h + A[8].

Memory   Registers are faster to access than e oy memory Operating on memory data requires loads and stores  More instructions to be executed  Compiler must use registers for variables as much as possible   Only y spill p to memory y for less frequently q y used variables Register optimization is important! Chapter 2 — Instructions: Language of the Computer — 11 .Registers vs.

\$s3 4  No subtract immediate instruction  J Just use a negative i constant addi \$s2. \$s3 \$s3. \$s1.Immediate Operands  Constant data specified in an instruction addi \$s3. -1  Design D i P Principle i i l 3 3: Make M k the th common case fast   Small constants are common Immediate operand avoids a load instruction Chapter 2 — Instructions: Language of the Computer — 12 .

g. \$s1. \$zero  Useful for common operations  Chapter 2 — Instructions: Language of the Computer — 13 ..The Constant Zero  MIPS register 0 (\$zero) is the constant 0  Cannot be overwritten E. move between E b registers i add \$t2.

96 . 95 Chapter 2 — Instructions: Language of the Computer — 14 .294.§2.295 .4 Sign ned and U Unsigned N Numbers Unsigned Binary Integers  Given an n-bit number x = x n−1 2 n −1 + x n−2 2 n−2 + L + x1 2 + x 0 2 1 0   Range: 0 to +2n – 1 Example  0000 0000 0000 0000 0000 0000 0000 10112 = 0 + … + 1×23 + 0×22 +1×21 +1×20 = 0 + … + 8 + 0 + 2 + 1 = 1110  Using 32 bits  0 to o +4.967. 9 .

483.6 Chapter 2 — Instructions: Language of the Computer — 15 .2s-Complement Signed Integers  Given an n-bit number x = − x n−1 2 n −1 + x n−2 2 n−2 + L + x1 2 + x 0 2 1 0   Range: –2 2n – 1 to +2n – 1 – 1 Example  1111 1111 1111 1111 1111 1111 1111 11002 = –1×231 + 1×230 + … + 1×22 +0×21 +0×20 = –2.648 . 83.147.147.483.647 .483.6 8 to o +2.483. . 83.147.644 = –410  Using 32 bits  –2. .648 + 2.147.

2s-Complement Signed Integers  Bit 31 is sign bit   1 for negative numbers 0 for non-negative numbers    –(–2n – 1) can’t be represented Non-negative numbers have the same unsigned and 2s-complement representation Some specific numbers     0: 0000 0000 … 0000 –1: 1111 1111 … 1111 Most-negative: 1000 0000 … 0000 Most-positive: 0111 1111 … 1111 Chapter 2 — Instructions: Language of the Computer — 16 .

0 0→1 x + x = 1111..Signed Negation  Complement and add 1  Complement means 1 → 0.1112 = −1 x + 1 = −x  Example: negate +2   +2 2 = 0000 0000 … 00102 –2 = 1111 1111 … 11012 + 1 = 1111 1111 … 11102 Chapter 2 — Instructions: Language of the Computer — 17 ..

f. unsigned values: extend with 0s +2: 0000 0010 => 0000 0000 0000 0010 –2: 1111 1110 => 1111 1111 1111 1110 Chapter 2 — Instructions: Language of the Computer — 18  In MIPS instruction set     Replicate the sign bit to the left   Examples: a p es 8 8-bit b t to 16-bit 6 bt   . lh: extend loaded byte/halfword / f beq.Sign Extension  Representing a number using more bits  Preserve the numeric value addi: extend immediate value lb. bne: extend the displacement c.

register numbers.§2.5 Rep presenting Instructio ons in the C Computer Representing Instructions  Instructions are encoded in binary  Called machine code Encoded as 32-bit instruction words Small number of formats encoding operation code (opcode). … Regularity! \$t0 \$ 0 – \$t7 \$ are reg’s ’ 8 – 15 1 \$t8 – \$t9 are reg’s 24 – 25 \$s0 – \$s7 are reg’s reg s 16 – 23  MIPS instructions     Register numbers    Chapter 2 — Instructions: Language of the Computer — 19 .

MIPS R-format Instructions op 6 bit bits rs 5 bit bits rt 5 bit bits rd 5 bit bits shamt 5 bit bits funct 6 bit bits  Instruction fields       op: operation code (opcode) rs: first source register number rt: second source register number rd: destination register number shamt: shift amount (00000 for now) funct: function code ( (extends opcode) ) Chapter 2 — Instructions: Language of the Computer — 20 .

\$ . \$s1. .R-format Example op 6 bit bits rs 5 bit bits rt 5 bit bits rd 5 bit bits shamt 5 bit bits funct 6 bit bits add \$ \$t0. \$s2 \$ special 0 000000 \$s1 17 10001 \$s2 18 10010 \$t0 8 01000 0 0 00000 add 32 100000 000000100011001001000000001000002 = 0232402016 Chapter 2 — Instructions: Language of the Computer — 21 .

Hexadecimal  Base 16   Compact representation of bit strings 4 bits per hex digit 0000 0001 0010 0011 4 5 6 7 0100 0101 0110 0111 8 9 a b 1000 1001 1010 1011 c d e f 1100 1101 1110 1111 0 1 2 3  Example: eca8 6420  1110 1100 1010 1000 0110 0100 0010 0000 Chapter 2 — Instructions: Language of the Computer — 22 .

but allow 32-bit instructions uniformly Keep formats as similar as possible Chapter 2 — Instructions: Language of the Computer — 23 .MIPS I-format Instructions op 6 bit bits rs 5 bit bits rt 5 bit bits constant or address 16 bit bits  Immediate arithmetic and load/store instructions    rt: t destination d ti ti or source register i t number b Constant: –215 to +215 – 1 Address: dd ess o offset set added to base add address ess in rs s  Design Principle 4: Good design demands good compromises   Different formats complicate decoding.

Stored Program Computers The BIG Picture    Instructions represented in binary.. compilers. just like data Instructions and data stored in memory P Programs can operate t on programs  e g compilers e.g. linkers linkers. …  Binary compatibility allows compiled programs g to work on different computers  Standardized ISAs Chapter 2 — Instructions: Language of the Computer — 24 .

ori i nor  Useful for extracting and inserting groups of bits in a word Chapter 2 — Instructions: Language of the Computer — 25 .§2. andi or.6 Logical Opera ations Logical Operations  Instructions for bitwise manipulation Operation Shift left Shift right Bitwise AND Bitwise OR Bitwise NOT C << >> & | ~ Java << >>> & | ~ MIPS sll srl and.

Shift Operations op 6 bits rs 5 bits rt 5 bits rd 5 bits shamt 5 bits funct 6 bits   shamt: how many positions to shift Shift left logical   Shift left and fill with 0 bits sll by i bits multiplies by 2i Shift right and fill with 0 bits srl by i bits divides by 2i (unsigned only) Chapter 2 — Instructions: Language of the Computer — 26  Shift right logical   .

\$t1.AND Operations  Useful to mask bits in a word  Select some bits. \$t2 \$t2 \$t1 \$t0 0000 0000 0000 0000 0000 1101 1100 0000 0000 0000 0000 0000 0011 1100 0000 0000 0000 0000 0000 0000 0000 1100 0000 0000 Chapter 2 — Instructions: Language of the Computer — 27 . bits clear others to 0 and \$t0.

\$t1. 1 leave others unchanged or \$t0.OR Operations  Useful to include bits in a word  Set some bits to 1. \$t2 \$t2 \$t1 \$t0 0000 0000 0000 0000 0000 1101 1100 0000 0000 0000 0000 0000 0011 1100 0000 0000 0000 0000 0000 0000 0011 1101 1100 0000 Chapter 2 — Instructions: Language of the Computer — 28 .

\$zero \$t1 \$ \$t0 0000 0000 0000 0000 0011 1100 0000 0000 1111 1111 1111 1111 1100 0011 1111 1111 Chapter 2 — Instructions: Language of the Computer — 29 .NOT Operations  Useful to invert bits in a word  Change 0 to 1 1. and 1 to 0 a NOR b == NOT ( a OR b ) Register 0: always read as zero  MIPS has NOR 3-operand instruction  nor \$t0. \$t1.

if (rs != rt) branch to instruction labeled L1.7 Instructions fo or Making Decisions s Conditional Operations  Branch to a labeled instruction if a condition co d t o is st true ue  Otherwise. continue sequentially if (rs == rt) branch to instruction labeled L1. L1   bne rs. L1   j L1  Chapter 2 — Instructions: Language of the Computer — 30 .§2. unconditional jump to instruction labeled L1  beq rs. rt. rt.

g … in \$s0. \$s4. \$s0 \$s1 \$s1. else f = g-h. Else \$s0. … \$s3.  f g.Compiling If Statements  C code: if (i (i==j) j) f = g+h. \$s1. \$s2 Exit \$s0. \$s1. \$s2 Assembler calculates addresses Chapter 2 — Instructions: Language of the Computer — 31  Compiled MIPS code: bne add j Else: sub Exit: … . f.

\$t1 \$t0.Compiling Loop Statements  C code: while (save[i] == k) i += 1.  i in \$s3. \$s3. Exit \$s3. address of save in \$s6 \$t1. \$s3 1  Compiled MIPS code: Loop: sll add lw bne addi j Exit: … Chapter 2 — Instructions: Language of the Computer — 32 . \$t1. \$t0. \$s3 Loop \$s3. k in \$s5. \$t1 \$s6 0(\$t1) \$s5. 2 \$t1.

Basic Blocks  A basic block is a sequence of instructions with   No embedded branches (except at end) No branch targets (except at beginning)   A compiler il id identifies tifi b basic i blocks for optimization An advanced processor can accelerate execution of basic blocks Chapter 2 — Instructions: Language of the Computer — 33 .

slt \$t0. bne Chapter 2 — Instructions: Language of the Computer — 34 . rt   slti rt. \$zero. \$s1. else rt = 0. rs. if (rs < constant) rt = 1. \$s2 bne \$t0. else l rd d=0 0. if ( (rs < rt) ) rd d=1 1.More Conditional Operations  Set result to 1 if a condition is true  Otherwise set to 0 Otherwise. rs. L # if (\$s1 < \$s2) # branch to L  slt rd. constant   Use in combination with beq q.

requiring a slower clock All instructions penalized!   beq and b d bne b are the th common case This is a good design compromise Chapter 2 — Instructions: Language of the Computer — 35 . ≥.Branch Instruction Design   Why not blt. ≠   Combining with branch involves more work per instruction instruction. … slower than = =. bge. etc? Hardware for < <.

\$s0. \$s0.967.295 > +1 ⇒ \$t0 = 0 Chapter 2 — Instructions: Language of the Computer — 36 .Signed vs. \$s1 # signed  –1 < +1 ⇒ \$t0 = 1  sltu \$t0.294. sltui Example    \$s0 = 1111 1111 1111 1111 1111 1111 1111 1111 \$s1 = 0000 0000 0000 0000 0000 0000 0000 0001 slt \$t0. Unsigned    Signed comparison: slt. slti Unsigned comparison: sltu. \$s1  # unsigned +4.

8 Sup pporting Pr rocedures in Compu uter Hardw ware Procedure Calling  Steps required 1. Place parameters in registers Transfer control to procedure Acquire storage for procedure Perform procedure’s operations Pl Place result lt in i register i t for f caller ll Return to place of call Chapter 2 — Instructions: Language of the Computer — 37 . 6.§2. 5 5. 3 3. 4. 1 2.

\$ .Register Usage    \$a0 – \$a3: arguments (reg’s 4 – 7) \$v0.\$ \$v1: result values ( (reg’s g 2 and 3) ) \$t0 – \$t9: temporaries  Can be overwritten by callee Must be saved/restored by callee  \$s0 – \$s7: saved      \$gp: global \$ l b l pointer i t f for static t ti d data t ( (reg 28) \$sp: stack pointer (reg 29) \$f frame \$fp: f pointer i t (reg ( 30) \$ra: return address (reg 31) Chapter 2 — Instructions: Language of the Computer — 38 .

g. for case/switch statements Chapter 2 — Instructions: Language of the Computer — 39 .Procedure Call Instructions  Procedure call: jump and link jal ProcedureLabel  Address of following instruction put in \$ra  Jumps to target address  Procedure return: jump register jr \$ra  Copies \$ra to program counter  Can also be used for computed jumps  e..

need to save \$s0 on stack) )  Result in \$v0 Chapter 2 — Instructions: Language of the Computer — 40 . …. \$a3  f in \$s0 ( (hence. ….( (i + j). }  Arguments g. h i.Leaf Procedure Example  C code: int leaf_example leaf example (int g. return f. f = (g + h) ) . g h. j in \$a0. i j) { int f.

\$sp 4 jr \$ra Save \$s0 on stack Procedure body Result Restore \$s0 Return Chapter 2 — Instructions: Language of the Computer — 41 . \$zero lw \$s0. \$t1 \$ add \$v0. 0(\$sp) addi \$sp \$sp. 0(\$sp) add dd \$t0. \$a2. \$a3 sub \$ \$s0. \$ .Leaf Procedure Example  MIPS code: leaf_example: leaf example: addi \$sp. \$sp. \$ 0 \$a1 \$ 1 add \$t1. -4 sw \$s0. \$t0 \$a0. \$sp. . \$s0. \$t0.

caller needs to save on the stack:   Its return It t address dd Any arguments and temporaries needed after the call  Restore from the stack after the call Chapter 2 — Instructions: Language of the Computer — 42 .Non-Leaf Procedures   Procedures that call other procedures For nested call call.

Non-Leaf Procedure Example  C code: int fact (int n) { if ( (n < 1) ) return etu f. else return n * fact(n . }  Argument n in \$a0  Result in \$v0 Chapter 2 — Instructions: Language of the Computer — 43 .1). .

-1 0(\$sp) (\$ p) 4(\$sp) \$sp. result is 1 pop 2 items from stack and d return else decrement n recursive call restore original g n and return address pop 2 items from stack multiply to get result and return Chapter 2 — Instructions: Language of the Computer — 44 . L1 \$zero. 8 \$a0. \$v0. \$ . \$v0 # # # # # # # # # # # # # # adjust stack for 2 items save return address save argument test for n < 1 if so. 8 \$a0. \$sp. \$t0. \$t0. \$a0. \$ra \$sp. fact \$a0. -8 4(\$sp) 0(\$sp) \$a0. 1 \$zero. 1 \$sp.Non-Leaf Procedure Example  MIPS code: fact: addi sw sw slti beq addi addi j jr L1: addi jal lw lw addi mul jr \$sp. \$ra. \$sp. \$ra. \$ra \$ \$a0. \$v0.

Local Data on the Stack  Local data allocated by y callee  e. C automatic variables U db Used by some compilers il t to manage stack t k storage t Chapter 2 — Instructions: Language of the Computer — 45  Procedure frame (activation record)  .g..

Memory Layout
 

Text: program code Static data: global g variables




e.g., static variables in C, constant arrays and strings \$gp initialized to address allowing ±offsets into this segment E.g., E g malloc in C C, new in Java



Dynamic data: heap




Stack: automatic storage
Chapter 2 — Instructions: Language of the Computer — 46

§2.9 Com mmunicatin ng with Pe eople

Character Data


Byte-encoded character sets


ASCII: 128 characters


95 graphic, 33 control ASCII, +96 more graphic characters



Latin-1: 256 characters




Unicode: 32 32-bit bit character set
  

Used in Java, C++ wide characters, … M t of Most f the th world’s ld’ alphabets, l h b t plus l symbols b l UTF-8, UTF-16: variable-length encodings
Chapter 2 — Instructions: Language of the Computer — 47

Byte/Halfword Operations
 

Could use bitwise operations MIPS byte/halfword load/store


String processing is a common case
lh rt, offset(rs) ff ( ) lhu lh rt, offset(rs) ff ( ) sh h rt, offset(rs) ff ( )

lb rt, offset(rs) ff ( )


Sign extend to 32 bits in rt Zero extend to 32 bits in rt Store just rightmost byte/halfword
Chapter 2 — Instructions: Language of the Computer — 48

lb rt, offset(rs) lbu ff ( )


sb b rt, offset(rs) ff ( )


String Copy Example  C code (naïve): Null-terminated Null terminated string void strcpy (char x[]. while ((x[i]=y[i])!='\0') (( [ ] y[ ]) \ ) i += 1. }  Addresses of x. \$a1  i in \$s0  Chapter 2 — Instructions: Language of the Computer — 49 . y in \$a0. i = 0. char y[]) { int i.

\$ 2 \$s0. \$t1. \$s0. \$t3. \$t2. \$a1 0(\$t1) \$s0. \$t2. \$s0. \$zero \$s0. L1 \$s0.String Copy Example  MIPS code: strcpy: addi sw add L1: add lbu add sb b beq addi j L2: lw addi jr \$sp. \$a0 0(\$t3) \$zero. \$t2. \$ . 1 0(\$sp) (\$ p) \$sp. -4 0(\$sp) \$zero. 4 # # # # # # # # # # # # # adjust stack for 1 item save \$s0 i = 0 addr of y[i] in \$t1 \$t2 = y[i] addr of x[i] in \$t3 x[i] = y[i] exit i loop l if y[i] [i] == 0 i = i + 1 next iteration of loop restore saved \$s0 \$ pop 1 item from stack and return Chapter 2 — Instructions: Language of the Computer — 50 . \$sp. \$ L2 2 \$s0. \$ra \$sp.

10 MIPS Addres ssing for 3 32-Bit Imm mediates and Addres sses 32-bit Constants  Most constants are small  16 bit immediate is sufficient 16-bit  For the occasional 32-bit constant l i rt. 61 ori \$s0. \$s0. .§2. . 2304 0000 0000 0111 1101 0000 1001 0000 0000 Chapter 2 — Instructions: Language of the Computer — 51 . constant lui   Copies 16-bit constant to left 16 bits of rt Clears right 16 bits of rt to 0 0000 0000 0111 1101 0000 0000 0000 0000 lhi \$s0.

Branch Addressing  Branch instructions specify  Opcode two registers. registers target address F Forward d or backward b k d op 6 bits  Most branch targets are near branch  rs 5 bits rt 5 bits constant or address 16 bits  PC relative addressing PC-relative   Target address = PC + offset × 4 PC already l d incremented i t d by b 4b by thi this ti time Chapter 2 — Instructions: Language of the Computer — 52 . Opcode.

Jump Addressing  Jump (j and jal) targets could be anywhere in text segment  Encode full address in instruction op 6 bits address 26 bits  (Pseudo)Direct jump addressing  Target address = PC31…28 : (address × 4) Chapter 2 — Instructions: Language of the Computer — 53 .

\$s3. Exit 80012 80016 80020 80024 addi \$s3. \$s6 \$t0. 2 \$t1. 1 j Exit: … Loop Chapter 2 — Instructions: Language of the Computer — 54 . \$s3. \$s5. 0(\$t1) 80000 80004 80008 0 0 35 5 8 2 0 9 9 8 19 19 22 8 21 19 20000 9 9 4 0 0 2 1 0 32 Loop: sll add lw bne \$t0.Target Addressing Example  Loop code from earlier example  Assume Loop at location 80000 \$t1. \$t1.

\$ 0 \$ 1 L1 ↓ b bne \$s0.\$s1. offset assembler rewrites the code Example b beq \$s0. \$ 0 \$ 1 L2 2 j L1 L2: … Chapter 2 — Instructions: Language of the Computer — 55 .Branching Far Away   If branch target is too far to encode with 16-bit offset.\$s1.

Addressing Mode Summary Chapter 2 — Instructions: Language of the Computer — 56 .

.11 Parallelism a and Instruc ctions: Syn nchronizat tion Synchronization  Two processors sharing an area of memory   P1 writes.§2.g. then P2 reads Data race if P1 and P2 don’t synchronize  Result depends of order of accesses  Hardware support required   Atomic read/write memory operation No other access to the location allowed between the read and write E. atomic swap of register ↔ memory Or an atomic pair of instructions  Could be a single instruction   Chapter 2 — Instructions: Language of the Computer — 57 .

\$t1 . offset(rs) Store conditional: sc rt.0(\$s1) \$t0.\$s4 \$t1.load linked .Synchronization in MIPS   Load linked: ll rt.\$zero.branch store fails .0(\$s1) \$t0 0(\$s1) \$t0.copy exchange value .try \$s4.put load value in \$s4 Chapter 2 — Instructions: Language of the Computer — 58 .\$zero.store conditional . .\$zero. offset(rs) ( )  Succeeds if location not changed since the ll  Returns 1 in rt Returns 0 in rt  Fails if location is changed   Example: p atomic swap p( (to test/set lock variable) ) try: add ll sc beq add \$t0.

12 Tra anslating a and Startin ng a Progr ram Translation and Startup Many compilers produce object modules directly Static linking Chapter 2 — Instructions: Language of the Computer — 59 .§2.

\$zero. \$t0. \$t1 bne \$at. \$t1 blt \$t0. L → slt \$at.Assembler Pseudoinstructions   Most assembler instructions represent machine instructions one-to-one Pseudoinstructions: figments of the assembler’s assembler s imagination → add \$t0. L  \$ t( \$at (register i t 1) 1): assembler bl t temporary Chapter 2 — Instructions: Language of the Computer — 60 . \$t1. \$t1 move \$t0. \$zero.

Producing an Object Module   Assembler (or compiler) translates program into machine instructions Provides information for building a complete program from the pieces       Header: H d d described ib d contents t t of f object bj t module d l Text segment: translated instructions Static data segment: data allocated for the life of the program Relocation info: for contents that depend on absolute location of loaded program Symbol table: global definitions and external refs Debug info: for associating with source code Chapter 2 — Instructions: Language of the Computer — 61 .

Merges segments 1 2. Resolve labels (determine their addresses) 3 Patch location-dependent 3.Linking Object Modules  Produces an executable image 1. no need to do this Program can be loaded into absolute location in virtual memory space Chapter 2 — Instructions: Language of the Computer — 62 . location dependent and external refs  Could leave location dependencies for fi i b fixing by a relocating l ti l loader d   But with virtual memory.

Read header to determine segment sizes 1 2. \$gp) 6 Jump 6. Initialize registers (including \$sp. Set up arguments on stack 4 5. do exit syscall Chapter 2 — Instructions: Language of the Computer — 63 . … and calls main When main returns returns.Loading a Program  Load from image file on disk into memory 1. J to t startup t t routine ti   Copies arguments to \$a0. \$fp.  Or set page table entries so they can be faulted in 4. Create virtual address space 3 Copy text and initialized data into memory 3.

Dynamic Linking  Only link/load library procedure when it is called    Requires procedure code to be relocatable Avoids image bloat caused by static linking of all (transitively) referenced libraries Automatically picks up new library versions Chapter 2 — Instructions: Language of the Computer — 64 .

Starting Java Applications Simple Si l portable t bl instruction set for the JVM Compiles bytecodes of “hot” methods into native code for host machine Interprets bytecodes Chapter 2 — Instructions: Language of the Computer — 66 .

k in \$a1.13 A C Sort Exa ample to P Put It All To ogether C Sort Example   Illustrates use of assembly instructions o a C bubb bubble e so sort t function u ct o for Swap procedure (leaf) void swap(int v[]. [k] v[k] = v[k+1].§2. t temp = v[k]. } v in \$a0. int k) { int temp. v[k+1] [ ] = temp. temp in \$t0  Chapter 2 — Instructions: Language of the Computer — 67 . p.

0(\$t1) # \$t0 (temp) = v[k] lw \$t2. 4(\$t1) # \$t2 = v[k+1] sw \$t2. 4(\$t1) # v[k+1] = \$t0 (temp) jr \$ra j # return to calling g routine Chapter 2 — Instructions: Language of the Computer — 68 . \$a0. 0(\$t1) # v[k] = \$t2 (v[k+1]) sw \$t0. \$t1 # \$t1 = v+(k*4) # (address of v[k]) lw \$t0.The Procedure Swap swap: sll \$t1. 2 # \$t1 = k * 4 add \$t1. \$a1.

k in \$a1.The Sort Procedure in C  Non-leaf (calls swap) void sort (int v[].j). j. i += 1) ) { for (j = i – 1. int n) { int i. i < n. . . } } } v in \$a0. j -= 1) { swap(v. i in \$s0. j >= 0 && v[j] > v[j + 1]. for ( (i = 0. . . . j in \$s1 Chapter 2 — Instructions: Language of the Computer — 69  .

\$t4. \$s3 \$t0. \$s0. \$s1. –1 for2tst \$s0 \$s0. \$s1. \$s0 \$s0. \$s1. exit2 \$a0. \$s0 1 for1tst # # # # # # # # # # # # # # # # # # # # # save \$a0 into \$s2 save \$a1 into \$s3 i = 0 \$t0 = 0 if \$s0 ≥ \$s3 (i ≥ n) go to exit1 if \$s0 ≥ \$s3 (i ≥ n) j = i – 1 \$t0 = 1 if \$s1 < 0 (j < 0) go to exit2 if \$s1 < 0 (j < 0) \$t1 = j * 4 \$t2 = v + (j * 4) \$t3 = v[j] \$t4 = v[j + 1] \$t0 = 0 if \$t4 ≥ \$t3 go to exit2 if \$t4 ≥ \$t3 1st param of swap is v (old \$a0) 2nd param of swap is j call swap procedure j –= 1 jump to test of inner loop i += + 1 jump to test of outer loop Move params Outer loop p Inner loop Pass params & call Inner loop Outer loop Chapter 2 — Instructions: Language of the Computer — 70 . 2 \$t2.The Procedure Body move move move for1tst: slt beq addi for2tst: slti bne sll add lw lw slt beq move move jal addi j exit2: addi j \$s2. exit1 \$s1. \$s0. \$zero. \$zero \$t0 \$t0. \$t4 4(\$t2) \$t0. 0(\$t2) \$t4. \$s2 \$a1. –1 \$t0. 0 \$t0. \$a1 \$s0. \$t1 \$t3. \$s2. \$t3 \$t0. \$a0 \$s3. \$zero exit2 \$t1. \$zero. \$t0 \$zero. \$a1 \$s1 swap \$s1.

The Full Procedure
sort: addi \$sp,\$sp, –20 sw \$ra, 16(\$sp) sw \$s3,12(\$sp) sw \$s2, 8(\$sp) sw \$s1, 4(\$sp) sw \$s0, 0(\$sp) … … exit1: lw \$s0, 0(\$sp) lw \$s1, 4(\$sp) lw \$s2, 8(\$sp) lw \$s3,12(\$sp) lw \$ra,16(\$sp) addi \$sp,\$sp, 20 jr \$ra # # # # # # # # # # # # # # make room on stack for 5 registers save \$ra on stack save \$s3 on stack save \$s2 on stack save \$s1 on stack save \$s0 on stack procedure body restore \$s0 from stack restore \$s1 from stack restore \$s2 from stack restore \$s3 from stack restore \$ra from stack restore stack pointer return to calling routine

Chapter 2 — Instructions: Language of the Computer — 71

Effect of Compiler Optimization
Compiled with gcc for Pentium 4 under Linux
3 2.5 2 1.5 1 0.5 0 none 180000 160000 140000 120000 100000 80000 60000 40000 20000 0 none O1 O2 O3 2 1.5 1 0.5 0 O1 O2 O3 none O1 O2 O3

Relative Performance

140000 120000 100000 80000 60000 40000 20000 0 none

Instruction count

O1

O2

O3

Clock Cycles

CPI

Chapter 2 — Instructions: Language of the Computer — 72

Effect of Language and Algorithm
3 2.5 2 15 1.5 1 0.5 0 C/none C/O1 C/O2 C/O3 Java/int Java/JIT

Bubblesort Relative Performance

2.5 2 1.5 1 0.5 0 C/none

Quicksort Relative Performance

C/O1

C/O2

C/O3

Java/int

Java/JIT

3000 2500 2000 1500 1000 500 0 C/none

Quicksort vs. Bubblesort Speedup

C/O1

C/O2

C/O3

Java/int

Java/JIT

Chapter 2 — Instructions: Language of the Computer — 73

Lessons Learnt    Instruction count and CPI are not good performance indicators in isolation Compiler optimizations are sensitive to the algorithm Java/JIT compiled code is significantly f t than faster th JVM interpreted i t t d  Comparable to optimized C in some cases  Nothing can fix a dumb algorithm! Chapter 2 — Instructions: Language of the Computer — 74 .

14 Arr rays versu us Pointers s Arrays vs.§2. Pointers  Array indexing involves   Multiplying index by element size Adding to array base address  Pointers P i t correspond d directly di tl t to memory addresses  Can avoid indexing complexity Chapter 2 — Instructions: Language of the Computer — 75 .

\$t0.loop1 # if (…) # goto loop1 clear2(int *array.\$a0.0(\$t0) # Memory[p] = 0 addi \$t0.Example: Clearing and Array clear1(int array[].\$t1 # i = 0 # \$t1 = i * 4 # \$t2 = y[ ] # &array[i] sw \$zero. int size) { int *p. int size) { int i.\$zero loop1: sll \$t1.\$a1.\$zero.\$zero. p < &array[size].\$t2 # \$t3 = #(p<&array[size]) bne \$t3. } move \$t0.\$t0.4 # p = p + 4 slt \$t3. for (p = &array[0]. for (i = 0.\$t0.\$a1 # \$t3 = # (i < size) bne \$t3.2 add \$t2.\$a0. p = p + 1) *p = 0.\$t0.loop2 # if (…) # goto loop2 Chapter 2 — Instructions: Language of the Computer — 76 .\$a0 # p = & array[0] sll \$t1.\$t0. i += 1) array[i] = 0. i < size. 0(\$t2) # array[i] = 0 addi \$t0.2 # \$t1 = size * 4 add \$t2. } move \$t0.1 # i = i + 1 slt \$t3.\$t1 # \$t2 = y[ ] # &array[size] loop2: sw \$zero.

Comparison of Array vs.f. Ptr   Multiply “strength reduced” to shift Array version requires shift to be inside loop   Part P t of f index i d calculation l l ti f for i incremented t di c. incrementing pointer  Compiler can achieve same effect as manual use of pointers   Induction variable elimination Better to make program clearer and safer Chapter 2 — Instructions: Language of the Computer — 77 .

16 Re eal Stuff: ARM Instru uctions ARM & MIPS Similarities   ARM: the most popular embedded core Similar basic set of instructions to MIPS ARM MIPS 1985 32 bits 32-bit flat Aligned 3 31 × 32-bit Memory mapped 1985 32 bits 32-bit flat Aligned 9 15 × 32-bit Memory mapped Date announced Instruction size Address space Data alignment Data addressing modes Registers Input/output Chapter 2 — Instructions: Language of the Computer — 78 .§2.

zero. carry. overflow Compare instructions to set condition codes without keeping the result Top 4 bits of instruction word: condition value C avoid Can id b branches h over single i l i instructions t ti  Each instruction can be conditional   Chapter 2 — Instructions: Language of the Computer — 79 .Compare and Branch in ARM  Uses condition codes for result of an arithmetic/logical instruction   Negative.

Instruction Encoding Chapter 2 — Instructions: Language of the Computer — 80 .

17 Re eal Stuff: x x86 Instruc ctions The Intel x86 ISA  Evolution with backward compatibility  8080 (1974): 8-bit 8 bit microprocessor  Accumulator. MMU   80386 (1985): 32-bit extension (now IA-32)   Chapter 2 — Instructions: Language of the Computer — 81 .§2. plus 3 index-register pairs Complex instruction set (CISC) Adds FP instructions and register stack Segmented memory mapping and protection Additional addressing modes and operations Paged memory mapping as well as segments  8086 (1978): 16-bit extension to 8080   8087 (1980): floating-point coprocessor   80286 (1982): 24-bit addresses.

The Intel x86 ISA  Further evolution…  i486 (1989): pipelined. The Pentium Chronicles) Added SSE (Streaming SIMD Extensions) and associated registers New microarchitecture Added SSE2 instructions Chapter 2 — Instructions: Language of the Computer — 82  Pentium (1993): superscalar. on-chip caches and FPU  C Compatible tibl competitors: tit AMD AMD. i … Later versions added MMX (Multi-Media eXtension) instructions The infamous FDIV bug New microarchitecture (see Colwell Colwell. Pentium II (1997)   Pentium III (1999)   Pentium 4 (2001)   . C Cyrix. 64-bit datapath    Pentium Pro (1995).

more instructions  I t l Core Intel C (2006)   AMD64 (announced 2007): SSE5 instructions   Advanced Vector Extension (announced 2008)   If Intel didn’t extend with compatibility.The Intel x86 ISA  And further…   AMD64 (2003): extended architecture to 64 bits EM64T – Extended E d dM Memory 64 T Technology h l (2004)   AMD64 adopted by Intel (with refinements) Added SSE3 instructions Added SSE4 instructions. virtual machine support Intel declined to follow. its competitors would!  Technical elegance ≠ market success Chapter 2 — Instructions: Language of the Computer — 83 . instead… Longer SSE registers.

Basic x86 Registers Chapter 2 — Instructions: Language of the Computer — 84 .

1.Basic x86 Addressing Modes  Two operands per instruction Source/dest operand Register Register R i t Register Memory Memory Second source operand Register Immediate M Memory Register Immediate  Memory addressing modes     Address in register Address = Rbase + displacement Address = Rbase + 2scale × Rindex (scale = 0. or 3) l × R Add Address = Rbase + 2scale di l index + displacement Chapter 2 — Instructions: Language of the Computer — 85 . 2.

x86 Instruction Encoding  Variable length encoding   Postfix bytes specify addressing mode Prefix bytes modify operation  Operand length. locking. … Chapter 2 — Instructions: Language of the Computer — 86 . repetition.

Implementing IA-32  Complex instruction set makes implementation difficult  Hardware translates instructions to simpler microoperations   Simple instructions: 1–1 Complex p instructions: 1–many y   Microengine similar to RISC Market share makes this economically viable Compilers avoid complex instructions Chapter 2 — Instructions: Language of the Computer — 87  Comparable performance to RISC  .

including simple ones  Compilers are good at making fast code from simple instructions But modern compilers are better at dealing with modern ode processors p ocesso s More lines of code ⇒ more errors and less productivity  Use assembly code for high performance   Chapter 2 — Instructions: Language of the Computer — 88 .18 Fa allacies and d Pitfalls Fallacies  Powerful instruction ⇒ higher performance   Fewer instructions required But complex instructions are hard to implement  May slow down all instructions.§2.

Fallacies  Backward compatibility ⇒ instruction set doesn’t doesn t change  But they do accrete more instructions x86 instruction set Chapter 2 — Instructions: Language of the Computer — 89 .

not by 1!  Keeping a pointer to an a automatic tomatic variable ariable after procedure returns   e..g. passing i pointer i t b back k via i an argument t Pointer becomes invalid when stack popped Chapter 2 — Instructions: Language of the Computer — 90 .Pitfalls  Sequential words are not at sequential addresses  Increment by 4.

Simplicity favors regularity Smaller is faster Make a et the e co common o case fast ast Good design demands good compromises  Layers of software/hardware  Compiler. hardware c. x86  MIPS: typical of RISC ISAs  Chapter 2 — Instructions: Language of the Computer — 91 . 3 4. 1 2. assembler. 3.19 Co oncluding R Remarks Concluding Remarks  Design principles 1.§2.f.

slt.Concluding Remarks  Measure MIPS instruction executions in benchmark be c a p programs og a s   Consider making the common case fast Consider compromises p MIPS examples add. sub. ori. bne. lh. sll. addi lw. andi. nor. Branch Jump Chapter 2 — Instructions: Language of the Computer — 92 . lui and. sw. sb. lhu. jr. or. lbu. slti. lb. sltiu j. jal SPEC2006 Int 16% 35% 12% 34% 2% SPEC2006 FP 48% 36% 4% 8% 0% Instruction class Arithmetic Data transfer Logical g Cond. srl beq.