Two Key Principles of Machine Design

1.

Instructions are represented as numbers and, as such, are indistinguishable from data Programs are stored in alterable memory (that can be read or written to) Memory just like data Stored-program concept
z

2 2.

‰

Accounting A ti prg (machine code) C compiler (machine code) Payroll d t data Source code in C for Acct p prg g

z

Programs can be shipped as files of binary numbers – binary compatibility Computers can inherit ready-made ready made software provided they are compatible with an existing ISA – leads industry to align around a small number of ISAs

CSE431 Chapter 2.1

Irwin, PSU, 2008

MIPS-32 ISA
‰

Instruction Categories
z z z z

Registers R0 - R31

Computational Load/Store Jump and Branch Floating Point
coprocessor

z z

Memory Management Special

PC HI LO

3 Instruction Formats: all 32 bits wide op op op
CSE431 Chapter 2.2

rs rs

rt rt

rd

sa

funct

R format I format J format
Irwin, PSU, 2008

immediate

jump target

MIPS (RISC) Design Principles
‰

S Simplicity f favors regularity
z z z

fixed size instructions small number of instruction formats opcode always the first 6 bits

‰

Smaller is faster
z z z

limited instruction set limited number of registers in register file li it d number limited b of f addressing dd i modes d

‰

Make the common case fast
z z

arithmetic operands from the register file (load-store (load store machine) allow instructions to contain immediate operands

‰

Good design demands good compromises
z

three instruction formats
Irwin, PSU, 2008

CSE431 Chapter 2.3

MIPS Arithmetic Instructions ‰ MIPS S assembly language arithmetic statement add sub $t0. $s2 ‰ ‰ Each arithmetic instruction p performs one operation p Each specifies exactly three operands that are all p register g file ($t0.$s2) contained in the datapath’s destination ← source1 op source2 ‰ Instruction Format (R format) 0 17 18 8 0 0x22 CSE431 Chapter 2. $s1. $s2 $t0.4 Irwin. PSU. 2008 . $s1.$s1.

5 Irwin. 2008 . PSU. $s2 ‰ ‰ Each arithmetic instruction p performs one operation p Each specifies exactly three operands that are all p register g file ($t0.MIPS Arithmetic Instructions ‰ MIPS S assembly language arithmetic statement add sub $t0. $s1.$s2) contained in the datapath’s destination ← source1 op source2 ‰ Instruction Format (R format) 0 17 18 8 0 0x22 CSE431 Chapter 2. $s2 $t0.$s1. $s1.

MIPS Instruction Fields ‰ MIPS fields are given names to make them easier to refer to op rs rt rd shamt funct op rs rt rd funct 6-bits 5-bits 5 bits 5-bits 5-bits 6-bits opcode that specifies the operation register file address of the first source operand register file address of the second source operand register file address of the result’s destination shift amount (for shift instructions) function code augmenting the opcode shamt 5-bits CSE431 Chapter 2. 2008 . PSU.6 Irwin.

But register files with more locations write control are slower (e. (A*B) – (C*D) – (E*F) can do multiplies in any order vs.g. stack z Can hold variables so that . 2008 .7 Irwin.Read/write port increase impacts speed quadratically z Easier for a compiler to use . PSU.code density improves (since register are named with fewer bits than a memory location) CSE431 Chapter 2...MIPS Register File ‰ H ld thi Holds thirty-two t t 32-bit 32 bit registers i t z z Register g File 32 bits 5 5 5 32 32 locations 32 src2 32 src1 Two read ports and One write p port src1 addr src2 2 addr dd dst addr write data data ‰ Registers are z Faster than main memory data .g.e. a 64 word file could be as much as 50% 0% slower than a 32 word f file) ) .

$t9 24-25 $gp 28 $sp 29 p 30 $fp $ra 31 CSE431 Chapter 2. 2008 Usage .$t7 8-15 $ .$v1 2-3 $a0 . PSU.8 Name Preserve on call? constant 0 (hardware) na n. returned values no arguments yes temporaries no saved values yes temporaries no global pointer yes stack pointer yes frame p pointer yes y return addr (hardware) yes Irwin.$s7 $s0 $ 16-23 $t8 .a. reserved for assembler n.$a3 4-7 $t0 .a.Aside: MIPS Register Convention Register Number $zero 0 $at 1 $v0 .

MIPS Memory Access Instructions ‰ MIPS S has two basic data transfer f instructions for f accessing memory lw sw $t0 4($s3) $t0. PSU.768 bytes) of the address in the base register Irwin. $t0.9 . 2008 CSE431 Chapter 2. 8($s3) #load word from memory #store word to memory ‰ The data is loaded into (lw) or stored from (sw) a register in the register file – a 5 bit address The memory address – a 32 bit address – is formed by adding the contents of the base address register to the offset ff t value l z ‰ A 16-bit field meaning access is limited to memory locations within a region of ±213 or 8.192 words (±215 or 32.

1001 0100 . PSU.Machine Language . 0001 1000 + .10 . 1010 1100 = 0x120040ac $t0 $s3 data CSE431 Chapter 2. . . 2008 . . . . .Load Instruction ‰ L d/St Load/Store Instruction I t ti Format F t (I format): f t) lw $t0. 24($s3) 35 19 8 2410 Memory 2410 + $s3 = 0xf f f f f f f f 0x120040ac 0x12004094 0x0000000c 0x00000008 0 00000004 0x00000004 0x00000000 word address (hex) Irwin.

V DEC Al Alpha h (Wi (Windows d NT) 3 msb 0 big endian byte 0 2 1 CSE431 Chapter 2.Byte Addresses ‰ Since 8-bit bytes are so useful.11 Irwin. PSU.the memory y address of a word must be on natural word boundaries (a multiple of 4 in MIPS-32) ‰ Big Endian: Little Endian: leftmost byte is word address rightmost byte is word address little endian byte 0 0 lsb 1 2 3 IBM 360/370. 2008 . HP PA ‰ I t l 80x86. S f most architectures address individual bytes in memory z Alignment g restriction . Motorola 68k. MIPS. Intel 80 86 DEC Vax. Sparc.

Aside: Loading and Storing Bytes ‰ MIPS S provides special instructions to move bytes lb sb $t0. 6($s3) 0x28 19 8 #load byte from memory #store byte to 16 bit offset memory ‰ What 8 bits get loaded and stored? z load byte places the byte from memory in the rightmost 8 bits of the destination register .what happens to the other bits in the memory word? Irwin. 2008 CSE431 Chapter 2.12 . PSU. 1($s3) $t0.what happens to the other bits in the register? z store byte takes the byte from the rightmost 8 bits of a register and writes it to a byte in memory .

. 15 ‰ #$sp = $sp + 4 #$t0 = 1 if $ #$ $s2<15 Machine format (I format): 0x0A 18 8 0x0F ‰ The constant is kept inside the instruction itself! z I Immediate di t format f t limits li it values l t to th the range +215–1 1t to -2 215 Irwin. $ $s2.MIPS Immediate Instructions ‰ ‰ Small constants are used often in typical code Possible approaches? z z z put “typical typical constants” constants in memory and load them create hard-wired registers (like $zero) for constants like 1 have special instructions that contain constants ! addi $sp. PSU. 4 slti $ $t0. 2008 CSE431 Chapter 2.13 . $sp. .

1010101010101010 16 0 8 10101010101010102 ‰ ‰ Then must get the lower order bits right. 2008 CSE431 Chapter 2.14 . use ori $t0. $t0.Aside: How About Larger Constants? ‰ We'd W 'd also l lik like t to b be able bl t to l load d a 32 bit constant t ti into t a register. 1010101010101010 1010101010101010 0000000000000000 1010101010101010 0000000000000000 1010101010101010 1010101010101010 Irwin. for this we must use two instructions a new "load load upper immediate immediate" instruction lui $t0. PSU.

.. 2008 231 230 229 31 30 29 ..Review: Unsigned Binary Representation Hex 0x00000000 0x00000001 0x00000002 0x00000003 0x00000004 0x00000005 0x00000006 0x00000007 0x00000008 0x00000009 0xFFFFFFFC 0xFFFFFFFD 0 FFFFFFFE 0xFFFFFFFE 0xFFFFFFFF CSE431 Chapter 2. 23 22 21 3 2 1 20 0 bit weight g bit position 1 1 1 .15 Binary 0…0000 0…0001 0…0010 0…0011 0…0100 0…0101 0…0110 0 0111 0…0111 0…1000 0…1001 … 1…1100 1…1101 1 1110 1…1110 1…1111 Decimal 0 1 2 3 4 5 6 7 8 9 232 .. 0 0 0 0 - 1 232 .4 232 ..1 Irwin.. PSU.2 232 .1 .3 232 .. . 1 1 1 1 bit 1 0 0 0 ..

1 = 0111 CSE431 Chapter 2. PSU.16 .Review: Signed Binary Representation 2’sc 2 sc binary -23 = -(23 . 2008 complement all the bits 0101 d add dd a 1 and 0110 1011 and add a 1 1010 complement all the bits 1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 23 .1) = 1000 1001 1010 1011 decimal -8 -7 -6 6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 Irwin.

2008 CSE431 Chapter 2. 8 srl $t2.17 . 8 #$t2 = $s0 << 8 bits #$ #$t2 = $ $s0 >> 8 bits ‰ ‰ Instruction Format (R format) 0 16 10 8 0x00 ‰ Such S h shifts hift are called ll d logical l i l because b th they fill with ith zeros z Notice that a 5-bit shamt field is enough g to shift a 32-bit value 25 – 1 or 31 bit positions Irwin. PSU. $ $ $s0.MIPS Shift Operations ‰ Need operations to pack and unpack 8-bit 8 bit characters into 32-bit words Shifts move all the bits in a word left or right sll $t2. $s0.

$t1. 0xFF00 ori i ‰ #$t0 = $t1 & ff00 #$t0 = $t1 | ff00 $t0 $t1 $t0. $t1. $t2 #$t0 = $t1 & $t2 or $t0. PSU.18 . $t1. $t1. 0xFF00 0 FF00 Instruction Format (I format) 0x0D 9 8 0xFF00 Irwin. $t1. $t2 #$t0 = not($t1 | $t2) ‰ Instruction Format (R format) 0 9 10 8 0 0x24 andi $t0.MIPS Logical Operations ‰ There are a number of bit-wise logical operations in the MIPS ISA and $t0. 2008 CSE431 Chapter 2. $t2 #$t0 = $t1 | $t2 nor $t0.

19 .. $s1 . $s0. $ $s1. bne $s0. . $s1. $s1. Lbl #go #g to Lbl if $ $s0=$s1 $ z Ex: if (i==j) h = i + j. Lbl1 add $s3. Lbl #go to Lbl if $s0≠$s1 beq q $s0.. Lbl1: ‰ Instruction Format (I format): 0x05 16 17 16 bit offset ‰ How is the branch destination address specified? p Irwin. PSU.MIPS Control Flow Instructions ‰ MIPS conditional branch instructions: bne $s0. $ . 2008 CSE431 Chapter 2.

i t ti b but t most tb branches h are local anyway from the low order 16 bits of the branch instruction 16 offset sign-extend 00 32 32 Add 32 4 32 Add 32 PC 32 CSE431 Chapter 2.its use is automatically implied by instruction .20 32 branch dst address ? Irwin. PSU.PC gets updated (PC+4) during the fetch cycle so that it holds the address of the next instruction z limits the branch distance to -215 to +215-1 (word) instructions from th (i the (instruction t ti after ft the) th ) branch b h instruction. 2008 .Specifying Branch Destinations ‰ Use a register (like in lw and sw) added to the 16 16-bit bit offset z which register? Instruction Address Register (the PC) .

$s0...g. PSU. # if $s0 < $s1 then $t0=1 . # if $s0 < 25 then $t0=1 .In Support of Branch Instructions ‰ We have beq..21 2 . bne. 25 sltu $t0. $s0. but what about other kinds of branches (e. Irwin. 2008 CSE431 Chapter 2. $s1 sltiu $t0. $s0. we need yet another instruction. $s1 # if $s0 < $s1 # $t0 = 1 # $t0 = 0 then else ‰ ‰ Instruction format ( (R format): ) 0 16 17 8 0x24 ‰ Alternate versions of slt slti $t0. branch-if-less-than)? For this. 25 # if $s0 < 25 then $t0=1 ..... $s0. slt Set on less than instruction: slt $t0.

2008 CSE431 Chapter 2.recognized (and expanded) by the assembler z I why Its h the h assembler bl needs d a reserved d register i ($at) Irwin. bne. $s2. Label #$at set to 1 if #$s1 < $s2 z z z less than or equal to greater than great than or equal to ble $s1. PSU. Label bgt $s1. $zero. beq. $s2.Aside: More Branch Instructions ‰ Can use slt. $s2 $at. Label $at. $s1.22 . $s2. $s2. Label ‰ Such branches are included in the instruction set as pseudo instructions . Label bge $s1. and the fixed value of 0 in register $zero to create other conditions z less than slt bne blt $s1.

2008 CSE431 Chapter 2. IOOB # # # # # $t0 = 0 if $s1 > $t2 (max) or $s1 < 0 (min) go to IOOB if $t0 = 0 ‰ The key y is that negative g integers g in two’s complement p look like large numbers in unsigned notation.Bounds Check Shortcut ‰ Treating signed numbers as if f they were unsigned gives a low cost way of checking if 0 ≤ x < y (index out of bounds for arrays) sltu $t0. $s1. Irwin. PSU. an unsigned comparison of x < y also checks if x is negative as well as if x is less than y y.23 . $zero. Thus. $t2 beq $t0.

24 32 Irwin. 2008 . PSU.Other Control Flow Instructions ‰ MIPS also has an unconditional branch instruction or jump instruction: j label #go to label ‰ Instruction Format (J Format): 0 02 0x02 26 bit address 26-bit dd from the low order 26 bits of the jump instruction 26 00 32 4 PC CSE431 Chapter 2.

L2 L1 CSE431 Chapter 2. $s1.Aside: Branching Far Away ‰ What if f the branch destination is further f away than can be captured in 16 bits? ‰ The assembler comes to the rescue – it inserts an unconditional jump to the branch target and inverts the condition beq $s0.25 Irwin. $s1 $s0 $s1. L1 becomes bne j L2: $s0. PSU. 2008 .

26 . 2008 CSE431 Chapter 2.Instructions for Accessing Procedures ‰ MIPS procedure call instruction: jal ProcedureAddress #jump and link ‰ Saves PC+4 in register $ra to have a link to the next instruction for the procedure return Machine format (J format): 0x03 26 bit address ‰ ‰ Then can do procedure return with a j jr $ $ra # #return ‰ Instruction format (R format): 0 31 0x08 Irwin. PSU.

$a3: four argument registers 2. PSU. 3. Callee returns control to the caller z $ra: one return address register to return to the point of origin CSE431 Chapter 2. Main routine (caller) places parameters in a place where the procedure (callee) can access them z $a0 .27 Irwin. 5 5. 4. 2008 .Six Steps in Execution of a Procedure 1 1.$v1: two value registers for result values 6. Caller transfers control to the callee Callee acquires q the storage g resources needed Callee performs the desired task Callee places the result value in a place where the caller can access it z $v0 .

is used to address the stack (which “grows” from high address to low address) z add dd d data t onto t th the stack t k – push h $sp = $sp – 4 data on stack at new $sp z remove data from the stack – pop data from stack at $sp p $sp = $sp + 4 Irwin. PSU.Aside: Spilling Registers ‰ What if the callee needs to use more registers than allocated to argument and return values? z callee uses a stack – a last-in-first-out queue ‰ high addr top of stack $sp One of the general registers. $sp ($29) ).28 . 2008 low addr CSE431 Chapter 2.

$fp is initialized using $sp on a call and $sp is restored using $fp on a return $sp low addr CSE431 Chapter 2. 2008 .29 Irwin.Aside: Allocating Space on the Stack ‰ high addr Saved argument regs (if any) Saved return addr Saved local regs (if any) Local arrays y & structures (if any) $fp The segment of the stack containing a procedure’s saved registers and local variables is its procedure frame (aka activation record) z The frame pointer ($fp) points t the to th first fi t word d of f the th frame f of fa procedure – providing a stable “base” register for the procedure . PSU.

Aside: Allocating Space on the Heap ‰ Static data segment for S f constants and other static variables (e. arrays) Dynamic data segment (aka heap) for structures that grow and shrink (e (e.g.. g linked lists) z $sp Memory Stack 0x 7f f f f f f c ‰ Allocate space on the heap with ith malloc() ll () and df free it with free() in C $gp Dynamic data (heap) Static data Text (Your code) 0x 1000 8000 0 0x 1000 0000 PC 0x 0040 0000 Reserved 0x 0000 0000 CSE431 Chapter 2.g.30 Irwin. PSU.. 2008 .

48% 36% 4% 8% 0% CSE431 Chapter 2. Branch Jump Frequency Integer 16% 35% 12% 34% 2% Ft.31 Irwin.MIPS Instruction Classes Distribution ‰ Frequency of f MIPS S instruction classes for f SPEC2006 S C Instruction Class Arithmetic Data transfer Logical g Cond. PSU. Pt. 2008 .

32 $t1. uninterruptable instruction. as one operation (instruction) z Implementing an atomic exchange would require both a memory read and a memory write in a single.Atomic Exchange Support ‰ Need hardware support for f synchronization mechanisms to avoid data races where the results of the program can change depending on how events happen to occur z Two memory accesses from different threads to the same location. 0($s1) #load linked #store conditional Irwin. 0($s1) $t0.. i.e. and at least one is a write ‰ Atomic exchange (atomic swap) – interchanges a value in a register for a value in memory atomically. An alternative is to have a pair of specially configured instructions ll sc CSE431 Chapter 2. 2008 . PSU.

0($s1) #$t0=$s4 (exchange value) #load memory value to $t1 #try to store exchange #value to memory. $ $zero. t try add $s4. PSU.Automic Exchange with ll and sc ‰ If the th contents t t of f the th memory location l ti specified ifi d by b the th ll are changed before the sc to the same address occurs.33 . then sc returns a 0 in $t0 causing th code the d sequence t to try t again. $zero. if fail #$t0 will be 0 #t #try again i on f failure il #load value in $s4 try: beq $t0 b $t0. i Irwin. $t1 ‰ If the value in memory between the ll and the sc instructions changes. 2008 CSE431 Chapter 2. the sc fails (returns a zero) add $t0. 0($s1) sc $t0. $s4 ll $t1. $zero.

2008 library routines .The C Code Translation Hierarchy C program compiler assembly code assembler object code linker machine code executable loader memory CSE431 Chapter 2. PSU.34 Irwin.

Why? y Irwin.938 37.06 clock rate. 2008 CSE431 Chapter 2.79 1.66 1.Compiler Benefits ‰ C Comparing performance f f for bubble (exchange) ( ) sort z To sort 100.41 Clock cycles (M) 158. but the O3 version is the fastest.20 4 20 Relative performance 1. PSU.521 65.37 2.38 1. the O1 version has the lowest instruction count. using i Li Linux version i 2 2.00 2.990 66. with ith 2 GB of f DDR SDRAM SDRAM.470 39. a 533 MHz system bus.993 44.46 gcc opt None O1 (medium) O2 (full) O3 (proc mig) ‰ The unoptimized code has the best CPI.747 Instr count (M) 114.4.615 66.000 words with the array initialized to random values on a Pentium 4 with a 3.993 CPI 1.35 .38 2.

PSU.36 Irwin. 2008 .The Java Code Translation Hierarchy Java program compiler Class files (Java bytecodes) Java library routines (machine code) Just In Time (JIT) compiler Java Virtual Machine Compiled Java methods (machine code) CSE431 Chapter 2.

05 0.1/1.3.37 2.Sorting in C versus Java ‰ Comparing performance C f f for two sort algorithms in C and Java z The JVM/JIT is Sun/Hotspot p version 1.00 1 50 1.29 Observations? Irwin.00 2 37 2.41 0.12 2.3.50 1.37 .13 1.50 1. 2008 CSE431 Chapter 2.91 0.1 Method Opt Bubble Quick Speedup quick vs q bubble 2468 1562 1555 1955 1050 338 Relative performance C C C C Java Java ‰ Compiler Compiler Compiler Compiler Interpreted JIT compiler None O1 O2 O3 1.38 2. PSU.

38 Irwin. Immediate addressing op op rs rs rt rt operand offset 4 PC-relative 4. 2008 Memory jump destination instruction . PSU. Register addressing op rs rt rd Addressing Modes Illustrated funct Register word operand 2.1. PC relative addressing Memory branch destination instruction Program Counter (PC) 5. Base (displacement) addressing op rs rt offset Memory word or byte operand base register eg s e 3. Pseudo-direct addressing op jump address || Program Counter (PC) CSE431 Chapter 2.

PSU.MIPS Organization So Far Processor Register File src1 addr src2 addr dst addr write data 5 5 5 32 32 bits branch offset 32 PC 4 32 Add 32 32 ALU 32 32 Add 32 32 registers ($zero . 2008 .39 Irwin.$ra) src1 d data t 32 Memory 1…1100 src2 32 data read/write addr 32 230 words read data 32 32 Fetch PC = PC+4 Exec write data 32 32 4 0 5 1 6 2 7 3 Decode 32 bits byte address (big Endian) 0…1100 0…1000 0 0100 0…0100 0…0000 word address (binary) CSE431 Chapter 2.