You are on page 1of 50

ADVANCED

COMPUTER
ARCHITECTURE
BY
DR. RADWA M. TAWFEEK
MIPS PROCESSOR
MIPS BASICS

• Memory Data types (also aligned)


• Bytes -- 8 bits
• Half words -- 16 bits
• Words -- 32 bits
• Memory is denote “M” (e.g., M[0x10] is the byte at address 0x10)

• Registers
• 32 4-byte registers in the “register file”
• Denoted “R” (e.g., R[2] is register 2)

• Instructions
• 4 bytes (32 bits)
• 4-byte aligned (i.e., they start at addresses that are a multiple of 4 -- 0x0000, 0x0004, etc.)
• Instructions operate on memory and registers
MIPS REGISTER FILE

• Operands of arithmetic instructions must be from a special


location contained in the datapath’s register file Register File
5 32
• Thirty-two 32-bit registers src1 addr src1
5 data
• Two read ports src2 addr 25 = 32
• One write port 5 locations
dst addr 32
 Registers are 32 src2
write data data
 Fast
- Smaller is faster & Make the common case fast 32 bits

 Improves code density


- Since register are named with fewer bits than a memory location
 Register addresses are indicated by using $
THE MIPS REGISTER FILE

• All registers are the same


• Where a register is needed any register will work

• By convention, we use them for particular tasks


• Argument passing
• Temporaries, etc.

• $zero is the “zero register”


• It is always zero.
• Writes to it have no effect
MIPS ARITHMETIC INSTRUCTION
• MIPS assembly language arithmetic statement
add $t0, $s1, $s2
sub $t0, $s1, $s2
• Each arithmetic instruction performs only one operation
• Each arithmetic instruction specifies exactly three operands
destination  source1 op source2
• Operand order is fixed (the destination is specified first)

• The operands are contained in the datapath’s register file ($t0, $s1, $s2)
MIPS INSTRUCTION FORMATS
MIPS INSTRUCTION SET (1)

Category Instr OpC Example Meaning


Arithmetic add 0 & 20 add $s1, $s2, $s3 $s1 = $s2 + $s3
(R & I subtract 0 & 22 sub $s1, $s2, $s3 $s1 = $s2 - $s3
format)
add immediate 8 addi $s1, $s2, 4 $s1 = $s2 + 4
shift left logical 0 & 00 sll $s1, $s2, 4 $s1 = $s2 << 4
shift right logical 0 & 02 srl $s1, $s2, 4 $s1 = $s2 >> 4 (fill with zeros)

shift right arithmetic 0 & 03 sra $s1, $s2, 4 $s1 = $s2 >> 4 (fill with sign bit)

and 0 & 24 and $s1, $s2, $s3 $s1 = $s2 & $s3
or 0 & 25 or $s1, $s2, $s3 $s1 = $s2 | $s3
nor 0 & 27 nor $s1, $s2, $s3 $s1 = not ($s2 | $s3)
and immediate c and $s1, $s2, ff00 $s1 = $s2 & 0xff00
or immediate d or $s1, $s2, ff00 $s1 = $s2 | 0xff00
load upper immediate f lui $s1, 0xffff $s1 = 0xffff0000
MIPS INSTRUCTION SET (2)

Category Instr OpC Example Meaning


Data load word 23 lw $s1, 100($s2) $s1 = Memory($s2+100)
transfer store word 2b sw $s1, 100($s2) Memory($s2+100) = $s1
(I format)
load byte 20 lb $s1, 101($s2) $s1 = Memory($s2+101)
store byte 28 sb $s1, 101($s2) Memory($s2+101) = $s1
load half 21 lh $s1, 101($s2) $s1 = Memory($s2+102)
store half 29 sh $s1, 101($s2) Memory($s2+102) = $s1
Cond. br on equal 4 beq $s1, $s2, L if ($s1==$s2) go to L
branch br on not equal 5 bne $s1, $s2, L if ($s1 !=$s2) go to L
(I & R
set on less than immediate a slti $s1, $s2, 100 if ($s2<100) $s1=1; else $s1=0
format)
set on less than 0 & 2a slt $s1, $s2, $s3 if ($s2<$s3) $s1=1; else $s1=0
Uncond. jump 2 j 2500 go to 10000
jump jump register 0 & 08 jr $t1 go to $t1
MIPS DATAPATH AND CONTROL
PERFORMANCE ISSUES

• Unfortunately, though simple, the single cycle approach is not used


because it is very slow
• Clock cycle must have the same length for every instruction
• Longest delay determines clock period

What is the longest (slowest) path (slowest instruction)?


INSTRUCTION CRITICAL PATHS

 Calculate cycle time assuming negligible delays (for muxes, control unit,
sign extend, PC access, shift left 2, wires, setup and hold times) except:
 Instruction and Data Memory (4 ns) ALU and adders (2 ns)
 Register File access (reads or writes) (1 ns)

Instr. I Mem Reg Rd ALU Op D Mem Reg Wr Total


R- 4 1 2 1 8
type
load 4 1 2 4 1 12
store 4 1 2 4 11
beq
4 1 2 7
jump
4 4
SINGLE CYCLE DISADVANTAGES &
ADVANTAGES

• Uses the clock cycle inefficiently – the clock cycle must be timed to
accommodate the slowest instr
• especially problematic for more complex instructions like floating point multiply

Cycle 1 Cycle 2
Clk

lw sw Waste
• May be wasteful of area since some functional units (e.g., adders) must be
duplicated since they can not be shared during a clock cycle
but
• It is simple and easy to understand
MIPS PIPELINE

• Five stages, one step per stage


1. IF: Instruction fetch from memory
2. ID: Instruction decode & register read
3. EX: Execute operation or calculate address
4. MEM: Access memory operand
5. WB: Write result back to register
PIPELINE PERFORMANCE

• Assume time for stages is


• 100ps for register read or write
• 200ps for other stages

• Compare pipelined datapath with single-cycle datapath


PIPELINE PERFORMANCE
PIPELINE SPEEDUP

• If all stages are balanced


• i.e., all take the same time

• If not balanced, speedup is less


• Speedup due to increased throughput
• Latency (time for each instruction) does not decrease
PIPELINED DATAPATH
SINGLE CYCLE, MULTIPLE CYCLE, VS. PIPELINE

Single Cycle Implementation:


Cycle 1 Cycle 2
Clk

lw sw Waste

Multiple Cycle Implementation:

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk
lw sw R-type
IFetch Dec Exec Mem WB IFetch Dec Exec Mem IFetch

Pipeline Implementation:
pipeline clock same
lw IFetch Dec Exec Mem WB
as multi-cycle clock
sw IFetch Dec Exec Mem WB

R-type IFetch Dec Exec Mem WB


PIPELINE DATAPATH (1)
PIPELINE DATAPATH (2)
PIPELINE DATAPATH (3)
PIPELINE DATAPATH (4)
PIPELINE DATAPATH (5)
PIPELINE DATAPATH (6)
PIPELINE DATAPATH (7)
MIPS PIPELINE CONTROL PATH MODIFICATIONS

• All control signals can be determined during Decode


• and held in the state registers between pipeline stages
ID/EX
EX/MEM

IF/ID Control

Add
MEM/WB
Add
4 Shift
left 2
Read Addr 1
Instruction Data
Register Read
Memory Read Addr 2Data 1 Memory
Read
PC

File Address Read


Address Write Addr ALU
Read Data
Data 2 Write Data
Write Data

Sign
16 Extend 32
WHY PIPELINE? FOR PERFORMANCE!
Time (clock cycles)

Once the

ALU
I Inst 0 IM Reg DM Reg pipeline is full,
n one instruction is
s completed every

ALU
t Inst 1 IM Reg DM Reg
cycle so CPI = 1
r.

ALU
O Inst 2 IM Reg DM Reg
r
d

ALU
e Inst 3 IM Reg DM Reg
r

ALU
Inst 4 IM Reg DM Reg

Time to fill the pipeline


HAZARDS

• Situations that prevent starting the next instruction in the next cycle
• Structure hazards
• A required resource is busy

• Data hazard
• Need to wait for previous instruction to complete its data read/write

• Control hazard
• Deciding on control action depends on previous instruction

• Can always resolve hazards by waiting


• pipeline control must detect the hazard
• and take action to resolve hazards
STRUCTURE HAZARDS

• Conflict for use of a resource


• In MIPS pipeline with a single memory
• Load/store requires data access
• Instruction fetch would have to stall for that cycle
• Would cause a pipeline “bubble”

• Hence, pipelined datapaths require separate instruction/data memories


• Or separate instruction/data caches
A SINGLE MEMORY WOULD BE A STRUCTURAL HAZARD
Time (clock cycles)

Reading data from

ALU
I lw Mem Reg Mem Reg
memory
n
s

ALU
t Inst 1 Mem Reg Mem Reg
r.

ALU
O Inst 2 Mem Reg Mem Reg
r
d

ALU
e Inst 3 Mem Reg Mem Reg
r

ALU
Inst 4 Mem Reg Mem Reg
Reading instruction
from memory

 Can fix with separate instr and data memories


HOW ABOUT REGISTER FILE ACCESS?
Time (clock cycles)

Fix register file

ALU
I add $1, IM Reg DM Reg access hazard by
n doing reads in the
s second half of the

ALU
t Inst 1 IM Reg DM Reg
cycle and writes in
r. the first half

ALU
O Inst 2 IM Reg DM Reg
r
d

ALU
e add $2,$1, IM Reg DM Reg
r

clock edge that controls clock edge that controls


register writing loading of pipeline state
registers
DATA HAZARD

• An instruction depends on completion of data access by a previous


instruction
• Read after write (RAW)
• Write after Read (WAR) Not Common
in simple
• Write after Write (WAW) Pipeline
REGISTER USAGE CAN CAUSE DATA HAZARDS

• Dependencies backward in time cause hazards

ALU
add $1, IM Reg DM Reg

ALU
sub $4,$1,$5 IM Reg DM Reg

ALU
and $6,$1,$7 IM Reg DM Reg

ALU
or $8,$1,$9 IM Reg DM Reg

ALU
IM Reg DM Reg
xor $4,$1,$5
 Read after write data hazard (RAW)
ONE WAY TO “FIX” A DATA HAZARD

Can fix data


hazard by

ALU
I add $1, IM Reg DM Reg
waiting – stall
n
s
t stall
r.

O stall
r
d

ALU
e sub $4,$1,$5 IM Reg DM Reg
r

ALU
and $6,$1,$7 IM Reg DM Reg
DATA HAZARDS

• An instruction depends on
completion of data access by a
previous instruction
ANOTHER WAY TO “FIX” A DATA HAZARD

Fix data hazards by


forwarding results

ALU
add $1, IM Reg DM Reg
I as soon as they are
n available to where
s they are needed

ALU
t IM Reg DM Reg
sub $4,$1,$5
r.

ALU
IM Reg DM Reg
r and $6,$1,$7
d
e

ALU
r IM Reg DM Reg
or $8,$1,$9

ALU
IM Reg DM Reg
xor $4,$1,$5
FORWARDING (BYPASSING)

• Use result when it is computed


• Don’t wait for it to be stored in a register
• Requires extra connections in the datapath
FORWARDING WITH LOAD-USE DATA HAZARDS

ALU
I lw $1,4($2)IM Reg DM Reg

n
s

ALU
sub $4,$1,$5 IM Reg DM Reg
t
r.

ALU
IM Reg DM Reg
O and $6,$1,$7
r
d

ALU
IM Reg DM Reg
e or $8,$1,$9
r

ALU
IM Reg DM Reg
xor $4,$1,$5

• Will still need one stall cycle even with forwarding


LOAD-USE DATA HAZARD

• Can’t always avoid stalls by forwarding


• If value not computed when needed
• Can’t forward backward in time!
CODE SCHEDULING TO AVOID STALLS

• Reorder code to avoid use of load result in the next instruction


• C code for A = B + E; C = B + F;
CONTROL HAZARDS

• Branch determines flow of control


• Fetching next instruction depends on branch outcome
• Pipeline can’t always fetch correct instruction
• Still working on ID stage of branch

• In MIPS pipeline
• Need to compare registers and compute target early in the pipeline
• Add hardware to do it in ID stage
JUMPS INCUR ONE STALL

 Jumps not decoded until ID, so one flush is needed

Fix jump

ALU
I j IM Reg DM Reg
hazard by
n
waiting –
s
flush

ALU
t flush IM Reg DM Reg
r.

ALU
O IM Reg DM Reg
j target
r
d
e
r

• Fortunately, jumps are very infrequent


BRANCHES CAUSE CONTROL HAZARDS

• Dependencies backward in time cause hazards

ALU
I beq IM Reg DM Reg
n
s

ALU
t lw IM Reg DM Reg
r.

ALU
O Inst 3 IM Reg DM Reg
r
d

ALU
e Inst 4 IM Reg DM Reg
r
ONE WAY TO “FIX” A BRANCH CONTROL HAZARD

Fix branch

ALU
I beq IM Reg DM Reg hazard by
n waiting –
s flush – but

ALU
t flush IM Reg DM Reg
affects CPI
r.

ALU
IM Reg DM Reg
O flush
r
d

ALU
IM Reg DM Reg
e flush
r

ALU
IM Reg DM Reg
beq target

ALU
IM Reg DM
Inst 3
ANOTHER WAY TO “FIX” A BRANCH CONTROL HAZARD

• Move branch decision hardware back to as early in the


pipeline as possible – i.e., during the decode cycle

ALU
beq IM Reg DM Reg Fix branch
I
n hazard by
s waiting –

ALU
t flush IM Reg DM Reg flush
r.

ALU
O IM Reg DM Reg
r beq target
d

ALU
e IM Reg DM
r Inst 3
STALL ON BRANCH

• Wait until branch outcome determined before fetching next instruction


BRANCH PREDICTION

• Longer pipelines can’t readily determine branch outcome early


• Stall penalty becomes unacceptable

• Predict outcome of branch


• Only stall if prediction is wrong

• In MIPS pipeline
• Can predict branches not taken
• Fetch instruction after branch, with no delay
YET ANOTHER WAY TO “FIX” A CONTROL HAZARD

 “Predictbranches are always not taken – and


take corrective action when wrong (i.e., taken)

Branch decision

ALU
IM Reg DM Reg
I 4 beq $1,$2,2 hardware moved
n to the decode
s

ALU
cycle
8 sub $4,$1,$5 flush
IM Reg DM Reg
t
r.

ALU
16 and $6,$1,$7 IM Reg DM Reg
O
r
d

ALU
IM Reg DM Reg
e 20 or r8,$1,$9
r
MORE-REALISTIC BRANCH PREDICTION

• Static branch prediction


• Based on typical branch behavior
• Example: loop and if-statement branches
• Predict backward branches taken
• Predict forward branches not taken

• Dynamic branch prediction


• Hardware measures actual branch behavior
• e.g., record recent history of each branch
• Assume future behavior will continue the trend
• When wrong, stall while re-fetching, and update history

You might also like