Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek

ADVANCED
COMPUTER
ARCHITECTURE
BY
DR. RADWA M. TAWFEEK
MIPS PROCESSOR
MIPS BASICS
• Memory Data types (also aligned)

• Bytes -- 8 bits
• Half words -- 16 bits
• Words -- 32 bits
• Memory is denote “M” (e.g., M[0x10] is the byte at address 0x10)
• Registers
• 32 4-byte registers in the “register file”
• Denoted “R” (e.g., R[2] is register 2)
• Instructions
• 4 bytes (32 bits)
• 4-byte aligned (i.e., they start at addresses that are a multiple of 4 -- 0x0000, 0x0004, etc.)
• Instructions operate on memory and registers
MIPS REGISTER FILE
• Operands of arithmetic instructions must be from a special

location contained in the datapath’s register file Register File
5 32
• Thirty-two 32-bit registers src1 addr src1
5 data
• Two read ports src2 addr 25 = 32
• One write port 5 locations
dst addr 32
 Registers are 32 src2
write data data
 Fast
- Smaller is faster & Make the common case fast 32 bits
 Improves code density

- Since register are named with fewer bits than a memory location
 Register addresses are indicated by using $
THE MIPS REGISTER FILE
• All registers are the same

• Where a register is needed any register will work
• By convention, we use them for particular tasks

• Argument passing
• Temporaries, etc.
• $zero is the “zero register”

• It is always zero.
• Writes to it have no effect
MIPS ARITHMETIC INSTRUCTION
• MIPS assembly language arithmetic statement
add $t0, $s1, $s2
sub $t0, $s1, $s2
• Each arithmetic instruction performs only one operation
• Each arithmetic instruction specifies exactly three operands
destination  source1 op source2
• Operand order is fixed (the destination is specified first)
• The operands are contained in the datapath’s register file ($t0, $s1, $s2)
MIPS INSTRUCTION FORMATS
MIPS INSTRUCTION SET (1)
Category Instr OpC Example Meaning

Arithmetic add 0 & 20 add $s1, $s2, $s3 $s1 = $s2 + $s3
(R & I subtract 0 & 22 sub $s1, $s2, $s3 $s1 = $s2 - $s3
format)
add immediate 8 addi $s1, $s2, 4 $s1 = $s2 + 4
shift left logical 0 & 00 sll $s1, $s2, 4 $s1 = $s2 << 4
shift right logical 0 & 02 srl $s1, $s2, 4 $s1 = $s2 >> 4 (fill with zeros)
shift right arithmetic 0 & 03 sra $s1, $s2, 4 $s1 = $s2 >> 4 (fill with sign bit)
and 0 & 24 and $s1, $s2, $s3 $s1 = $s2 & $s3
or 0 & 25 or $s1, $s2, $s3 $s1 = $s2 | $s3
nor 0 & 27 nor $s1, $s2, $s3 $s1 = not ($s2 | $s3)
and immediate c and $s1, $s2, ff00 $s1 = $s2 & 0xff00
or immediate d or $s1, $s2, ff00 $s1 = $s2 | 0xff00
load upper immediate f lui $s1, 0xffff $s1 = 0xffff0000
MIPS INSTRUCTION SET (2)
Category Instr OpC Example Meaning

Data load word 23 lw $s1, 100($s2) $s1 = Memory($s2+100)
transfer store word 2b sw $s1, 100($s2) Memory($s2+100) = $s1
(I format)
load byte 20 lb $s1, 101($s2) $s1 = Memory($s2+101)
store byte 28 sb $s1, 101($s2) Memory($s2+101) = $s1
load half 21 lh $s1, 101($s2) $s1 = Memory($s2+102)
store half 29 sh $s1, 101($s2) Memory($s2+102) = $s1
Cond. br on equal 4 beq $s1, $s2, L if ($s1==$s2) go to L
branch br on not equal 5 bne $s1, $s2, L if ($s1 !=$s2) go to L
(I & R
set on less than immediate a slti $s1, $s2, 100 if ($s2<100) $s1=1; else $s1=0
format)
set on less than 0 & 2a slt $s1, $s2, $s3 if ($s2<$s3) $s1=1; else $s1=0
Uncond. jump 2 j 2500 go to 10000
jump jump register 0 & 08 jr $t1 go to $t1
MIPS DATAPATH AND CONTROL
PERFORMANCE ISSUES
• Unfortunately, though simple, the single cycle approach is not used

because it is very slow
• Clock cycle must have the same length for every instruction
• Longest delay determines clock period
What is the longest (slowest) path (slowest instruction)?

INSTRUCTION CRITICAL PATHS
 Calculate cycle time assuming negligible delays (for muxes, control unit,
sign extend, PC access, shift left 2, wires, setup and hold times) except:
 Instruction and Data Memory (4 ns) ALU and adders (2 ns)
 Register File access (reads or writes) (1 ns)
Instr. I Mem Reg Rd ALU Op D Mem Reg Wr Total

R- 4 1 2 1 8
type
load 4 1 2 4 1 12
store 4 1 2 4 11
beq
4 1 2 7
jump
4 4
SINGLE CYCLE DISADVANTAGES &
ADVANTAGES
• Uses the clock cycle inefficiently – the clock cycle must be timed to
accommodate the slowest instr
• especially problematic for more complex instructions like floating point multiply
Cycle 1 Cycle 2
Clk
lw sw Waste
• May be wasteful of area since some functional units (e.g., adders) must be
duplicated since they can not be shared during a clock cycle
but
• It is simple and easy to understand
MIPS PIPELINE
• Five stages, one step per stage

1. IF: Instruction fetch from memory
2. ID: Instruction decode & register read
3. EX: Execute operation or calculate address
4. MEM: Access memory operand
5. WB: Write result back to register
PIPELINE PERFORMANCE
• Assume time for stages is

• 100ps for register read or write
• 200ps for other stages
• Compare pipelined datapath with single-cycle datapath

PIPELINE PERFORMANCE
PIPELINE SPEEDUP
• If all stages are balanced

• i.e., all take the same time
• If not balanced, speedup is less

• Speedup due to increased throughput
• Latency (time for each instruction) does not decrease
PIPELINED DATAPATH
SINGLE CYCLE, MULTIPLE CYCLE, VS. PIPELINE
Single Cycle Implementation:

Cycle 1 Cycle 2
Clk
lw sw Waste
Multiple Cycle Implementation:
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk
lw sw R-type
IFetch Dec Exec Mem WB IFetch Dec Exec Mem IFetch
Pipeline Implementation:
pipeline clock same
lw IFetch Dec Exec Mem WB
as multi-cycle clock
sw IFetch Dec Exec Mem WB
R-type IFetch Dec Exec Mem WB

PIPELINE DATAPATH (1)
MIPS PIPELINE CONTROL PATH MODIFICATIONS
• All control signals can be determined during Decode

• and held in the state registers between pipeline stages
ID/EX
EX/MEM
IF/ID Control
Add
MEM/WB
Add
4 Shift
left 2
Read Addr 1
Instruction Data
Register Read
Memory Read Addr 2Data 1 Memory
Read
PC
File Address Read

Address Write Addr ALU
Read Data
Data 2 Write Data
Write Data
Sign
16 Extend 32
WHY PIPELINE? FOR PERFORMANCE!
Time (clock cycles)
Once the
ALU
I Inst 0 IM Reg DM Reg pipeline is full,
n one instruction is
s completed every
ALU
t Inst 1 IM Reg DM Reg
cycle so CPI = 1
r.
ALU
O Inst 2 IM Reg DM Reg
r
d
ALU
e Inst 3 IM Reg DM Reg
r
ALU
Inst 4 IM Reg DM Reg
Time to fill the pipeline

HAZARDS
• Situations that prevent starting the next instruction in the next cycle
• Structure hazards
• A required resource is busy
• Data hazard
• Need to wait for previous instruction to complete its data read/write
• Control hazard
• Deciding on control action depends on previous instruction
• Can always resolve hazards by waiting

• pipeline control must detect the hazard
• and take action to resolve hazards
STRUCTURE HAZARDS
• Conflict for use of a resource

• In MIPS pipeline with a single memory
• Load/store requires data access
• Instruction fetch would have to stall for that cycle
• Would cause a pipeline “bubble”
• Hence, pipelined datapaths require separate instruction/data memories

• Or separate instruction/data caches
A SINGLE MEMORY WOULD BE A STRUCTURAL HAZARD
Time (clock cycles)
Reading data from
ALU
I lw Mem Reg Mem Reg
memory
n
s
ALU
t Inst 1 Mem Reg Mem Reg
r.
ALU
O Inst 2 Mem Reg Mem Reg
r
d
ALU
e Inst 3 Mem Reg Mem Reg
r
ALU
Inst 4 Mem Reg Mem Reg
Reading instruction
from memory
 Can fix with separate instr and data memories

HOW ABOUT REGISTER FILE ACCESS?
Time (clock cycles)
Fix register file
ALU
I add $1, IM Reg DM Reg access hazard by
n doing reads in the
s second half of the
ALU
t Inst 1 IM Reg DM Reg
cycle and writes in
r. the first half
ALU
r
d
ALU
e add $2,$1, IM Reg DM Reg
r
clock edge that controls clock edge that controls

register writing loading of pipeline state
registers
DATA HAZARD
• An instruction depends on completion of data access by a previous

instruction
• Read after write (RAW)
• Write after Read (WAR) Not Common
in simple
• Write after Write (WAW) Pipeline
REGISTER USAGE CAN CAUSE DATA HAZARDS
• Dependencies backward in time cause hazards
ALU
add $1, IM Reg DM Reg
ALU
sub $4,$1,$5 IM Reg DM Reg
ALU
and $6,$1,$7 IM Reg DM Reg
ALU
or $8,$1,$9 IM Reg DM Reg
ALU
IM Reg DM Reg
xor $4,$1,$5
 Read after write data hazard (RAW)
ONE WAY TO “FIX” A DATA HAZARD
Can fix data

hazard by
ALU
I add $1, IM Reg DM Reg
waiting – stall
n
s
t stall
r.
O stall
r
d
ALU
e sub $4,$1,$5 IM Reg DM Reg
r
ALU
and $6,$1,$7 IM Reg DM Reg
DATA HAZARDS
• An instruction depends on
completion of data access by a
previous instruction
ANOTHER WAY TO “FIX” A DATA HAZARD
Fix data hazards by

forwarding results
ALU
add $1, IM Reg DM Reg
I as soon as they are
n available to where
s they are needed
ALU
t IM Reg DM Reg
sub $4,$1,$5
r.
ALU
IM Reg DM Reg
r and $6,$1,$7
d
e
ALU
r IM Reg DM Reg
or $8,$1,$9
ALU
IM Reg DM Reg
xor $4,$1,$5
FORWARDING (BYPASSING)
• Use result when it is computed

• Don’t wait for it to be stored in a register
• Requires extra connections in the datapath
FORWARDING WITH LOAD-USE DATA HAZARDS
ALU
I lw $1,4($2)IM Reg DM Reg
n
s
ALU
sub $4,$1,$5 IM Reg DM Reg
t
r.
ALU
IM Reg DM Reg
O and $6,$1,$7
r
d
ALU
IM Reg DM Reg
e or $8,$1,$9
r
ALU
IM Reg DM Reg
xor $4,$1,$5
• Will still need one stall cycle even with forwarding

LOAD-USE DATA HAZARD
• Can’t always avoid stalls by forwarding

• If value not computed when needed
• Can’t forward backward in time!
CODE SCHEDULING TO AVOID STALLS
• Reorder code to avoid use of load result in the next instruction

• C code for A = B + E; C = B + F;
CONTROL HAZARDS
• Branch determines flow of control

• Fetching next instruction depends on branch outcome
• Pipeline can’t always fetch correct instruction
• Still working on ID stage of branch
• In MIPS pipeline
• Need to compare registers and compute target early in the pipeline
• Add hardware to do it in ID stage
JUMPS INCUR ONE STALL
 Jumps not decoded until ID, so one flush is needed
Fix jump
ALU
I j IM Reg DM Reg
hazard by
n
waiting –
s
flush
ALU
t flush IM Reg DM Reg
r.
ALU
O IM Reg DM Reg
j target
r
d
e
r
• Fortunately, jumps are very infrequent

BRANCHES CAUSE CONTROL HAZARDS
• Dependencies backward in time cause hazards
ALU
I beq IM Reg DM Reg
n
s
ALU
t lw IM Reg DM Reg
r.
ALU
r
d
ALU
e Inst 4 IM Reg DM Reg
r
ONE WAY TO “FIX” A BRANCH CONTROL HAZARD
Fix branch
ALU
I beq IM Reg DM Reg hazard by
n waiting –
s flush – but
ALU
t flush IM Reg DM Reg
affects CPI
r.
ALU
IM Reg DM Reg
O flush
r
d
ALU
IM Reg DM Reg
e flush
r
ALU
IM Reg DM Reg
beq target
ALU
IM Reg DM
Inst 3
ANOTHER WAY TO “FIX” A BRANCH CONTROL HAZARD
• Move branch decision hardware back to as early in the

pipeline as possible – i.e., during the decode cycle
ALU
beq IM Reg DM Reg Fix branch
I
n hazard by
s waiting –
ALU
t flush IM Reg DM Reg flush
r.
ALU
O IM Reg DM Reg
r beq target
d
ALU
e IM Reg DM
r Inst 3
STALL ON BRANCH
• Wait until branch outcome determined before fetching next instruction

BRANCH PREDICTION
• Longer pipelines can’t readily determine branch outcome early

• Stall penalty becomes unacceptable
• Predict outcome of branch

• Only stall if prediction is wrong
• In MIPS pipeline
• Can predict branches not taken
• Fetch instruction after branch, with no delay
YET ANOTHER WAY TO “FIX” A CONTROL HAZARD
 “Predictbranches are always not taken – and

take corrective action when wrong (i.e., taken)
Branch decision
ALU
IM Reg DM Reg
I 4 beq $1,$2,2 hardware moved
n to the decode
s
ALU
cycle
8 sub $4,$1,$5 flush
IM Reg DM Reg
t
r.
ALU
16 and $6,$1,$7 IM Reg DM Reg
O
r
d
ALU
IM Reg DM Reg
e 20 or r8,$1,$9
r
MORE-REALISTIC BRANCH PREDICTION
• Static branch prediction

• Based on typical branch behavior
• Example: loop and if-statement branches
• Predict backward branches taken
• Predict forward branches not taken
• Dynamic branch prediction

• Hardware measures actual branch behavior
• e.g., record recent history of each branch
• Assume future behavior will continue the trend
• When wrong, stall while re-fetching, and update history

Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek

Uploaded by

Copyright:

Available Formats

ADVANCED

• Memory Data types (also aligned)

• Operands of arithmetic instructions must be from a special

 Improves code density

• All registers are the same

• By convention, we use them for particular tasks

• $zero is the “zero register”

Category Instr OpC Example Meaning

Category Instr OpC Example Meaning

• Unfortunately, though simple, the single cycle approach is not used

What is the longest (slowest) path (slowest instruction)?

Instr. I Mem Reg Rd ALU Op D Mem Reg Wr Total

• Five stages, one step per stage

• Assume time for stages is

• Compare pipelined datapath with single-cycle datapath

• If all stages are balanced

• If not balanced, speedup is less

Single Cycle Implementation:

Multiple Cycle Implementation:

R-type IFetch Dec Exec Mem WB

• All control signals can be determined during Decode

File Address Read

Time to fill the pipeline

• Can always resolve hazards by waiting

• Conflict for use of a resource

• Hence, pipelined datapaths require separate instruction/data memories

Reading data from

 Can fix with separate instr and data memories

Fix register file

clock edge that controls clock edge that controls

• An instruction depends on completion of data access by a previous

• Dependencies backward in time cause hazards

Can fix data

Fix data hazards by

• Use result when it is computed

• Will still need one stall cycle even with forwarding

• Can’t always avoid stalls by forwarding

• Reorder code to avoid use of load result in the next instruction

• Branch determines flow of control

 Jumps not decoded until ID, so one flush is needed

• Fortunately, jumps are very infrequent

• Dependencies backward in time cause hazards

• Move branch decision hardware back to as early in the

• Wait until branch outcome determined before fetching next instruction

• Longer pipelines can’t readily determine branch outcome early

• Predict outcome of branch

 “Predictbranches are always not taken – and

• Static branch prediction

• Dynamic branch prediction

You might also like