Professional Documents
Culture Documents
Advanced
Microarchitecture
Advanced Microarchitecture
• Deep Pipelining
• Microoperations
• Branch Prediction
• Superscalar Processors
• Out of Order Processors
• Register Renaming
• SIMD
• Multithreading
• Multiprocessors
– Cost 250
Time (ps)
200
Tc
150
Instruction
Time
5 6 7 8 9 10 11 12
N: # of pipeline stages
Without μops, would need 2nd write port on the register file
Done:
Done:
Done:
Only mispredicts last branch of loop
144 Digital Design & Computer Architecture Microarchitecture
Chapter 7: Microarchitecture
CLK
PC RD A1
A A2
A3 RD1
RD4
ALUs
A4 A1 RD1
Instruction A5 Register A2 RD2
A6 File RD2
Memory RD5 Data
WD3 Memory
WD6
WD1
WD2
Time (cycles)
s0
lw s7
lw s7, 40(s0) 40 +
RF t1 DM RF
IM
add s8
add s8, t1, t2 t2 +
s1
sub s9
sub s9, s1, s3 s3 -
RF s3 DM RF
IM
and s10
and s10, s3, t4 t4 &
s1
or s11
or s11, s1, t5 t5 |
RF s2 DM RF
IM
sw s5
sw s5, 80(s2) 80
Time (cycles)
s0
lw s8
lw s8, 40(s0) 40 +
RF t5 DM RF
IM
or s11
or s11, t5, t6 t6 |
RAW
s11
sw s7
sw s7, 80(s11) 80 +
RF DM RF
IM
two cycle latency
RAW
between load and
use of s8 s8
add s9
add s9, s8, t1 t1 +
RF t2 DM RF
WAR IM
sub s8
sub s8, t2, t3 t3 -
RAW
s4
and s10
and s10, s4, s8 s8 &
RF DM RF
IM
Time (cycles)
s0
lw s8
lw s8, 40(s0) 40 +
RF t5 DM RF
IM
or s11
or s11, t5, t6 t6 |
RAW
s11
sw s7
sw s7, 80(s11) 80 +
RF DM RF
IM
two cycle latency
RAW
between load and
use of R8 s8
add s9
add s9, s8, t1 t1 +
RF t2 DM RF
WAR IM
sub s8
sub s8, t2, t3 t3 -
RAW
s4
and s10
and s10, s4, s8 s8 &
RF DM RF
IM
Time (cycles)
s0
lw s8
lw s8, 40(s0) 40 +
RF t2 DM RF
IM
sub r0
sub r0, t2, t3 t3 -
RAW s8
s9
add
add s9, s8, t1 t1 +
RF s11 DM RF
IM
sw s7
sw s7, 80(s11) 80 +
a7 a6 a5 a4 a3 a2 a1 a0 D0
+ b7 b6 b5 b4 b3 b2 b1 b0 D1
a7 + b7 a6 + b6 a5 + b5 a4 + b4 a3 + b3 a2 + b2 a1 + b1 a0 + b0 D2
Multithreading &
Multiprocessors
Advanced Architecture Techniques
• Multithreading
– Wordprocessor: thread for typing, spell checking,
printing
• Multiprocessors
– Multiple processors (cores) on a single chip