Professional Documents
Culture Documents
EEE415 Week06 Micro Architecture
EEE415 Week06 Micro Architecture
Week 6
1
DRAFT 7/13/2023
2
DRAFT 7/13/2023
3
DRAFT 7/13/2023
10
4
DRAFT 7/13/2023
11
12
5
DRAFT 7/13/2023
13
14
6
DRAFT 7/13/2023
15
16
7
DRAFT 7/13/2023
17
18
8
DRAFT 7/13/2023
19
20
9
DRAFT 7/13/2023
21
22
10
DRAFT 7/13/2023
23
24
11
DRAFT 7/13/2023
Multicycle Datapath: B
Calculate branch target address:
BTA = (ExtImm) + (PC+8)
ExtImm = Imm24 << 2 and sign-extended
25
26
12
DRAFT 7/13/2023
Multicycle Control
27
28
13
DRAFT 7/13/2023
Decoder
29
30
14
DRAFT 7/13/2023
31
32
15
DRAFT 7/13/2023
Decoder
33
34
16
DRAFT 7/13/2023
35
36
17
DRAFT 7/13/2023
37
38
18
DRAFT 7/13/2023
39
40
19
DRAFT 7/13/2023
41
42
20
DRAFT 7/13/2023
Multicycle Control
43
44
21
DRAFT 7/13/2023
45
• ExecuteI/ExecuteR state:
CondEx
CondEx asserts
ALUFlags generated
Cond3:0
CLK • ALUWB state:
[3:2] Flags3:2 Flags updated
Conditi on
CondEx changes
Check
FlagWrite1:0
[1]
46
22
DRAFT 7/13/2023
47
48
23
DRAFT 7/13/2023
Week 6
49
Practice!!
50
24
DRAFT 7/13/2023
51
52
25
DRAFT 7/13/2023
53
54
26
DRAFT 7/13/2023
55
56
27
DRAFT 7/13/2023
Decoder Conditional
Logic
57
58
28
DRAFT 7/13/2023
59
60
29
DRAFT 7/13/2023
61
62
30
DRAFT 7/13/2023
63
64
31
DRAFT 7/13/2023
65
Execution Time = ?
66
32
DRAFT 7/13/2023
67
68
33
DRAFT 7/13/2023
Flags
ALUFlags
RegSrc
0 1 CLK CLK
CLK
19:16
Instr
ALU
0 3:0 A RD
Instruction 0 RA2
A2 RD2 0 SrcB Data
Memory 1
15:12 1 Memory
A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15
+
PCPlus4 0
+
4
23:0
Extend ExtImm
Result
69
Single-Cycle Control
70
34
DRAFT 7/13/2023
tdec RegWrite
Flags
tpcq_PC ALUFlags
tmem
tmem tmux CLK tRFread tALU
RegSrc
0 1 CLK
CLK
19:16
Instr
ALU
0 3:0 A RD
Instruction 0 RA2
A2 RD2 0 SrcB Data
Memory 1
15:12
tRFsetup A3 Register
WD3 File
1
WriteData
Memory
WD tmux
4 1
PCPlus8
R15
+
PCPlus4 0
+
4
23:0
Extend ExtImm
Result
71
72
35
DRAFT 7/13/2023
73
74
36
DRAFT 7/13/2023
75
76
37
DRAFT 7/13/2023
77
78
38
DRAFT 7/13/2023
79
80
39
DRAFT 7/13/2023
81
82
40
DRAFT 7/13/2023
Week 6
83
What Is Pipelining?
Engine
Chassis Paint
Time Time
A A
B B
C C
84
41
DRAFT 7/13/2023
What Is Pipelining?
• We arrange for the different phases of
execution to be overlapped. We aim to
exploit “temporal” parallelism.
• How are latency and throughput affected?
85
86
42
DRAFT 7/13/2023
Pipelining
C T
Clock period = T + C
Combinational
logic Now, if we create two pipeline
stages:
clk
C C C Clock period = T/2 + C
T/2 T/2
If C is small, our clock
frequency has almost doubled.
Hence, our throughput will
clk double, too.
Pipelining
register
87
88
43
DRAFT 7/13/2023
89
Instr
Pipelined
Fetch Dec Execute Memory Wr
1 Read
Instruction Reg ALU Read / Write Reg
Fetch Dec Execute Memory Wr
2 Read
Instruction Reg
ALU Read / Write Reg
Fetch Dec Execute Memory Wr
3 Read
Instruction Reg
ALU Read / Write Reg
(b)
90
44
DRAFT 7/13/2023
Time (cycles)
R0
LDR DM R2
LDR R2, [R0, #40] IM RF 40 + RF
R9
ADD DM R3
ADD R3, R9, R10 IM RF R10 + RF
R1
SUB DM R4
SUB R4, R1, R5 IM RF R5 - RF
R12
AND DM R5
AND R5, R12, R13 IM RF R13 & RF
R1
STR DM R6
STR R6, [R1, #20] IM RF 20 + RF
R11
ORR DM R7
ORR R7, R11, #42 IM RF 42 | RF
91
0 3:0 A RD
Instruction 0 RA2
A2 RD2 0 SrcB Data
Memory 1
15:12 1 Memory
A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15
+
PCPlus4 0
+
4
23:0
Extend ExtImm
Result
(a)
Pipelined
CLK CLK CLK CLK
CLK
CLK
CLK
InstrF
InstrD
19:16
0 RA1D WE3 SrcAE WE
1 PC' PCF A1 RD1
A RD 15 1 ALUResultE ReadDataW
0
ALU
3:0
0 RA2D A RD
Instruction A2 RD2 0 SrcBE
Memory 1 Data
15:12 WA3D 1 Memory
A3 Register WriteDataE
WD
WD3 File
4 1
PCPlus8
R15 ALUOutM ALUOutW
+
PCPlus4F 0
+
4
23:0
Extend ExtImmE
ResultW
92
45
DRAFT 7/13/2023
InstrF
InstrD
19:16
0 RA1D WE3 SrcAE WE
1 PC' PCF A1 RD1
A RD 15 1 ALUResultE ReadDat aW
ALU
0 3:0 A RD
Instruction 0 RA2D
A2 RD2 0 SrcBE Data
Memory 1
1 Memory
A3 Register WriteDataE
PCPlus8 WD
WD3 File
4 1
R15 ALUOutM ALUOutW
+
PCPlus4F 0
+
ResultW
93
InstrD
19:16
0 RA1D WE3 SrcAE WE
1 PC' PCF A1 RD1
A RD 15 1 ALUResultE ReadDat aW
ALU
0 3:0 A RD
Instruction 0 RA2D
A2 RD2 0 SrcBE Data
Memory 1
1 Memory
A3 Register WriteDataE
PCPlus8 WD
WD3 File
4 1
R15 ALUOutM ALUOutW
+
PCPlus4F 0
+
ResultW
InstrD
19:16
0 RA1D WE3 SrcAE WE
1 PC' PCF A1 RD1
A RD 15 1 ALUResultE ReadDat aW
ALU
0 3:0 A RD
Instruction 0 RA2D
A2 RD2 0 SrcBE Data
Memory 1
1 Memory
A3 Register WriteDataE
WD
WD3 File
1
R15 ALUOutM ALUOutW
PCPlus4F 0
+
94
46
DRAFT 7/13/2023
Flags'
PCSrcD PCSrcE PCSrcM PCSrcW
Control
RegWriteD RegWriteE RegWriteM RegWriteW
Unit
MemtoRegD MemtoRegE
MemtoRegM MemtoRegW
27:26 MemWriteD MemWriteE MemWriteM
Op
CondExE
25:20 ALUControlD ALUControlE
Funct
15:12
Rd BranchD BranchE
ALUSrcD ALUSrcE
FlagWriteD FlagWriteE
ImmSrcD
31:28 CondE Cond
Unit
RegSrcD
CLK CLK FlagsE
0 1 CLK
CLK
InstrF
InstrD
19:16
0 WE3 ALUFlags WE
RA1D SrcAE
1 PC' PCF A1 RD1
A RD 15 1 ALUResultE ReadDat aW
ALU
0 3:0 A RD
Instruction 0 RA2D
A2 RD2 0 SrcBE Data
Memory 1
1 Memory
A3 Register WriteDataE
WD
WD3 File
ExtImmE
1
R15 ALUOutM ALUOutW
PCPlus4F 0
+
23:0 Extend
PCPlus8D
ResultW
95
Pipeline Hazards
• When an instruction depends on result from instruction that
hasn’t completed
• Types:
• Datahazard: register value not yet written back to register file
• Control hazard: next instruction not decided yet (caused by branch)
96
47
DRAFT 7/13/2023
Data Hazard
1 2 3 4 5 6 7 8
Time (cycles)
R4
ADD DM R1
ADD R1, R4, R5 IM RF R5 + RF
R1
AND DM R8
AND R8, R1, R3 IM RF R3 & RF
R6
ORR DM R9
ORR R9, R6, R1 IM RF R1 | RF
R1
SUB DM R10
SUB R10, R1, R7 IM RF R7 - RF
97
98
48
DRAFT 7/13/2023
Time (cycles)
R4
ADD DM R1
ADD R1, R4, R5 IM RF R5 + RF
NOP DM
NOP IM RF RF
NOP DM
NOP IM RF RF
R1
AND DM R8
AND R8, R1, R3 IM RF R3 & RF
R6
ORR DM R9
ORR R9, R6, R1 IM RF R1 | RF
R1
SUB DM R10
SUB R10, R1, R7 IM RF R7 - RF
99
Data Forwarding
1 2 3 4 5 6 7 8
Time (cycles)
R4
ADD DM R1
ADD R1, R4, R5 IM RF R5 + RF
R1
AND DM R8
AND R8, R1, R3 IM RF R3 & RF
R6
ORR DM R9
ORR R9, R6, R1 IM RF R1 | RF
R1
SUB DM R10
SUB R10, R1, R7 IM RF R7 - RF
100
49
DRAFT 7/13/2023
Data Forwarding
1 2 3 4 5 6 7 8
Time (cycles)
R4
ADD DM R1
ADD R1, R4, R5 IM RF R5 + RF
R1
AND DM R8
AND R8, R1, R3 IM RF R3 & RF
R6
ORR DM R9
ORR R9, R6, R1 IM RF R1 | RF
R1
SUB DM R10
SUB R10, R1, R7 IM RF R7 - RF
101
Data Forwarding
CLK CLK CLK
Flags'
InstrD
19:16
0 WE3 ALUFlags WE
RA1D SrcAE
1 PC' PCF A1 RD1 00
A RD 15 1 01 ALUResultE
10 ReadDat aW
ALU
0 3:0 A RD
Instruction 0 RA2D
A2 RD2 00 0 SrcBE Data
Memory 1 01
10 1 Memory
A3 Register WriteDataE
WD
File
ExtImmE
WD3
1
R15 ALUOutM ALUOutW
PCPlus4F 0
+
PCPlus8D
Result W
RegWriteW
ForwardAE
ForwardBE
RegWriteM
Match
Hazard Unit
102
50
DRAFT 7/13/2023
Data Forwarding
• Execute stage register matches Memory stage register?
Match_1E_M = (RA1E == WA3M)
Match_2E_M = (RA2E == WA3M)
• Execute stage register matches Writeback stage register?
Match_1E_W = (RA1E == WA3W)
Match_2E_W = (RA2E == WA3W)
• If it matches, forward result:
103
Data Forwarding
• Execute stage register matches Memory stage register?
Match_1E_M = (RA1E == WA3M)
Match_2E_M = (RA2E == WA3M)
• Execute stage register matches Writeback stage register?
Match_1E_W = (RA1E == WA3W)
Match_2E_W = (RA2E == WA3W)
• If it matches, forward result:
104
51
DRAFT 7/13/2023
Stalling
1 2 3 4 5 6 7 8
Time (cycles)
R4
LDR DM R1
LDR R1, [R4, #40] IM RF 40 + RF
Trouble!
R1
AND DM R8
AND R8, R1, R3 IM RF R3 & RF
R6
ORR DM R9
ORR R9, R6, R1 IM RF R1 | RF
R1
SUB DM R10
SUB R10, R1, R7 IM RF R7 - RF
105
Stalling
1 2 3 4 5 6 7 8 9
Time (cycles)
R4
LDR DM R1
LDR R1, [R4, #40] IM RF 40 + RF
R1 R1
AND DM R8
AND R8, R1, R3 IM RF R3 RF R3 & RF
R6
ORR ORR DM R9
ORR R9, R6, R1 IM IM RF R1 | RF
Stall R1
SUB DM R10
SUB R10, R1, R7 IM RF R7 - RF
106
52
DRAFT 7/13/2023
Stalling Hardware
CLK CLK CLK
Flags'
PCSrcD PCSrcE PCSrcM PCSrcW
Control
RegWriteD RegWriteE RegWriteM RegWriteW
Unit
MemtoRegD MemtoRegE
MemtoRegM MemtoRegW
27:26 MemWriteD MemWriteE MemWriteM
Op
CondExE
25:20 ALUControlD ALUControlE
Funct
15:12
Rd BranchD BranchE
ALUSrcD ALUSrcE
FlagWriteD FlagWriteE
ImmSrcD
31:28 CondE Cond
Unit
RegSrcD
CLK CLK FlagsE
0 1 CLK
CLK
InstrF
InstrD
19:16
0 WE3 ALUFlags WE
RA1D SrcAE
1 PC' PCF A1 RD1 00
A RD 15 1 01 ALUResultE
10 ReadDat aW
ALU
0
EN
3:0
0 A RD
Instructi on RA2D
A2 RD2 00 0 SrcBE Data
Memory 1 01
10 1 Memory
A3 Register WriteDataE
WD
File
ExtImmE
WD3
1
R15 ALUOutM ALUOutW
PCPlus4F 0
+
CLR
Exten d
EN
23:0
PCPlus8D
ResultW
MemtoRegE
RegWriteW
ForwardAE
ForwardBE
RegWriteM
FlushD
FlushE
Match
St allD
StallF
Hazard Unit
107
Stalling Logic
• Is either of the source register in the Decode stage the same as
the one being written in the Execute stage?
Match_12D_E = (RA1D == WA3E) + (RA2D == WA3E)
108
53
DRAFT 7/13/2023
Control Hazards
• B:
• branch not determined until the Writeback stage of
pipeline
• Instructions after branch fetched before branch occurs
• These 4 instructions must be flushed if branch happens
109
Control Hazards
1 2 3 4 5 6 7 8 9 10
Time (cycles)
B DM
20 B 3C IM RF RF
R1
AND DM
24 AND R8, R1, R3 IM RF R3 & RF
R6 Flush
ORR DM
28 ORR R9, R6, R1 IM RF R1 | RF these
instructions
R1
SUB DM
2C SUB R10, R1, R7 IM RF R7 - RF
R1
SUB DM
30 SUB R11, R1, R8 IM RF R8 - RF
34 ...
...
R3
ADD DM R12
64 ADD R12, R3, R4 IM RF R4 RF
+
110
54
DRAFT 7/13/2023
111
InstrD
1
19:16
0 WE3 ALUFlags WE
RA1D SrcAE
0 PC' PCF A1 RD1 00
0 A RD 15 1 01 ALUResultE
10 Rea dDat aW
ALU
1
EN
3:0
0 A RD
Instructi on RA2D
A2 RD2 00 0 SrcBE Data
Memory 1 01
10 1 Memory
A3 Register WriteDataE
WD
WD3 File
Ext ImmE
1
R15 ALUOutM ALUOutW
PCPlus4F 0
+
CLR
EN
23:0 Extend
PCPlus8D
Result W
MemtoRegE
Reg WriteW
ForwardAE
ForwardBE
RegWriteM
FlushD
FlushE
Match
StallD
StallF
Hazard Unit
112
55
DRAFT 7/13/2023
Time (cycles)
B DM
20 B 3C IM RF RF
R1
AND DM
24 AND R8, R1, R3 IM RF R3 & RF
Flush
R6 these
ORR DM instructions
28 ORR R9, R6, R1 IM RF R1 | RF
34 ...
...
R3
ADD DM R12
64 ADD R12, R3, R4 IM RF R4 RF
+
Dr. Sajid Muhaimin Choudhury 113
EEE 415 - Department of EEE, BUET Digital Design and Computer Architecture: ARM® Edition © 2015
113
114
56
DRAFT 7/13/2023
Flags'
PCSrcD PCSrcE PCSrcM PCSrcW
Control
RegWriteD RegWriteE RegWriteM RegWriteW
Unit
MemtoRegD MemtoRegE
MemtoRegM MemtoRegW
27:26 MemWriteD MemWriteE MemWriteM
Op
CondExE
25:20 ALUControlD ALUControlE
Funct
15:12
Rd BranchD BranchE
ALUSrcD ALUSrcE
FlagWriteD FlagWriteE
ImmSrcD
31:28 CondE Cond
Unit
RegSrcD
CLK CLK FlagsE
0 1 CLK
CLK
InstrF
InstrD
1
19:16
0 WE3 ALUFlags WE
RA1D SrcAE
0 PC' PCF A1 RD1 00
0 A RD 15 1 01 ALUResultE
10 ReadDataW
ALU
1
EN
3:0
0 A RD
Instruction RA2D
A2 RD2 00 0 SrcBE Data
Memory 1 01
10 1 Memory
A3 Register WriteDataE
WD
File
ExtImmE
WD3
1
R15 ALUOutM ALUOutW
PCPlus4F 0
+
CLR
EN
23:0 Extend
PCPlus8D
ResultW
MemtoRegE
RegWriteW
ForwardAE
ForwardBE
RegWriteM
FlushD
FlushE
Match
StallD
St allF
Hazard Unit
115
116
57
DRAFT 7/13/2023
117
Pipelined Performance
• Pipelined processor critical path:
Tc3 = max [
tpcq + tmem + tsetup Fetch
2(tRFread + tsetup ) Decode
tpcq + 2tmux + tALU + tsetup Execute
tpcq + tmem + tsetup Memory
2(tpcq + tmux + tRFwrite) ] Writeback
118
58
DRAFT 7/13/2023
119
120
59
DRAFT 7/13/2023
121
Execution
Time Speedup
Processor (seconds) (single-cycle as baseline)
Single-cycle 84 1
Multicycle 140 0.6
Pipelined 36.9 2.28
122
60