Professional Documents
Culture Documents
Ch 04 計算機組織與設計5e
Ch 04 計算機組織與設計5e
4.1
(CPI)
MIPS
(datapath)(control
unit)
4 3
MIPS
MIPS
(lw)(sw)
add sub AND OR slt
(beq)(j)
CPI
4 4
1. (PC)
2.
4 5
(
)
4.1 MIPS
4 6
4.1 MIPS
4 7
4.1 MIPS
(PC)
(
)()(
)ALU
ALU
ALU
ALU
(PC +4
)(PC
4)(bus)
4 8
4.2 4.1
4 9
4.2 MIPS
4 10
4.2 MIPS
PC (PC+4
)AND
ALU Zero
ALU (
)()
ALU
()
()
ALU
4 11
4.2
(combinational)
ALU
(state elements)
4 12
4.2
(sequential)
4 13
(clocking methodology)
(edge-triggered)
4.3
4 14
4.3
()
()
4 15
4.4
(feedback)
4 16
4.4
(race)
4 17
4.3
(datapath
elements)
4.6 4.5
PC
R (2.18)
4 18
2.18 MIPS
4 19
4.6
(PC)
4 20
4.7 R ALU
ALU
4 21
4.7 R ALU ALU
Read register
(
)
5 32
ALU 4 ALU
ALU Zero (branches)ALU
(overflow)4.9 (exceptioins)
4 22
4.8 4.7 ALU
16
32 ()
(write enable)
4 23
4.3
MIPS
(beq)
4 24
4.9 ALU
PC 16 ()2
4 25
4.9 ALU
PC 16 ()
2
2
002
16
ALU
ZeroPC PC
4 26
MIPS (delayed)
(4.8 )
beq
4 27
4 28
MIPS4.11
ALU
4 29
4.11 MIPS
4.64.9 4.10
()
4 30
4.4
(lw) (sw)
(beq) add sub ANDOR set
on less than
(j)
4 31
ALU
4 32
ALU
ALU
R ALU 6
(funct)
(AND OR subtract add set on less than)
()
ALU
4 33
ALU
(funct)
ALUOp 2
ALUOp
(00) beq (01)
(10)
4 34
ALU
4 35
ALU
4 36
4.14
Op Op[5 : 0]
R (beq)
25 : 21
20 : 16 rs rt
25 :
21 (rs)
(beq)16
(offset)15 : 0
4 37
4.14 (R-)
4 38
4.14 (R-)
(jump)(a)R-
(opcode)rs
rt rdrs rt (source)rd (destination)
ALU ALU (funct)
R-addsubandor slt
(shamt)(b)
(opcode=3510)(opcode=4310)rs
(base register)16
rt rt
(c)(opcode =
410) rs rt 16
PC+4
4 39
20 : 16
(rt)15 : 11 (rd)
rt rd
4.15 ALU
4 40
4.15 4.12
ALU PC
PC
PC
4 41
4.16
4 42
4.16
1 (asserted)
1
(deasserted) 0
4 43
4.17
4 44
4.17
6
1 (RegDst
ALUSrc MemtoReg)
(RegWriteMemRead MemWrite)
1 (Branch)2
ALU(ALUOp)AND
Branch ALU Zero AND
PC PCSrc
4 45
4.18
4 46
4.18
R (addsubandor slt)
rs rtrdALUSrc
RegDst R-
(RegWrite=1)Branch
0 PC PC+4 ALU Zero
1 PC R-ALUOp
10ALU (funct field)
lw sw ALUSrc ALUOp
Mem-Read MemWrite
RegDst RegWrite rt
R rs rt ALU
ALUOp (ALU =01)
RegWrite 0 MemtoReg
Write data
MemtoReg X
RegWrite 0 RegDst X
4 47
4.19 R
4 48
PC
ALU (5 :0 )
ALU
ALU 15:11
($t1)
4 49
4.19
add
$t1,$t2,
$t3 R
4 50
4.20
PC
ALU
16
ALU
20:16
4 51
4.20
4 52
4.20
4 53
4.21 beq
PC
$t1 $t2
ALU PC+4
16
()
ALU Zero
PC
4 54
4.21
(branch-on-
equal)
4 55
4.21 (branch-on-equal)
ALU Zero
(PC)
4 56
4.22
(jump)
4 57
4.22
4 58
4.22
(Op [5 :0] 31 : 26
)
RegWrite
Op5Op2
R R
lwsw beqMIPS
4 59
4.24 4.17
4 60
4.24
(jump)
4 61
4.24 (jump)
()
jump
26 2 00
PC+4 4 32
4 62
CPI 1
4 63
(pipelining)
4 64
4.5
(pipelining)
(overlapped)
4.25
(throughput)
4 65
4.5
MIPS
1. (fetch)
2. (decode)MIPS
3. (execute)
4. (memory)
5. (write back)
MIPS
4 66
4.5
(speed-up)
(overhead)
4 67
MIPS
MIPS
MIPS
MIPS
4 68
(hazards)
(structural hazard)
(data hazards)
(stall)
(dependence)
4 69
ALU
(forwarding)(bypassing)
4.29
4 70
4.29
add EX sub EX
sub $s0
4 71
(data hazards)
(load-use data
hazard)4.30
(pipeline stall)
(bubble)
(control hazard)
4 72
4.30 R
4.7
4 73
(control hazard)
(predict)
4 74
(control hazard)
(untaken)
(branch prediction)
(dynamic)
4 75
(delayed decision)
MIPS
4 76
(parallelism)
4 77
CPI
4 78
(pointer)
4 79
4 80
4.6
4.33 4.4
5
5
1. IF
2. ID
3. EX
4. MEM
5. WB
4 81
4.33 4.4
(4.17)
4 82
4.33 4.4 (4.17)
(PC)ALU
(wite-back)(
4 83
4.6
PC PC
MEM
4.34
4.27
4 84
4.34 4.33
4 85
4.34 4.33
4.28 4.30
4.33
IM
Reg
(ID)
(ID)(WB)
ID
WB
4 86
4.6
4.35 (pipeline registers)
4 87
4.35 4.33
4 88
4.35 4.33
(pipeline registers)
IF/ID
IF/ID 64
32 32
PC 12897 64
4 89
4.6
4.364.38
ALU
(275)
4 90
4.36 IF
ID
4.35
4 91
4.36 IF ID4.35
4.28 4.2
16
ID/EX
4 92
4.37 EX4.35
EX/MEM
4 93
4.38 MEM
WB
4.35
EX/MEM
MEM/WB
MEM/WB
4.41
4 94
4.6
IF
4.41
4 95
4.41
MEM/WB ID
MEM/WB
5
4 96
4.34 (multiple-
clock-cycle pipeline diagram)
4.36 4.40
4 97
4.46
4 98
4.46 4.41
4.4 PC ALU
EX 6 ALU
ID/EX 6
6 ID/EX
4 99
4.47 4.49
4.47 4.12
ALU ALUOp R
4 100
4.48 4.16
ALU (ALUOp)4.47
(asserted)1
0 PCSrc 4.46
AND Branch ALU Zero 1 PCSrc1
0beq Branch PCSrc 0
4 101
4.49 4.18
4 102
PC
IF/IDID/EXEX/MEM MEM/WB
1.
2.
3.
RegDst ALUOpALUSrc
4. Branch
MemReadMemWrite
4 103
5. MemtoReg
ALU
RegWrite
4.50
4 104
4.50
EX EX/MEM
MEM MEM/WB
WB
4 105
4.51 4.51
4.46
ID/EX
4 106
4.7
sub $2, $1,$3 # $2sub
and $12,$2,$5 # ($2)sub
or $13,$6,$2 # ($2)sub
sub $2, $1,$3 # $2sub
and $12,$2,$5 # ($2)sub
or $13,$6,$2 # ($2)sub
$2 10
20
4.52
$2
4 107
4.52
4 108
4.52
CC 1
$2
$2
(
4 109
4.7
4.5 EX
and or
EX
4 110
4.7
ID/EX.RegisterRs
1a. EX/MEM.RegisterRd = ID/EX.RegisterRs
1b. EX/MEM.RegisterRd = ID/EX.RegisterRt
2a. MEM/WB.RegisterRd = ID/EX.RegisterRs
2b. MEM/WB.RegisterRd = ID/EX.RegisterRt
4 111
4.7
Reg-Write
$0
0
ALU
4.54 ALU
4.55 EX
ALU
4 112
4.54 ALU
D/EX.RegisterRt
slt
4 113
4.54 ALU
4 114
4.55 4.54
ALU (signed immediate)
4 115
4.7
EX
1. EX
4 116
4.7
WB MEM
ALU
MEM (
)
4 117
4.7
2. MEM
4 118
MEM
4 119
4.58
4.58
(and)
4 120
4.58
4 121
(hazard detection unit)
EX
ID
4 122
ID IF
nops
ID/EX EX MEM WB
0
RegWrite MemWrite 0
4.59
4 123
4.59
4 124
4.59
and nop4
and 2 3
EX 5 (4)
OR 3 ID
5 (4)
4 125
4.60
4 126
4.60
4 127
4.60
ID EX (
)
4 128
4 129
4.8
4.61
(control hazard)(branch
hazard)
4 130
4.61
4 131
4.61
(40, 44, )
MEM (beq 4 )
beq72 lw
(4.31
)
4 132
4 133
IF/ID PC
EX ID
(beq)ID
4 134
ID
ALU/MEM MEM/WB
ALU
IF
IF/ID
0
4 135
(dynamic branch prediction)
(branch prediction buffer)
(branch history table)
(1-bit prediction scheme)
(2-bit prediction schemes)
4 136
4.63
(finite-state machine) 4.63
4 137
(branch delay slot)
4.64
4 138
4.64
4 139
4.64
(a)
(b)(c)(a)
(b)(c)$s1add
($s1)(b)
(b)
(c)
(b)(c)sub
OKOK
$t4
4 140
(branch target buffer)PC
(correlating predictor)
4 141
(tournament predictors)
(conditional move)
ARMv7
4 142
4.65 4.65
4.57
ALUsrc
4.51
4 143
4.9
(exceptions)
(interrupts)
MIPS
4 144
MIPS
add $1, $2, $1
(exception
program counter, EPC)
4 145
MIPS
MIPS (
(cause register))
(vectored interrupts)
8000 018016
4 146
MIPS
MIPS
EPC
(Cause)
5
10
12
4 147
add
ID.FlushEx.Flush8000 018016 PC
PC
4.66
4 148
4.66
4 149
4.66
PC (
8000 018016)Cause
EPC 8000
018016
ALU
4 150
(EPCexception program counter)(
+4
4)4.66
4 151
4 152
(imprecise interrupts)
(imprecise exceptions)
(precise interrupts)
(precise exceptions)
4 153
4.10
(instruction-level
parallelism, ILP)
(multiple issue)
CPI 1IPC
(instructions per clock cycle)1
4 154
4.10
(static multiple issue)
(dynamic multiple issue)
1. (issue slots)
2.
4 155
ILP
4 156
(issue packet)
(Very
Long Instruction Word, VLIW)
4 157
MIPS
(twoissue)MIPS
ALU
VLIW
64
ALU
nop
4 158
MIPS
4.68
4.68
ALU
4 159
MIPS
nops
4 160
MIPS
4 161
MIPS
4 162
4.70
4 163
MIPS
(loop unrolling)
4.70 MIPS
nops
4 164
4 165
()
lwadd
swaddi bne4.71
(register renaming)
(antidependence)(name dependence)
4 166
14 12
8
CPI 8/14=0.57
4 167
nops$s1 16
$s14812
4 168
(superscalar)
(dynamic pipeline scheduling)
4 169
(commit unit)
4.72
4 170
4.72
(retirement)(graduation)
4 171
(reservation
stations)
4 172
(reorder buffer)
(out-of-order
execution)
(in-order commit)
4 173
4 174
4 175
()
4 176
(watt)
4.73
4 177
4.73 Intel Sun
Pentium 4 Pentium 4
4 178
4.11 ARM Cortex-A8 Intel Core i7
4.74
ARM
Cortex-A8
Intel
Core i7 920
4 179
ARM Cortex-A8
4.75
12
512
4096 8
13
4 180
4.75 A8
12
(Address Generation Unit, AGU)(Branch Target
Buffer, BTB)(Global History Buffer, GHB)
(Return (Address) Stack, RS)
4 181
ARM Cortex-A8
4.76 A8 SPEC2000
CPI
A8 (
)
4 182
4.76 ARM Cortex A8 Minnespec
SPEC2000 CPI
CPI ()
4 183
Intel Core i7 920
14
()
Intel x86
(micro-operations)()MIPS
4 184
Intel Core i7 920
(microarchitecture)
(register renaming)
(reorder buffer)
(architectural registers)
4.77 Core i7
4 185
4.77 Core i7
14
17
48 32
RISC
4 186
Intel Core i7 920
x86
15
16
16
18
4 187
Intel Core i7 920
x86
x86 x86
(microcode)
(loop stream
detection)28 256
4 188
Intel Core i7 920
i7 36
4 189
(macroop fusion)
x86
(microfusion)
load/ALU ALU /store
(
)
4 190
Intel Core i7 920
4.78 Intel Core i7 SPEC2006
CPI
4 191
Intel Core i7 920
4.79
4.79 Intel
Core i7 920
SPEC2006
4 192
4.12
DGEMM
4.80 3.23
4 193
4.80
DGEMM
C
C
x86
AVX (
3.23)
4.81
()
for
4 194
4.12
4.81
vbroadcastsd
3.24 AVX 4.81
17
12 24
4 195
4.12
4.82
3.21 DGEMM
8.8
4 196
4.82 DGEMM 3232
3.21
9
4 197
4.14
4 198
4.15
(instruction latency)
(CPI)
4 199
4.15
1990
1980
60%
()
Amdahl
4 200