You are on page 1of 200

4

4.1 4.11 ARM Cortex-A8


4.2 Intel Core i7
4.3 4.12
4.4
4.5 4.14
4.6 4.15
4.7
4.8
4.9
4.10

4.1


(CPI)




MIPS
(datapath)(control
unit)

4 3
MIPS
MIPS
(lw)(sw)
add sub AND OR slt
(beq)(j)



CPI

4 4


1. (PC)

2.


4 5



(
)

4.1 MIPS

4 6
4.1 MIPS

4 7
4.1 MIPS

(PC)

(
)()(
)ALU
ALU
ALU
ALU
(PC +4
)(PC
4)(bus)

4 8

4.2 4.1




4 9
4.2 MIPS

4 10
4.2 MIPS

PC (PC+4
)AND
ALU Zero

ALU (
)()
ALU
()
()
ALU

4 11
4.2


(combinational)
ALU
(state elements)




4 12
4.2


(sequential)

4 13

(clocking methodology)


(edge-triggered)

4.3

4 14
4.3

()
()

4 15




4.4

(feedback)

4 16
4.4
(race)

4 17
4.3
(datapath
elements)
4.6 4.5
PC

R (2.18)

4 18
2.18 MIPS

4 19
4.6
(PC)


4 20
4.7 R ALU
ALU

4 21
4.7 R ALU ALU

Read register

(
)

5 32
ALU 4 ALU
ALU Zero (branches)ALU
(overflow)4.9 (exceptioins)

4 22
4.8 4.7 ALU

16
32 ()
(write enable)

4 23
4.3
MIPS
(beq)

4 24
4.9 ALU
PC 16 ()2

4 25
4.9 ALU
PC 16 ()
2
2
002

16
ALU
ZeroPC PC

4 26

MIPS (delayed)



(4.8 )

beq

4 27








4 28

MIPS4.11



ALU

4 29
4.11 MIPS
4.64.9 4.10
()

4 30
4.4
(lw) (sw)
(beq) add sub ANDOR set
on less than
(j)

4 31
ALU

4 32
ALU

ALU

R ALU 6
(funct)
(AND OR subtract add set on less than)
()
ALU

4 33
ALU

(funct)
ALUOp 2

ALUOp
(00) beq (01)
(10)

4 34
ALU

4.12 ALU ALUOp R-


ALUOp
ALUOp 00 01 ALU
XXXXXX(dont care)ALUOp
10 ALU

4 35
ALU

4.13 4 ALU ((operation))


ALUOp ALU
ALUOp 11
1X X11001(F5
F4)10 XX

4 36


4.14

Op Op[5 : 0]
R (beq)
25 : 21
20 : 16 rs rt
25 :
21 (rs)
(beq)16
(offset)15 : 0

4 37
4.14 (R-)

4 38
4.14 (R-)

(jump)(a)R-
(opcode)rs
rt rdrs rt (source)rd (destination)
ALU ALU (funct)
R-addsubandor slt
(shamt)(b)
(opcode=3510)(opcode=4310)rs
(base register)16
rt rt
(c)(opcode =
410) rs rt 16
PC+4

4 39

20 : 16
(rt)15 : 11 (rd)
rt rd

4.15 ALU

4 40
4.15 4.12
ALU PC
PC
PC
4 41

4.16

4 42

4.16
1 (asserted)
1
(deasserted) 0

4 43


4.17










4 44
4.17
6
1 (RegDst
ALUSrc MemtoReg)
(RegWriteMemRead MemWrite)
1 (Branch)2
ALU(ALUOp)AND
Branch ALU Zero AND
PC PCSrc

4 45

4.18

4 46
4.18
R (addsubandor slt)
rs rtrdALUSrc
RegDst R-
(RegWrite=1)Branch
0 PC PC+4 ALU Zero
1 PC R-ALUOp
10ALU (funct field)
lw sw ALUSrc ALUOp
Mem-Read MemWrite
RegDst RegWrite rt
R rs rt ALU
ALUOp (ALU =01)
RegWrite 0 MemtoReg
Write data
MemtoReg X
RegWrite 0 RegDst X

4 47





4.19 R

4 48

PC


ALU (5 :0 )
ALU

ALU 15:11
($t1)

4 49
4.19


add
$t1,$t2,
$t3 R











4 50

4.20


PC

ALU
16
ALU

20:16

4 51
4.20

4 52
4.20

4 53

4.21 beq
PC
$t1 $t2
ALU PC+4
16
()
ALU Zero
PC

4 54
4.21

(branch-on-
equal)

4 55
4.21 (branch-on-equal)

ALU Zero
(PC)

4 56

4.22

(jump)

4 57
4.22

4 58
4.22

(Op [5 :0] 31 : 26
)
RegWrite

Op5Op2
R R
lwsw beqMIPS

4 59

4.24 4.17

4 60
4.24

(jump)

4 61
4.24 (jump)

()

jump
26 2 00
PC+4 4 32

4 62




CPI 1


4 63

(pipelining)


4 64
4.5
(pipelining)
(overlapped)

4.25


(throughput)

4 65
4.5
MIPS
1. (fetch)
2. (decode)MIPS

3. (execute)
4. (memory)
5. (write back)
MIPS

4 66
4.5

(speed-up)

(overhead)



4 67

MIPS

MIPS
MIPS


MIPS

4 68


(hazards)

(structural hazard)


(data hazards)
(stall)

(dependence)

4 69

ALU


(forwarding)(bypassing)
4.29

4 70
4.29
add EX sub EX
sub $s0

4 71

(data hazards)

(load-use data
hazard)4.30
(pipeline stall)
(bubble)
(control hazard)


4 72
4.30 R

4.7

4 73

(control hazard)






(predict)



4 74

(control hazard)




(untaken)
(branch prediction)

(dynamic)


4 75


(delayed decision)
MIPS




4 76

(parallelism)

4 77


CPI








4 78

(pointer)

4 79

4 80
4.6
4.33 4.4

5
5
1. IF
2. ID
3. EX
4. MEM
5. WB

4 81
4.33 4.4

(4.17)

4 82
4.33 4.4 (4.17)

(PC)ALU
(wite-back)(

4 83
4.6




PC PC
MEM


4.34
4.27

4 84
4.34 4.33

4 85
4.34 4.33

4.28 4.30

4.33
IM
Reg
(ID)

(ID)(WB)
ID
WB

4 86
4.6


4.35 (pipeline registers)

4 87
4.35 4.33

4 88
4.35 4.33
(pipeline registers)
IF/ID

IF/ID 64
32 32
PC 12897 64

4 89
4.6
4.364.38




ALU

(275)



4 90
4.36 IF
ID


4.35


4 91
4.36 IF ID4.35

4.28 4.2

16
ID/EX

4 92
4.37 EX4.35

EX/MEM

4 93
4.38 MEM
WB


4.35


EX/MEM



MEM/WB

MEM/WB




4.41

4 94
4.6
IF


4.41

4 95
4.41
MEM/WB ID
MEM/WB
5

4 96


4.34 (multiple-
clock-cycle pipeline diagram)
4.36 4.40

4 97



4.46

4 98
4.46 4.41
4.4 PC ALU
EX 6 ALU
ID/EX 6
6 ID/EX

4 99

4.47 4.49

4.47 4.12
ALU ALUOp R

4 100
4.48 4.16
ALU (ALUOp)4.47
(asserted)1
0 PCSrc 4.46
AND Branch ALU Zero 1 PCSrc1
0beq Branch PCSrc 0
4 101
4.49 4.18

4 102

PC
IF/IDID/EXEX/MEM MEM/WB


1.
2.

3.
RegDst ALUOpALUSrc
4. Branch
MemReadMemWrite
4 103

5. MemtoReg
ALU
RegWrite





4.50

4 104
4.50
EX EX/MEM
MEM MEM/WB
WB
4 105

4.51 4.51



4.46






ID/EX







4 106
4.7

sub $2, $1,$3 # $2sub
and $12,$2,$5 # ($2)sub
or $13,$6,$2 # ($2)sub
sub $2, $1,$3 # $2sub
and $12,$2,$5 # ($2)sub
or $13,$6,$2 # ($2)sub
$2 10
20
4.52
$2
4 107
4.52

4 108
4.52

CC 1
$2
$2
(

4 109
4.7





4.5 EX

and or
EX

4 110
4.7




ID/EX.RegisterRs

1a. EX/MEM.RegisterRd = ID/EX.RegisterRs
1b. EX/MEM.RegisterRd = ID/EX.RegisterRt
2a. MEM/WB.RegisterRd = ID/EX.RegisterRs
2b. MEM/WB.RegisterRd = ID/EX.RegisterRt
4 111
4.7


Reg-Write
$0
0
ALU

4.54 ALU

4.55 EX
ALU
4 112
4.54 ALU

D/EX.RegisterRt

slt
4 113
4.54 ALU

4 114
4.55 4.54
ALU (signed immediate)

4 115
4.7
EX
1. EX

4 116
4.7
WB MEM
ALU

MEM (
)

4 117
4.7
2. MEM

4 118

MEM



4 119



4.58

4.58
(and)

4 120
4.58

4 121


(hazard detection unit)


EX
ID

4 122

ID IF

nops

ID/EX EX MEM WB
0
RegWrite MemWrite 0

4.59

4 123
4.59

4 124
4.59
and nop4
and 2 3
EX 5 (4)
OR 3 ID
5 (4)

4 125

4.60

4 126
4.60

4 127
4.60

ID EX (
)

4 128

4 129
4.8
4.61

(control hazard)(branch
hazard)

4 130
4.61

4 131
4.61
(40, 44, )
MEM (beq 4 )

beq72 lw
(4.31
)

4 132



4 133



IF/ID PC
EX ID
(beq)ID


4 134

ID

ALU/MEM MEM/WB
ALU




IF
IF/ID
0

4 135



(dynamic branch prediction)
(branch prediction buffer)
(branch history table)
(1-bit prediction scheme)

(2-bit prediction schemes)

4 136

4.63
(finite-state machine) 4.63













4 137



(branch delay slot)

4.64

4 138
4.64

4 139
4.64

(a)
(b)(c)(a)
(b)(c)$s1add
($s1)(b)

(b)
(c)
(b)(c)sub
OKOK
$t4

4 140




(branch target buffer)PC




(correlating predictor)

4 141

(tournament predictors)



(conditional move)
ARMv7

4 142

4.65 4.65









4.57

ALUsrc

4.51

4 143
4.9


(exceptions)
(interrupts)
MIPS

4 144
MIPS


add $1, $2, $1


(exception
program counter, EPC)


4 145
MIPS

MIPS (
(cause register))

(vectored interrupts)



8000 018016

4 146
MIPS
MIPS
EPC


(Cause)
5
10
12

4 147



add

ID.FlushEx.Flush8000 018016 PC
PC
4.66

4 148
4.66

4 149
4.66
PC (
8000 018016)Cause
EPC 8000
018016
ALU

4 150


(EPCexception program counter)(
+4
4)4.66




4 151






4 152

(imprecise interrupts)
(imprecise exceptions)
(precise interrupts)
(precise exceptions)

4 153
4.10
(instruction-level
parallelism, ILP)



(multiple issue)
CPI 1IPC
(instructions per clock cycle)1

4 154
4.10


(static multiple issue)
(dynamic multiple issue)


1. (issue slots)
2.

4 155

ILP










4 156


(issue packet)
(Very
Long Instruction Word, VLIW)

4 157
MIPS
(twoissue)MIPS
ALU


VLIW

64
ALU


nop
4 158
MIPS
4.68

4.68
ALU

4 159
MIPS

nops





4 160
MIPS






4 161

MIPS

LOOPlw $t0, 0($s1) # $t0=


addu $t0,$t0, $s2 # $s2
sw $t0, 0($s1) #
addi $s1,$s1,-4 #
bne $s1,$zero,Loop # $s1!=0

4 162

4.70

CPI 0.5 0.8


IPC 2.0 1.25CPI IPC
nops nops
CPI

4 163
MIPS

(loop unrolling)

4.70 MIPS
nops

4 164

4 165

()

lwadd
swaddi bne4.71

(register renaming)
(antidependence)(name dependence)

4 166

14 12
8
CPI 8/14=0.57

4 167

4.71 4.70 MIPS

nops$s1 16
$s14812
4 168

(superscalar)





(dynamic pipeline scheduling)

4 169






(commit unit)
4.72

4 170
4.72
(retirement)(graduation)

4 171



(reservation
stations)



4 172


(reorder buffer)



(out-of-order
execution)

(in-order commit)


4 173


4 174







4 175






()

4 176




(watt)
4.73

4 177
4.73 Intel Sun
Pentium 4 Pentium 4

4 178
4.11 ARM Cortex-A8 Intel Core i7

4.74
ARM
Cortex-A8
Intel
Core i7 920

4 179
ARM Cortex-A8
4.75

12
512
4096 8

13





4 180
4.75 A8
12
(Address Generation Unit, AGU)(Branch Target
Buffer, BTB)(Global History Buffer, GHB)
(Return (Address) Stack, RS)

4 181
ARM Cortex-A8
4.76 A8 SPEC2000
CPI
A8 (
)

4 182
4.76 ARM Cortex A8 Minnespec
SPEC2000 CPI

CPI ()

4 183
Intel Core i7 920
14
()
Intel x86
(micro-operations)()MIPS



4 184
Intel Core i7 920
(microarchitecture)

(register renaming)
(reorder buffer)
(architectural registers)
4.77 Core i7

4 185
4.77 Core i7

14

17

48 32




RISC

4 186
Intel Core i7 920
x86

15
16
16
18

4 187
Intel Core i7 920
x86

x86 x86

(microcode)

(loop stream
detection)28 256

4 188
Intel Core i7 920




i7 36






4 189

(macroop fusion)
x86

(microfusion)
load/ALU ALU /store
(
)

4 190
Intel Core i7 920
4.78 Intel Core i7 SPEC2006
CPI

4.78 Intel Core i7 920


SPEC2006
CPI

4 191
Intel Core i7 920
4.79

4.79 Intel
Core i7 920

SPEC2006





4 192
4.12

DGEMM


4.80 3.23

4 193
4.80
DGEMM
C
C

x86

AVX (
3.23)

4.81

()
for

4 194
4.12

4.81
vbroadcastsd
3.24 AVX 4.81
17

12 24

4 195
4.12

4.82
3.21 DGEMM
8.8

4 196
4.82 DGEMM 3232
3.21
9

4 197
4.14







4 198
4.15



(instruction latency)

(CPI)

4 199
4.15
1990
1980
60%


()

Amdahl

4 200

You might also like