You are on page 1of 14

a

3 color X 1 byte
=
3 bytes
1280 × 1024 x 3 =
3932160 bytes X

bb
125 × 105 bytes
100 M bits =
10 106 ×
8 bytes =

3932160
time = = O . 3145728 S *
125× 105

1
a . Instuctions cycles Instmctions cycles .
( Clockrate )
IPS
=

selond
=

aycles
,

Second CPI
cycles
IPSI 下
= 0 IPS 2
536 2× =
a
=
{ ×2 , 56 = 2 5× ,
109 ,IPS 3 =246
G
( V) = 1 .818× 10

b ( clockrate) × (time ) IPSxtime


aydes instmctions
. =
=

C. =
3×109 × ( 0 = 3 xlolo I1 =
L× 109X 10 =
2× 1010
;
: 1 .2 CPI
C “ 12 × 1 . 5
.

Instmtions P1 2 × 10 × 5. 1426HEA
clock 入可
= =
rate :

time . 7
;

a = 26
CPIp
2
=|×O 1 + 2 ×0 2 +3 × 0 ,5 + 3 × 0
.

[PIP 2 = 2

b
6 × 106
,

(yclesp ) = 2.

ycles P 2 = 2× 106





CDI

CPUtime
=

instmctionsxydetime
a :
.CPIA ∵×10 a
= 1 .
1

CPIB :
器≤ 0 10
a
= 1 25.

b . clock rate = CDIXInsMCtions


CPU time ← same
G

最 =
1 × 10

1.25 × 1.2×109
= 0 . 73×

G
CPU timec = 1. 1× 6 ×<08× 109 = 0 . 66 s

Pertormancec
PerformanceA
=
CPUA
CPUC
"
0. 66
= 1 .07 , : 最

6
= 2 .2 27 *

( 2)

percentage
:

P: 品 t 10
× 100 % = 10 %

%
100 %=42
.86
i5 =

ratio =

D =
号品 = 0111

i5 == 0175

(1 )
Ps = IV
( 3) Pr =
ViIi + Cvif
Di Cvicf
Pd CVif Ii
-

= =

Pentinm =
GO =
C. 112
x . 6x 109 3
If = Pf -
CUfc
- Vf
sF
c ti 6 xto =

Ii If=

CUf f
G '

3
5 = 40= ×0 . 9 .4 × 10 pi
=
Cvi Pf-
C -

8 Vf N
C . 4524 × 10 F
-
)
= 1



agcleslock
= lock
InUtiOnspI
executiontime rate cdockaydes=
109
Cp 1 56×109 × 1 109× 12 + 256× 100 5 19 2×
=
2. + 1 . 28× ×
,

.
=

G
19 2× 10
9 6s
.

tp1 = =

2× 109
109 xR
256 × 106× 5 14 08× 109
1 . 28 ×
2 . 56× 109 + ,

tp2= ,. 04
Cp
= 7
= + . s
2 O1 DX 2
0 . 7 x2

s 109
C Cp8 4 47×109 2 24
tpoe
n 67× 3 . 84s tp8
=
= =
.
,
.
= ,
. ,

21. 76
1=

Cp 1 19 2× 109 + 2156 × 109 =


) 6 × 10 . tp
,
= 10 . 88 s
× 109
= G .

③ R. 67 × 109
Cp4 =

1 28 × 109 x
.
.
= 7. 69 × 109 _ 2, 56 × 109 -
256× 106× 5 =) X = 3

Die =
#
$/wafer
Die,xYield
·

不良
1
0. 1489
$ IDie =
=

84× 0 .9532

製造
$/ Die 2 =
15

100 × 0 Goq

① ( ( 0) ≥ =

DA 2 =π
1 .

3 , 14 cm
yield :
lo0
( 1+ Detectlareax bie Area 2
, 1

2
Y =

o . 020 ×<.
104
)
=
01 G 532

Die Area Water


- Count (1+
2
=

Die Count
π ( 7. 5'2 104cmt
Y 2
=
0 . aoq 0

DA 1 =
4=
.







③ |

=π (7, 5)
DA ( =
1 . 912cm .

为 1.15 ×0 .0×× 1 . a 12 2
= 0 9575
84× 11 ( 1 +
]
2

DA 2 ox
=π (1
1. y
= 2. 855 cm^ .

如 = 0 . 9082


-1
|

DRonginal
'
= = 0 043
( loy zim ( 1 + DR)
' =
=

DA : womm
: 2wox :
y Toai

Y
'

+D
=
1

DR2
DR =

或 -
DRimpoved= To .
aj -

1 = 0. 026

"≥
)
= (( ( 1+

① Exeation Time
CPT =

# InstructionsX Clock Cycle time


nSO
" a
= 0 . 94
2 389×l0
. × 0 333×L0

Reterence time
SPEC ratio
Execution time

ab0
=

50
=
12 .
87

( pu Time = ( PIx # InsEmctionsXClockCylleTime

NewcpuTime =
1. 1 × o 1 dCPVTime

=
1 . |×750 = 825s


New CPUt =
11 × 1 . 05× D50
=
866 25s .


66= 11
a
SPEC ratio =
. 14
25


1700 D5
07 = 7%
-

O =0 .
⑥ 1700 × 4× 10
/ nJO

CPI =
= 1 . 38 New Execaabn tx NOw CR
0 .85 × 2, 389× L012 ⑨#InsTrncTions =

CPI

SPEC =
9000 =
13 8.

=
0 . 9 × 960 × 4 × 10
G

≈ 21417 × 109
1. 6 ”
① 1138 0 94 1
109× 1
-

61
pIchange 0 . 468
60147
×

Instmction
=
(
,

= .

0 G4 (R =
CPIx # =
=
46 Hz
Ex . xa七 0
9

4
CRchange =
号 = 0 . 333 ⑤
0 .85× 1. 6 / × 109214
)X
CR
=

8 xa 00
=
31836 HI
Couat 0.
notsimilarsinceTnstuctioa
also has been rednced











# InstuCtions XCPI
( PUt =

CR

5×009× 0x
(pOt 1 = .125s
10 a = 1

CPUt 2 |× 109 × 0 , 75
0 . 25 s
=
=

3 xlOG


1 × 109x 0. 9
(PUτ 1 = = 0. 225 s
G
4 × 10

CPUt 1 =
CPU +z
XX 0 75
0. 225 =

3 X 10 G

60353×
0 . 215α G
X = =
09 × 10

③ CR
IPS =
, MIPS= (

pτx 106
4× 109
MIPS 1 =
= 4444
Q. 9× 106
a
MIPS2 = × 10 = 4000
35× 106
109
0,

103 4 50 × ×
MFOPS 1
= 11778× =

1 12j* 106
.

MFOPS2 =
0. 4×|× 109 = 0 . 533 × 103
0 . 35 × 106

tint = 250
- 70 -85 -4
=
55 s

CPUt = 250 -
0 2× 70
.

230 s

reducedt = 0. 2× 70 = 14 s

② tint = 55 ⑤= 5 s
250 ×0 . 8 = 200
-
,

③ 210 withont Branch ) impossible


170+85 + 55
= (

> 2O



① # IastrucionsX ( PI □× (06. |
= + 110× 10× 1 + 0× 10 × 4 + 16× 10 Q 2
(pUt 0 . 256 s
=
=
C G
2× 10

× 10 × x + 110× 100+ 32z0 × 106 + 32 xi


0 b 0
0 250
2
=

2× LOG ,
X=
-

4. 12 1 mpossible
③ 106 + 110×106 +0 ×10⑥× x + 32 × 100
0128 =
58 X X = 0 . 4 毙%
.

2xlOa
8

③ 50× 106×0. 6 + 110 × 100 × 0 . 6 + 30 × 106 x 017 +32 × 10x 0, ]


=
cpOt =
0 . 1712 s
2× 109
0 256 -

0. 1712
% = 33 . 125 %
× 100
0. 256

ideal speedup =
声 ,

realspeedup 声 = t4


10
i =
号 =
50 loO
= 2
50
r= 54
(

0 1 . 108 器=
1 . 852

=

speedups relative to
using
a

single processor with (Put loos


=





sub X5 , X28 x29 1/ i j-

;
X6 X7 X28

X5 X2 G

slli X5 , X5 3
,
1/ ×8

X10 X1 add X 5 , X10 , X 5 11 X5 →


& ACiJ]

|d X0 , 0 (x5 ) 1/ X6 = ACi -

j]

5 6 少 2 2 sd X 0 , 64( X 11 ) 1 / B (8 ]

c "

¢
little

8

12 et cd ab
,

f+
Big abcd
1

A [ f+ 1 ] B[ g] = A [ ε]+A [f+ 1] et 12
A[f]+ A[ft1] ,

slli V2 s , X28 , 3 11 ix 8
5 672 2G

add .
X30 , X28 X 10 , 11 & AC]
ld
"

11 AC]
(
X30 , 0 [ X30)

s 1 li X29 X29
, , 3 11jx 8

add X 31 X29 , Xlo 1 & AC5 ]


,

|d X31 ,
0 ( X31 ) 11 ACj]
add X30 ,
X30 , X3 1 / ACi] + AC5 ]

sd X30 , 64 ( X 1) 11BC8 ” ]

5 6 92 2
G

1 ”

晶 Cn
1 & ACo ]
1/ A [ 1] =
&A
” &A
11 f = &A t &A



:

1500000000000 0000

overflow CX 64 )

Boooo 00000000000

ho

1 Do00000000000000

overflow

263 - <
1 128 tX6 < -
263
26229 X < 263 -128-
-

- 16 , 263 +
③ 23
26 ”
-

1 < 128 ×-

6 ( - 203
,

- + 12 G 128

263 X6 2630 26年 12 n < X6 < 263+ 128
1 128
-

< <
-
- -

Type imm [ 11 = 5] rs | rs 2 funct 3 imm (4= 07 opode


O 10001
10000000010 1110 011 0000
s

O× 25 F 3023

First, increase the size of each of the

i
]
instruction set fields consumes more
32 =25 128 = 2
memory and makes the overall code size
bigger.
Secondly, by using more registers we
would reduce the register spillage, which
should allow more complex operations
to be implemented in one instruction
instead of requiring multiple instructions.
In that case, the code size could
decrease.


addi Xn , X0 , 63 .

11 create bit mask of 11111

sl 1 i X7 , Xn 1 " 11 shife the value to over bits 1 to 16


and X30 X5 X 7 , ,
11 extract bits fom X5
slli X7 , Xn , 15

Xori Xn , Xn , - 11 Noroperation, 0
bo00000111 …

and Xo X 0 , Xy 11 mask bits 26 ~ 31


,
.

slli X 30 , X 30 , 15
or X 6 , X 0 , X30 ,

Xor X 5, × 6 , -
|

B =
PC
-
22 ~
Pc+ 2- . ; 1C -
220 ~
Pc+ 201
① "
-
2 = -

( 10000) 16 20 -

1 =
( FFFFF) 16 ,

1 FF00001 ,
,

( 2 WOFFFFF ]
( 20000000 -
10000) 16 = ( 1 FF 00001 ) 16
( 2000000O + FFFFF) 16 = ( 200 FFFFF ) 16

- 22 =
-
[ ( 000) ( 6 - 2
"
-
1 =
CFFF ) 16

CIFFFF 000 ,
2000
OFFFJ
5 6 ^ 2 G

addi X7 , X 0, O 11 initialize i= 0
.

LOOP 1 : 13 + 12× G +1

XD X 5 ENDI
icabeq l
14 + 810 122
, .
:

×
1
addi Xn , X7 , 1 i t+

addi X29 , X 0 , 0
115 =
LOOP 2 =

b beq
za , Xo . i
END 2 <

add X 31 ,X7 Xza 11 X 31 =


itj
sd X 31 ,
O (X 30 ) 11
*
×30 =
i +j → D [4 j ]
*
=
itj
addi X 3032 11 X 30=
(
X 30 , & P [4 * j+ 1 ) ]
addi X 29 X29 1 11 jt +
,

jal X0
, LOOP 2

END 2 =

xn , XD ,
1 // i ++
addi
X 0 , LOOP
jal
END I =

intis
int result = O ;
int XLG = loo ;
Rewnte : X2 G =
dO {
result = add Y29 , X 10 , 8001" & M [100]
*
X2 G =LO LOOP :
i += MemArray
X17 = Mem [o] |d ×n , O [ X(0 ]

+= Mem [o ] add X5 X5 , X
,
MemArrayt+ ;
Mem [ 1 ] 8 result +t 3
addi X10 , X10 ,

resultt+
blt Xc 0 X29 LOOP , , 了 while ( resulecXza )




fib "

X10 , X0 , done 1 n = o
beq
addi X5, X0 ,
|

beq X 1o ,
X5 , done 1/ n =
1

add sp 16 11 Allocate 2 words ot stack space


sp
'
-
, ,

sd ra , O ( sp) 11 save ra

sd X 10 , 8 csp ) 11 save carrent n

add . X10 , X 10 , -

jal tibra
,
11 fibcn -
1)

id X5, 8 < sp ) 11 load old n

sd X 1o , [ sp ) 17pushfibch1) ontothestack
8

addi X . 0 , X 5,
-
2

jal fibra
, / call fib ch -
2 )

d X5 , 8 (X2)
11 ×5 =
fib (n -
1)

add X10 , X 10 , X 5 1/ X (0 =
fib (n 1 ) -

+ fibcn 2)
-

d ra , θ ( sp )

addi sp ,
sp , 16

done
faddisp sp
"

jalr X0 , X 1 , ,
16
-

sd ra , ocsp )
add X5 , X 12 , X13 11 45 =
ctd
sd X 5 , 8 (sp)1 savectd
jal ra X10 = g ( ab )

ld X
11 ,
8 [ sp) ” X 11 =ctd

jal ra , 11X 10 =(( g (a b) , ctd )


tarl call
,
-

1d X* , < sp )
0

Optimication

— 1

addisp , sp , 16

jalr
— Xo , → jal Xo , g
Ans
*
shvar

X
x_→

You might also like