Problem 1 A) Considering The Number of Instructions Here To Be A Constant A

P1 P2 P3
Clock rate 3.0 GHz 2.5 GHz 4.0 GHz

CPI 1.5 1.0 2.2
Problem 1
a) Considering the number of instructions here to be a constant a.
To compare the performance among those processors, we have to calculate
each’s CPU time.
Instruction count × CPI
CPU time =
Clock rate
Instruction count Clock rate
¿> =
CPU time CPI
Clock rate
¿> Instruction per second =
CPI
Processor P1:
3 ×10 9 9
Instruction per second ( P 1)= =2×10 (instructions /s)
1.5
Processor P2:
2.5 ×10 9
Instruction per second ( P 2)= =2.5× 109 (instructions /s)
1.0
Processor P3:
4.0 ×109
Instruction per second ( P 3)= =1.81 ×109 (instructions / s)
2.2
In the same amount of time (1 second), the P2 process the greatest number of
instructions among those three processors. Hence, P2 has the highest
performance.
b) Based on the formula calculating the CPU time above, the formula
calculating number of instructions is
CPU time × Clock rate
Instruction count=
CPI
Number of instructions that P1 executed:

9
10 ×3.0 ×10 10
Instruction count 1= =2.0× 10 (instructions)
1.5
9
10 ×2.5 ×10 10
Instruction count 2= =2.5× 10 (instructions)
1.0
10 × 4.0 ×109 10
Instruction count 3= =1.8 ×10 (instructions )
2.2
Formula calculating clock cycles:

Clock cycles=CPI × Instruction count
Processor P1:
10 10
Clock cycles 1=1.5 × 2.0× 10 =3 ×10 (cycles)
Processor P2:
10 10
Clock cycles 2=1.0× 2.5× 10 =2.5 ×10 (cycles)
Processor P3:
10 10
Clock cycles 3=2.2× 1.8× 10 =4.0× 10 (cycles)
c) CPI ' =1.2CPI

CPU time '=0.7 CPU time
Instruction count ×CP I '
CPU tim e' =
Clock rat e'
Instruction count ×1.2 CPI
¿> 0.7CPU time=
Clock rate'
Taking the ratio of CPU time over CPU time’:

CPU time CPI ×Clock rat e' 1 1× Clock rat e '
= =¿ =
CPU tim e ' CP I ' × Clock rate 0.7 1.2× Clock rate
' 12
¿>Clock rat e = Clock rate
7
Processor P1:
' 12
Clock rat e = ×3.0=5.1(GHz )
7
Processor P2:
' 12
Clock rat e = ×2.5=4.3 (GHz)
7
Processor P3:
' 12
Clock rat e = × 4.0=6.6(GHz)
7
Problem 2:
Class A: 106 ×10 %=105 (instructions )
Class B: 106 ×20 %=2 ×105 (instructions )
Class C: 106 ×50 %=5 ×105 (instructions )
Class D: 106 ×20 %=2 ×105 (instructions )
n
Instruction count i
b) global CPI=∑ (CPI i × ¿)¿
i=1 Instruction count
5
global CPI (1)=(10¿ ¿5 ×1)+(2× 2×10 )+¿ ¿ ¿ ¿
5
global CPI (2)=(10¿ ¿5 × 2)+(2× 2× 10 ) +¿ ¿ ¿ ¿
Instruction count × global CPI
a) CPU time=
Clock rate
106 × 2.6 −3
CPU time ( 1 )= 9
=1.04 × 10 (s)
2.5 ×10
6
10 ×2.0
CPU time ( 2 )= 9
=0.66 × 10−3 (s)
3.0 ×10
Hence, the second implementation is faster.
c) Clock cycles=CPI × Instruction count
Clock cycles ( 1 )=2.6 × 106 (cycles)
Clock cycles ( 2 )=2.0× 106 (cycles)
Problem 3:
CPU time
a) CPI=
Clock cycle time × Instruction count
1.1
CPI ( A )= 9 −9
=1.1
10 ×10
1.5
CPI ( B )= =1.25
1.2×10 9 × 10−9
b) CPU time =
Clock rate
CPU time ( A ) Instruction count ( A ) CPI ( A ) Clock rate ( B )
¿> = × ×
CPU time ( B ) Instruction count ( B ) CPI ( B ) Clock rate ( A )
Clock rate ( A ) Instruction count ( A ) CPI ( A ) CPU time ( B )
= × ×
Clock rate ( B ) Instruction count ( B ) CPI ( B ) CPU time ( A )
Clock rate ( A ) 10
9
1.1
= × ×1=¿ Clock rate ( A ) =0.73 Clock rate ( B )
Clock rate ( B ) 1.2 ×10 1.25
9
1
Hence, the clock of the processor running compiler B’s code is =1.36 faster
0.73
than the clock of the processor running compiler A’s code.
c) CPU time ( new compiler )=CPI × Instruction count × Clock cycle time
8 −9
¿ 1.1× 6 ×10 ×10
−9
(Clock cyle time=10 ( s ) because of the same processor)
¿ 0.66( s)
CPU time ( A ) 1.1
= =1.67
CPU time ( new compiler ) 0.66
CPU time (B) 1.5
= =2.27
CPU time(new compiler ) 0.66
Therefore, the new compiler applied for that processor is faster than the
compiler A 1.67 times and also faster than B 2.27 times.
Problem 4:
Dynamic power=Capacitive load ×Voltage2 × Frequency
Dynamic power
Capacitive load =
Voltage 2 × Frequency
90 −8
a) Capacitive load( Pentinum 4 Prescott)= 2 9
=1.6 × 10 ( F)
1.25 ×3.6 ×10
40 −8
Capacitive load (Core i5 Ivy Bridge)= 2 9
=1.45× 10 (F )
0.9 ×3.4 × 10
static power
b) %static power=
dynamic power+ static power
10
Pentinum 4 Prescott :%static power= =10 %
90+ 10
30
Core i 5 Ivy Bridge :%static power= =42.86 %
40+ 30
Ratio of static power to dynamic power:

10 1
Pentinum 4 Prescott : = =0.11
90 9
30 3
Core i 5 Ivy Bridge : = =0.75
40 4
c) total power=dynamic power+ static power

2
¿ Capacitive load ×Voltage × Frequency +Voltage ×leakage current
total power−Capacitive load ×Voltage 2 × Frequency

¿>leakage current =
Voltage
After the total power is reduced by 10 % :

2
total power '−Capacitive load ×(Voltage ') × Frequency
leakage current ' =
Voltage '
And the leakage current is unchanged:

2
total power−Capacitive load ×Voltage 2 × Frequency total power '−Capacitive load ×(Voltage ') × Frequen
=
Voltage Voltage '
Pentinum 4 Prescott:
' 2
(90+ 10)−1.6 ×10 ×1.25 ×3.6 ×10 (90+10)× 0.9−1.6× 10 × ( Voltag e ) × 3.6× 10
−8 2 9 −8 9
=
1.25 Voltage '
' 2
90−57.6 ( Voltag e )
¿> 8= '
Voltag e
2
¿>57.6 ( Voltag e ' ) + 8 Voltag e' −90=0
¿>Voltag e ' =1.182(V )
Percentage of voltage reduced:

Voltage−Voltage ' 1.25−1.182
= =5.44 %
Voltage 1.25
Core i5 Ivy Bridge:

2
( 40+30)−1.45 ×10−8 × 0.92 × 3.4 ×109 ( 40+30)× 0.9−1.45 ×10−8 × ( Voltag e ' ) ×3.4 × 109
=
0.9 Voltage '
' 2
63−49.3 × ( Voltag e )
¿>33.4= '
Voltag e
2
¿> 49.3 × ( Voltag e ' ) +33.4 Voltag e ' −63=0
¿>Voltag e ' =0.841(V )
Percentage of voltage reduced:

Voltage−Voltage ' 0.9−0.841
= =6.5 %
Voltage 0.9
Problem 5:
a)
We have the equation:
clock cycles = num of instruction x CPI
Because we have three types of instructions, so:
clock cycles = i=13num of instruction of type i x CPI i
Hence, for only one processor, we have:

clock cycles = (2.56 x 10 ) x 1+(1.28 x 10 ) x 12+(256 x 10 ) x 5 = 1.92 x 10
9 9 6 10
Then,
execution time = clock cyclesclock rate = 1.92 x 10102 x109 = 9.6 (s)
Call p is the number of processor (p > 1). We have:
clock cycles = 2.56 x 1090.7 p x 1 + 1.28 x 1090.7 p x 12 + 256 x 10 x 5
p
6
= 2.56 x 109 p + 1.28 x 10 9
hence,
execution time = clock cyclesclock rate = 2.56 x 1010 p + 1.28 x 1092 x 109 =
p
2.56 x 1010 p + 1.28 x 10 9
= 12.8 p + 0.64
Finally, we’ll sketch the table:
p 1 2 4 8
execution time in seconds 9.6 7.04 3.84 2.24
speed-up (relative to 1 1 1.36 2.5 4.29

professor )
b)
For one processor we have:
clock cycles = (2.56 x 10 ) x 2 + (1.28 x 10 ) x 12 + (256 x 10 ) x 5 = 2.18 x 10
9 9 6 10
execution time = clock cyclesclock rate = 2.18 x 10102 x 109 = 10.9 (s)
Call p is the number of processor (p > 1). We have:
clock cycles = 2.56 x 1090.7 p x 2 + 1.28 x 1090.7 p x 12 + 256 x 10 x 5
p
6
= 2.93 x 109 p + 1.28 x 10 9
hence,
execution time = clock cyclesclock rate = 2.93 x 1010 p + 1.28 x 1092 x 109 =
p
2.56 x 109 p + 1.28 x 10 9
= 14.65 p + 0.64
Finally, we’ll sketch the table:
p 1 2 4 8
execution time in seconds 10.9 7.96 4.303 2.47

5
speed-up (relative to 1 1.13 1.13 1.12 1.1
professor )
c)
This mean that the execution time of one processor (with reduced CPI ) and of 2
four processors will be the same. So we have:

execution time = 3.84 (s) new
Because clock rate remains unchanged, and

execution time = clock cyclesclock rate new
We have:
clock cycles2 GHz = 3.84 (s)
⇒ clock cycles = 2 x 10 x 3.84 = 7.68 x 10
new
9 9
Then,
clock cycles = (2.56 x 10 ) x 1 + (1.28 x 10 ) x CPI + (256 x10 ) x 5
new,
9 9
2,new
6
= 3.84 x 10 + 1.28 x a x CPI = 7.68 x 10

9
2,new
9
Hence,
CPI = 7.68 x 109 - 3.84 x 1091.28 x 109 = 3
2,new
Problem 6:
a)
First, we obtian the die areas:
Die area Wafer area 1Die count 1 = (7.5)284 = 2.104 cm
1
2
Die area 102100 = cm 2

2
Plug in to the yield euqation:

Yield = 1(1 + Defect rate 1 x Die area 12 )2 = 1(1 + 0.020 x 0.5 x 2.104)2 =
1
0.96
Yield = 1(1 + 0.031 x 0.5 x )2 = 0.91
2
b)
Cost per die:
Cost per die = Cost per wafer 1Dies per wafer 1 x Yield 1 = 1284 x 0.96 = 0.149
1
Cost per die = 15100 x 0.91 = 0.165 2
c)
 number of dies per wafer is increased by 10%
Die area Wafer area 1Die count 1 x 1.1 = (7.5)284 x 1.1 = 1.91 cm
1
2
Die area 102110 = 2.86 cm 2

2
 the defects per area unit increases by 15%

Yield = 1(1 + Defect rate 1 x 1.15 x Die area 12 )2 = 1(1 + 0.020 x1.15 x 0.5 x
1
2.104)2 = 0.95
Yield = 1(1 + 0.031 x 1.15 x 0.5 x )2 = 0.91
2
d) a die area is 200 mm = 2 cm 2 2
We find the yield is given by:

Yield = 1(1 + (Defect rate)x 22 = 1(1 + Defect rate)2
Solving for defect rate we have:
Defect rate = 1Yield-1
Previous: Defect rate = 1Yield-1= 10.92-1= 0.043 defects/cm 2
New : Defect rate = 1Yield-1=10.95-1 = 0.026 defects/cm 2
Problem 7.
Instruction Execution Reference
count time time
2.389E12 750 s 9650 s
a.
- Clock cycle is 0.333ns find CPI.
- CPI = (execution time)/((instruction count) × (Clock cycle))
750
- 12 −9
=0.94
(2.389 ×10 )×(0.333 ×10 )
b.
9650
- Spec ratio = reference time /excecution time= 750 =12.86s
c.
Number of instruction count ×CPI
- CPU time = Clock rate
 Because CPU time is proportional to Instruction count . So increase 10%
of number of instruction count without affect clock rate and CPI will
increase the CPU time 10%.
d.
- CPU time after increase Intruction count 10% , CPI 5%:
( 1.1number of instruction ) ×(1.05 CPI )
- CPU time = =1.115CPU time (old )
Clock rate
So CPU time increase 15.5%

e.
- SPEC ratio = reference time/CPU time
Specratio(after) CPU time (before) 1
- = =
Specratio (before) CPU time(after ) 1.1555
=0.86 s
So the SPEC ratio is decreased by 14%.

f.
( CPU time ) × Clock rate 700 × 4 × 10
9
- CPI = = =1.37
Instruction count 0.85× 2389× 109
g.
AMD version Clock rate (GHz) CPI
Before 3 0.94
After 4 1.37
Clock rate (after ) 4
- The clock rate ratio between 2 version : Clock rate(before) = 3 =1.33
CPI (after ) 1.37
- The CPI ratio between 2 version : CPI (before) = 0.94 =1.45
 The increase in CPI is different from the increase in Clock rate because
the number of instructions has been reduced by 15%, the CPU time has
been reduced by a lower percentage.
h.
CPU time(after ) 700
- The percentage reduce on CPU time: CPU time(before) = 750 =0.933=6.7 %.
i.
Clock rate (GHz) CPI Instruction count CPU time (ns)
4 1.61 960
- CPU time after reduce 10% : 0.9 × 960=¿864 ns
9
CPU time× Clock rate 864 × 4 ×10 9
- Instruction count = = =2146 × 10 .
CPI 1.61
j.
- Clock rate= CPU time
.
- To reduce CPU time 10% (0.9 time), Clock rate must increase
1 1
clock rate ( old ) = ×3 GHz=3.33 GHz .
0.9 0.9
k.
- CPI is reduced by 15 % = 0.85 CPI(old)
- CPU time is reduced by 20% =0.8 CPU time (old)
- New Clock rate =
Instruction count ×(0.85CPI ) 0.85 0.85
= Clock rate ( old )= ×3=3.1875 GHz
0.8CCPU time 0.8 0.8
Problem 8.
Clock Rate (GHz) Instruction CPI
Counts (E9)
P1 4 5 0.9
P2 3 1 0.75
a.
5× 109 ×0.9
- Execution Time P1: 9
=1.125 second
4 ×10
9
1× 10 × 075
- Execution Time P2 =0.25 second
3× 109
 This fallacy is false although Processor 1 has larger clock rate than
Processor 2 but the execution time is smaller than processor 2.
b.
- The execution time of Processor 1 to process 1.0E9 instruction:
Instruction count ×CPI 1.0 ×109 ×0.9
CPU time = Clock rate
= 9 = 0.025 s
4 ×10
- The number of instructions that Processor 2 can process in 0.25s :

CPU time × Clock rate 0.025× 3× 109
Instruction count= = = 108 instructions
CPI 0.75
c.
Calculate the millions of instructions per second (MIPS) of 2 processor:

9
( 4 ×10 )
- MIPS of Processor 1 = 6
=4444.44
0.9 ×10
(3× 109 )
- MIPS of Processor 2 = 6
=4000
0.75× 10
In the section (a) of this problem we have the performance ratio of 2

processor:
Performance ( p 1) Execution time( p 2) 0.25
= = =0.22
Performance ( p 2) Execution time( p 1) 1.125
So Processor 1’s performance is less than processor 2’s performance

=> Although Processor 1 has larger MIPS but we has determined that
Processor 2 has better performance in the section a.
d.
Number of FP operation
MFLOPS = 6
Execution time × 10
40 % × 5× 109
- MFLOPS of Processor 1 : 6
=1.7 ×103
1.125× 10
9
40 % × 1× 10 3
- MFLOPS of Processor 2 : 6
=1.6 ×10
0.25× 10
Problem 9:
a)
New time spend to run FP operation:
(1-0.2) x 70 = 56 (s)
Total time reduced by:
70 - 56 = 14 (s)
or (14 : 250) x 100 = 5.6%
b)
The total time is reduced by 20% ⇒ 250 x (1- 0.2) = 200 (s)
Then, the time for execute INT operations is : 200 -70 -85 - 40 = 5 (s)
When the actually time needed is : 250 - 70 - 85 - 40 = 55 (s)
Hence, the time for INT operations reduced by : 555x100 = 91%
c)
Assume that we avoid using branch operations.
The time of execution is : 55 + 70 +85 = 210
So it’s reduction is : 1 - 210250 = 0.16 = 16%
Hence, the total time cannot be reduced 20% only by decreasing time of
branch operations.
Problem 10:
a)
The execution of 50 x 10 FP instructions
6
110 x 10 INT instructions

6
80 x 10 L/S instructions

6
16 x 10 branch instructions
6
executions time = i=14num of instructions clock rate

x CPI
= 50 x 106 x 1 + 110 x 106 x 1 + 80 x 106 x 4 +

16 x 106 x 22 x 108
= 0.256 (s)
b)
we want the program to run two times faster
⇒ the executions time = 0.2562 = 0.128 (s)
executions time = i=14num of instructions clock rate x new CPI
Solve for new CPI:

CPI = 50 x 106 = - 4.12 (cannot)
256 x 106 - 462 x 106
Therefore, it is impossible for the program to run two times faster.
c)
The CPI of INT and FP instructions reduced by 40%
CPI = (1 - 0.4) x 1 = 0.6
INT
CPI = (1 - 0.4) x 1 = 0.6

FP
CPI of L/S and branch reduced by 30%

CPI = (1 - 0.3) x 4= 2.8
L/S
CPI = (1 - 0.3) x 2 = 1.4

BRANCH
Then,
executions time = i=14num of instructions clock rate
x CPI
= 50 x 106 x 0.6 + 110 x 106 x 0.6 + 80 x 106 x 2.8 +

16 x 106 x 1.42 x 109
= 0.1712 (s)

Problem 1 A) Considering The Number of Instructions Here To Be A Constant A

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Problem 1 A) Considering The Number of Instructions Here To Be A Constant A

Uploaded by

Copyright:

Available Formats

P1 P2 P3

Clock rate 3.0 GHz 2.5 GHz 4.0 GHz

Number of instructions that P1 executed:

Formula calculating clock cycles:

c) CPI ' =1.2CPI

Taking the ratio of CPU time over CPU time’:

Ratio of static power to dynamic power:

c) total power=dynamic power+ static power

total power−Capacitive load ×Voltage 2 × Frequency

After the total power is reduced by 10 % :

And the leakage current is unchanged:

¿>Voltag e ' =1.182(V )

Percentage of voltage reduced:

Core i5 Ivy Bridge:

¿>Voltag e ' =0.841(V )

Percentage of voltage reduced:

Hence, for only one processor, we have:

= 2.56 x 109 p + 1.28 x 10 9

2.56 x 1010 p + 1.28 x 10 9

execution time in seconds 9.6 7.04 3.84 2.24

speed-up (relative to 1 1 1.36 2.5 4.29

= 2.93 x 109 p + 1.28 x 10 9

2.56 x 109 p + 1.28 x 10 9

execution time in seconds 10.9 7.96 4.303 2.47

four processors will be the same. So we have:

Because clock rate remains unchanged, and

= 3.84 x 10 + 1.28 x a x CPI = 7.68 x 10

Die area 102100 = cm 2

Plug in to the yield euqation:

Cost per die = 15100 x 0.91 = 0.165 2

Die area 102110 = 2.86 cm 2

 the defects per area unit increases by 15%

d) a die area is 200 mm = 2 cm 2 2

We find the yield is given by:

New : Defect rate = 1Yield-1=10.95-1 = 0.026 defects/cm 2

So CPU time increase 15.5%

So the SPEC ratio is decreased by 14%.

- The number of instructions that Processor 2 can process in 0.25s :

Calculate the millions of instructions per second (MIPS) of 2 processor:

In the section (a) of this problem we have the performance ratio of 2

So Processor 1’s performance is less than processor 2’s performance

110 x 10 INT instructions

80 x 10 L/S instructions

executions time = i=14num of instructions clock rate

= 50 x 106 x 1 + 110 x 106 x 1 + 80 x 106 x 4 +

Solve for new CPI:

Therefore, it is impossible for the program to run two times faster.

CPI = (1 - 0.4) x 1 = 0.6

CPI of L/S and branch reduced by 30%

CPI = (1 - 0.3) x 2 = 1.4

= 50 x 106 x 0.6 + 110 x 106 x 0.6 + 80 x 106 x 2.8 +

You might also like