21020512 - Mai Ngọc Duy

MÔN HỌC: KIẾN TRÚC MÁY TÍNH Họ và tên: Mai Ngọc Duy
MÃ HỌC PHẦN: INT2212 42 Lớp: K66 CACLC2

MSV: 21020512
Bài tập về nhà Tuần 2
***
* * * *
Exercise 2.1:
We use the formula: Execution time = Instruction count × Clock cycles per instruction × Clock cycle
time.
Execution-time1 = (10 × 1 + 3×2 + 4×3 + 3×4) × 1/2GHz = 39 × 1/2GHz = 19.5 ns
Execution-time2 = (20 × 1 + 3×2 + 2×3 + 1×4) × 1/2GHz = 36 × 1/2GHz = 18 ns
Now we calculate the MIPS:

1
MIPS1 = ≈ 51.28 MIPS
19.5 ns
1
MIPS2 = ≈ 55.56 MIPS
18 ns
 The second optimization results in faster program.
Exercise 2.2:
30
Fraction of CPU which gets enhanced = = 0.30
100
50
Fraction of CPU in I/O activity = = 0.50
100
Speed up for enhancement given = 10
Therefore, the overall speed up is = 1 / (0.50 + (0.30/10)) = 1.889.
Exercise 2.3:
a) To combine rates of execution one must use a weighted harmonic mean (WHM).
The WHM is defined as
1
WHM = n
W
∑ Ri
i=1 i
where Wi and Ri are the weights and rate for program i within the suite respectively.
One must not use an arithmetic mean or a simple harmonic mean, as these do not lead to a
correct computation of the overall execution rate for the collection of programs. Each weight W is
defined as the ratio of the amount of work performed in program i to the total amount of work
performed in the whole suite of programs.
Thus, if program i executes Ni floating-point operations then, assuming m programs in total;
Ni
Wi= m
or,
∑Nj
j=1
N 1+ N 2+ N 3+ …+ N m
WHM = or,
t 1 +t 2+t 3 +…+t m
t 1 r 1 +t 2 r 2 +t 3 r 3 +…+ t m r m
WHM =
t 1+ t 2 +t 3+ …+t m
m
We know that T =∑ t j
j=1
Using the above two equations, we can say that

t1 t t
WHM = r 1+ 2 r 2 +…+ m r m
T T T
b) For this example, WHM = (0.1 × 1000 + 0.55 × 110 + 0.2 × 500 + 0.1 × 200 + 0.05 × 75), which is
284.25 MFLOPS.
c) Consider P2 in isolation. If all other programs run at rate rꝏ then:

R2
lim ( WHM )=
r→∞ W2
Thus the upper bound on the benchmark performance is simply R2/W2. We therefore have 284:25 =
0:55 = 516:82 mflops.
Exercise 2.4:
The overall speed up we can achieve:
Execution time(old ) ( 0.4 ×15 )+(1−0.4)
Overall speed up= = =6.6
Execution time(new) 0.4+(1−0.4)
Exercise 2.5:
a)
- The design constraints are: The constraints are that total system cost must be ≥$50k and ≤$100k, and
total system power consumption must be <50kW.
- The quantity or quantities to maximize is/are: Since the goal of the product is to be a cost-
performance leader, we’d better try to maximize its cost-performance!
- The quantity or quantities to minimize is/are: System cost should be minimized, within the ≥$50k
constraint.
b) (i) the constraints that our design must satisfy:

Constraints on total system cost: Csys = nchipsCX ≥ Csys,minCsys = nchipsCX ≤ Csys,max
Constraints on total system power: Psys = nchipsPX ≤ Psys,max
(ii) the quantity that we should be maximizing in our design (figure of merit),
Maximize system cost-performance: CPsys = Tsys/Csys = TXnchips/CXnchips = TX/CX.
(iii) the quantity that we should be minimizing in our design (figure of demerit)
Minimize system cost (within the minimum cost constraint): Csys = nchipsCX = Csys,min/CX×CX
(This goal is secondary to minimizing the cost per unit of throughput, C X/TX, or maximizing cost-
performance)
c) (i)
CPsys,A =TA/CA = 80 w.u./s/$2.75k = 29.09 w.u./s/$1k
CPsys,B =TB/CB = 50 w.u./s/$1.6k = 31.25 w.u./s/$1k
-> Chip B gives better cost-performance (31¼ work units per second per $1,000 of cost, versus only
29.09 for chip A).
(ii) Since we want to minimize system cost C sys,B = nchipsCB, we must minimize nchips, within the
constraint that nchipsCB ≥ $50k. Thus, the optimal (minimum) value of nchips is given by
nchips,opt = $50k/CB = $50k/$1.6k = 31.25 = 32.
(Since 32×50W = 1.6kW < 50 kW, the power constraint is met.)
(iii) If we had used chip A, then we would have had

nchips,opt = $50k/CA = $50k/$2,750 = 18.18 = 19, and Csys,A = 19×$2,750 = $52,250.
With chip B, we have Csys,B = 32×$1,600 = $51,200.
So chip B also lets us achieve a design with lower total cost that still meets the $50,000 minimum cost
constraint.
Exercise 2.6:
Amdahl’s law is based on a fixed workload, where the problem size is fixed regardless of the machine
size. Gustafson’s law is based on a scaled workload, where the problem size is increased with the
machine size so that the solution time is the same for sequential and parallel executions.
Exercise 2.7:
a) Say Program P1 consists of n x86 instructions, and hence 1.5 × n MIPS instructions. Computer A
operates at 2.5 GHz, i.e. it takes 0.4ns per clock. So the time it takes to execute P1 is 0.4ns/clock × 2
clocks/instructions × 1.5 n instructions = 1.2 n ns. Computer B operates at 3 GHz, i.e. 0.333ns per
clock, so it executes P1 in 0.333 × 3 × n = n ns. So Computer B is 1.2 times faster.
b) Going by similar calculations, A executes P2 in 0.4 × 1 × 1.5 n = 0.6 × n ns while B executes it in

0.333 × 2 × n = 0.666 n ns hence A is 1.11 times faster.
Exercise 2.8:
a) CPI = 0.1 × 50 + 0.9 × 1 = 5.9
b) Time spent just doing divides:

0.1 * 50 × 100 / 5.9 = 84.75%
c) Divide would now take 25 cycles, so

CPI = 0.1 × 25 + 0.9 × 1 = 3.4. Speed up = 5.9/3.4 = 1.735x
d) Divide would now take 10 cycles, so

CPI = 0.1 × 10 + 0.9 × 1 = 1.9. Speed up = 5.9/1.9 = 3.105x
e) Divide would now take 5 cycles, so

CPI = 0.1 × 5 + 0.9 × 1 = 1.4. Speed up = 5.9/1.4 = 4.214x
f) Divide would now take 1 cycle, so

CPI = 0.1 × 1 + 0.9 × 1 = 1. Speed up = 5.9/1 = 5.9x
g) Divide would now take 0 cycles, so

CPI = 0.1 × 0 + 0.9 × 1 = 0.9. Speed up = 5.9/.9 = 6.55x
Exercise 2.9:
Divider execution time = 200 × 60% = 120 (ms)

Others’ execution time = 200 – 120 = 80 (ms)
Total execution time after speed-up = 200 / 2.5 = 80 (ms)
In order to have speed-up of 2.5 on this system, the divider execution time should be Zero, which
means the divider speed-up should be infinity, which is not possible.
Exercise 2.10:
Original Approach 1 Approach 2

Swap 2x FP 4x
Add 10% 1 1 1
Subtract 10% 1 1 1
Integer Multiply 7.5% 3 3 3
Swap 20% 2 1 2
Floating point 2.25% 10 10 2.5
Operation
Others 50.25% 1.5 1.5 1.5
Average CPI (Original) = 10%×1 + 10%×1+ 7.5%×3 + 20%×2 + 2.25%×10 + 50.25% × 1.5 =
180.375%
Average CPI (Approach1) = 10%×1 + 10%×1+ 7.5%×3 + 20%×1 + 2.25%×10 + 50.25% × 1.5 =
160.375%
Average CPI (Approach2) = 10%×1 + 10%×1+ 7.5%×3 + 20%×2 + 2.25%×2.5 + 50.25% × 1.5 =
163.5%
It is clear that Approach 1 with swapping twice as fast is a better choice.

21020512 - Mai Ngọc Duy

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

21020512 - Mai Ngọc Duy

Uploaded by

Copyright:

Available Formats

MÔN HỌC: KIẾN TRÚC MÁY TÍNH Họ và tên: Mai Ngọc Duy

MÃ HỌC PHẦN: INT2212 42 Lớp: K66 CACLC2

Now we calculate the MIPS:

Using the above two equations, we can say that

c) Consider P2 in isolation. If all other programs run at rate rꝏ then:

b) (i) the constraints that our design must satisfy:

(iii) If we had used chip A, then we would have had

b) Going by similar calculations, A executes P2 in 0.4 × 1 × 1.5 n = 0.6 × n ns while B executes it in

b) Time spent just doing divides:

c) Divide would now take 25 cycles, so

d) Divide would now take 10 cycles, so

e) Divide would now take 5 cycles, so

f) Divide would now take 1 cycle, so

g) Divide would now take 0 cycles, so

Divider execution time = 200 × 60% = 120 (ms)

Original Approach 1 Approach 2

You might also like