You are on page 1of 20

Problem 1.

[4] <§1.6> Consider three different processors P1, P2, and P3 executing the same
instruction set. P1 has a 3 GHz clock rate and a CPI of 1.5. P2 has a 2.5 GHz clock
rate and a CPI of 1.0. P3 has a 4.0 GHz clock rate and has a CPI of 2.2.

a. Which processor has the highest performance expressed in instructions per


second?

b. If the processors each execute a program in 10 seconds, find the number of
cycles and the number of instructions.

c. We are trying to reduce the execution time by 30%, but this leads to an increase
of 20% in the CPI. What clock rate should we have to get this time reduction?

Step-by-step solution
Step 1/7
Consider three different processors P1, P2 and P3 which are executing on the same
instruction set with the following clock rates and CPI’s:

Table 1: Processors with clock rate and CPI.

Here, CPI denotes the clock cycles per instruction.

Step 2/7
a.

The performance of each processor is calculated by using the following formula:

For processor P1:


Thus, the performance of processor   is .

For processor P2:

Thus, the performance of processor   is .

For processor P3:

Thus, the performance of processor   is .

As the performance is inversely proportional to the time, the processor with less time
performs better. Thus, among the 3 processors, the least time is taken by the
processor  resulting in highest performance.

Thus, the processor P3 results in the highest performance expressed in


instructions per second.

Step 3/7
b.

Consider the CPU time for executing each program is 10 seconds.

The number of cycles and number of instructions for each processor is calculated by
using the following formulae:
For processor P1:

Thus, the number of cycles for processor   is .

Thus, the number of cycles for processor   is .

For processor P2:

Step 4/7
Thus, the number of cycles for processor   is .

Thus, the number of instructions for processor   is .

For processor P3:


Thus, the number of cycles for processor   is .

Thus, the number of instructions for processor   is  .

Step 5/7
c.

Consider the old CPU time is 10 seconds.

Now, calculate the new CPU time as follows:

The time is decreased by 30%.

So, the CPU time is 7s.

CPI is increased by 20%.

So,  .

Calculate the clock rate to get the time reduction by using the following formula:
Calculate number of cycles and number 0f instructions of each processor by using
the following formulae:

For processor P1:

Thus, the number of cycles for processor   is .

Thus, the number of instructions for processor   is  .

Thus, the Clock rate for processor P1 is .


Step 6/7
For processor P2:

Thus, the number of cycles for processor   is  .

Thus, the number of instructions for processor   is .

Thus, the Clock rate for processor P2 is .

Step 7/7
For processor P3:
Thus, the number of cycles for processor   is  .

Thus, the number of instructions for processor   is .

Thus, the clock rate for processor   is  .

Problem 1.7

[15] <§1.6> Compilers can have a profound impact on the performance of an


application. Assume that for a program, compiler A results in a dynamic instruction
count of 1.0E9 and has an execution time of 1.1 s, while compiler B results in a
dynamic instruction count of 1.2E9 and an execution time of 1.5 s.

a. Find the average CPI for each program given that the processor has a clock cycle
time of 1 ns.

b. Assume the compiled programs run on two different processors. If the execution
times on the two processors are the same, how much faster is the clock of the
processor running compiler A’s code versus the clock of the processor running
compiler B’s code?
c. A new compiler is developed that uses only 6.0E8 instructions and has an
average CPI of 1.1. What is the speedup of using this new compiler versus using
compiler A or B on the original processor?

Step-by-step solution
Step 1/5
Consider the two compilers A and B. The values of some factors such as Execution
time and Instruction count for the two compilers are as follows:

Compile Execution Instruction


r Time Count
A 1.1
B 1.5

The clock cycle time of the processor is  .

Step 2/5
To calculate the CPI for each compiler, need to use the following formula:

For compiler A:

CPI for compiler A when the Cycle Time is 1 ns is as follows:

Thus, the average CPI for compiler A is 1.1 .

For compiler B:

CPI for compiler B when the Cycle Time is 1 ns is as follows:


Thus, the average CPI for compiler B is 1.25 .

Step 3/5
Consider that the execution time on the two different processors is same.

The formula to calculate the execution time of a processor is as follows:

If the execution times on the two processors are same, then we have,

Thus, the clock of the processor running compiler A code is 1.36 times faster
than the clock of the processor running complier B’s code.

Step 4/5
Consider a new compiler.

The number of instructions for new compiler=

The average CPI for new compiler= 1.1


For compiler A:

Execution time for compiler A = 1.1 seconds

Step 5/5
Thus, speed up of using new compiler with compiler A is   .

For compiler B:

Execution time for compiler B = 1.5 seconds

Thus, speed up of using new compiler with compiler B is   .

Problem 1.6

[20] <§1.6> Consider two different implementations of the same instruction set
architecture. The instructions can be divided into four classes according to their CPI
(classes A, B, C, and D). P1 with a clock rate of 2.5 GHz and CPIs of 1, 2, 3, and 3,
and P2 with a clock rate of 3 GHz and CPIs of 2, 2, 2, and 2. Given a program with a
dynamic instruction count of 1.0E6 instructions divided into classes as follows: 10%
class A, 20% class B, 50% class C, and 20% class D, which is faster: P1 or P2?

a. What is the global CPI for each implementation?

b. Find the clock cycles required in both cases.


Step-by-step solution
Step 1/7
Consider the four classes of instructions A, B, C, and D. The clock rate and CPI of
each implementation are given in the below table:

Assume the number of instructions (I) =1.0E6=   .

Consider the classes as follows:

Step 2/7
Calculate the CPU time for processor P1:
Therefore, the CPU time of processor P1=1.04 milliseconds … (1)

Step 3/7
Calculate the CPU time for processor P2:
Therefore, the CPU time of processor P2=0.67 milliseconds … (2)

The performance is inversely proportional to CPU Time. The Processor taking least
CPU time performs better.

From equations (1) and (2), it is observed that processor P2 is faster than
processor P1.

Step 4/7
a)

Calculate the Global CPI for Processor P1:

Processor P1, clock rate=  and number of instructions=106

From equation (1), the CPI time of processor 

Therefore, the global CPI of processor P1 = 2.6 … (3)

Step 5/7
Calculate the Global CPI for Processor P2:

Processor P2, clock rate=  and number of instructions=106

From equation (2), the CPI time of processor 


Therefore, the global CPI of processor P2 = 2.01… (4)

Step 6/7
b)

Calculate the number of clock cycles for Processor P1:

Therefore, the number of clock cycles of Processor 

Step 7/7
Calculate the number of clock cycles for Processor P2:

Therefore, the number of clock cycles of Processor   .

[5] <§1.2> The eight great ideas in computer architecture are similar to ideas from
other fields. Match the eight ideas from computer architecture, “Design for Moore’s
Law,” “Use Abstraction to Simplify Design,” “Make the Common Case Fast,”
“Performance via Parallelism,” “Performance via Pipelining,” “Performance via
Prediction,” “Hierarchy of Memories,” and “Dependability via Redundancy” to the
following ideas from other fields:

a. Assembly lines in automobile manufacturing

b. Suspension bridge cables

c. Aircraft and marine navigation systems that incorporate wind information

d. Express elevators in buildings

e. Library reserve desk

f. Increasing the gate area on a CMOS transistor to decrease its switching time

g. Adding electromagnetic aircraft catapults (which are electrically powered as


opposed to current steam-powered models), allowed by the increased power
generation offered by the new reactor technology

h. Building self-driving cars whose control systems partially rely on existing sensor
systems already installed into the base vehicle, such as lane departure systems and
smart cruise control systems

Step-by-step solution
Step 1/1
There are 8 great ideas in computer architecture that are similar to the ideas of other
fields. These ideas are matches with the ideas of other fields as follows:

a. Assembly lines in automobile manufacturing use the idea of pipelining. When the
first activity of the task is executed, then it executes second activity and so on.

Therefore, the statement ‘a’ matches with the term “Performance via Pipelining”.

b. Suspension bridge cables resemble the Performance via Parallelism idea because
each cable holds part of the total weight of the bridge. So, it is similar to multiple
processes performing work at same time.

Therefore, the statement ‘b’ matches with the term “Performance via Parallelism”.

c. Aircraft and marine navigation systems incorporate wind information, which is


similar to the idea of performance via prediction because, it uses wind speed
prediction for better route generation.

Therefore, the statement ‘c’ matches with term “Performance via Prediction”.


d. Express elevators in building do not stop at each floor. Instead, they are used to
go to the most common locations, such as terrace of a building or a floor, which is
travelled most, faster.

Therefore, the statement ‘d’ matches with the term “Make common case fast”.

e. Hierarchy of Memories is implemented on the basis of a very common idea named


Library reserve desk. A library reserve desk contains the things that are required
most frequently required by the students. The course teachers or university staff
members put these books or papers. Similarly, the cache memory, which is part of
hierarchy of memories, also incorporates the same idea.

Therefore, the statement ‘e’ matches with the term “Hierarchy of Memories”.

f. The idea to increase the gate area on a CMOS transistor is to decrease its
switching time. This increase in the Gate area is redundant to increase the switching
time. This allows the CMOS to be more dependable.

Therefore, the statement ‘f’ matches with the term “Dependability via


Redundancy”.

g. Moore’s Law predict the increase in technology and performance over time. The
idea of adding electromagnetic aircraft catapults is possible because of the
technological advancement. These technological advancements in design resembles
Moore’s Law for Design.

Therefore, the statement ‘g’ matches with the term “Design for Moore’s Law”.

h. Abstraction means to hide the internal working and use already built systems.
Building self-driving cars on which the control system partially relies on existing
sensor systems such as a lane departure system and smart cruise control system is
an example of abstraction.

Therefore, the statement ‘h’ matches with the term “Use Abstraction to simplify


design”.

Another pitfall cited in Section 1.10 is expecting to improve the overall performance
of a computer by improving only one aspect of the computer. Consider a computer
running a program that requires 250 s, with 70 s spent executing FP instructions, 85
s executed L/S instructions, and 40 s spent executing branch instructions.

[5] <§1.10> By how much is the total time reduced if the time for FP operations is
reduced by 20%?

Step-by-step solution
Step 1/1
Consider the following data:
The Computer running instructions with CPU times has given

Total reduced time for FP operations 

When the CPU time is reduced by 20% the number of Floating point instructions can
be calculated as follows:

Calculate FP Instructions CPU time:

The reduced time for the FP instructions is the difference of total time taken for
reduced floating point instructions from the total time taken for floating point
instructions.

Total time after reducing floating point instructions is,

Calculate Total Reduced time:

The total reduced time after reducing FP instructions is the difference of the total
time taken after reduction from the total time taken before reduction.

The percentage of total time reduced when the total reduced time for FP instructions
is calculated as follows:
Hence, the total time reduced for 20% of FP instructions is   .

[5] <§1.8> When a program is adapted to run on multiple processors in a


multiprocessor system, the execution time on each processor is comprised of
computing time and the overhead time required for locked critical sections and/or to
send data from one processor to another.

Assume a program requires t = 100 s of execution time on one processor. When


run p processors, each processor requires t/p s, as well as an additional 4 s of
overhead, irrespective of the number of processors. Compute the per-processor
execution time for 2, 4, 8, 16, 32, 64, and 128 processors. For each case, list the
corresponding speedup relative to a single processor and the ratio between actual
speedup versus ideal speedup (speedup if there was no overhead).

Step-by-step solution
Step 1/4
Consider the following data:

Execution time on one processor = 100 sec

Let the execution time be calculated for 2, 4, 6, 8, 16, 32, 64 and 128 processors.

Step 2/4
The following table shows the calculation of execution time and total time:

Processo Execution Total time


r time
= t/p+ 4(in
‘ p  ’ = t/p (in sec) sec)
1 100 100
2 50 54
4 25 29
8 12.5 16.5
16 6.25 10.25
32 3.125 7.125
64 1.5625 5.5625
128 0.78125 4.78125
Step 3/4
The relative speed up is the ratio of execution time for one processor to the
execution time of p processors. The following table shows the calculation of relative
speedup:

Processo Execution Total time


r time Relative
= t/p+  4(in speedup
‘ p  ’ = t/p (in sec) sec)
1 100 100

2 50 54

4 25 29

8 12.5 16.5

16 6.25 10.25

32 3.125 7.125

64 1.5625 5.5625

128 0.78125 4.78125

Step 4/4
The following table shows the calculation of ratio of actual speedup to ideal speedup:

Actual
Processo Execution Total time speedup
r time Relative
= t/p+  4(in speedup Vs
‘ p  ’ = t/p (in sec) sec)
Ideal speedup
1 100 100

2 50 54

4 25 29

8 12.5 16.5
16 6.25 10.25

32 3.125 7.125

64 1.5625 5.5625

128 0.78125 4.78125

You might also like