You are on page 1of 11

Principles Of Computer Architecture-Assignment 1

Rahul Sidhardha
U03024960
1. One major difference between servers and embedded processors lies in their intended use,
whereas servers are employed in the data center, enterprise environment and providing
computing power and storage for various applications and services, embedded processors are
commonly found in devices which are integrated to other products, such as automobiles,
household appliances, and medical equipment.
An Another key distinction pertains to their architecture, whereas Servers are usually designed
with a general-purpose approach, enabling them to handle a wide range of tasks. In contrast,
embedded processors are designed with a specific purpose, timing their performance for a
particular task or a set of tasks.

2. As from the figure in slide 44


Energy cost of 32b dram = 640
Energy cost of 32b add = 0.1
Ratio of energy cost comparison between 32B DRAM read vs 32B add is
= 640 / 0.1 = 6400
Ratio of energy cost comparison between 32b SRAM read vs 32b DRAM read
= 640 / 5 = 128

3. (a) True
Even if some parts are made 10 times faster, the slowest component limits a system's
performance. It is due to the fact that the system's overall performance is influenced by the
speed of the slowest component.
Regardless of whether a few parts are made 10x quicker, the general exhibition of the
framework will in any case be restricted by the slowest part.

(b) False
Amdahl's law is super important in computer science and parallel computing. It helps us see
how non-parallelizable parts affect performance. Even though equal registering is more normal
currently, Amdahl's regulation is yet an important guideline for improving projects and
frameworks.

(c) False
Operating clock frequency alone is not a reliable metric for measuring the performance of a
processor. We also have Factors like architecture, cache size, and instruction which impact the
performance. Simply having a higher clock frequency does not guarantee better performance if
other aspects are not optimized. It is also necessary to consider multiple factors when
evaluating a processor's performance

(d) True
In future, the performance improvement will depend on parallelizing programming instead of
just adding more cores to a chip. Having more cores can help, it is all about optimizing the
software to make the most of those cores. By distributing tasks effectively, we can achieve
major performance gains. key is to focus on the parallelization for the future performance
improvements.

4 (a)
By avoiding the 32-bit floating-point multiplier from the architecture, the CPU could be made
more efficient for programs that do not require floating-point operations. It will reduce
complexity and shrink the CPU, decreasing expenses and enhancing performance for certain
specific applications. The CPU can then concentrate on efficiently handling the necessary 32-bit
adder as well as integer multiplier operations through this.

(b)
The processor can be made more cost- and energy-efficient if it is optimized for applications
which do not require floating-point calculations. The 32-bit floating-point multiplier may be
removed, simplifying, and condensing the CPU. It gets more energy efficient because of its
simplicity and reduces power usage. It may additionally cut the processor's production and total
cost. As an outcome, the optimization not only improves performance but additionally helps
conserve both energy and funds.

5.(a)
As we know that latency is the time taken by the message to reach from one network to
another, also throughput is the rate and speed of the data at which it is transmitted. The faster
the network the higher the throughput value is.
Given as there are set of 10 tasks in which it takes 1 second to finish a processor.
As the processor can finish only one task at a time.
so, the latency for finishing 10 tasks is 10/1 = 10.
It can only complete 1 certain task at a time, so the throughput is 1

(b) now they addressed that the processor could complete couple of tasks in parallel
Latency for finishing 10 tasks = 10/2 = 5
It can complete 2 tasks at a time, so the throughput is 2.

6 (1.8)
(a)The processor can be made more cost- and energy-efficient if it is optimized for uses that
do not require floating-point calculations. The 32-bit floating-point numbers multiplier can be
eliminated, shortening, and reducing the CPU. It became more energy efficient as an outcome
of its simplicity and reduces power usage.
So, 50 % of energy will be saved if we execute at the current speed and turn off the system
when the computation is complete.

(b) If you execute the necessary code at the current speed and turn off the system when the
computation is complete, you can save a significant amount of energy. By not running the
system unnecessarily, you avoid consuming additional energy during idle periods.
If you adjust the voltage and frequency to half, it can be further contributed to energy savings.
By reducing the voltage and frequency, it lowers the power consumption of the system,
resulting in even greater energy efficiency. So, by optimizing both the execution strategy and
the voltage/frequency settings, you can maximize energy savings in your real-time application.
E = (c v^2)
So that E ∝ v^2
Energy is dependent on voltage not on frequency
= E ∝ (½ v) ^2
= E ∝ (¼ V^2)
From this equation we can say that the energy decreases by 4 times if frequency and voltage
reduce to half.

1.12
(b) considering that doing encryption procedures on a quad-core system is 20 times quicker
than doing it normally. A 2% increase in power usage is seen.
By Amdahl’s Law
Speedup Overall = 1 / ([(1 − fraction enhanced) + (fraction enhanced/speedup enhanced)])
Given that Speedupoverall = 2
Also, Speedupenhanced = 20
Let us assume that fraction enhanced = F
From the above equation
(1/((1-F) +(F/20))) =2
(20/((20-20F) +F) = 2
(20/(20-19F)) =2
40-38F = 20
F = 20/38 = (10/19)
=0.526
Finally, the percentage of encryption results in speedupoverall= 52.6%
(c) we have got
Fractionenhanced= 52.6%
We must find the Execution timenew =
Executiontimeold *[(1-fractionenhanced )+(fractionenhanced /speedupenhanced)]
This equation helps us to find the executiontimenew
= = (0.526 ÷ 20) * [ (1- 0.53) + (0.526)/20]
= 0.526 ÷ 20 / [ ( 0.475 + 0.526/20 )]
= 0.526 ÷ 20 /( 9.4 + 0.526 ) / 20
= 0.526 ÷ ( 9.4 + 0.526 )
= 0.526 ÷ 9.926 = 5.25
Thus the execution timenew = 5.25% when speedup is 2.
(d) as the percentage of encryption is 50
90% of operations can be parallelized
Initially we must find the speedup for providing 2 units:
We knew that speedup = (1/ (0.5+0.5(0.1/20+9/(2*20)))
= 1/ (0.5+0.0137)
=1/0.5137
=1.947
Thus, the speedup for 2 units is 1.947
Now we must find Speedup for 4 units:
Speedup= (1/ (0.5+0.5(0.1/20+9/(4*20)))
= 1/(0.5+0.0081)
=1/(0.5081)
=1.968
Thus, the speedup for 4 units is 1.968
(1.13)
(a)Let E be the execution time without improvement. According to the proportion of execution
time that is spent using the enhanced mode, it is claimed that this mode is employed 50% of
the time.
We know that executiontimeold = 50 % executionnewtime
=0.5 fast mode
As we know, making enhancements for a computer improves the mode of execution by a factor
of 10.
So speedup = executionnewtime + 0.5 * 10 fast
= 0.5 + 0.5* 10 fast
= 0.5+ 5 = 5.5 fast
Thus, the speedup we got in fast mode is 5.5
(b)
We must find the percentage of original execution time that has been converted to fast mode
As we know that, by Amdahl's law
Speedupoverall =1/((1-fractionenhanced )+(fractionenhancement /speedupenhancement))
As we know that the fraction enhanced is f/100 and speedup enhancement is 10.so
substituting these in the Amdahl's law we get
1/((1-f/100) + (f/100/10)) = 5.5
1/((1-f/100) + ( f/1000)) = 5.5
1000/(-9p+1000) = 5.5
1000= -49.5p+5500
49.5p= 4500
P=4500/49.5
=90.90 %
Finally, percentage of the original execution time is 90.90%

1.16
(a) Amdahl's Law may be used to determine the speedup with N processors when 80% of the
program can be parallelized without accounting for communication costs.
According to Amdahl's Law, the percentage of an application that cannot be parallelized
determines the maximum speedup that may be obtained by doing so
Speedup time is known as execution time old / execution time new
Based on the amdahl’s law we got
=1/((1-fractionenhance )+(fraction enhanced/speed enhanced)
We know that fraction enhanced is 80 %
Let's assume that the numner of processors are x
Thus the speedup is 1/[(1-0.8) + 0.8/x]
= 1/(0.2+0.8/x)
(b)
Fraction enhanced=0.8
Speedup enhanced=8
Thus the speedup overall with eight processors are
= 1/((1-f enhanced) +(f enhanced * 1)/speedup enhanced)
=1/(0.2+(0.8/8) +(0.005*8))
=2.94
Thus, the speedup overall with the eight processors are 2.94

7
(A.1)
Mainly We must find the effective CPI for ASTAR an BZIP
We also have to find the CPU time and clock cycle time for ASTAR and BZIP
We know that Effective CPI is the product of ∑ Instruction category frequency clock cycles for
frequency
ASTAR
From the given details
Effective CPI = ∑Instruction category frequencies * clock cycles for frequency
= (0.46 x 1 )+ (0.02 x 3)+(0.06 x 3) + (0.28 x 5 )+(0.18 x 5)
= (0.46)+(0.06)+(0.18)+(1.4)+(0.9)
=3
Effective CPI = 3
Finally the effective cpi is 3 cycles
Given Clock cycle time is the inverse of clock frequency
CPU time = Instruction cycle * cycles per instruction * clock cycle time
Instruction cycle =3
Cycles per instruction= 1000
Clock cycle time = (1/10^9)
CPU time = 3 * 1000 * (1/10^9)
= 3 * 10 –6
= 3 microseconds as the value of microseconds is (10-6 )

BZIP
Given Instruction cycle = 2.33
Cycles per instruction = 1000
Clock cycle time= 1/ clock frequency
Thus CPU time for the bzip = 2.33 * 1000 * 1/clock frequency
= 2.33 *1000 *1/109
=2.33 * 1000 * 10 –9
= 2.33 * 10 –6
Cpu time is 2.33 microseconds
Effective CPI = (0.54)*(1.0) + (0.2) *(5) + (0.07) * (3)+ (0.01) * (3) + (0.11)* (0.5)
=2.33
Thus the effective cpi for BZIP is 2.33 cycles

ALU
Instruction cycle = 2.6
Cycles per instruction =1000
Clock cycle time is the inverse of clock frequency
CPU time = 2.6 * 1000 * 1/ clock frequency
= 2.6 * 1000 * 10-9
= 2.6 * 10-6
Cpu time is 2.6 microseconds
Effective cpi = (0.5)* (1) +( 0.2 * 5) + (0.1 * 5 ) + (0.2 * 3)
=2.6 cycles

(A.9)
We have three types of address instructions here mentioned in the problem. They are
a) 3 two address instructions
b) 63 one address instructions
c) 45 zero address instructions
(a) 3 two address instructions
We have 2 bits in the address instructions
The address in the instructions are two.
The operands we have 2
From that 22 = 4
As 4 > 3
The Size of address fields given are 6
As we have two address fields thus number of address bits are 2*6
= 12 bits
Total number of bits are 12 + 2 = 14 bits which is the total bits
Finally, from this the instructions can be encoded

(b) 63 one address instructions


We know One address field has 6 bits which is
As we mentioned here the address is 1.
1 * 6 = 6 bits
26 = 64
Opcode contains 6 bits in it.
Totally the number of bits we have is 6 + 6 = 12 bits
From this we can say that the instructions can be encoded.
(c)45 zero address instructions
In this address instructions we have 6 bits which are good for opcode
The address mentioned in the instructions is zero.
So, number of bits in the address field is
0 x 6 = 0 bits
Thus, the total bits we have is 0 + 6 = 6 bits
So, we can say that the instructions can be encoded.
(A.19)

It is given that,
A ⟵B + C ;
B⟵A + C;
D⟵A - B;
For each of the four instruction set architectures, the following table lists the assembly
language code and related memory accesses

Stack Load-store Memory-memory accumulator


Push B Load R1, B Add B, C, A Load B
Push C Load R2, C Add A, C, B Add C
Add ADD R3, R1, R2 SUB A, B, D Store A
POP A Store R3, A Add C
Push A Add R4, R3, R2 Store B
Push C Store R4, B Load A
Add Subtract R5, R3, R4 Subtract B
Pop B Store R5, D Store D
Push A
Push B
Subtract
Pop D
Memory accesses - 9 5 9 8

You might also like