High Performance Computing - CS 3010 - MID SEM Question by Subhasis Dash With Solution

Sample Question Format
KIIT Deemed to be University

Online Mid Semester Examination(Autumn Semester-2020)
Subject Name & Code: High Performance Computing / (CS 3010)

Applicable to Courses:
Full Marks=20 Time:1 Hour
SECTION-A(Answer All Questions. All questions carry 2 Marks)
Time:20 Minutes (5×2=10 Marks)
Question Question Question Answer CO

No Type(MCQ/SAT) Key(if Mapping
MCQ)
Q.No:1(a) MCQ Question -1 on concept 1 B) CO - 2
Consider an instruction pipeline with

four stages that take 6 nsc, 2 nsec, 10
nsec, and 4 nsec respectively. The
delay of an inter-stage register stage
of the pipeline is 1 nsec. What is the
approximate speedup of the pipeline
in the steady state under ideal
conditions as compared to the
corresponding non-pipelined
implementation?
A)4
B) 2
C) 2.5
D) 3.5
Ans. Speedup = (6+2+10+4) / (10+1) = 2
So Option B is correct (B) 2
SAT Question -2 on concept 1 CO - 2
Time required to perform instructions

fetch (IF) is 5ns, instruction decode
(ID) is 3ns, instruction execution
(EXE) is 4ns, operand fetch (MEM) is
6ns and write back is 2ns. The
processor is designed using latch
delay which takes 1ns. Assuming that
there is no stall, what is speed up of
the pipeline processor with respect to
the non-pipelined processor, to
execute 200 instructions?
Ans. Speedup = [(5+3+4+6+2)*200] / [(5+200-1)*(6+1)]=4000/1428= 2.8011
Consider an un-pipelined processor

takes 10 clock cycles for dependant
instructions and other operations but
it takes 20 clock cycles for memory
operations. Assume the relative
operation frequencies like dependant
and other operations are 70% and
memory operations are 30%. Ignore
the effect of dependant instructions
operations. If cycle time is 6ns and
pipeline overhead is 2ns then,
calculate the speed due to pipeline.
Ans. CPI Un-pipelined = (10 x 0.7) + (20 x 0.3) = 13

Avg. execution time for Un-pipelined = 13 x 6 = 78 ns
Avg. execution time for pipelined = 6 + 2 = 8 ns
Speed up = 78 / 8 = 9.75
A pipeline system affected by data

dependant instructions is 5 stall
cycles. If 30% instruction are data
reference instructions and pipeline is
operating with a clock cycle of 40
nanoseconds providing the speed-up
factor is 20 then, find out the number
of stages in the pipeline system?
Ans:- Non pipeline – n X 40

Pipeline =[ 1 + (0.3 x 5) ] x 40 = 46
20 = [n X 40] / 46  n = 23 stages
Q.No:1(b) SAT Question -1 on concept 2 CO1-1,2
A program is executed for 1 sec, on a

processor with a clock cycle time
50ns and the throughput is 15×10 6
instructions per second. How much is
the CPI for the program?
Ans. CT=50ns,
15×10 6 instructions 1 sec
1 instruction= 1/ (15×10 6) sec
50 ns 1 Clock cycle time
1/ (15×10 6) sec
1/ (15×10 6) /50 ns
SAT Question -2 on concept 2 CO1-1,2
The processor P1 having clock rate

2GHz executes 20 ×10 9 instructions
in 7 seconds. The processor P2 having
clock rate 1.5GHz executes 30× 10 9
instructions in 7 seconds. Find the
IPC of each processor. Find the clock
rate of P2 that reduces its execution
time of P1.
Ans.
For P1
7=(20 ×10 9 × CPI) / 2×10 9
CPI= 0.7 IPC= 10/7=1.42
For P2
7=(30× 10 9 × CPI) / 1.5× 10 9
CPI= 7/20, IPC=20/7=2.85
But the execution time of both P1 and P2 is equal, so the clock rate of P2 remains as it is,
1.5GHz.
Consider a program consists of 50

ALU, 30 LOAD and 20 STORE
instructions. ALU, LOAD and
STORE take 1, 4 and 4 cycles each.
Find out CPI. The clock frequency
has been incremented by 25% and this
implies a CPI increment of ALU
instructions of 50% and LOAD
instructions of 25% while the
remaining instructions are executed
with the same CPI. Find out the new
CPI.
Ans.
CPI=(0.5×1+0.3×4+0.2×4)=2.5
New CPI= (0.5×1.5+0.2×5+0.2×4)=2.55
The processor P1, P2 and P3 executes

a program and P1 takes 10seconds.
The performance P2 is 10% more
than P1 and the performance of P3 is
10 times than P1. Find the execution
time of P2 and P3.
Ans.
P2 /P1= E1/E2=1.1
E2= 10/1.1=9.09 sec
P3/P1=E1/E3=10
E3=10/10= 1 sec
Q.No:1(c) SAT Question -1 on concept 3 CO - 2
Given an un-pipelined processor with

a 10 number cycle time and pipeline
latches with 0.5ns latency. What is the
average instruction processing time of
a five stage instruction pipeline for 32
instructions if conditional branch
instructions occur as follows: I2, I5,
I7, I25, I27.
Ans.
1.75
SAT Question. -2 on concept 3 CO - 2

Consider the following MIPS
assembly code:
LOAD R1, 10(R2)
ADD R7, R1, R5
SUB R8, R7, R6
MUL R6, R4, R8
Identify each dependency by type and
list the two instructions involved.
Ans. LOAD & ADD  (R1- True),

ADD & SUB  (R7 - True),
SUB & MUL  (R8 - True),
SUB & MUL  (R6- Anti)
SAT Question -3 on concept 3 CO – 1,2
How many maximum clock cycles

is required to execute a program
that contains 150 machine
instruction in a 5 stage pipeline with
each stage take one clock cycle for
processing. Assume 20% instructions
are load/ store instruction and each
data transfer instruction creating one
stall cycle delay.
Ans.
As per the pipeline cycle calculation = {(n-1) +k } = 149 + 5 =154 clk cycle
20% Load /Store instructions = 20% of 150 = 30 no of instructions takes 1 clock cycle delay
For 30 instructions it will take 30 clock cycles delay
So maximum clock cycles is required to execute = 154 + 30 = 184 clock cycles
Consider a pipeline processor with

Ideal CPI of 1.0. Assume that 40%
are conditional instruction and 5 %
are unconditional instruction present
in the instruction mix of a benchmark
program. It is found that the branch
prediction accuracy of the pipeline is
50% and each misprediction causes 2
stall cycle delays in the pipeline. Find
the depth of the pipeline in order to
achieve a speedup gain of 6 times
with respect to a non-pipelined
processor.
Ans.
 Speed up = No of stage of the Pipeline ( K ) / (1+ stall cycles per instruction)

 6 = K / (1+ 0.4 * (1/2) * 2 + 0.05 * 2)
 K = 6 * ( 1+ 0.4 +0.1) = 6 * 1.5
 K = 9
So 9 Stage Pipeline
Q.No:1(d) SAT Question -1 on concept 4 CO – 2
How instruction rescheduling /

reordering is used to prevent data
hazard? Give an example.
Ans. By using instruction rescheduling / reordering, independent instructions can execute in
between two dependant instructions without affecting the results, which helps to avoid stalls
(1 Mark)
Example:- (1 mark)
Before Instruction Reordering After Instruction Reordering
1. ADD R1 , R2 , R3 1. ADD R1 , R2 , R3
2. SUB R4 , R1 , R5 2. XOR R8 , R6 , R7
3. XOR R8 , R6 , R7 3. AND R9 , R10 , R11
4. AND R9 , R10 , R11 4. SUB R4 , R1 , R5
SAT Question -2 on concept 4 CO – 2
What is pipeline interlock? Give an

example.
Ans. LOAD R1, 10(R2)

SUB R4, R1,R5
LOAD instruction has a latency that can’t be eliminated by operand forwarding. This is
called as pipeline inter-lock to preserve the correct execution.
Differentiate between predicted taken

and predicted un-taken schemes used
to prevent control hazard.
Ans. Predicted taken :- Treat every branch as taken (1 mark)
Predicted un-taken :- Execute successor instructions in sequence as if there is no
Branch (1 mark)
What is structural hazard? What are

the techniques used to solve structure
hazard in a pipeline processor?
Ans. Structural hazard :- Arise from resource conflicts among instructions executing
Concurrently (1 mark)
Techniques used to solve structure hazard (1 mark)

1. Separate data cache and instruction cache accessed simultaneously in the same cycle.
2. Memory interleaving
Q.No:1(e) SAT Question -1 on concept 5 CO - 1
Why the execution time of CISC

instruction varies from instruction to
instruction?
Ans.
More addressing modes that affects instruction length.

Instructions use memory operands that affects the time for fetching.
So variable instruction length and use of memory operands are the main reason for different execution time
for different instructions.
As CISC having Micro program control unit, it is also one the reason to have variation in instruction
execution time.
MCQ Question -2 on concept 5 A CO – 2
The speed-up gained in the ideal

pipeline is equal to the stages of the
pipeline if
A) N is very much greater than K

B) K is very much greater than N
C) Both K and N are equal
D) None of the above
( Where N is no. of instructions and K
is pipeline stages)
Ans. A) N is very much greater than K
How many classes of Computers are

there in Flynn’s Classifications?
Mention which one is used as array
processor and which one is used as
multiprocessor.
ANS. SISD, SIMD, MISD, MIMD (Write the full name)
Array Processor  SIMD

MIMD
Multiprocessor
“RAR is a hazard”. Justify.
Ans. No (1 Mark)
(give your justification) (1 Mark)
SECTION-B(Answer Any One Question. Each Question carries 10 Marks)
Time: 30 Minutes (1×10=10 Marks)
Question Question CO
No Mapping
Q.No:2 A. What is Amdahl’s Law? Assume that 30% instructions are data CO - 1
transfer instruction, 40 % instructions are ALU instruction and the
rest are the control instruction. Each of data transfer, ALU and
control instruction takes respectively 6clock cycle, 4clock cycle
and 7 clock cycle. Find the CPI of the machine. If using latest
hardware there found 3 times enhancement in the ALU instruction,
then find the overall Speedup of the machine.
[5 Marks]
Ans.
Amdahl’s Law: (2.5 Mark)
– Performance improvement gained from using some faster mode of execution is
limited by the amount of time the enhancement is actually used
– FRACTION ENHANCED :-Fraction of the computation time in the original machine
that can use the enhancement
– It is always less than or equal to 1
– SPEEDUP ENHANCED:-Improvement gained by enhancement, that is how much
faster the task would run if the enhanced mode is used for entire program.
– It is always greater than 1
CPI = 0.3 * 6 + 0.4 * 4 + 0.3 * 7 = 5.5 (2.5 Mark)
Fraction Enhanced = 40% = 0.4

Speedup Enhanced = 3
Overall speed-up = 1 / [( 1 - 0.4 ) + ( 0.4 / 3 )] = 1.3636
B. Differentiate between array processor and multi processor with CO - 1

diagrams.
[5 Marks]
Ans. Explain Array Processor with diagram (2.5 Mark)
Explain Multiprocessor with diagram (2.5 Mark)
Q.No:3 A. Derive the overall speed up gained by Amdahl’s law. Suppose a CO - 1

program runs in 100 seconds on a computer with multiply
operations responsible for 80 seconds of this time. How much do I
have to improve the speed of multiplication if I want my program
to run five times faster?
[5 Marks]
Ans.
Amdahl’s law derivation (2.5 Mark)
Speed up gained = 1/ ( 1- f e)+ f e / S e
As per Amdahl’s law,
Execution time after improvement= (100-80)+ 80/n (2.5 Mark)

Since I want the performance 5 times faster and the new execution time should be 20
seconds, and the equation is
20= 20+80/n
0= 80/n
There is no amount by which we can enhance multiply to achieve a fivefold increase in
Performance.
B. Differentiate between two bit perdition and one bit prediction CO – 3

schemes with example.
[5 Marks]
Ans. one bit prediction schemes with example (2.5 Mark)

 Use result from last time this instruction executed.
 Let initial value = NT, actual outcome of branches is- T, T, T, T, T, NT

 Predictions are:
 NT, T,T,T,T,T
 2 wrong (in red), 4 correct = 66% accuracy
Two bit perdition schemes with example. (2.5 Mark)

 Change prediction only if twice mispredicted
 Let initial value = NT, actual outcome of branches is- NT, T, NT, T
 Predictions are:
NT, NT, NT, NT
 2 wrong (in red), 2 correct = 50% accuracy
Q.No:4 A. Find out the total no of clock cycles required to execute the CO – 2
following instructions without and with operand forwarding?
LD R1, 0(R2)
DADDIU R1,R1,#1
SD R1, 0(R2)
DADDIU R2,R2,#4
DSUB R4,R3,R2
[5 Marks]
Ans.
Without operand forwarding = 16 CYCLES
1 2 3 4 5 6 7 8 9 10 11 13 14 15 16
LD IF ID E M W
DADD IF ID S S E M W
SD IF ID S S E M W
DADD IF ID E M W
DSUB IF ID S S E M W
With operand forwarding =10 CYCLES
1 2 3 4 5 6 7 8 9 10 11
LD IF ID E M W
DADD IF ID S E M W
SD IF ID E M W
DADD IF ID E M W
DSUB IF ID E M W
B. Explain different ways to schedule instruction in branch delay slot CO – 2

to prevent control hazard.
[5 Marks]
Ans. ( 5 Marks )
(i) From before

(ii) From target
(iii) From fall-through
Give examples from each
Q.No:5 A. A five stage pipeline processor has IF, ID, EXE, MEM, WB. The CO – 2
IF, ID, MEM, WB stages takes 1 clock cycles each for any
instruction. The EXE stage takes 1 clock cycle for LOAD, ADD &
SUB instructions, 2 clock cycles for MUL and DIV instructions
respectively.
Consider the following instructions:-
LOAD R3, 9(R2)
DIV R1, R3, R4
ADD R5, R1, R6
SUB R7, R1, R8
MUL R9, R1, R10
For the above sequence of instructions, find out total number of
clock cycles required to complete the execution, without operand
forwarding?
[5 Marks]
Ans.
Inst. \ Clock cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

LOAD R3, 9(R2) IF ID EXE MEM WB
DIV R1, R3, R4 IF STALL STALL ID EXE EXE MEM WB
ADD R5, R1, R6 IF STALL STALL STALL ID EXE MEM WB
SUB R7, R1, R8 IF ID EXE MEM WB
MUL R9, R1, R10 IF ID EXE EXE MEM WB
Total No. of clock cycle is 15 and 5 Stalls.
B. Explain the R, I and J MIPS instruction format with one example CO - 1

from each type.
[5 Marks]
ANS.
Q.No:6 A. “Amdahl’s Law Quantifies overall performance gain due to CO - 1
improve in a part of a computation.” -: Justify & prove the
statement for speed up overall.
Suppose that we are considering an enhancement to the processor of

a server system used for web serving. The new CPU is 10 times
faster on computation in the web serving application than the
original processor. Assuming that the original CPU is busy with
computation 40% of the time & is waiting for I/O 60% of the time.
What is the overall speed up gained by incorporating the
enhancement?
[5 Marks]
Ans.
Amdahl’s Law: (2.5 Marks)
– Performance improvement gained from using some faster mode of execution is
limited by the amount of time the enhancement is actually used
– FRACTION ENHANCED :-Fraction of the computation time in the original machine
that can use the enhancement
– It is always less than or equal to 1
– SPEEDUP ENHANCED:-Improvement gained by enhancement that is how much
faster the task would run if the enhanced mode is used for entire program.
– It is always greater than 1
Overall Speedup = 1 / [(1 - 0.4) + (0.4 / 10)] = 1.5625 (2.5 Marks)
B. In a five stage pipeline IF, ID, EX, MEM, WB; ADD, SUB and CO - 2
LOAD takes one clock cycle, MUL takes three clock cycles to
execute. Then for
ADD R2, R1, R0

LOAD R7, 10(R3)
MUL R4, R7, R2
SUB R6, R5, R4
Calculate Number of clock cycles required using operand

forwarding techniques.
[5 Marks]
Ans.
Inst. \ Clock cycle 1 2 3 4 5 6 7 8 9 10 11

ADD R2, R1, R0 IF ID EXE MEM WB
LOAD R7, 10(R3) IF ID EXE MEM WB
MUL R4, R7, R2 IF ID STALL EXE EXE EXE MEM WB
SUB R6, R5, R4 IF STALL ID STALL STALL EXE MEM WB
Total No. of clock cycle is 11 and 3 Stalls.
Controller of Examinations

High Performance Computing - CS 3010 - MID SEM Question by Subhasis Dash With Solution

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

High Performance Computing - CS 3010 - MID SEM Question by Subhasis Dash With Solution

Uploaded by

Copyright:

Available Formats

Sample Question Format

KIIT Deemed to be University

Subject Name & Code: High Performance Computing / (CS 3010)

Full Marks=20 Time:1 Hour

SECTION-A(Answer All Questions. All questions carry 2 Marks)

Time:20 Minutes (5×2=10 Marks)

Question Question Question Answer CO

Consider an instruction pipeline with

Ans. Speedup = (6+2+10+4) / (10+1) = 2

So Option B is correct (B) 2

SAT Question -2 on concept 1 CO - 2

Time required to perform instructions

Ans. Speedup = [(5+3+4+6+2)*200] / [(5+200-1)*(6+1)]=4000/1428= 2.8011

SAT Question -3 on concept 1 CO - 2

Consider an un-pipelined processor

Ans. CPI Un-pipelined = (10 x 0.7) + (20 x 0.3) = 13

SAT Question -4 on concept 1 CO - 2

A pipeline system affected by data

Ans:- Non pipeline – n X 40

Q.No:1(b) SAT Question -1 on concept 2 CO1-1,2

A program is executed for 1 sec, on a

SAT Question -2 on concept 2 CO1-1,2

The processor P1 having clock rate

SAT Question -3 on concept 2 CO1-1,2

Consider a program consists of 50

New CPI= (0.5×1.5+0.2×5+0.2×4)=2.55

SAT Question -4 on concept 2 CO1-1,2

The processor P1, P2 and P3 executes

Q.No:1(c) SAT Question -1 on concept 3 CO - 2

Given an un-pipelined processor with

SAT Question. -2 on concept 3 CO - 2

Ans. LOAD & ADD  (R1- True),

SAT Question -3 on concept 3 CO – 1,2

How many maximum clock cycles

Consider a pipeline processor with

 Speed up = No of stage of the Pipeline ( K ) / (1+ stall cycles per instruction)

Q.No:1(d) SAT Question -1 on concept 4 CO – 2

How instruction rescheduling /

SAT Question -2 on concept 4 CO – 2

What is pipeline interlock? Give an

Ans. LOAD R1, 10(R2)

SAT Question -3 on concept 4 CO – 2

Differentiate between predicted taken

SAT Question -4 on concept 4 CO - 2

What is structural hazard? What are

Techniques used to solve structure hazard (1 mark)

Q.No:1(e) SAT Question -1 on concept 5 CO - 1

Why the execution time of CISC

More addressing modes that affects instruction length.

MCQ Question -2 on concept 5 A CO – 2

The speed-up gained in the ideal

A) N is very much greater than K

Ans. A) N is very much greater than K

SAT Question -3 on concept 5 CO - 1

How many classes of Computers are

Array Processor  SIMD

SAT Question -4 on concept 5 CO – 2

“RAR is a hazard”. Justify.

SECTION-B(Answer Any One Question. Each Question carries 10 Marks)

Time: 30 Minutes (1×10=10 Marks)

CPI = 0.3 * 6 + 0.4 * 4 + 0.3 * 7 = 5.5 (2.5 Mark)

Fraction Enhanced = 40% = 0.4

B. Differentiate between array processor and multi processor with CO - 1

Ans. Speedup = [(5+3+4+6+2)200] / [(5+200-1)(6+1)]=4000/1428= 2.8011