You are on page 1of 17

TEST-1 SOLUTIONS

Subject: Advanced Computer Architecture


PART-1
Answer any one full question.
1) Give Flynns classification of various computer architectures. Clearly explain the features
of each with conceptual diagrams. (10 Marks)
Sol: ichael Flynns introduced a classification of various computer architectures !ased on
notions of instruction and data streams. "hey are
1. SISD #single instruction stream over a single data stream)
2. SIMD #single instruction stream over a multiple data stream)
3. MIMD #multiple instruction stream over a multiple data stream)
4. MISD #multiple instruction stream over a multiple data stream)
1. SISD #single instruction stream over a single data stream):
Conventional se$uential machines are called S%S& computers as shown in Fig 1a.
CU = control unit
PU = processing unit IS
MU = memory unit IS DS
IS = instruction stream
DS = data stream I/O
Fig 1a: S%S& 'niprocessor architecture
2. SIMD #single instruction stream over a multiple data stream):
(ector computers are e$uipped with scalar and vector hardware are called S%&
computers as shown in Fig 1!.
CU
PU MU
SIMD:
Program loaded DS DS
From Data sets loaded
Host IS IS from host
DS DS DS DS
)* + )rocessing *lements
, + ,ocal emory
Fig 1!: S%& architecture #with distri!uted memory)
3. MIMD #multiple instruction stream over a multiple data stream):
)arallel computers are reserved for %& machines is as shown in the Fig 1c.
IS
IS DS
I/O
I/O IS DS
IS
Fig 1c: %& architecture #with shared memory)
4. MISD #multiple instruction stream over a multiple data stream):
-n %S& machines are modeled in Fig 1d.
PE
1
LM
1
PE
n
LM
n
CU
PU
n
CU
n
CU
1
PU
1
Share
d
memo
ry
"he same data stream flows through an array of processors executing different
instruction streams.
IS IS

IS IS IS

DS DS DS DS DS DS
IS
I/O Fig 1d: %S& architecture #the systolic array)
.f the four machine models/ most parallel computers assumed %& model
for general0purpose computations.
"he S%& and %S& are more suita!le for special0purpose computations.
"herefore %& is the most popular model/ S%& next and %& is the
least popular model.
1) a) - 23 45 processor was supposed to execute 133333 instructions with following
instruction mix and C)% needed for each instruction
Instruction ty! "#I Instruction count
%nteger arithmetic 1 637
&ata transfer 2 187
Floating point 6 117
Control transfer 9 137
PU
2
CU
2

CU
1
PU
1
PU
n
CU
n
Memory
(Program
and
data)
&etermine the effective C)%/ %)S rate and execution time.
($ Marks)
Sol:
C)% + #no of cloc: cycles ; no of instructions)
*ffective C)% + #63;133<%c)<1 = #18;133<%c)<2 = #11;133<%c)<6 = #13;133<%c)<9
%c
E%%!cti&! "#I ' 3.14 c(ock cyc(!s)instruction.
%)S rate + %c ; #">13
6
)
+ %c ; %c < C)% < < 13
6

+ 1 ; ?.12<1;23<13
6
< 13
6

MI#S rat! ' 12.*3++ MI#S.
*xecution time:
" + %c < C)% <
+ 133 <13
?
< ?.12 < 1;23<13
6

T ' 1$.*,s!c.
2) !) &ifferentiate !etween implicit and explicit parallelism with a neat s:etch.
($ Marks)
Sol: I,(icit ara((!(is,-
-n implicit approach uses a conventional language/ such as C/ Fortran/ ,ips or )ascal
to write the source program.
"he se$uentially coded source program is translated into parallel o!@ect code !y a
paralleli5ing compiler.
-s shown in Fig 9 #a)/ this compiler must !e a!le to detect parallelism and assign
target machine resources.
"his compiler approach has !een applied in programming shared memory
multiprocessors.
"his approach re$uires less effort on the part of the programmer.
Programmer
Source code written in
sequential languages !
"ortran #i$s or Pascal
Paralleli%ing
com$iler
Parallel o&'ect
code
()ecution &y routine
system
Fig 9 #a): %mplicit )arallelism
E.(icit ara((!(is,-
"his approach as shown in Fig 9 #!) re$uires more effort !y the programmer to
develop a source program.
)arallelism is explicitly specified in the user program.
"his will significantly reduce the !urden on the compiler to detect parallelism.
%nstead the compiler needs to preserve parallelism and where possi!le/ assigns target
machine resources.

Fig 9 #!) : *xplicit )arallelism
PART-*
Programmer
Source code written in
concurrent dialects of !
"ortran #i$s or Pascal
!oncurrency
$reser+ing
com$iler
!oncurrent o&'ect
code
()ecution &y routine
system
3) *xplain '- and A'- odel of Shared0emory ultiprocessors with a neat
diagram.

(10 Marks)
Soln
"he multiprocessor parallel models are
i) 'niform memory access model B'-C.
ii) Aon0uniform memory access model BA'-C.
i) Uni%or, ,!,ory acc!ss ,o/!( 0UM12-
%n this model physical memory is uniformly shared !y all processors.
-ll processors have e$ual access time to all memory words.
*ach processor uses a private cache.
ultiprocessors are tightly coupled systems due to high degree of resource sharing.
"he system interconnect ta:es the form of a common !us/ a cross!ar switch or a
multistage networ:.
'- model is suita!le for general purpose/ time sharing application !y multiple
users.
Coordination of parallel events/ synchroni5ation and communication among
processors are done through shared varia!les.
%n this type of architecture when all the processors have e$ual access time to all the
peripherals/ the system is said to !e symmetric multiprocessor.
%n this case all the processors e$ually capa!le of running the executive programs.
%n an asymmetric multiprocessor/ only one or a su!set of processors are executive
capa!le.
"he remaining processors have no %;. capa!ility and thus are called attached
processors.
-n executive or a master processor can execute the .S and handle %;..
-ttached processors execute user codes under the supervision of the master processor.
Processors

P
1
P
*
$
n
System interconnect (,us cross&ar
multistage networ-s)
./0 SM
1
SM
n

Shared memory
Fig 1: "he '- multiprocessor model.
ii) Non-Uni%or, ,!,ory acc!ss ,o/!( 0NUM12-
- A'- multiprocessor is a shared memory system in which the access time varies
with the location of the memory word.
"wo A'- machine models are as shown in Fig? #a) D #!).
"he shared memory is physically distri!uted to all processors/ called local memories.
"he collection of all local memories forms a glo!al address space accessi!le !y
processors.
%t is faster to access a local memory with a local processor. "he access of remote
memory attached to other processors ta:es longer due to the added delay through the
interconnected networ:.
Eesides distri!uted memories/ glo!ally shared memory can !e added to a
multiprocessor system.
%n this case there are three memory access patterns. "hey are
a. ,ocal memory access #fastest).
!. Glo!al memory access.
c. Femote memory access #slowest).
%n this model processors are divided into several clusters.
*ach cluster is itself an '- or an A'- microprocessor.
"he clusters are connected to glo!al shared memory modules. "he entire system is
considered a A'- multiprocessor.
-ll processors !elonging to the same clusters are allowed to uniformly access the
cluster shared memory modules. -ll clusters have e$ual access to the glo!al memory.
"he access time to the cluster memory is shorter than that to the glo!al memory.
a. Shared local memories
#M
1
#M
*
#M
n
P
1
P
*
P
n
.nter
conne
ction
netwo
r-
: :

b. - hierarchical cluster model


: : : :
"(ust!r1 "(ust!r N
Fig ?: "wo A'- models for multiprocessor systems.

1ns3!r any t3o %u(( 4u!stions.
4) *xplain the architecture of vector super computer with a neat diagram.
Sol:
"he architecture of vector super computer is as shown in the Fig2
(ector processor is !uilt on top of the scalar processor.
"he vector processor is attached to the scalar processor as an optional feature.
)rogram and data are first loaded into the main memory through a host computer.
-ll instructions are first decoded !y the scalar control unit.
%f the decoded instruction is the scalar operation or a program control operation/ it
will !e directly executed !y the scalar processor using the scalar functional
pipelines.
%f the instruction is decoded as a vector operation/ it will !e sent to the vector
control unit. "his control unit will supervise the flow of vector data !etween the
P
P
P
CSM
CSM
CSM
GSM GSM

GSM
C
I
N
P
P
P
CSM
CSM
CSM
C
I
N
1lo&al interconnect networ-
main memory and the vector functional pipelines. "he vector data flow is
coordinated !y the control unit. - num!er of vector functional pipelines may !e
!uilt into a vector processor.
%n vector super computer/ there will !e a vector processor and it can !e !uilt on two
architectures/ namely
1. Fegister0to0register architecture
1. emory0to0memory architecture
5!6ist!r-to-r!6ist!r arc7it!ctur!-
4ere vector registers are used to hold the vector operands/ intermediate and
final vector results.
"he vector functional pipelines retrieve operands from and put results into the
vector registers.
-ll vector registers are programma!le in user instructions.
*ach vector register is e$uipped with a component counter which :eeps trac:
of the component registers used in successive pipeline cycle.
%n general/ there are fixed num!er of vector registers and functional pipelines
in a vector processor.
M!,ory-to-,!,ory arc7it!ctur!-
%n this architecture/ the vector operands and intermediate results are directly copied
into the memory and they are retrieved as and when it is re$uired from the memory.


Scalar
processor
Scalar
instructions
Vector processor
Main memory
(program an
ata)
!ost
comp
Mass
storage

Instructions
Scalar ector
Data Data
I/O !user"
Fig 2: "he architecture of vector supercomputer.
9) a) *xplain different types of data dependency with an example
!) &raw the data dependency graph for the following.
S1: ,oad F1/ #133)
S1: ove F1/ F1
S?: %nc F1
S2: -dd F1/ F1
S9: Store #133)/ F1 ($8$Marks)
Sol:
a) "here are 9 types of data dependencies. "hey are as follows:
#1) Flow dependence:
- statement S1 is flow0dependent on the statement S1 if an execution path exists
from S1 to s1 and if at least one output of S1 feeds in as input to S1.
*x: S1: load F1/ -
S1: -dd F1/ F1
#1) -nti dependence:
Statement S1 is anti dependent on statement S1 if S1 follows S1 in program order
and if the output of S1 overlaps the input to S1.
*x: S1: add F1/ F1
S1: move F1/ F?
S
1
S
2
S
1
S
1
#?) .utput dependence:
"wo statements are output dependent if they produce the same output varia!le.
*x: S1: load F1/ -
S1: move F1/ F?
#2) %;. dependence:
Fead and write are %;. statements. %;. dependence occurs not !ecause the
same varia!le is involved !ut !ecause the same file is referenced !y !oth %;.
statements.
#9) 'n:nown dependence:
"he dependence relation !etween two statements cannot !e determined in the
following situations.
"he su!script of a varia!le itself su!scri!ed.
"he su!script does not contain the loop index varia!le.
- varia!le appears more than once with su!scripts having different coefficients of
the loop varia!le.
"he su!script is nonlinear in the loop index varia!le.
Ghen one or more of these conditions exists/ a conservative assumption is to
claim un:nown dependence among the statements involved.
!) &raw the data dependency graph for the following.
S1: ,oad F1/ #133)
S1: ove F1/ F1
S?: %nc F1
S2: -dd F1/ F1
S9: Store #133)/ F1
Sol: "he data dependence graph is as shown !elow.
S
1
S
1
S
1
S
2
S
"
S
#
#15T3
1ns3!r any T3o %u(( 4u!stions.
6) "race out the following program to detect the parallelism using Eernsteins conditions
)1: C + & x *
)1: + G = C
)?: - + E = C
)2: C + , =
)9: F + G ) *
-ssume that each step re$uires one cycle to execute and two adders are availa!le.
Compare !etween serial and parallel execution of the a!ove program (10 Marks)
Sol: Eernstein revealed a set of conditions !ased on which two processes can execute in
parallel.
)1/ )1 0 process
%1/ %1 0 inputs
.1/ .1 00 outputs
)1 HH )1 if and only if
%1 .1 +
%1 .1 +
.1 .1 +
)1 HH )1 HH . . . . HH ): if and only if
)i HH )@ if i I @
D D E
E
G C B G E
#
$
1
%
$1 $2 &
S
'
G
B L M
A
L C A F
C
G
E
F
Fig#a): Se$uential execution in 9 steps Fig #!): )arallel execution in ?steps
)1 HH )9/ )1 HH )?/ )1 HH )9/ )2 HH )9/ )9 HH )?
Collectively )1 HH )? HH )9 Eecause )1 HH )?/ )1 HH )9/ )? HH )9
J) *xplain hardware and software parallelism with an example. (10Marks)
So(-
9ar/3ar! ara((!(is,-
"his refers to parallelism defined !y machine architecture and hardware multiplicity.
.ne way to characteri5e the parallelism is !y the num!er of instruction issues per
machine cycle.
%f a processor issues :0instructions per machine cycle/ then it is called :0issue processor.
- conventional processor ta:es one or more machine cycles to issue a single instruction.
"hese are called one issue machine with single instruction pipeline in the processor.
- multiprocessor system !uilt with n :0issue processor should !e a!le to handle a
maximum n: thread of instructions simultaneously.
So%t3ar! ara((!(is,-
%t is defined !y the control and data dependences of programs.
"he degree of parallelism is revealed in the program profile or in the program flow graph.
$
2
$
"
&
$"
Software parallelism can !e achieved !y algorithms/ programming style and compiler
optimi5ation.
)arallelism in a program varies during execution period.
Control parallelism:
%t is a :ind of software parallelism. "his appears in the form of pipelining or multiple
functional units. Eut !oth pipelining and functional parallelisms are handled !y the hardware.
So while programming/ programmer has to ta:e special actions to invo:e them.
&ata parallelism:
%t offers the highest potential for concurrency.
%t is practiced in !oth S%& and %& modes on )) system.
&ata parallel code is easier to write and to de!ug than control parallel code.
Synchroni5ation in S%& data parallelism is handled !y the hardware.
&ata parallelism exploits parallelism in proportion to the $uantity of data involved.
-ssuming two multiplier units and two add;su!tract units calculate average software
parallelism.
-ssuming two multiplier units and two add;su!tract units and 10issue processor in which
one memory access #load;store) and one arithmetic operation can execute simultaneously.
Calculate average hardware parallelism.
10cycle/ 20operations
10cycle/ 1-operations

10cycle/ 10operations
1 :
Fig#a): Software parallelism
s;w parallelism + 8;? + 1.6J instructions per cycle.
Cycle 1
Cycle 1
Cycle 1
L
1
L
2
L
"
L
'
% %
$ $
L1
L2
%
1
L
"

Cycle 1
Cycle 1

1 Cycle 1
Fig #!) 4ardware parallelism Cycle 1
:
J0cycles and 80operations
4;w parallelism + 8;J + 1.12 instruction;cycle
8) *xplain how grain pac:ing can !e done to compute the sum of the 2 elements in the
resulting product matrix C + - x E Ghere - and E are 1x1 atrices. -ssume grain si5e
for multiplication is 131 and the grain si5e for addition is 8.
(10Marks)
So(- A B A B

is ' 101 is ' +

C + - K E
- + -11 -11 E + E11 E11 C + C11 C11
-11 -11 E11 -11 C11 C11
C
$$
= %
$$
#&
$$
' %
$(
#&
$(
C
$(
= %
$$
#&
$(
' %
$(
#&
((
C
($
= %
($
#&
$$
' %
((
#&
($
C
((
= %
($
#&
$(
' %
((
#&
((
SUM) C = C
$$
' C
$(
' C
($
' C
((
% %
L
'
%
2
$
(
Fine grain graph:
% & C D * F + H
, - . M
/ O
P
SUM
Coarse grain graph:



SUM
Grain si5e of ' + 113
Grain si5e of ( + 113
Grain si5e of G + 113
% % % % % % % %
$
$ $
$
$
$
$
U
V )
*
+
Grain si5e of K + 113
Grain si5e of L + 12

****