You are on page 1of 11

5 Marks 

Q. Describe Array Processor Architecture. 


● Array processor is a machine based on synchronous parallel computing model consists of several Processing Elements 
(PEs), all of which execute the same instruction on different data.  
● The synchronous parallel computer consists of array processors with multiple ALU called Processing element(PE) that 
operate in parallel 
● Each Processing Element is equipped with register and local memory 
● All PEs are synchronized and connected by an Interconnection network to perform same function at the same time. 
● Array processors increases the overall instruction processing speed. 
● As most of the Array processors operates asynchronously from the host CPU, hence it improves the overall capacity 
of the system. 
● Array Processors has its own local memory, hence providing extra memory for systems with low memory. 
● The three major components of an array structure are the array units, the memory they access, and the connections 
between the two. 
● The main memory is used for storing the program. The control unit is responsible for fetching the instructions. 
Vector instructions are send to all PE's simultaneously and results are returned to the memory. 

 
● The instructions are fetched and broadcasted to all PE’s by common control unit.   
● These machines work on different data elements residing in their local memory 

 
● The best known SIMD array processor is the ILLIAC IV computer developed by the Burroughs corps. SIMD 
processors are highly specialized computers. They are only suitable for numerical problems that can be expressed in 
vector or matrix form and they are not suitable for other types of computations. 
 
Q. Explain Instruction and Arithmetic Pipeline. 
Pipelining is the process of accumulating instruction from the processor through a pipeline. It allows storing and 
executing instructions in an orderly process. It is also known as pipeline processing. Pipelining is a technique where 
multiple instructions are overlapped during execution. Pipeline is divided into stages and these stages are connected with 
one another to form a pipe like structure. Instructions enter from one end and exit from another end. Pipelining 
increases the overall instruction throughput. 

Types of Pipeline: 
Arithmetic Pipeline: ​Arithmetic pipelines are usually found in most of the computers. They are used for floating point 
operations, multiplication of fixed point numbers etc. For example: The input to the Floating Point Adder pipeline is: 
X = A*2^a 
Y = B*2^b 
Here A and B are mantissas (significant digit of floating point numbers), while a and b are exponents. 
The floating point addition and subtraction is done in 4 parts: 
1. Compare the exponents. 
2. Align the mantissas. 
3. Add or subtract mantissas 
4. Produce the result. 
Registers are used for storing the intermediate results between the above operations. 
 
Instruction Pipeline: ​In this a stream of instructions can be executed by overlapping ​fetch​, ​decode​ and ​execute​ phases 
of an instruction cycle. This type of technique is used to increase the throughput of the computer system. An instruction 
pipeline reads instruction from the memory while previous instructions are being executed in other segments of the 
pipeline. Thus we can execute multiple instructions simultaneously. The pipeline will be more efficient if the instruction 
cycle is divided into segments of equal duration. The process of executing an instruction involves following four steps 
1. Instruction fetch: Fetches instruction from Main memory 
2. Instruction Decoding: Determines opcode and operand 
3. Operand Fetch: Fetches operand from Main Memory 
4. Execution: Perform indicated operation and store result 

 
 
Q. Explain Flynn's Classification for parallel architecture. 
Flynn's taxonomy distinguishes multi-processor computer architectures according to how they can be classified along the 
two independent dimensions of Instruction Stream and Data Stream. Each of these dimensions can have only one of two 
possible states: Single or Multiple. 
SISD- (Single instruction single Data) 
● A serial (non-parallel) computer 
● Single Instruction: Only one instruction stream is being acted on by the CPU during any one clock cycle 
● Single Data: Only one data stream is being used as input during any one clock cycle 
● Deterministic execution 
● This is the oldest type of computer 
● Examples: older generation mainframes, minicomputers, workstations and single processor/core PCs. 

   
Single Instruction, Multiple Data (SIMD): 
● Single Instruction: All processing units execute the same instruction at any given clock cycle 
● Multiple Data: Each processing unit can operate on a different data element 
● Best suited for specialized problems characterized by a high degree of regularity, such as graphics/image processing. 
● Synchronous (lockstep) and deterministic execution 
● Two varieties: Processor Arrays and Vector Pipelines 
Examples: 
Processor Arrays: Thinking Machines CM-2 
Vector Pipelines: IBM 9000 
Most modern computers, particularly those with graphics processor units (GPUs) employ SIMD instructions and execution 
units. 

   
Multiple Instruction, Multiple Data (MIMD): 
● Multiple Instruction: Every processor may be executing a different instruction stream 
● Multiple Data: Every processor may be working with a different data stream 
● Execution can be synchronous or asynchronous, deterministic or non-deterministic 
● Currently, the most common type of parallel computer - most modern supercomputers fall into this category. 
● Examples: most current supercomputers, networked parallel computer clusters and "grids", multiprocessor SMP 
computers, multi-core PCs. 
● Note: many MIMD architectures also include SIMD execution sub-components 

 
Multiple Instruction, Single Data (MISD): 
● Multiple Instruction: Each processing unit operates on the data independently via separate instruction streams. 
● Single Data: A single data stream is fed into multiple processing units. 
● Few (if any) actual examples of this class of parallel computer have ever existed. 
Some conceivable uses might be: 
● multiple frequency filters operating on a single signal stream 
● multiple cryptography algorithms attempting to crack a single coded message. 

  
 
 
Q. Discuss the impact of pipeline hazards on the performance of pipeline processor. 
1. Pipeline hazards are situations that prevent the next instruction in the instruction stream from executing during its 
designated clock cycles. 
2. Any condition that causes a stall in the pipeline operations can be called a hazard. 
3. There are primarily three types of hazards: 
i. Data Hazards​: A data hazard is any condition in which either the source or the destination operands of an instruction 
are not available at the time expected in the pipeline. As a result of which some operation has to be delayed and the 
pipeline stalls. Whenever there are two instructions one of which depends on the data obtained from the other. 
A=3+A 
B=A*4 
For the above sequence, the second instruction needs the value of ‘A’ computed in the first instruction. 
Thus the second instruction is said to depend on the first. If the execution is done in a pipelined processor, it is highly 
likely that the interleaving of these two instructions can lead to incorrect results due to data dependency between the 
instructions. Thus the pipeline needs to be stalled as and when necessary to avoid errors. 
ii. Structural Hazards: ​This situation arises mainly when two instructions require a given hardware resource at the same 
time and hence for one of the instructions the pipeline needs to be stalled. The most common case is when memory is 
accessed at the same time by two instructions. One instruction may need to access the memory as part of the Execute or 
Write back phase while other instruction is being fetched. In this case if both the instructions and data reside in the 
same memory. Both the instructions can’t proceed together and one of them needs to be stalled till the other is done 
with the memory access part. Thus in general sufficient hardware resources are needed for avoiding structural hazards. 
iii. Control hazards:​ The instruction fetch unit of the CPU is responsible for providing a stream of instructions to the 
execution unit. The instructions fetched by the fetch unit are in consecutive memory locations and they are executed. 
However the problem arises when one of the instructions is a branching instruction to some other memory location. Thus 
all the instruction fetched in the pipeline from consecutive memory locations are invalid now and need to removed(also 
called flushing of the pipeline).This induces a stall till new instructions are again fetched from the memory address 
specified in the branch instruction. Thus the time lost as a result of this is called a branch penalty. Often dedicated 
hardware is incorporated in the fetch unit to identify branch instructions and compute branch addresses as soon as 
possible and reducing the resulting delay as a result. 
 
Q. Explain desirable features of global scheduling algorithm.  
1. No Priori Knowledge about Processes: ​Scheduling algorithms that operate based on the information about the 
characteristics and resource requirements of the processes pose an extra burden on the users who must provide this 
information while submitting their processes for execution. 
2. Dynamic in nature: ​Process assignment decisions should be dynamic, I.e., be based on the current load of the system 
and not on some static policy. It is recommended that the scheduling algorithm possess the flexibility to migrate a 
process more than once because the initial decision of placing a process on a particular node may have to be changed 
after some time to adapt to the new system load. 
3. Quick decision making capability:​Heuristic methods requiring less computational efforts (and hence less time) while 
providing near-optimal results are preferable to exhaustive (optimal) solution methods. 
4. Balanced system performance and scheduling overhead:​ Great amount of information gives more intelligent 
decision, but increases overhead. Need to improve the system performance by minimizing scheduling overhead 
5. Stability:​ Unstable when all processes are migrating without accomplishing any useful work. It occurs when the nodes 
turn from lightly-loaded to heavily-loaded state or vice versa. 
6. Scalability:​ A scheduling algorithm should be capable of handling small as well as large networks. As number of nodes 
increases, the network traffic consumes Network Bandwidth.  
7. Fault tolerance: ​Should be capable of working after the crash of one or more nodes of the system. Algorithm should 
decentralize the decision making capability. Also considers only available nodes. ​Also, if the nodes are partitioned into 
two or more groups due to link failures, the algorithm should be capable of functioning properly for the nodes within a 
group. 
8. Fairness of Service: ​Global scheduling policies that blindly attempt to balance the load on all the nodes of the 
system are not good from the point of view of fairness of service. This is because in any load-balancing scheme, 
heavily loaded nodes will obtain all the benefits while lightly loaded nodes will suffer poorer response time than in a 
stand-alone configuration. A fair strategy that improves response time of the former without unduly affecting the 
latter is desirable. 
9. Load Sharing approach: ​More users initiating equivalent processes expect to receive the same quality of service. 
Load-balancing has to be replaced by the concept of load sharing, that is, a node will share some of its resources as 
long as its users are not significantly affected. 
 
Q. Discuss process scheduling in detail.  
In computing, a process is an instance of a computer program that is being executed. It contains the program code and 
its current activity. ​The act of determining which process is in the ready state and should be moved to the running state 
is known as Process Scheduling. Distributed systems contain a set of resources interconnected by a network. Resources 
can be logical (shared file) or physical (CPU). The functionality of DS is to assign process to nodes nothing but the 
resources of DS such that Resource usage, Response time, Scheduling overhead and Network congestion is optimized. 
Types of process scheduling techniques 
Task assignment approach 
Main assumptions 
Processes have been split into tasks 
Computation requirement of tasks and speed of processors are known 
Cost of processing tasks on nodes are known 
Communication cost between every pair of tasks are known 
Resource requirements and available resources on node are known 
Reassignment of tasks are not possible 
Basic idea:​ Finding an optimal assignment to achieve goals such as the following: Minimization of IPC costs, Quick 
turnaround time of process, High degree of parallelism and Efficient utilization of resources 
In case of m tasks and q nodes, possible assignments of tasks to nodes m​q​. In practice, however, the actual number of 
possible assignments of tasks to nodes may be less than m​q​ due to the restriction that Certain tasks cannot be assigned 
to certain nodes due to their specific requirements (e.g. need a certain amount of memory or a certain data 
Example: two nodes {n1,n2}, six tasks {t1, t2, t3, t4, t5, t6} 
Task execution cost (xab the cost of executing task a on node b) 
Inter-task communication cost (cij the inter-task communication cost between tasks i and j). 
Inter-task communication cost  Execution costs 
t1 t2 t3 t4 t5 t6  n1  n2 
t1 0 6 4 0 0 12 t1 5 10 
t2 6 0 8 12 3 0  t2 2  
t3 4 8 0 0 11 0  t3 4 4 
t4 0 12 0 0 5 0  t4 6 3 
t5 0 3 11 5 0 0  t5 5 2 
t6 12 0 0 0 0 0  t6  4 
Task t6 cannot be executed on node n1 and task t2 cannot be executed on node n2 since the resources they need are not 
available on these nodes. 
Serial Assignment: ​Tasks t1, t2, t3 are assigned to node n1. Tasks t4, t5, t6 are assigned to node n2 
Execution cost, x = x11 + x21 + x31 + x42 + x52 + x62 = 5 + 2 + 4 + 3 + 2 + 4 = 20 
Communication cost, c = c14 + c15 + c16 + c24 +C25+C26+C34+ C35+C36= 0 + 0 + 12 + 12 + 3 + 0 + 0 + 11 + 0 = 38.  
Hence total cost = 58. 
Optimal Assignment: ​Tasks t1, t2, t3, t4, t5 are assigned to node n1. Task t6 is assigned to node n2. 
Execution cost, x = x11 + x21 + x31 + x41 + x51 + x62= 5 + 2 + 4 + 6 + 5 + 4 = 26 
Communication cost, c = c16 + c26 + c36 + c46 + c56= 12 + 0 + 0 + 0 + 0 = 12 
Total cost = 38 
 
Load-balancing approach 
Tasks are distributed among nodes so as to equalize the workload of nodes of the system 

 
Load-sharing approach 
Scheduling of tasks in a load sharing distributed system involves deciding not only when to execute a process, but 
also where to execute it. Two components: 
1) The allocator decides where a job will execute 
2) The scheduler decides when a job gets its share of the CPU. 
It is necessary and sufficient to prevent nodes from being idle while some other nodes have more than two processes. 
Load-sharing is much simpler than load-balancing since it only attempts to ensure that no node is idle when heavily node 
exists. Priority assignment policy and migration limiting policy are the same as that for the load-balancing algorithms 

Q. Compare and contrast static and dynamic load balancing. 


Tasks are distributed among nodes so as to equalize the workload of nodes of the system 

 
Static versus Dynamic: 
● Static algorithms use only information about the average behavior of the system 
● Static algorithms ignore the current state or load of the nodes in the system 
● Static algorithms are much more simpler 
● The goal of static scheduling methods is to minimize the overall execution time of a concurrent program while 
minimizing the communication delays 
● Dynamic algorithms collect state information and react to system state if it changed 
● Dynamic algorithms are able to give significantly better performance 
 
Type of static load-balancing algorithms: ​Deterministic versus Probabilistic 
● Deterministic algorithms use the information about the properties of the nodes and the characteristic of processes 
to be scheduled 
● Deterministic approach is difficult to optimize 
● Probabilistic algorithms use information of static attributes of the system (e.g. number of nodes, processing 
capability, topology) to formulate simple process placement rules 
● Probabilistic approach has poor performance 
 
Type of dynamic load-balancing algorithms: ​Centralized versus Distributed 
● Centralized approach collects information to server node and makes assignment decision 
● Centralized algorithms can make efficient decisions, have lower fault-tolerance 
● Distributed approach contains entities to make decisions on a predefined set of nodes 
● Distributed algorithms avoid the bottleneck of collecting state information and react faster 
 
Type of distributed load-balancing algorithms: ​Cooperative versus Non cooperative 
● In Noncooperative algorithms entities act as autonomous ones and make scheduling decisions independently from 
other entities 
● In Cooperative algorithms distributed entities cooperate with each other 
● Cooperative algorithms are more complex and involve larger overhead 
● Stability of Cooperative algorithms are better 
 
Q. What do you mean by threads? How are they implemented in distributed systems.  
● Thread is a lightweight process. 
● The analogy: thread is to process as process is to machine. 
● Each thread runs strictly sequentially and has its own program counter and stack to keep track of where it is. 
● Threads share the CPU just as processes do: first one thread runs, then another does. 
● Threads can create child threads and can block waiting for system calls to complete. 
● All threads have exactly the same address space. 
● They share code section, data section, and OS resources (open files & signals). 
● They share the same global variables. One thread can read, write, or even completely wipe out another thread’s stack. 
● Threads can be in any one of several states: running, blocked, ready, or terminated. 
 
Advantages of using threads 
1. Useful for clients: if a client wants a file to be replicated on multiple servers, it can have one thread talk to each 
server. 
2. Producer-consumer problems are easier to implement using threads because threads can share a common buffer. 
3. It is possible for threads in a single address space to run in parallel, on different CPUs. 
 
Threads Implementation in DS: 
● Organization of file Server 
● File server wait for the incoming request 
● Processes the request by fetching required data 
● Sent result back 
● The requests are sent by clients to a well-known endpoint for this server. 

 
● Dispatcher, reads incoming requests for a file operation. 
● After examining the request, the server chooses an idle (i.e., blocked) worker thread and hands it the request. 
● The worker proceeds by performing a blocking read on the local file system, which may cause the thread to be 
suspended until the data are fetched from disk. 
● If the thread is suspended, another thread is selected to be executed. 
● For example,the dispatcher may be selected to acquire more work. 
● Another worker thread can be selected that is now ready to run. 
 
Q. Explain code migration techniques with an example. 
Code migration in distributed systems took place in the form of process migration in which an entire process was moved 
from one machine to another. Moving a running process to a different machine is a costly and better with respect to 
performance. The basic idea is that overall system performance can be improved if processes are moved from 
heavily-loaded to lightly-loaded machines. Process consists of three segments. 
Code segment:​ Contains the set of instructions that make up the program that is being executed. 
Resource segment​: Contains references to external resources needed by the process, such as files, printers, devices, 
other processes, and so on. 
Execution segment: ​ Used to store the current execution state of a process, consisting of private data, the stack, and, 
of course, the program counter. 
 
 
Approaches to Code Migration 
1. Migrating parts of the client to the server:​ Consider, as an example, a client-server system in which the server 
manages a huge database. A client application needs to perform many database operations involving large quantities of 
data. Network may be swamped with the transfer of data from the server to the client. Better to ship part of the 
client application to the server and send only the results across the network. Code migration is based on the 
assumption that it generally makes sense to process data close to where those data reside. 
2. Migrating parts of the server to the client:​ For example, in many interactive database applications, clients need to 
fill in forms that are subsequently translated into a series of database operations. Processing the form at the client 
side, and sending only the completed form to the server, sometimes avoid that a relatively large number of small 
messages need to cross the network. The result is that the client perceives better performance, while at the same 
time the server spends less time on form processing and communication. 
3. Code migration can also help improve performance by exploiting parallelism:​ Example is searching for information 
in the Web. Implement a search query in the form of a small mobile program, called a mobile agent. MA moves from 
site to site. MA makes several copies of such a program, and sending each off to different sites. We achieve a linear 
speedup compared to using just a single program instance. 
4. Other reason is that of flexibility:​ The traditional approach to building distributed applications is to partition the 
application into different parts, and decide in advance where each part should be executed. If code can move 
between different machines, it becomes possible to dynamically configure distributed systems. Clients need not have 
all the software preinstalled to talk to server 
 
Models of code migration: 
Weak mobility model: ​Possible to transfer only the code segment, along with perhaps some initialization data the 
program counter. Weak mobility requires only that the target machine can execute that code and makes the code 
portable. It is important to know that migrated code is executed by the target process, or whether a separate process is 
started. The benefit of this approach is its simplicity. Example, Java applets are simply downloaded by a Web browser 
and are executed in the browser's address space. 
Strong mobility model: ​Possible to transfer the code segment, along with the execution segment. The characteristic 
feature of strong mobility is that a running process can be stopped, subsequently moved to another machine, and then 
resume execution where it left off. Strong mobility can also be supported by remote cloning. Cloning yields an exact copy 
of the original process, but now running on a different machine. The cloned process is executed in parallel to the original 
process. Harder to implement. 
Sender-​initiated Migration: Migration is initiated at the machine where the code currently resides or is being 
executed. 
Receiver-initiated Migration: ​In receiver-initiated migration, the initiative for code migration is taken by the 
target machine.Simpler than sender-initiated migration 
 
Q. Design a pipelined processor architecture for multiplication of two 6-digit fixed numbers. 
Arithmetic pipeline divides an arithmetic operation into sub-operations for execution in the pipeline segments, it's used 
to implement the floating point operation , multiplication of fixed point numbers and similar computations encountered in 
scientific problems. For implementing the arithmetic pipelines we generally use following two types of adder:  
i) Carry propagation adder (CPA):​ It adds two numbers such that carries generated in successive digits are propagated.  
ii) Carry save adder (CSA):​ It adds two numbers such that carries generated are not propagated rather these are 
saved in a carry vector.  
Two fixed-point numbers are added by the ALU using add and shift operations. This sequential execution makes the 
multiplication a slow process. If we look at the multiplication process carefully, then we observe that this is the process 
of adding the multiple copies of shifted multiplicands as show below: 

 
Now, we can identify the following stages for the pipeline:  
• The first stage generates the partial product of the numbers, which form the six rows of shifted multiplicands.  
• In the second stage, the six numbers are given to the two CSAs merging into four numbers.  
• In the third stage, there is a single CSA merging the numbers into 3 numbers.  
• In the fourth stage, there is a single number merging three numbers into 2 numbers.  
• In the fifth stage, the last two numbers are added through a CPA to get the final product.  
These stages have been implemented using CSA tree as shown: 
 
 
2 Marks 
Q. Difference between cluster computing and grid computing. 
Cluster Computing  Grid Computing 

Homogenous   Heterogeneous  

Tightly coupled systems  Loosely coupled 

Diversity and Dynamism  Single system image 

Distributed Job Management & scheduling  Centralized Job management & scheduling system 

The cluster computers all have the same hardware and OS  The computers that are part of a grid can run different 
operating systems and have different hardware  

The machines in a cluster are dedicated to work as a  Make use of spare computing power  
single unit and nothing else  

The whole system (all nodes) behave like a single system  Every node is autonomous i.e. it has its own resource 
view and resources are managed by centralized resource  manager and behaves like an independent entity. 
manager 

The computers in the cluster are normally contained in a  Grid are inherently distributed by its nature over a LAN, 
single location or complex.  metropolitan or WAN 
 
Q. Design Issues of Parallel Computing. 
● Requires complex hardware 
● Expensive than serial computing 
● No task can be perfectly parallelizable, so shared resources have to be used serially 
● Task inter-dependencies must be considered before design 
● Communication overhead exists as multiple 
● processors are involved in computing 
 
Q. List and define various performance metrics for parallel Computers. 
There are a number of metrics, the best known are:  
Speedup: ​Speedup is a measure of performance. It measures the ratio between the sequential execution time and the 
parallel execution time. 
Efficiency: ​Efficiency is a measure of the usage of the computational resources. It measures the ratio between 
performance and the resources used to achieve that performance​. 
Redundancy: ​Redundancy measures the increase in the required computation when using more processors. It measures 
the ratio between the number of operations performed by the parallel execution and by the sequential execution.  
Utilization: ​Utilization is a measure of the good use of the computational capacity. It measures the ratio between the 
computational capacity utilized during execution and the capacity that was available.  
Quality: ​Quality is a measure of the relevancy of using parallel computing 
 
Q. Define:(a) Task Parallelism (b) Data Parallelism (c) Hybrid Parallelism. 
Data parallelism: ​Perform the same operation to different items of data at the same time; the parallelism grows with 
the size of the data. Example: convert all characters in an array to upper-case. Data parallelism facilitates very high 
speedups; and scaling to supercomputers.  
Task parallelism: ​Perform distinct computations or tasks at the same time on same data with the number of tasks 
fixed, the parallelism is not scalable. Example – Several functions on the same data: average, minimum, binary or, 
geometric mean. No dependencies between the tasks, so all can run in parallel  
Hybrid data/task parallelism: ​A parallel pipeline of tasks, each of which might be data parallel  
 
Q. Define various pipelining performance measures. 
Speed up:​ Speedup is a measure of performance. It measures the ratio between the sequential execution time and the 
parallel execution time.  
speedup=(Time taken by non pipelined implementation)/(Time taken by pipelined implementation) 
s(n)= (m*n*t)/((n+m-1)*t) = (m*n)/(n+m-1) 
 
Throughput: ​Throughput is the outputs produced per clock cycle and that throughput will be equal to 1, in case of ideal 
situation that means, when the pipeline is producing one output per clock cycle. Formula is U(n)= m*f / n+(m-1) 
 
Efficiency: ​Efficiency is a measure of the usage of the computational resources. It measures the ratio between 
performance and the resources used to achieve that performance​. ​The efficiency of n stages in a pipeline is defined as 
ratio of the actual speedup to the maximum speed. Formula is E(n)= m / n+m-1 
 
Q. Discuss importance of load balancing. 
● Load balancing is the process of distribution or redistribution of load among processor therefore improving the 
performance of the system.   
● Load balancing aims to optimize resource use, maximize throughput, minimize response time, and avoid overload of any 
single resource.  
● A distributed system contains number of processors working independently with each other and linked by 
communication channel. Some are not linked with any communication channel.  
● Each processor has an initial load that is the amount of work to be performed, and each may have a different 
processing capacity.  
● The workload has to be evenly distributed among all processors based on their processing speed so that time to 
execute all tasks gets minimized and idle time of each processor can be reduced. This is why we need load balancing. 
● Load imbalance is also a main problem in data parallel applications and here also it mainly occurs due to the uneven 
distribution of data among the various processors in the system. 
 
Q. Systolic Architecture 
The data to be processed flows through various operation stages and then finally is put in the memory. In such 
architectures, data to be processed is taken from the memory and enters the processing for operation f1. The data 
processed by f1 is given to f2 and so on. In the end, the processed data from fn is stored in the memory. 
 
Advantages of Systolic Architecture 
● Regularity and modular design. Array consists of modular processing units. Modular processing units are 
interconnected with homogeneity 
● High degree of Pipelining 
● Highly synchronized multiprocessing 
● High speed & low cost 
● Elimination of global broadcasting 
 
Disadvantages of Systolic Architecture 
● High Bandwidth requirement both for periphery 
● (RAM) and between PEs 
● Poor run-time fault tolerance due to lack of 
● interconnection protocol 
 
Q. Dataflow Architecture 
Uses Data driven model in which program is represented using directed acyclic graph. Graph contains nodes and edges. 
Node represent the instruction and the edge represents data dependency relationship between the connected nodes 
A:=5*C+D 

 
 
Execution of instructions is solely determined based on the availability of input arguments to the instructions. Execution 
of instructions is scheduled if and only if its input data is valid. Popular data flow languages are SISAL, Silage, LISP,etc 
Advantages 
● High potential for parallelism 
● High throughput for complex computation 
Disadvantages 
● Time lost waiting for unneeded arguments 
● High Control Overhead 
 
Q. Give the example of code migration to improve performance by exploiting parallelism. 
● Example is searching for information in the Web.  
● Implement a search query in the form of a small mobile program, called a mobile agent.  
● MA moves from site to site.  
● MA makes several copies of such a program, and sending each off to different sites.  
● We achieve a linear speedup compared to using just a single program instance. 
 

You might also like