You are on page 1of 6

THE BCS PROFESSIONAL EXAMINATIONS BCS Level 6 Professional Graduate Diploma in IT April 2009 EXAMINERS REPORT Distributed & Parallel

General Comments Distributed and parallel computing continues to be an important computing discipline, with applications in science, education and commerce. Regrettably, the number of students taking this paper continues to be low. In agreement with earlier years, several students appear to be rather poorly prepared to tackle the examination, despite that, this year, many of the questions set tested knowledge explicitly listed in the syllabus. However, a number of very good candidates also attempting the exam, competently and confidently tackled many of the questions.

Section A
Question A1 a) Outline the differences between multi-processing and multi-computing. (5 marks) b) What is meant by a MIMD architecture? (5 marks) c) What is meant by a superscalar processor architecture? (5 marks) d) Describe the purpose and function of processor pipelining, illustrating your answer with a suitable example. (10 marks)

Answer Pointers a. The term multi-processing is typically used to describe a system comprising more than one processor. Variously, this may be implemented by having multiple CPUs, multiple cores on a single die, or multiple dies on a single chip package. This term is also occasionally used to describe multiple processes executing concurrently on a single CPU (i.e., it can be used synonymously with multitasking). The term multi-computer is closely related, but is more often used to describe a computer that is itself made up of several computers, such that each has its own independent CPU and memory. b. MIMD is one entry in Flynn’s Taxonomy, standing for Multiple Instruction stream, Multiple Data stream. A MIMD architecture system comprises a number of independent processors, operating asynchronously. Typically, both the instruction being executed and the data being processed are different from processor to processor. The two main forms of MIMD architecture are shared memory and distributed memory. Examples include clusters of workstations and transputer array processors. c. A superscalar architecture permits instruction level parallelism within a single processor, increasing data throughput (relative to a non superscalar architecture) at the same clock rate. This is accomplished by dispatching several instructions to multiple functional units (e.g., ALU, multiplier, etc), enabling more than one instruction to be dealt with in the same cycle. It is a close relative of pipelining, but a superscalar CPU does not also necessarily use pipelining.

This question tested candidate’s basic knowledge of common terms of parallel and distributed computing. (5 marks) c) Highlight the differences between local and distributed objects. non-blocking routines do not wait for data transfer to complete. Processes. A thread is.. Blocking inter-process communication routines do not return until the communication attempted has been successfully completed (e. This is accomplished by feeding the pipeline with new instructions before existing instructions are fully completed – special measures are taken when an instruction causes a branch (e. memory space.. Processes consume more resources than threads. relative to a process. Threads and processes are both mechanisms to achieve parallelism. which are essentially programs in a state of execution. not an architectural construct. the message transmitted. Each process may contain multiple threads. Stages accept input and produce output that is passed to the next stage. Question A2 a) Distinguish between threads and processes. for a producer. b. They enforce synchronisation. returning regardless. threads share identical state. (5 marks) b) Distinguish between blocking and non-blocking inter-process/inter-thread communication mechanisms. (10 marks) Answer Pointers a.g. contain state information. dumping the remaining partly executed instructions). Pipelining is a system whereby a processor comprises a number of processing stages that operate in parallel. via the fork system call) to create an application architecture. avoiding deadlocks caused by lost messages. Switching between threads is faster than for processes. possess their own address space. and may be configured to communicate directly with one another. of which only 42% achieved a pass mark (scoring 40% or above). (5 marks) d) Outline how semaphores may be used for the protection of critical resources. This enables an application to continue if an expected message is not forthcoming. . An example 3 stage pipeline executing 3 instructions to completion is shown below: stage1 instr1 instr2 instr3 stage2 instr1 instr2 instr3 stage3 time1: time2: time3: time4: time5: instr1 instr2 instr3 Examiner’s Guidance Notes This question was attempted by 63% of candidates.g. Despite that the terms featuring in this question appear in the syllabus.d. and interact via O/S mediated inter-process communication mechanisms (IPCs). and for a consumer. Applications may be formed from multiple processes performing different tasks that share data via IPCs. Pipelines may be synchronous (controlled by a clock) or asynchronous (passing tasks from one unit to the next when this becomes possible via a system of requests and acknowledgements). since the receiving application will not continue without the expected data. the success rate was relatively low.g. in Unix. the message received). Conversely. Multiple processes may be generated within a single application (e.. but potentially requiring extra code to re-request missing messages or otherwise deal with their absence.

Before using the resource. d.g. Given that objects may reside on different computer systems. requiring an underlying communication framework. The semaphore controlling a particular resource is shared among all processes requiring access to that resource. they are able to operate in parallel. Stream sockets are also two-way. a distributed garbage collector is often used to ensure orphaned objects are destroyed to avoid memory leakage. In a peer to peer architecture. and was poorly answered with only 38% scoring a pass mark (40% or above). (10 marks) b) c) Answer Pointers a. Peer to peer networks are typically ad-hoc. which may involve transmission across a network medium. If the variable indicates that the resource is available. Common types are counting and blocking semaphores. Examples include http. but these should be transparent to the programmer.. the value of the semaphore is tested. including the following. and both sequenced and reliable. Peer to peer networks are commonly used in file sharing services. the client-server architecture tends to be less dynamic. this term may also be used to describe the services running (as background processes) on any standard machine. b. but. Two-way communication is supported. with participants joining and leaving as required. communicate with and destroy distributed objects.c. Distributed objects are more susceptible to security issues. are connectionless). They send messages to one another. stream sockets are connection-oriented. protected variables (or data structures) that regulate access to a critical section of code or other computing resource (such as shared memory in parallel computer systems). Referencing distributed objects is more complex. Resources are offered directly to peers without the need for intermediate server. with each message being stamped with the destination address. . Semaphores are specialised. highlighting the impact that each has on performance. along with the deployment of objects in a parallel/distributed system. control/synchronisation and communication. Question A3 a) Distinguish between datagram and stream sockets. in a network). distributed objects either exist on multiple computers (e. In client-server architectures. offering services to clients that may make requests. This question was attempted by 68% of candidates. Clients do not share resources. each participating computer is equally privileged. Semaphores prevent race conditions from occurring. which may include storage capacity. Though servers may be dedicated machines. Each participant provides a resource. Special measures are required to create. Examiner’s Guidance Notes This question tested candidate’s knowledge of processes/threads in terms of their essential differences.e. (5 marks) How is speedup for a parallel application calculated? (5 marks) d) Describe how quality of service (QoS) parameters will be configured differently for VoIP and data transfer applications. smtp and telnet services. files for download or processor time. (5 marks) Outline the principle difference between peer-to-peer and client-server software architectures. but data transfer is unreliable and messages may arrive in a different order that the original transmission order. it may be that one or more of the machines in the network assumes server responsibilities. Unlike local objects. Local and distributed objects differ in many respects. Datagram sockets do not require that a connection is established prior to transmission (i. unlike datagram socket connections. and reset when the resource is relinquished. or within multiple processes on a single computer. it is modified for the duration that the resource is in use to prevent concurrent access.. Relative to peer-topeer networks.

latency and loss rate will be configured for a data transfer application as follows. supercomputing typically employs custom-built CPUs that operate faster than conventional CPUs since they employ cutting edge designs that permit parallel execution of instructions (among other things). providing an example of each. . However. I/O systems provide high bandwidth to maximise the speed that data can be moved around the system. Further. or numerical calculations).. clearly. in contrast to data transfer.5 times. Supercomputers are often designed for specific computational tasks (such as weather prediction.g. Cluster computing involves a loosely coupled collection of computers. The speedup of a parallel application may be calculated simply by dividing the original (serial) execution time of the application (tS) by the execution time of the parallel version (tP). especially given that retransmitted packets may not be useful since their window of usefulness will have elapsed. and may perform considerably less well at more generalised tasks. or when idle). overnight. (5 marks) c) Outline the strengths and limitations of the message passing approach to parallel computing. QoS parameters bandwidth. like question 2. Computers are typically networked. only 38% of those successfully scored a pass mark (of 40% of above). (10 marks) d) Answer Pointers a. Supercomputer design attempts to eliminate the serial portion of programs as far as possible to maximise speedup. speedup is calculated by 5/2. higher bandwidth will decrease the time taken to transfer the data). using COTS hardware and (potentially) free software. if the serial version of an algorithm executes in 5s and the parallel version executes in 2s. (5 marks) Distinguish between pre-emptive and non-pre-emptive scheduling algorithms.. For a VoIP application. bandwidth will need to have a minimum tolerable level in order that the voice stream is serviceable. Section B Question B4 a) Outline the benefits and disadvantages of cluster computing relative to conventional highperformance supercomputing. and. d. since it is unspecified if loss to the particular data transferred will render it useless. Bandwidth and latency do not need to have specific minimums (though. However. Any standard network of computers may perform cluster computing duties (e.e. They are resilient to failure: a fault on a single processing element would not normally affect the ability of the virtual machine to operate successfully. latency should be minimised.c. to avoid delay/jitter. usually appearing to the programmer as a single computing resource (virtual machine). loss rate must be 0. Memory hierarchies are designed to ensure the processor is continually supplied with data and instructions (i. CPU idle time is minimised). (5 marks) b) Briefly describe what is meant by the scalability of a parallel algorithm. thus: S = tS / tP For example. Conversely. giving a speedup of 2. minimal data loss may be permitted. Examiner’s Guidance Notes This question was attempted by 68% of candidates. They are cheap. or a stack of dedicated networked PCs may be used. in VoIP.

distributing. Typically. adding additional processing elements will cease to yield commensurate increases in performance. it relates to whether adding additional processors produces a consistent degree of speedup. This allows processes that are technically runnable to be suspended. and depth of argument. waiting for an I/O resource). message passing burdens the programmer with making explicit calls to organise.. with each local host executing processes concurrently (e. collecting and recombining the data exceeds the benefit garnered from the additional computing power. it may run to completion. Please note: your answer will be assessed for its quality of approach. In pre-emptive scheduling. Examiner’s Guidance Notes This question was attempted by 63% of candidates. processes can be forcibly removed from the CPU when the scheduler decides that a different process should be provided with CPU time. clarity of expression.. increasing the consumption of resources. thoughtful presentation (that may incorporate both text and diagrams). accuracy of content. The responsibility for this to be done lies with the programmer. and that computing power can be easily upgraded or scaled by adding more processing elements without modifying underlying code.b. Scalability refers to the ability of a parallel algorithm either to exploit additional processing resources made available to it (for example. package. ensuring that no job is made to wait indefinitely). for a fixed volume of data to be processed. Benefits are that simultaneous access is feasible. such that each processing element (computer) has its own local CPU and memory. range of discussion. working on a subset of the data to be processed). c. and was the least successfully answered question. with clearly identifiable introduction and conclusion/summary slides . Relative to other approaches.. Unlike shared memory systems. Duplicated data may create issues with integrity.. This year. or to continue to work in a predictable fashion as the volume of data to be processed grows. (25 marks) Answer Pointers This question format is used regularly in the distributed and parallel systems examination. data rate is low. Messages are passed between processing elements via the network (or other communications medium).g. data is duplicated. and cannot be forcibly removed. with associated notes. shared memory systems). connected by a network. in non-pre-emptive scheduling. Conversely. Message passing systems tend to be MIMD multi-computers. relative to tightly coupled architectures that do not require message passing (e. In the first interpretation. send and receive data. shorter jobs have a lower priority than longer jobs. Messages might also be lost or intercepted due to inherent network limitations. that you would use for your talk. succinct and informative treatment of the topic. Where a process does not yield the CPU. with only 25% achieving a pass mark (40% or above). d. different architectures may communicate. In reality.g. ensuring that each contains a factual. The limitations of message passing are that. candidates were required to compare grid and cluster computing. and a logical argument that flows sensibly from beginning to end.g. though various strategies are deployed to promote overall fairness (e.g. and is not suspended for other reasons (e. processes must yield the CPU voluntarily. Candidates are expected to spend several minutes constructing each slide. Credit is awarded for identifying relevant issues. Question B5 You have been asked to make a 30 minute presentation on the following topic: Grid and Cluster Computing: How are they Alike? Sketch out approximately 8 content-rich presentation slides. at some point when the cost of dividing. an increase in the number of processing elements in a computer cluster).

and relatively few provided a sufficiently detailed description of cluster computing. few students were able to devise a presentation that convincingly compared these two paradigms.Examiner’s Guidance Notes This question was attempted by only 26% of candidates. Very few candidates were able to adequately describe the main features of grid computing. . Consequently. of which 40% achieved a pass mark (of 40% or above).