You are on page 1of 7

BCS HIGHER EDUCATION QUALIFICATIONS BCS Level 6 Professional Graduate Diploma in IT April 2011 EXAMINERS’ REPORT Distributed & Parallel

Systems
Overall Examiner’s Comments Overall, the cohort of candidates sitting the April 2011 paper performed very well, having achieved the highest pass rate in several years.  

Section A
  A1. a) Outline a transaction scenario that may lead to deadlock in a parallel/distributed system; propose a scheme whereby deadlocks may be avoided. (10 marks) Briefly describe the differences between processes and threads, stating where each is an appropriate choice in the development of parallel/distributed applications. (5 marks) Distinguish between synchronous and asynchronous inter-process communication, providing one example of each. (5 marks) Distinguish between datagram and stream sockets, suggesting where each would be an appropriate choice for a parallel/distributed application. (5 marks)

b)

c)

d)

Answer Pointers A1. a) Deadlocks occur when a process waits for a resource indefinitely that is not forthcoming since the resource has, for one reason or another, not been relinquished by another process. For example, two competing processes P1 and P2 both require resources A and B to complete. P1 has acquired resource A, and P2 has acquired resource B, and both wait indefinitely for the other resource to become available without relinquishing the lock on the resource they currently hold. Deadlocks can also occur when a process is in a blocked state, awaiting a message from a cooperating process that does not arrive. In either case, this may occur for a variety of reasons, such as race conditions, communication link failure, host failure, messages arriving out of order, or corrupted. Deadlocks may be avoided in a number of ways; examples include incorporating a time limit when blocking/waiting for messages, after which contingency code is executed to work around missing messages (such as a re-request), and to employ schemes such as semaphores to ensure that critical sections of code and resource acquisitions are coordinated in such a way that reduces the risk of deadlocks arising from the scenarios outlined above. In a similar fashion, locked resources could be relinquished after a period of time without successfully securing the other resources needed to continue executing. Alternatively, we might require that a process acquires all necessary resources simultaneously before locking those resources, or implement a system that supports the pre-emptive acquisition of resources from competing processes. These are difficult to accomplish in distributed systems since there is no global view of the system; for this reason, it is common to employ deadlock detection systems with the capability to roll back processes to an epoch before the deadlock occurred

1

non-blocking communication may be used where tasks may continue with other work whilst waiting for a result.e. 2 . and may be configured to communicate directly with one another. and are both sequenced and reliable.in an attempt to avoid the original issue.g. They enforce synchronisation. Processes consume more resources than threads. returning regardless. A1.. Processes. Switching between threads is faster than switching between processes. reducing time expended in an inactive state. and for a consumer. memory space. larger applications may be formed from multiple processes performing different tasks that share data via IPCs.. but. not an architectural construct. Conversely. This enables an application to continue if an expected message is not forthcoming. d) Datagram sockets do not require that a connection is established prior to transmission (i. in Unix. the message received). non-blocking routines do not wait for data transfer to complete.g. possess their own address space. Examiners Comments This question was jointly the second most popular in this year’s examination paper. A1. but potentially requiring extra code to re-request missing messages or otherwise deal with their absence. are connectionless). c) Blocking inter-process communication routines do not return until the communication attempted has been successfully completed (e. relative to a process. with each message being stamped with the destination address. Multiple processes may be generated within a single application (e. stream sockets are connection-oriented. Stream sockets are also two-way. the message transmitted. which are essentially programs (or largely separate program components) in a state of execution. and interact via O/S mediated inter-process communication mechanisms (IPCs). threads share identical state.. b) Threads and processes are both mechanisms to achieve concurrent execution. contain state information. Blocking communication is used in a scenario in which parallel tasks must be completed in a specific order. which may present a deadlock rick if not properly controlled. for a producer. but data transfer is unreliable and messages may arrive in a different order than the original transmission order. A process may contain multiple threads. although it accrued the second lowest average marks. via the fork system call) to create an application architecture – for example. A thread is. perhaps periodically checking to check if an awaited result has arrived. avoiding deadlocks caused by lost messages. unlike datagram socket connections. A1. Twoway communication is supported. since the receiving application will not continue without the expected data.

collectively accounting for 50% of the original program execution time) cannot be improved upon. since this may be highly dependent upon the specific data to be processed (which may be independent of the volume of data to be processed). Processing section takes 50% of the total time. would need to be explicitly measured beforehand and taken into consideration by the task allocation algorithm in the head node. What is the maximum attainable speedup if only the processing part can be parallelized? (5 marks) How does the concept of efficiency differ from speedup in a parallel/distributed application? How is efficiency calculated? (5 marks) Propose a scheme for load balancing on a heterogeneous high-performance compute cluster with a head node generating a continuous stream of variable-sized tasks. In DSM. if a static scheduling approach were used. Internally (and transparently). with good potential to reduce the size and complexity of program code. efficiency takes the specific number of processors used into consideration. explicit calls to transmit/receive messages are required between communicating parties. relieving the onus of inter-host or inter-task communication from the programmer. such that the programmer is explicitly aware that different data will be stored on different processes that may be physically spread among the hosts of the distributed system (although it is not necessarily the case that the programmer will know on which host the required data resides. as dictated by Amdahl’s Law. physically distributed memories appear to the programmer as a single resource (i. as a single logical address space). DSM systems use message passing techniques to maintain this illusion. depending upon how quickly existing tasks return (as opposed to. a) Contrast distributed shared memory (DSM) and message passing approaches in parallel/distributed computing. The formula for speedup is: S=TS/TP. (5 marks) b) c) d) Answer Pointers A2. for example.. b) If only the parallel section of the program described can be parallelised. where N is the number of processors. he/she typically is required to know with which process/task communication is required). one half of the program (the input and output sections.e. a) DSM and message passing may be distinguished as follows. and is thus indicative of how well these additional resources are being utilised. A2. The input section takes 25% of the total time. The output section takes the remaining 25%. since even if the time taken to execute the parallel section were reduced to 0. The formula for efficiency is: E=TP/N. This is because the question specifically states that the hosts are heterogeneous. which will vary between 0 and 1. A2. A2. 3 . the maximum attainable speedup is 2 times. allocating the same number of tasks to each host). Furthermore. (10 marks) A sequential program has three principal sections. An upper bound on speedup is in effect.A2. which simply measures how much more quickly a parallel application executes in relation to a serial implementation (for a given number of processors). d) A sensible scheme for load balancing on a HPC with a head node generating a continuous stream of variable-sized tasks would be to allocate tasks dynamically. and therefore may have differing processing capabilities that. c) Unlike speedup. In a conventional message passing system. it is also difficult to know how quickly a particular task will execute. beyond which no further improvement can occur.

Peer to peer networks are typically ad-hoc. which are “volunteer computing” projects in the sense that home users can elect to join BOINC to donate their spare CPU cycles (such as during the times when a screensaver would normally be activated). each participating computer is equally privileged. latency should be minimised. (5 marks) b) c) Distinguish between peer-to-peer and client-server distributed system architectures. since data loss is intolerable in the transmission of a data file (i. Peer to peer networks are commonly used in file sharing services. Relative to peer-to-peer networks. a non-zero loss rate may be tolerated. Also. clearly. This is despite that several similar questions have appeared in earlier exams. a) The three applicable QoS parameters are bandwidth. a) Identify the basic quality of service parameters.e. A3. SMTP and telnet services. The workload allocated to participating hosts in a grid does not require communication with other hosts. A3. but loss rate needs to be 0. latency and loss rate. Each participant provides a resource. with participants joining and leaving as required. higher bandwidth will decrease the amount of time taken to transmit the data file). (10 marks) Outline the main features of a grid computing system. relative to a conventional cluster. For a VOIP application. it can be safely abandoned. In client-server architectures. wherein the entire cluster is under central management and hosts may interact if required. For a FTP application. Though servers may be dedicated machines. which may include storage capacity. it may be that one or more of the machines in the network will assume server responsibilities. Examples include HTTP. which distinguishes Grid computing from conventional high-performance cluster computing. bandwidth and latency are do not need to have specific minimums (though. example applications include SETI@Home and Folding@Home. Resources are offered directly to peers without the need for intermediate server. c) In a peer to peer architecture. b) A grid computer system. files for download or processor time. and produced the lowest average marks. and state how they would be configured differently for FTP and VOIP applications. the client-server architecture tends to be less dynamic. Clients do not share resources.Examiners Comments This was the second least popular question. bandwidth will need to have a minimum tolerable level in order that the voice audio stream is serviceable. In VOIP. Where data that has not already been transmitted has been lost or is unlikely to be displayed at the client’s machine because of its age. this term may also be used to describe the services running (as background processes) on any standard machine. to avoid delay/jitter. corruption of the file is likely to lead to it not functioning correctly once received). A3. since data corresponding to a particular audio time-frame is only useful in if that frame is still current to the client. (5 marks) Why is a distributed system usually more reliable than a non-distributed system? (5 marks) d) Answer Pointers A3. offering services to clients that may make requests. 4 .. utilises processing units (which are typically entire PCs from different administrative domains) that are heterogeneous and widely geographically dispersed. It is common to use middleware such as BOINC to oversee the allocation of work and the collection of results.

Some candidates were not able to clearly distinguish between general cluster computing and grid computing. based upon SMTP using POP/IMAP) retrieving mail from a local host: Weakness Sender is not authenticated Type of Attack Masquerading.e. using PGP) Utilise Kerberos or SSL for the authentication of clients b) Message contents are not authenticated Tampering. d) The potential for increased reliability in a distributed system stems from the distribution of workload and processing power across multiple hosts. Examiners Comments This was both the most popular question. there is no single point of failure. (5 marks) d) Compare public and private key encryption methods. Masquerading Message contents in plain text Eavesdropping Delivery and deletion from the POP or IMAP server is authenticated by a login with password only Masquerading 5 . With defensive programming. since the work allocated to these hosts may be redistributed to hosts with continued availability – i. Denial of Service Remedy Implement end-to-end authentication with digital signatures (for example.. and the question for which students accrued the highest marks. We can think of this as fault tolerance. replay and denial of service attacks. a) For a typical email system (for example. using PGP) Implement end-to-end encryption (for example. using PGP) Implement end-to-end authentication with digital signatures (for example. Section B B4. on average. tempering. (10 marks) Distinguish between symmetric and asymmetric security algorithms. it is possible that the failure of one of more hosts will not jeopardise the global execution of the application. a) Describe a scheme by which email communication can be protected from masquerading. but it can also lead to high availability.A3. (5 marks) Answer Pointers B4. (5 marks) c) Distinguish between steganography and cryptography.

Conversely. two different keys exist. cryptography requires that the original message is modified in some way so that it is no longer directly readable. On the other hand. In contrast. enables messages to be encrypted. such that only the recipient user has the correct information to extract the original message. For example. but despite this it was generally not well answered. asymmetric security algorithms entail different keys for encryption and decryption. not even the sender possesses the private decryption key. 6 .B4. These techniques may also be used in unison – encrypting a sensitive message and then embedded it in a carrier using steganography. but they might also be related via a trivial transform). It is often that case that neither may be inferred from the other. in the public key approach. but retrieving the private key from the public key is a prohibitively expensive operation to be attempted by hackers. If the carrier is large and is used merely as a capsule in which to embed the sensitive message. This may not increase the volume of data to be processed.. both encryption and decryption require the same key (or a key that must be transformed between two states. which focussed upon symmetric and asymmetric algorithms. such that only the recipient keeps the decryption key. Question a) was adapted from an exercise in the recommended textbook. so not even that can decrypt the message that they have prepared for transmission. but messages cannot be decrypted without the private key. embedding an email message at intervals within a sound file. the volume of data to be transmitted is large. Examiners Comments This was the least popular question on this year’s paper.e. one public and one private. b) In a symmetric security algorithm. Both keys are mathematically related. d) This question is related to the one posed above (b). both communicating parties share a secret key before the exchange takes place to enable encryption/decryption to take place (it is common for encryption/decryption keys to be identical. Few students were able to correctly distinguish steganography and cryptography approaches to security. which are typically implemented by means of private and public key encryption (i. The public key. but required that the recipient has the correct algorithm and key to decrypt the message. but this is not always so. Interestingly. and accrued the second lowest mean score. which typically a larger segment of data with an apparently different use. c) The steganography approach to secure communication involves embedding a sensitive message in a “carrier”. one for each task). B4. B4. The key (and any required transform) is held secretly by both communicating parties which must be exchanged somehow. Using private key encryption. these terms are more specific). which is freely available.

B5. and to ensure that the slides presented are succinct. It was the question with the second highest mean mark. factual and informative. being jointly the second most commonly attempted question. with explanatory notes where necessary. Candidates who carefully selected an appropriate number of relevant topics. (25 marks) Answer Pointers This question format features regularly in the distributed & parallel systems examination. Furthermore. 5 minutes on each slide. range of discussion. Candidates would be expected to spend approx. Your employer has asked to your prepare an oral presentation comparing the relative merits of high-availability and high-performance cluster computing. Candidates who used a scattershot approach. candidates were required to discuss HA and HP clustering. In this instance. 7 . focusing on quality rather than quantity. clarity of expression. candidates should endeavour to ensure that all slides presented are relevant to the topic to be discussed. illustrating these carefully (with both words and diagrams). and having a logical structure to the presentation that leads the reader through the topics identified in a thoughtful manner. Note: your answer will be assessed for its quality of approach. and depth of argument. Provide 8-10 content-rich slides for of your presentation. and presented information clearly and succinctly scored the highest marks. accuracy of content. Examiners Comments This question remained popular. Credit is given for identifying relevant issues. presenting a large number of slides containing barely relevant material scored considerably less well.