Distributed Computing

Introduction
Say you've got a big computation task to perform. Perhaps you have found a way to cure cancer, or you want to look for aliens. All you need is a few super computers to work out some calculations, but you've only got the one PC on your desk. What to do? A popular solution is the "distributed computing" model, where the task is split up into smaller chunks and performed by the many computers owned by the general public. This guide shows you how. Computers spend a lot of their time doing nothing. If you are reading this on your computer, that expensive CPU is most probably just sitting around waiting for you to press a key. What a waste! With the recent popularity of the Internet, distributed computing has become popular. The technology behind distributed computing is old, where it is usually known as parallel computing. When people speak of parallel computing, it is usually in the context of a local group of computers, owned by the same person or organization, with good links between nodes. The key issue here is that you are using computing power that you don't own. These computers are owned and controlled by other people, who you would not necessarily trust. Both angels and demons populate the world, and unless you know them personally, you can't tell them apart.

1

Distributed Computing

General Defination:
Distributed computing is any computing that involves multiple computers remote from each other that each has a role in a computation problem and/or information processing.

Defination
"Distributed Computing” is a vague term incorporating the overlapping fields of: 1. Client/Server computing 2. Internet computing 3. Geographical distribution of computing over a wide area 4. Network – peer-to-peer computing 5. Co-operative computing between workstations on a local area network

2

The DCE architecture is shown in Figure. DistributedComputingEnvironmentArchitecture 3 . The DCE is a set of integrated system services that provide an interoperable and flexible distributed environment with the primary goal of solving interoperability problems in heterogeneous. and maintained.Distributed Computing Distributed Computing Environment Architecture The Distributed Computing Environment (DCE) is an integrated distributed environment. executed. which incorporates technology from industry. networked environments. The DCE is intended to form a comprehensive software platform on which distributed applications can be built. The DCE infrastructure supports the construction and integration of client/server applications while attempting to hide the inherent complexity of the distributed processing from the user.

o Thread service. possibly reducing the need/cost for local disks. which provides a mechanism to monitor and track. Fundamental distributed services provide tools for software developers to create the end-user services needed for distributed computing. They include o Distributed file system. which provides portability. and provides performance enhancements to reduce network overhead. Diskless support. scalable. Data-sharing services provide end users with capabilities built upon the fundamental distributed services. model to allow programmers and maintainers to identify and access distributed resources more easily. authorization.500 support and a single naming. privacy. 2. o The DCE supports International Open Systems Interconnect (OSI) standards. programming model for building concurrent applications. which provides the network with o o o authentication. They include o Remote Procedure Call. which allows low-cost workstations to use disks on servers. which provides a simple. which interoperates with the network file system to provide a high-performance. which provide full X. and secure distributed applications.Distributed Computing DCE services are organized into two categories: 1. which are critical to global interconnectivity. Security service. clocks in a distributed environment and accurate time stamps to reduce the load on system administrator. network independence. Directory services. portable. It also implements ISO standards 4 . and user account management services to maintain the integrity. Time service. and authenticity of the distributed system. and secure file access system. These services require no programming on the part of the end user and facilitate better use of information.

notifies the management server that the system is available for processing. The Client s/w runs the Application Manager. 5 . or simply in the background. When the Client is idle. If the user of the client system needs to run his own applications at any time. control is immediately returned. The client then receives an application package from the server and runs the software when it has spare CPU cycles. as any delay in returning control will probably be unacceptable to the user. labeled in the components diagram. and the ISO session and presentation services. An agent running on a processing client detects when the system is idle. and one or more dedicated distributed computing management servers. 1. The following steps. The application may run as a screen saver.Distributed Computing such as Remote Operations Service Element (ROSE). show the interaction between the different components. and Application Manager registers itself with the Job Manager telling it that it is available to run subjobs for the corresponding application. There may also be requesting clients with software that allows them to submit jobs along with lists of their required resources. Association Control Service Element (ACSE). without impacting normal use of the computer. and sends the results back to the server. it sends a request for an Application package to the Network Manager. and processing of the distributed application package ends. How It Works In most cases today. 3. The application package consists of an Application Manager process and the application for which sub-jobs are to be run. a distributed computing architecture consists of very lightweight software agents installed on a number of client systems. 2. and usually requests an application package. The Network Manager sends an application package to the client machine. This must be essentially instantaneous.

the results from the sub-jobs are delivered to the Job Manager.The Job Manager schedules a sub-job to be run on the client.The Application Manager on the Client runs the sub-jobs with the corresponding inputs. 6 . of until the Network Manager reclaims the client for use elsewhere.When the application has finished. The Job Manager can continue to schedule sub-jobs on the Client for that application until it no longer needs the client and release it back to the Network Manager. and sends the Client the sub-gob parameters and input files. 6.Distributed Computing Interaction between components 4. 5.

Resource identification is necessary to define the level of processing power. A larger environment that includes multiple departments. usually with the help of a database. Policy management is used to varying degrees in different types of distributed computing environments. encryption. assuming that one or more sets of results will be returned quickly. including handling dialup users whose connections and IP addresses are inconsistent. and sand boxing 7 . They take distributed computing requests and divide their large processing tasks into smaller tasks that can run on individual desktop systems (though sometimes this is done by a requesting system). partners. or simply because he's using his system heavily for long periods. authentication.Distributed Computing Distributed Computing Management Server The servers have several roles. robust authentication. They send application packages and some client management software to the idle client machines that request them. They monitor the status of the jobs being run by the clients. and the perceived importance of each project. or other management functions as necessary. policy management. and storage each system can contribute. Obviously. they assemble the results sent back by the client and structure them for presentation. policy. encryption. and secure sand boxing functionality. it may send the same application package to another idle system. memory. possibly because the user has disconnected his system and gone on a business trip. If the server doesn't hear from a processing client for a certain period of time. Obviously the complexity of a distributed computing architecture increases with the size and type of environment. and who gets priority in various situations based on rank. Alternatively. The server is also likely to manage any security. deadlines. After the client machines run those packages. Administrators or others with rights can define which jobs and users get access to which systems. it may have already sent out the package to several systems at once. or participants across the Web requires complex resource identification.

8 . regulate and monitor data flow. typically seen as the client/server pattern. In the following sections. since there is only one host to be protected. Both control flow and data flow take place through the central server. All function and information is centralized on a single server (sometimes referred to as the “hub”). Centralized systems are also relatively easy to secure. centralized systems are easily managed and have no questions of data consistency or coherence. Distributed System Topologies Component-based network applications map naturally to business processes that involve an exchange of information among applications running across computer networks. a distributed infrastructure platform needs to support both control and data flow.Distributed Computing are necessary to prevent unauthorized access to systems and data within distributed systems that are meant to be inaccessible. Centralized Systems Centralized systems form the most popular system topology. Because all data is concentrated in one place. While a distributed application only involves the flow of data. with many clients (the “spokes”) connecting directly to the server to send and receive information. Control flow can be looked upon as a special flow of packets that enable. The primary advantage of centralized systems is their simplicity. we compare the organization of various distributed software system topologies with respect to flow of control and data. Selection of appropriate system topology is fundamental to the software infrastructure platform enabling such distributed applications.

Distributed Computing The drawback of centralization is that all information resides at the hub. Pure Peer-to-Peer Systems Figure P2P systems 9 . The hub is thus a single point of failure. since if the hub dies then all client applications connected to the hub also die. including Microsoft’s MTS. While one can introduce redundant hardware and employ better or faster hardware at the hub. The hub is also a bottleneck to scalability and performance. Examples of systems conforming to this centralized topology include J2EE servers and most commercially available web-servers and transaction processing monitors. this only alleviates the problem and does not solve it completely. Even though the hub-and-spoke architecture has found widespread acceptance in database servers and webservers. the drawbacks of scalability and fault-tolerance make it unsuitable for general purpose distributed application deployment.

while data flow takes place in a pure peer-to-peer manner as above. Hybrid Peer-to-Peer Systems In a hybrid peer-to-peer system. Decentralized systems also tend to be fault tolerant. Figure Hybrid P2P systems The drawbacks associated with control being centrally managed still remain. the control information is exchanged through a central server. However. but hybrid systems still suffer from scalability problems for control information that flows through a single node. The control server acts as a monitoring agent for all the other peers and ensures information coherence. Peer-to-peer data routing. the system looses ability to affect changes in data flow. as the failure or shutdown of any particular node does not impact the rest of the system. any node can join a network and start exchanging data with any other node. allows the Hybrid system to offer better scalability than a centralized system. existing applications are not affected by a failure of the central server as the data flow between nodes continues regardless of whether the central server is functional or not. This architecture alleviates the manageability problems of pure P2P systems.Distributed Computing A primary virtue of pure P2P systems is their scalability. While Hybrid systems are being 10 . If the central server goes down.

the solutions are limited to solve relatively small-scale problems only. Figure super-peer systems 11 . Super-Peer Architecture A new wave of peer-to-peer systems is advancing an architecture of centralized topology embedded in decentralized systems. such topology forms a super-peer network. Groove implements collaborative project management software in which a central synchronizing server controls all information being exchanged between peers. An example of a commercial hybrid P2P system is Groove.Distributed Computing effectively used for mission critical applications.

Figure The super peer architecture The super peer architecture closely maps to real world business processes.Distributed Computing Next Generation Distributed Computing Architecture Combining the Super-Peer topology with the Coarse-grained Component model [1] enables a distributed computing platform for a whole new generation of distributed applications which are more flexible. The adjoining figure illustrates a 2-redundant super peer architecture 12 . Super peers can have well defined protocols for cross cluster communication (acting as firewall for this virtual internet). scalable. and reliable than traditional applications. Each cluster maps to a business division.

as you don't want to bog down the network by sending large amounts of data to each client." meaning it should be possible to partition the application into independent tasks or processes that can be computed concurrently. though in some cases you can do so during off hours. days. For most solutions there should not be any need for communication between the tasks except at task boundaries. Server and other dedicated system clusters will be more appropriate for other slightly less data intensive applications. Even processing tasks that normally take an hour are two may not derive much benefit if the communications among distributed systems and the constantly changing availability of processing clients becomes a bottleneck. For a distributed application using numerous PCs. a supercomputer makes sense as communications can take place across the system's very high-speed back plane without bogging down the network. Instead you should think in terms of tasks that take hours. though Data allows some interprocess communications. The closer an application gets to running in real time. the required data should fit very comfortably in the PC's memory. with lots of room to spare. we examine a real-world problem that represents a typical business process and discuss the implementation of this process over multiple software infrastructure system topologies. the less appropriate it is. Generally the most appropriate applications consist of "loosely coupled. Clearly. and months." The high compute to data ratio goes hand-in-hand with a high compute-tocommunications ratio. Taking this further. weeks. If terabytes of data are involved. any application with individual tasks that need access to huge data sets will be more appropriate for larger systems than individual PCs. The tasks 13 . United Devices recommends that the application should have the capability to fully exploit "coarse-grained parallelism. In the following sections. Programs with large databases that can be easily parsed for distribution are very appropriate. Distributed Computing Application Characteristics Obviously not all applications are suitable for distributed computing.Distributed Computing that alleviates the bottlenecks associated with a super peer being a single point of failure for its clients. nonsequential tasks in batch processes with a high compute-to-data ratio.

000. with the submitted query running concurrently against each fragment on each desktop. produce coherent output. the quicker and greater the benefits will be. • A query search against a huge database that can be split across lots of desktops.Distributed Computing and small blocks of data should be such that they can be processed effectively on a modern PC and report results that. And the individual tasks should be small enough to produce a result on these systems within a few hours to a few days. the number of identifiable biological targets for today's drugs is expected to increase from about 500 to about 10. as trials could be run concurrently on many desktops. 14 . • Complex modeling and simulation techniques that increase the accuracy of results by increasing the number of random trials would also be appropriate. and the quicker it's done. are aiming squarely at the life sciences market. and combined to achieve greater statistical significance (this is a common method used in various types of financial risk analysis). As a result of sequencing the human genome. • Exhaustive search techniques that require searching through a huge number of results to find solutions to a problem also make sense. Another related application is the recent trend of generating new types of drugs solely on computers. Types of Distributed Computing Applications The following scenarios are examples of other types of application tasks that can be set up to take advantage of distributed computing. which has a sudden need for massive computing power. Pharmaceutical firms have repositories of millions of different molecules and compounds. when combined with other PC's results. some of which may have characteristics that make them appropriate for inhibiting newly found proteins. The process of matching all these "ligands" to their appropriate targets is an ideal task for distributed computing. • Many of today's vendors. particularly Entropia and United Devices. Drug screening is a prime example.

In the case of "search" projects. To stop a parasite. so long as you have adequate parasite protection. and then there is the motivation for parasites to get the reward without having to do the work. Most offer a chart of the top helpers as a motivation. as well as car crash and other complex simulations. we may start seeing cash payments to helpers. worship me!") Again. In future. These are often simple pleasures as karma and fame. Parasites People running distributed computing systems will often have some sort of reward for the people who help out.Distributed Computing • Complex financial modeling. consider that there may be someone whose only motivation is bragging rights for having broken your project. there are those people who want to stop your project from succeeding. When designing your security checks. a defence against spoilers is that it's statistically unlikely that the attacker will be lucky enough to be given the opportunity to hide a successful search. you need to make sure that when you receive your completed work package. you need to make sure they did the work correctly. Spoiler attacks As well as parasites. ("I'm a great haxor. spoilers and faulty clients. weather forecasting. persuading the public to give their spare time to this project rather than the other. 15 . Attacks against distributed computing system There are three types of attack to consider. there is often a reward for doing the work. However small. without doing the work. who want the reward. Parasites. and geophysical exploration are on the radar screens of these vendors. These are commonly known as spoiler attacks. At present. it represents actual work done. people tend to help distributed projects because they want to help out.

and risk a parasite getting though unchecked. instead of a simple "yes/no" response. This is the simplest way. To help this process along. Faulty hardware isn't unheard of. then the risk of this happening increases. If your project depends on accurate calculation. even when the result was overall. The "floating point co-processor" is a popular place to find hardware glitches.Distributed Computing Faulty clients This isn't really an attack. there is usually no contractual arrangement. spoilers and faulty clients? When a client helps out with a distributed computing project. The problem is that where you have a genuine and accurate client. There is the additional problem that you have no way of knowing if the client you give a checking job to isn't also a parasite. it has to run on hardware owned by the general public. Routinely send the same job to a different client and check that they come back with the same answer. It can even be used in conjunction with other schemes. Even if your client software is perfect. you should build in checks to confirm that the client is operating correctly. Reduce the number of retries. How to stop parasites. Randomly repeat a job. Having said that. and clients typically select themselves without their trustworthiness or reliability being checked. it wastes time. If a copy of a parasite client becomes widespread. Increase checks and waste time. have the client include (say) a totaled up calculation. and easily implemented. this technique is simple. 16 . but unintentional faults.

This would show that most probably did the work. This method usually works where the project is a "search". for parasites to avoid detection. they would have to collaborate. This also helps detecting a faulty client. it's proof that the larger number is indeed not-prime and you can strike it off the list. Say that you know that in any work package. For example. If you could repeat every single completed work package yourself. But what if you can quickly check that the client has completed the work package. If that value is indeed a factor. This is a controversial topic. always vary the pairings so that the same two people do not always check each other. "Did any other parasite do job #xyz?" "Yes. Include quickly checkable proof of work done. a collaboration protocol amongst parasites might work. Don't distribute the source code. you wouldn't need to enlist help. Consider a search for large prime numbers. This makes their job harder (but not impossible). Only reward success. once you have established which one is faulty. I did. 17 . With this in place. The organizers can very quickly check this answer by performing a simple division.Distributed Computing a "no". and my randomly chosen checking total was 58. a simple check would be for the client to report a factor. A completed work package shows ten occurrences where indeed it did come close. by having it respond with proof where it came close. it's in it's own section. a client would probably come close (say) eight times. If a client tells the server that a given number turned out not to be prime at all. So controversial." To better detect collaboration.

except perhaps marking this work package as "done". Reject "impossibly fast" responses. to keep attackers guessing. This would be filed under social engineering rather than technical. 18 . where any responses before that threshold will be thrown out. especially with firewalls being commonplace (so many computers appear behind one IP address). only reward successful effort. as if they were from 100 different computers. how about only recognizing those who you can be sure did the work? Instead of recognizing unsuccessful effort. This is much more difficult to detect. and unless the actual rewards will be frequent.Distributed Computing If the project is of a type where work packages fall into two groups. and also move the threshold around. wait a day. a parasite would instead have to pretend to be a large number of computers. Request enough work for a 100 computers to do in a day. you may find this strategy will limit the number of helpers. A parasite will want to churn though as many work packages as possible. For good measure. Have a secret threshold. Take away the reward away and the motivation for parasitical behavior goes away? This strategy alone still leaves you open to spoiling attacks and faulty clients. tell the client (if such a message is normal) that the work package was accepted. Perhaps mark "unusually busy" groups with suspicion and have a higher proportion of their work packages double-checked. or at least doublechecked. respond with forged completion responses. The effectiveness of social engineering is debatable. Under this scheme. and the only way you'll find out which group your work package falls into is to actually do the work. which doesn't contribute to the effort. Even modern computers have their limits. unsuccessful ones and successful ones. Ignore anyone who is unlucky enough to be given a work package.

Remote memory access is not the same as local memory access Synchronization • Concurrent interactions the norm 5. Latency • Interactions between distributed processes have a higher latency 3. 19 .Distributed Computing Benefits 1. This architectures are ideal for distributed business process composition (BPM) and generic distributing computing applications such as computeintensive scientific problems. high performance and scalable platform for distributing computing. Partial failure • • Applications need to adapt gracefully in the face of partial failure Lamport once defined a distributed system as “One on which I cannot get any work done because some machine I have never heard of has crashed” Conclusion 1. Coarse-grained component model leads to an extremely reliable. Resource sharing Challenges 1. Heterogeneity 2. Scalability 3. Performance • Parallel computing a subset of distributed computing 2. Memory Access • 4. 2.

Distributed Computing 20 .

Sign up to vote on this title
UsefulNot useful