You are on page 1of 54

DISTRIBUTED SYSTEM ARCHITECTURE

If we look back at the history of computers then we realize that the growth has been at a

tremendous pace. Early computers were so large that big rooms were needed to store

them. They were very expensive also and it was not possible for an ordinary person to

use them. That was the era of serial processing systems. Right from that stage now we

have reached to a stage where we can keep the computers in our pockets and even a

layman can use it.

2.1: MILESTONES IN DISTRIBUTED COMPUTING SYSTEMS

The evolution of computer systems can be divided into various phases or generations

[15]. During the early phase of development the changes used to take place after a long

duration of time. New technologies took over a decade to evolve and to be accepted. But

now the changes are very very fast and their acceptance rate is also very high. Everyone

wants to switch to a new technology as soon as it is in the market. Let us have a look at

how we have moved from the era of vacuum tubes to the present day technology.

First Generations (Vacuum Tubes and Plug Boards) - The 1940's

There was no operating system in the earliest electronic digital computers. In that era the

programs were entered in the computer system one bit at a time on mechanical switches

or plug boards. Programming languages and operating systems were unheard of. It was

not possible for a common man to use computer systems. There were specially trained

group of people who used to design and build these systems. The programming,

operation and maintenance of these systems also required special training. For using
these systems the programmers had to reserve them by signing up beforehand. Then at

the designated time they used to come down to the computer room, insert their own

plugboard and execute their program. During this era the systems were very huge which

used to occupy big rooms. Plug boards and mechanical switches were used in building

those systems. One such system is shown in figure 2.1 below.

Figure2.1 (a): Large computers occupying big rooms

Figure2.1 (b): Plug Board with mechanical switches


Second Generation (Transistors and Batch Systems) - The 1950's

During the start of 1950s one job was executed in a system at a time. These were single

user systems as only one person used to execute his/her job at a time. All the resources

were there for that single user only. The computer system of that time did not had hard

disks as we have today. In these systems the tapes and disks had to be loaded before the

execution of the program. This took considerable amount of time which was known as

the job setup time. Hence the computer system was dedicated to one job for more than

the job’s execution time. Similarly after the execution of considerable omelette

considerable “teardown” time was needed which was the time required for removing the

tapes and disk packs. The computer system sat idle during job setup and job teardown.

These were known as single user serial processing systems.

Figure 2.2: Batch Processing System

In a serial processing system the procedure of job setup and teardown was repeated for

each job submitted. The new idea that came up in this era was to group together jobs

which required the same type of execution environment. By doing this and running all
these jobs one after the other the job setup and teardown had to be done only once for the

complete group. This is known as batch processing and it saved pretty good amount of

time.

The programming language used for the control cards was called job control language

(JCL). These single stream batch processing systems became very popular in the early

1960s. General Motors Operating System, Input Output System, FORTRAN Monitor

System and SAGE (Semi-Automatic Ground Environment) are some of the operating

systems which came into existence in 1950s. SAGE was a real time control system which

was designed to monitor weapons systems.

Third Generation (Timesharing and Multiprogramming) - The 1960's

In 1960s the concept of multiprogramming came into existence. In these systems also the

jobs were sent in batches. The advantage here was that multiple programs were loaded

into the memory at the same time. The program during it's course of execution goes

through phases of computation and performing some input output operation.

Figure 2.3: Several jobs in memory in a multiprogramming system


These multiprogramming systems used this to their advantage. While one of the

programs was busy doing some I/O the CPU was given to some other program in the

memory. Hence the CPU was never idle in these systems.

UNIX, VMS, and Windows NT are some of the most popular multiprogramming

operating systems. Another technique called spooling (simultaneous peripheral

operations on line) was again an additional feature in the operating systems of third

generation. There is big difference between the speed of the CPU and the speed of the

peripheral device like a printer. Now if the system has to directly write on the printer then

the speed of this write operation will be as fast as the speed of the printer, which infact is

very slow. Spooling removed this drawback by putting a high speed device like a disk in

between the running program and the low speed peripheral device. So now instead of

writing on the peripheral device the system will write on this high speed disk and will go

back to do some other job. Hence the time of the system is saved. Time sharing systems

were one of the major developments in this area. These systems allowed multiple users to

share computer resources simultaneously. These systems distribute the time among

multiple users where each user gets a very tiny slice of time. The users here are unaware

about this sharing of system resources. During it's allocated time the user utilises all the

resources of the system and then after it's time slice expires the other users get those

resources. The difference between time sharing systems and multiprogramming is very

subtle. In multiprogramming, the computer executes one program until it reaches a

logical stopping point, such as an input/output event, whereas in timesharing systems

every job is allocated a specific small time period only. The first timesharing system was

developed in November 1961 and was called CTSS – Compatible Time-Sharing System.
Fourth Generation (Microprocessor and personal Computer- The 1970s

The 1970s saw several important developments in the field of operating systems. Along

with this there were many technological advancements in the field of data communication

also. Military and university computing started making heavy use of TCP/IP

(Transmission Control Protocol/Internet Protocol) and hence it became widely popular.

Personal computers were also developed in this era only. Microprocessor or the

microchip was the main advancement which led to the development of these personal

computers.. The first IBM PC, IBM 5150 is shown in figure 2.4.

Figure 2.4: First IBM PC, IBM 5150

These microprocessors had thousands of transistors integrated onto small silicon chips

and hence were also known as integrated circuits or ICs. Intel’s 4004 was the first

microprocessor developed in 1971 and it was a 4-bit microprocessor. Other

microprocessors were Intel’s 8008, Motorola 68000, Zilog Z80 and the Mostek 6502. The
first operating system for Intel 8080 was written in 1976 and was known as CP/M

(Control Program for Micros).

Distributed processing and client-server processing - The 1980’s

When more than one computer or processor runs an application in parallel it is known as

distributed processing. There are numerous ways in which this distribution can take

place. Parallel processing is one such example in which there are multiple CPUs in a

single computer which execute programs. In distributed processing execution of a

program occurs on more than one processor in order for it to be completed in a more

efficient manner. Another example of distributed processing is local-area network

(LANs) where single program runs simultaneously at various sites.

Clusters – The 1990’s

When a group of identical or similar computers are connected in the same geographic

location using high speed connections they form a cluster. In cluster computing these

systems operate as a single computer and these computers that form the cluster cannot

work as independent separate systems. A cluster works as one big computer in which a

centralized resource manager manages various resources.

Grid Computing - Late 1990’s

In a Grid also a number of computers are connected using a communication network and

these systems work together on a large problem. The basic difference between a grid and

a cluster is the type of systems that are connected. In a cluster we have similar systems
but in a grid we have got hetrogenous environment, i.e. the systems connected in a grid

are of different types. This hetrogienity is both in terms of hardware as well as software.

This means that the computers that form a grid can run different operating systems and

they can have different hardware also. A Grid can be formed over a Local Area Network,

Metropolitan Area Network or a Wide Area Network which means that they are

inherently distributed in nature Another difference between a cluster and a Grid is that the

nodes in a Grid are autonomous i.e. they have their own resource manager and each node

behaves like an independent entity. As far as the physical location is concerned these

nodes can be geographically far apart and they can be operated independently. Each

computer in a Grid acts as an individual entity with it’s standalone identity.

Figure 2.5: Grid Architecture

Unused resources such as processing time and memory on one computer can be used by a

process running on another computer. This can be achieved with the help of a program
which runs on each node of the distributed environment. We know that the processing

speed of a computer is much larger than the speed of the communication network

between them. The task on a system is broken down into smaller independent parts and

these parts are migrated on various nodes. These nodes process their portions

independently and send back the results to the server.

So in a Grid structure, normally, a server logs onto a bunch of computers (the grid) and

sends them data and a program to run. The program is run on those computers, and when

the results are ready they are send back to the server.

Cloud Computing - Late 1990’s

In cloud computing the resources of computing are provided as a service instead of a

product. This means that the shared resources such as software and information are

provided to computers and other devices as a utility. We can compare this to an

electricity grid where the electric power runs over the wires, which is accessible to

everyone who has the connection and each user pays depending upon their utilization.

Analogous to this in cloud computing the resources can be accessed over the network

typically the Internet [16].

The term cloud computing was first used in this context in 1997 by Ramnath Chellappa

where he defined it as a new computing paradigm where the boundaries of computing

will be determined by economic rationale rather than technical limits alone.


Figure 2.6: Cloud Computing

Salesforce.com introduced the concept of delivering enterprise applications via a simple

website in 1999. This was time when the commercial application of this technology

started to come into the market. After this many more players started to emerge in the

market. Amazon launched its Amazon Web Service in 2002; Google Docs which came in

2006 brought cloud computing to the forefront of public consciousness. Amazon also

introduced its Amazon’s Elastic Compute cloud (EC2) as a commercial web service in

2006. We can also compare this service to a rental agency which provides computing

resources to small companies and individuals which cannot afford to have a full fledged

infrastructure of their own. These companies pay to the service provider in accordance to

the usage.

In the year 2007 corporate giants Google, IBM and a number of universities across the

United States came together in an industry-wide collaboration for this technology. The

concept of private cloud came with Eucalyptus in 2008 which was the first open source
AWS (Amazon Web Services) API compatible platform. This was followed by

OpenNebula which was the first open source software for deploying private and hybrid

clouds.

Microsoft also entered into cloud computing with Windows Azure in November 2009.

By this time most of the big companies were there in cloud computing. The latest entrants

in this technology are Dell, Oracle, Fujitsu, HP, Teradata, and a number of other

household names. Fundamentally the concepts of Grid computing and Cloud are different

but still we can have a cloud cluster within a computational grid and vice-versa.

Cloud Computing versus Grid Computing

The difference between Cloud Computing and Grid Computing lies in the method they

use for determining the tasks within their environments. A single task is broken down

into smaller subtasks in a grid environment and each of these subtasks is distributed

among different computing machines. Once these smaller tasks are completed they are

sent back to the primary machine. The primary machine combines the results obtained

from various nodes and gives one single result.

On the other hand the main focus of a cloud computing architecture is to enable users to

use difference services without investing in the underlying architecture. Although, in a

grid also similar facility for computing power is offered, but cloud computing goes

beyond that. With a cloud various services such as web hosting etc. are also provided to

the users.
The main feature of cloud is that it offers infrastructure as a service (IaaS), software as a

service (SaaS) and platform as a service (PaaS) as well as Web 2.0. The cost and

complexity of buying, configuring, and managing the hardware as well as software

needed to build and deploy applications is eliminated here. Instead these applications are

delivered as a service over the Internet (the cloud).

2.2: MODELS OF COMPUTATION

As we have seen in the previous section as technology developed we moved from

centralised computing model to distributed computing model. Let us discuss these two

models in detail.

2.2.1: Centralized System model

All computing is controlled through a central terminal server(s), which centrally provides

the processing, programs and storage. The workstations (ThinClients, PCs, appliances)

are just used for input and display purposes. They connect to the server(s) where all tasks

are performed. All server resources are purchased once and shared by all users. Security

issues are far easier to coordinate and centrally nail down. Thus Centralized Computing

takes some of the control and all of the parts easily susceptible to failure away from the

desktop appliance. All computing power, processing, program installations, back-ups and

file structures are done on the Terminal or Application Server.

CC Advantages

• Centralized Computing and file storage.

• Redundant technologies incorporated to ensure reduced downtime.


• Computer stations replaced with ThinClient appliances with no moving parts,

improving meantime before failure.

• Centralized management of all users, processes, applications, back-ups and

securities.

• Usually has lower cost of ownership, when measured over 3 + years.

CC Disadvantages

• User access to soft media drives are removed.

• In the rare event of a network failure, the ThinClient Terminal may lose access

to the terminal server. If this happens, there are still means to use some resources

from the local client

Traditionally, this type of computing was only found in Enterprise Level Businesses. In

more recent time, reduced server and network costs have seen this type of computing

deployed in many smaller and medium sized businesses.

2.2.2: Distributed System Model

Every user has their own PC (desktop or laptop) for processing, programs and storage.

Storage is often mixed over the network between the local PC, shared PCs, or a dedicated

file server. Each PC requires the purchase of its own resources (operating system,

programs, etc.). This is also known as Peer-to-Peer (P2P) model. This environment is an

ad-hoc network that is generally grown from a small group of independent computers that

need to share files, resources such as printers and network/internet connections. These

have allowed small business to improve some forms of productivity. If all is to run
smoothly, this model usually needs internal technical skills, or access to outsourced

technical support.

DC Advantages

• Each user has control of their own equipment, to a reasonable degree.

• Each user can add their own programs at their own leisure.

• Sometimes cheaper up front capital cost.

DC Disadvantages

• Typical lifespan of 3 years (maybe stretch to 5 with questionable results).

• Many moving parts (fans, hard drives) which are susceptible to failure.

• Larger vulnerability to security threats (both internal & external).

• Usually has higher cost of ownership, when measured over 3 + years.

This is the more widely used computing configuration, because it has grown out of what

most users and many IT people were used to, within the comfort zone of their home PCs.

As a result, there has been extensive development of many business practices, systems

and security products to help the distributed system fully function in a business

environment.

2.3: CLASSIFICATION BASED ON DESIGN

The processes running on the CPU’s of the different nodes are interconnected with some

sort of communication system. Various models are used for building distributed

computing systems. These are broadly classified as:


2.3.1: Minicomputer Model

The minicomputer model is a simple extension of the centralized time-sharing system. As

shown in Figure 2.7, a distributed computing system based on this model consists of a

few minicomputers (they may be large supercomputers as well) interconnected by a

communication network. Each minicomputer usually has multiple users simultaneously

logged on to it. For this, several interactive terminals are connected to each

minicomputer.

Figure2.7 A distributed computing system based on the minicomputer model.

Each user is logged on to one specific minicomputer, with remote access to other

minicomputers. The network allows a user to access remote resources that are available

on some machine other than the one on to which the user is currently logged. The

minicomputer model may be used when resource sharing (such as sharing of information

databases of different types, with each type of database located on a different machine)
with remote users is desired. The early ARPAnet is an example of a distributed

computing system based on the minicomputer model.

2.3.2: Workstation Model

The workstation model is straightforward: the system consists of workstations (high-end

personal computers) scattered throughout a building or campus and connected by a high-

speed LAN, as shown in figure 2.8. Some of the workstations may be in offices, and thus

implicitly dedicated to a single user, whereas others may be in public areas and have

several different users during the course of a day. In both cases, at any instant of time, a

workstation either has a single user logged into it, and thus has an "owner" (however

temporary), or it is idle.

Figure 2.8. A network of personal workstations, each with a local file system.

In some systems the workstations have local disks and in others they do not. The latter

are universally called diskless workstations, but the former are variously known as

diskful workstations, or disky workstations, or even stranger names. If the workstations

are diskless, the file system must be implemented by one or more remote file servers.

Requests to read and write files are sent to a file server, which performs the work and

sends back the replies.


Diskless workstations are popular at universities and companies for several reasons, not

the least of which is price. Having a large number of workstations equipped with small,

slow disks is typically much more expensive than having one or two file servers equipped

with huge, fast disks and accessed over the LAN.

A second reason that diskless workstations are popular is their ease of maintenance.

When a new release of some program, say a compiler, comes out, the system

administrators can easily install it on a small number of file servers in the machine room.

Installing it on dozens or hundreds of machines all over a building or campus is another

matter entirely. Backup and hardware maintenance is also simpler with one centrally

located 5-gigabyte disk than with fifty 100-megabyte disks scattered over the building.

Another point against disks is that they have fans and make noise. Many people find this

noise objectionable and do not want it in their office. Finally, diskless workstations

provide symmetry and flexibility. A user can walk up to any workstation in the system

and log in. Since all his files are on the file server, one diskless workstation is as good as

another. In contrast, when all the files are stored on local disks, using someone else's

workstation means that you have easy access to his files, but getting to your own requires

extra effort, and is certainly different from using your own workstation.

When the workstations have private disks, these disks can be used in one of at least four

ways:

1. Paging and temporary files.

2. Paging, temporary files, and system binaries.

3. Paging, temporary files, system binaries, and file caching.


4. Complete local file system.

The first design is based on the observation that while it may be convenient to keep all

the user files on the central file servers (to simplify backup and maintenance, etc.) disks

are also needed for paging (or swapping) and for temporary files. In this model, the local

disks are used only for paging and files that are temporary, unshared, and can be

discarded at the end of the login session. For example, most compilers consist of multiple

passes, each of which creates a temporary file read by the next pass. When the file has

been read once, it is discarded. Local disks are ideal for storing such files.

The second model is a variant of the first one in which the local disks also hold the binary

(executable) programs, such as the compilers, text editors, and electronic mail handlers.

When one of these programs is invoked, it is fetched from the local disk instead of from a

file server, further reducing the network load. Since these programs rarely change, they

can be installed on all the local disks and kept there for long periods of time. When a new

release of some system program is available, it is essentially broadcast to all machines.

However, if hat machine happens to be down when the program is sent, it will miss the

program and continue to run the old version. Thus some administration is needed to keep

track of who has which version of which program.

A third approach to using local disks is to use them as explicit caches (in addition to

using them for paging, temporaries, and binaries). In this mode of operation, users can

download files from the file servers to their own disks, read and write them locally, and

then upload the modified ones at the end of the login session. The goal of this

architecture is to keep long-term storage centralized, but reduce network load by keeping
files local while they are being used. A disadvantage is keeping the caches consistent.

What happens if two users download the same file and then each modifies it in different

ways? This problem is not easy to solve, and we will discuss it in detail later in the book.

Fourth, each machine can have its own self-contained file system, with the possibility of

mounting or otherwise accessing other machines' file systems. The idea here is that each

machine is basically self-contained and that contact with the outside world is limited.

This organization provides a uniform and guaranteed response time for the user and puts

little load on the network. The disadvantage is that sharing is more difficult, and the

resulting system is much closer to a network operating system than to a true transparent

distributed operating system.

The advantages of the workstation model are manifold and clear. The model is certainly

easy to understand. Users have a fixed amount of dedicated computing power, and thus

guaranteed response time. Sophisticated graphics programs can be very fast, since they

can have direct access to the screen. Each user has a large degree of autonomy and can

allocate his workstation's resources as he sees fit. Local disks add to this independence,

and make it possible to continue working to a lesser or greater degree even in the face of

file server crashes.

However, the model also has two problems. First, as processor chips continue to get

cheaper, it will soon become economically feasible to give each user first 10 and later

100 CPUs. Having 100 workstations in your office makes it hard to see out the window.

Second, much of the time users are not using their workstations, which are idle, while

other users may need extra computing capacity and cannot get it. From a system-wide
perspective, allocating resources in such a way that some users have resources they do

not need while other users need these resources badly is inefficient.

Using Idle Workstations

The second problem, idle workstations, has been the subject of considerable research,

primarily because many universities have a substantial number of personal workstations,

some of which are idle (an idle workstation is the devil's playground?). Measurements

show that even at peak periods in the middle of the day, often as many as 30 percent of

the workstations are idle at any given moment. In the evening, even more are idle. A

variety of schemes have been proposed for using idle or otherwise underutilized

workstations [17, 18]. The earliest attempt to allow idle workstations to be utilized was

the

rsh program

that comes with Berkeley UNIX. This program is called by

rsh machine command

in which the first argument names a machine and the second names a command to run on

it. What rsh does is run the specified command on the specified machine. Although

widely used, this program has several serious flaws. First, the user must tell which

machine to use, putting the full burden of keeping track of idle machines on the user.

Second, the program executes in the environment of the remote machine, which is

usually different from the local environment. Finally, if someone should log into an idle

machine on which a remote process is running, the process continues to run and the

newly logged-in user either has to accept the lower performance or find another machine.
The research on idle workstations has centered on solving these problems. The key issues

are:

• How is an idle workstation found?

• How can a remote process be run transparently?

• What happens if the machine's owner comes back?

Let us consider these three issues, one at a time.

How is an idle workstation found?

To start with, what is an idle workstation? At first glance, it might appear that a

workstation with no one logged in at the console is an idle workstation, but with modern

computer systems things are not always that simple. In many systems, even with no one

logged in there may be dozens of processes running, such as clock daemons, mail

daemons, news daemons, and all manner of other daemons. On the other hand, a user

who logs in when arriving at his desk in the morning, but otherwise does not touch the

computer for hours, hardly puts any additional load on it. Different systems make

different decisions as to what "idle" means, but typically, if no one has touched the

keyboard or mouse for several minutes and no user-initiated processes are running, the

workstation can be said to be idle. Consequently, there may be substantial differences in

load between one idle workstation and another, due, for example, to the volume of mail

coming into the first one but not the second.

The algorithms used to locate idle workstations can be divided into two categories: server

driven and client driven. In the former, when a workstation goes idle, and thus becomes a

potential compute server, it announces its availability. It can do this by entering its name,
network address, and properties in a registry file (or data base), for example. Later, when

a user wants to execute a command on an idle workstation, he types something like

remote command

and the remote program looks in the registry to find a suitable idle workstation. For

reliability reasons, it is also possible to have multiple copies of the registry.

An alternative way for the newly idle workstation to announce the fact that it has become

unemployed is to put a broadcast message onto the network. All other workstations then

record this fact. In effect, each machine maintains its own private copy of the registry.

The advantage of doing it this way is less overhead in finding an idle workstation and

greater redundancy. The disadvantage is requiring all machines to do the work of

maintaining the registry.

Figure 2.9: A registry-based algorithm for finding and using idle workstations.

Whether there is one registry or many, there is a potential danger of occurring of race

conditions. If two users invoke the remote command simultaneously, and both of them
discover that the same machine is idle, they may both try to start up processes there at the

same time. To detect and avoid this situation, the remote program can check with the idle

workstation, which, if still free, removes itself from the registry and gives the go-ahead

sign. At this point, the caller can send over its environment and start the remote process,

as shown in figure 2.9 above.

The other way to locate idle workstations is to use a client-driven approach. When remote

is invoked, it broadcasts a request saying what program it wants to run, how much

memory it needs, whether or not floating point is needed, and so on. These details are not

needed if all the workstations are identical, but if the system is heterogeneous and not

every program can run on every workstation, they are essential. When the replies come

back, remote picks one and sets it up. One nice twist is to have "idle" workstations delay

their responses slightly, with the delay being proportional to the current load. In this way,

the reply from the least heavily loaded machine will come back first and be selected.

How can a remote process be run transparently?

Finding a workstation is only the first step. Now the process has to be run there. Moving

the code is easy. The trick is to set up the remote process so that it sees the same

environment it would have locally, on the home workstation, and thus carries out the

same computation it would have locally.

To start with, it needs the same view of the file system, the same working directory, and

the same environment variables (shell variables), if any. After these have been set up, the

program can begin running. The trouble starts when the first system call, say a READ, is

executed. What should the kernel do? The answer depends very much on the system
architecture. If the system is diskless, with all the files located on file servers, the kernel

can just send the request to the appropriate file server, the same way the home machine

would have done had the process been running there. On the other hand, if the system has

local disks, each with a complete file system, the request has to be forwarded back to the

home machine for execution.

Some system calls must be forwarded back to the home machine no matter what, even if

all the machines are diskless. For example, reads from the keyboard and writes to the

screen can never be carried out on the remote machine. However, other system calls must

be done remotely under all conditions. For example, the UNIX system calls SBRK

(adjust the size of the data segment), NICE (set CPU scheduling priority), and PROFIL

(enable profiling of the program counter) cannot be executed on the home machine. In

addition, all system calls that query the state of the machine have to be done on the

machine on which the process is actually running. These include asking for the machine's

name and network address, asking how much free memory it has, and so on.

System calls involving time are a problem because the clocks on different machines may

not be synchronized. Using the time on the remote machine may cause programs that

depend on time, like make, to give incorrect results. Forwarding all time-related calls

back to the home machine, however, introduces delay, which also causes problems with

time.

To complicate matters further, certain special cases of calls which normally might have to

be forwarded back, such as creating and writing to a temporary file, can be done much

more efficiently on the remote machine. In addition, mouse tracking and signal
propagation have to be thought out carefully as well. Programs that write directly to

hardware devices, such as the screen's frame buffer, diskette, or magnetic tape, cannot be

run remotely at all. All in all, making programs run on remote machines as though they

were running on their home machines is possible, but it is a complex and tricky business.

What happens if the machine's owner comes back?

The final question on our original list is what to do if the machine's owner comes back

(i.e., somebody logs in or a previously inactive user touches the keyboard or mouse). The

easiest thing is to do nothing, but this tends to defeat the idea of "personal" workstations.

If other people can run programs on your workstation at the same time that you are trying

to use it, there goes your guaranteed response.

Another possibility is to kill off the intruding process. The simplest way is to do this

abruptly and without warning. The disadvantage of this strategy is that all work will be

lost and the file system may be left in a chaotic state. A better way is to give the process

fair warning, by sending it a signal to allow it to detect impending doom, and shut down

gracefully (write edit buffers to the disk, close files, and so on). If it has not exited within

a few seconds, it is then terminated. Of course, the program must be written to expect and

handle this signal, something most existing programs definitely are not.

A completely different approach is to migrate the process to another machine, either back

to the home machine or to yet another idle workstation. Migration is rarely done in

practice because the actual mechanism is complicated. The hard part is not moving the

user code and data, but finding and gathering up all the kernel data structures relating to

the process that is leaving. For example, it may have open files, running timers, queued
incoming messages, and other bits and pieces of information scattered around the kernel.

These must all be carefully removed from the source machine and successfully reinstalled

on the destination machine. There are no theoretical problems here, but the practical

engineering difficulties are substantial. Further literature about this can be found in [19,

20].

In both cases, when the process is gone, it should leave the machine in the same state in

which it found it, to avoid disturbing the owner. Among other items, this requirement

means that not only must the process go, but also all its children and their children. In

addition, mailboxes, network connections, and other system-wide data structures must be

deleted, and some provision must be made to ignore RPC replies and other messages that

arrive for the process after it is gone. If there is a local disk, temporary files must be

deleted, and if possible, any files that had to be removed from its cache restored.

The Sprite system [21] and an experimental system developed at Xerox PARC [22] are

two examples of distributed computing systems based on the workstation model.

2.2.3: Workstation Server Model

The workstation model is a network of personal workstations, each with its own disk and

a local file system. A workstation with its own local disk is usually called a diskful

workstation and a workstation without a local disk is called a diskless workstation. With

the proliferation of high-speed networks, diskless workstations have become more

popular in network environments than diskful workstations, making the workstation-

server model more popular than the workstation model for building distributed

computing systems.
Figure 2.10: A distributed computing system based on the workstation-server

model.

As shown in figure 2.10, a distributed computing system based on the workstation server

model consists of a few minicomputers and several workstations (most of which are

diskless, but a few of which may be diskful) interconnected by a communication

network.

Note that when diskless workstations are used on a network, the file system to be used by

these workstations must be implemented either by a diskful workstation or by a

minicomputer equipped with a disk for file storage. The minicomputers are used for this

purpose. One or more of the minicomputers are used for implementing the file system.

Other minicomputers may be used for providing other types of services, such as database

service and print service. Therefore, each minicomputer is used as a server machine to

provide one or more types of services. Hence in the workstation-server model, in addition

to the workstations, there are specialized machines (may be specialized workstations) for

running server processes (called servers) for managing and providing access to shared

resources.
For a number of reasons, such as higher reliability and better scalability, multiple servers

are often used for managing the resources of a particular type in a distributed computing

system. For example, there may be multiple file servers, each running on a separate

minicomputer and cooperating via the network, for managing the files of all the users in

the system. Due to this reason, a distinction is often made between the services that are

provided to clients and the servers that provide them. That is, a service is an abstract

entity that is provided by one or more servers. For example, one or more file servers may

be used in a distributed computing system to provide file service to the users. In this

model, a user logs onto a workstation called his or her home workstation. Normal

computation activities required by the user's processes are performed at the user's home

workstation, but requests for services provided by special servers (such as a file server or

a database server) are sent to a server providing that type of service that performs the

user's requested activity and returns the result of request processing to the user's

workstation. Therefore, in this model, the user's processes need not be migrated to the

server machines for getting the work done by those machines. For better overall system

performance, the local disk of a diskful workstation is normally used for such purposes as

storage of temporary files, storage of unshared files, storage of shared files that are rarely

changed, paging activity in virtual-memory management, and caching of remotely

accessed data.

As compared to the workstation model, the workstation-server model has several

advantages:
1. In general, it is much cheaper to use a few minicomputers equipped with large, fast

disks that are accessed over the network than a large number of diskful workstations,

with each workstation having a small, slow disk.

2. Diskless workstations are also preferred to diskful workstations from a system

maintenance point of view. Backup and hardware maintenance are easier to perform with

a few large disks than with many small disks scattered all over a building or campus.

Furthermore, installing new releases of software (such as a file server with new

functionalities) is easier when the software is to be installed on a few file server machines

than on every workstation.

3. In the workstation-server model, since all files are managed by the file servers, users

have the flexibility to use any workstation and access the files in the same manner

irrespective of which workstation the user is currently logged on. Note that this is not true

with the workstation model, in which each workstation has its local file system, because

different mechanisms are needed to access local and remote files.

4. In the workstation-server model, the request-response protocol described above is

mainly used to access the services of the server machines. Therefore, unlike the

workstation model, this model does not need a process migration facility, which is

difficult to implement.

The request-response protocol is known as the client-server model of communication. In

this model, a client process (which in this case resides on a workstation) sends a request

to a server process (which in this case resides on a minicomputer) for getting some

service such as reading a block of a file. The server executes the request and sends back a
reply to the client that contains the result of request processing. The client-server model

provides an effective general-purpose approach to the sharing of information and

resources in distributed computing systems. It is not only meant for use with the

workstation-server model but also can be implemented in a variety of hardware and

software environments. The computers used to run the client and server processes need

not necessarily be workstations and minicomputers. They can be of many types and there

is no need to distinguish between them. It is even possible for both the client and server

processes to be run on the same computer. Moreover, some processes are both client and

server processes. That is, a server process may use the services of another server,

appearing as a client to the latter.

5. A user has guaranteed response time because workstations are not used for executing

remote processes. However, the model does not utilize the processing capability of idle

workstations.

The V-System [23] is an example of a distributed computing system that is based on the

workstation-server model.

2.3.4: The Processor Pool Model

Although using idle workstations adds a little computing power to the system, it does not

address a more fundamental issue: What happens when it is feasible to provide 10 or 100

times as many CPUs as there are active users? One solution, as we saw, is to give

everyone a personal multiprocessor. However this is a somewhat inefficient design.


An alternative approach is to construct a processor pool, a rack full of CPUs in the

machine room, which can be dynamically allocated to users on demand. The processor

pool approach is illustrated in figure 2.11. Instead of giving users personal workstations,

in this model they are given high-performance graphics terminals, such as X terminals

(although small workstations can also be used as terminals). This idea is based on the

observation that what many users really want is a high-quality graphical interface and

good performance. Conceptually, it is much closer to traditional timesharing than to the

personal computer model, although it is built with modern technology (low-cost

microprocessors).

Figure 2.11: A system based on the processor pool model.

The motivation for the processor pool idea comes from taking the diskless workstation

idea a step further. If the file system can be centralized in a small number of file servers

to gain economies of scale, it should be possible to do the same thing for compute

servers. By putting all the CPUs in a big rack in the machine room, power supply and

other packaging costs can be reduced, giving more computing power for a given amount

of money. Furthermore, it permits the use of cheaper X terminals (or even ordinary

ASCII terminals), and decouples the number of users from the number of workstations.
The model also allows for easy incremental growth. If the computing load increases by

10 percent, you can just buy 10 percent more processors and put them in the pool.

In effect, we are converting all the computing power into "idle workstations" that can be

accessed dynamically. Users can be assigned as many CPUs as they need for short

periods, after which they are returned to the pool so that other users can have them. There

is no concept of ownership here: all the processors belong equally to everyone.

So far we have tacitly assumed that a pool of n processors is effectively the same thing as

a single processor that is n times as fast as a single processor. In reality, this assumption

is justified only if all requests can be split up in such a way as to allow them to run on all

the processors in parallel. If a job can be split into, say, only 5 parts, then the processor

pool model has an effective service time only 5 times better than that of a single

processor, not n times better.

Still, the processor pool model is a much cleaner way of getting extra computing power

than looking around for idle workstations and sneaking over there while nobody is

looking. By starting out with the assumption that no processor belongs to anyone, we get

a design based on the concept of requesting machines from the pool, using them, and

putting them back when done. There is also no need to forward anything back to a

"home" machine because there are none.

There is also no danger of the owner coming back, because there are no owners. In the

end, it all comes down to the nature of the workload. If all people are doing is simple

editing and occasionally sending an electronic mail message or two, having a personal

workstation is probably enough. If, on the other hand, the users are engaged in a large
software development project, frequently running make on large directories, or are trying

to invert massive sparse matrices, or do major simulations or run big artificial intelligence

or VLSI routing programs, constantly hunting for substantial numbers of idle

workstations will be no fun at all. In all these situations, the processor pool idea is

fundamentally much simpler and more attractive.

2.3.5: A Hybrid Model

A possible compromise is to provide each user with a personal workstation and to have a

processor pool in addition. Although this solution is more expensive than either a pure

workstation model or a pure processor pool model, it combines the advantages of both.

Interactive work can be done on workstations, giving guaranteed response. Idle

workstations, however, are not utilized, making for a simpler system design. They are

just left unused. Instead, all non interactive processes run on the processor pool, as does

all heavy computing in general. This model provides fast interactive response, an

efficient use of resources, and a simple design.

2.4: CLASSIFICATION BASED ON ARCHITECTURE

A network-centric system is an interconnection of hardware, software, and humans that

operate together over a network (e.g., Internet, virtual private network, local area

network, intranet) to accomplish a set of goals. The model simplifies and abstracts the

functions of the individual components of a distributed system and then it considers the

placement of the components across a network of computers. It also gives the

interrelationships between the various components. Based on how the responsibilities are
distributed between system components and how these components are placed we can

place the distributed systems into the following four categories:

a) Client-server Architecture

b) Distributed Object Architecture

c) Service Oriented Architecture

d) Peer to Peer Architecture

2.4.1 Client-server Architecture

The client/server model is a computing model that acts as distributed application which

partitions tasks or workloads between the providers of a resource or service, called

servers, and service requesters, called clients.

Figure 2.12: A Typical Client Server Environment

Often clients and servers communicate over a computer network on separate hardware,

but both client and server may reside in the same system. A server machine is a host that

is running one or more server programs which share their resources with clients. A client
does not share any of its resources, but requests a server's content or service function.

Clients therefore initiate communication sessions with servers which await incoming

requests.

The client/server characteristic describes the relationship of cooperating programs in an

application. The server component provides a function or service to one or many clients,

which initiate requests for such services. With the development of large scale information

systems which are very complex in nature, the client server model has become very

popular. These systems have evolved at a tremendous rate over the past two decades for

application design and deployment.

The C/S computing architecture is the backbone of technologies like groupware and

workflow systems. The Client Server technology is having huge impact on the

development work in the field of computer systems and information technology.

Traditionally we associate the term Client/Server computing to a connection of a desktop

personal computer to an SQL database server over a communication network. This is

perhaps a logical model where the nodes are divided into clients and servers.

One-Tier ~ Monolithic (C/S) Architectures

This architecture has been in existence since the very beginning of the Information

Technology (IT) industry. Simple form of Client/Server computing exists in the

mainframes also where the mainframe acts as a server and an unintelligent terminal acts

as a client. This is an example of a one-tier C/S system.


Two-Tier Client/Server Architectures

When the client communicates directly with a database server, it is known as two tier

architecture. The business or the application logic can reside either on the client or on the

database server in the form of stored procedures.

This model started to emerge in the late eighties and early nineties in the applications

which were developed for Local Area Network. These applications were based on simple

file sharing techniques which were implemented by X-base style products such as

Paradox, dBase, Clipper, FoxPro, etc.

Fat Clients

In the beginning the Client Server systems were such that most of the processing

occurred on the client node itself. In this case the host was a non-mainframe system as a

network file server. Since most of the processing took place on the client node so it was

known as a “fat client”. This configuration is shown in figure 2.13 but the disadvantage of

this model was that it was not able to handle large or even mid-size information systems

(greater than 50 or so connected clients).

For desktop computing the Graphical User Interface (GUI) became the most popular

environment. New horizons started to appear for the two- tier architecture. Specialized

database servers started to replace the general purpose LAN file server. New

development tools such as PowerBuilder, Visual Basic, Delphi etc. started to emerge.
Figure 2.13: Fat Client Model

In this new scheme datasets of information used to be send to the client using Structured

Query Language (SQL) techniques. Most of the processing was still carried out on the

"fat" clients.

If we want to carry out most of the processing on the client the hardware of the client

must be very powerful to support it. That is we need a fatter client. If the client

technology is not that advanced then this is not feasible and hence the application cannot

be afforded. For fat clients the amount of bandwidth required is also quite large and

hence the number of users using the network is reduced.

Thin Client

As opposed to fat client where most of the processing was carried out on the client we

have thin client model which is shown in figure 2.14. In this model the procedures stored

at the database server are invoked by the user as and when required.

The performance is increased in case of the thin client model because the network

bandwidth required here is lesser than the fat client model.


Figure 2.14: Thin Client Model

However the drawback of this model is that it relies heavily on stored procedures which

are much customized and vendor specific. These stored procedures are very closely

linked to the database and hence they have to be changed whenever the business logic

changes. If the database is very large it becomes very difficult to make these changes and

maintain different versions of it.

Remote database transport protocol has to be used for such cases. One such example is

using SQL-Net to carry out the transactions. The Client/Server interaction has to be

mediated through 'heavy' network process in such situations. Due to this the network

transaction size is reduced and query transaction speed is slowed

Either we used thin client or fat client model the two-tier (C/S) systems were not able to

handle distributed systems larger than those comprising of 100 users. They were also not

suitable for mission critical applications.

Three-Tier Client/Server Architectures

In order to cope up with the limitations of the two tier architecture a new model was

developed in which a middle tier was added to achieve '3-tier' architecture.


In this environment the presentation logic is implemented at the client side which is

actually a thin client. The application server(s) holds the business logic and the database

server(s) holds the data.

Figure 2.15: A 3- Tier Client Server Architecture

The three component layers of a multi-tier architecture are:

Front-end component: This component provides a presentation logic

which is portable in nature.

Back-end component: This component is responsible for providing access

to dedicated services, such as a database server.

Middle-tier component: This component is responsible for sharing and

controlling business logic by isolating it from the actual application.

Multi-Tier Client/Server architectures have got many more advantages which include:
The applications can be modified easily to adapt to the changing user

requirement. This is possible because the user interface is independent of the

application logic.

In this architecture only the data which is required to handle a task is

transferred to the client by the application layer. Hence network bottlenecks are

minimized.

Because the server holds the business logic so whenever there is any change

in the business logic we just have to update the server. No changes are needed at

the client as in the case of two-tier architecture.

The client has no information about the database and network operations. It

can access data easily and quickly without having any knowledge about the

location of the data and the number of servers in the system.

Several users can share the database connections by pooling. The cost of

per user licensing can be reduced by using this approach.

Standard SQL is used for writing the data layer. Because standard SQL is

platform independent so the organization has database independence and it is not

limited because of vendor-specific stored procedures.

Standard third or fourth generation languages, such as Java, C or COBOL

can be used for writing the application layer. So the programming can be easily

handled by the organization's programmers who are well versed in these

languages.
Fat Middle

In a multi-tier architecture one or more middle tier components are added as compared to

the traditional client/server architecture. Standard protocols such as RPC or HTTP are

used for interaction between the client system and the middle-tier and standard protocols

for database connectivity such as SQL, ODBC and JDBC are used for interaction

between the middle-tier and the backend server

Most of the application logic is present at the middle-tier. The client calls here are

translated into database queries and other actions and the data from the database is

translated into client data in return. Scalability is easier to achieve here because the

business logic is present on the application server. This also provides for easier handling

of the rapidly changing business needs. Apart from these advantages this also allows for

more open choice of database vendors.

When the middle tier is capable of providing connections to different types of services,

and can integrate and couple them to the client and to each other we get an N tier

architecture.

N-Tier Architectures (thin all over)

With the developments in the field of distributed architectures we got models where in a

multi tier environment a client side computer works as a client as well as a server. In

these client-server systems the processes at the client end are smaller but more

specialized. These processes have the advantage that they can be developed faster and
maintained easily. Similarly on the server side also small specialized processes were

developed.

The industry nowadays is embracing this N-Tier architecture at a very rapid pace. Most

of the applications today are written using this technology only. But this does not mean

that two-tier and three-tier model are completely out Depending on the type of

application, the size of the distributed environment and the type of data access the two- or

three-tiered models are also used departmental applications.

Benefits of N-Tier Architecture

Some of the advantages of using N tier architecture are discussed below:

Suppose a startup company begins by running all tiers on a single machine.

When the traffic increases due to growing business needs each tier can be expanded

and moved to its own machine and then clustered. This is one example of how N-Tier

Architecture improves scalability and supports cost-efficient application building.

Applications can be made more readable and reusable by using N-Tier model. It

is easier to port EJB’s and custom tag libraries to readable applications in well-

maintained templates. Developer productivity increases and application

maintainability improves due to reusability It is an important feature in web

applications.

As there is no single point of failure so applications developed using N-Tier

architectures are more robust. The various tiers are independent of each other. For
example, if a business changes database vendors, they just have to replace the data

tier and adjust the integration tier to any changes that affect it. The business logic tier

and the presentation tier remain unchanged. Likewise, if the presentation layer

changes, this will not affect the integration or data layer. In 3-Tier Architecture all the

layers exist in one and affect each other. A developer would have to pick through the

entire application code to implement any changes. Again, well-designed modules

allow for applications or pieces of applications to be customized and used across

modules or even projects. Reusability is particularly important in web applications.

Finally, N-Tier Architecture helps developers build web applications because it

allows developers to apply their specific skill to that part of the program that best

suits their skill set. Graphic artists can focus on the presentation tier, while

administrators can focus on the database tier.

It is expected that the applications using N tier model will grow almost four-fold over the

next two years.

What kind of systems can benefit?

An N tier `architecture can be used to implement any Client/Server system where

application logic is broken down among various servers. Due to this partitioning of

application an integrated information infrastructure is created which allows for secure,

consistent and global access to critical data. Due to the N tier architecture the network

traffic is also reduced which in turn results in greater reliability, faster network

communications and better overall performance


This model provides for centralized common services in a distributed environment. Here

we have: a backend host such as a mainframe, a UNIX device or database/LAN server;

an intelligent client, and one or more intelligent agents in the middle which control

activities as On-Line Transaction Processing (OLTP), transaction monitoring, message

handling, security and object store control. Object oriented methodologies are also very

heavily used in N tier architecture

TP monitors

Transactional middleware, such as TP monitors and application servers, do a

commendable job of coordinating information movement and method sharing between

many different resources. However, although the transactional paradigm this middleware

employs provides an excellent mechanism for method sharing, it is not as effective at

simple information sharing-the primary goal of B2B application integration. For example,

transactional middleware tends to create a tightly coupled B2B application integration

solution, while messaging solutions tend to be more cohesive. In addition, in order to take

advantage of transactional middleware, the source code in target applications has to be

changed.

In truth, TP monitors are first-generation application servers as well as a transactional

middleware product. They provide a mechanism to facilitate the communications

between two or more applications and a location for application logic. Examples of TP

monitors include Tuxedo from BEA Systems, MTS from Microsoft, and CICS from

IBM.

The TP monitor performs two major services. On one side, a TP monitor provides

services that guarantee the integrity of transactions (a transaction service). On the other
side, a TP monitor provides resource management and runtime management services (an

application server). The two services are orthogonal.

TP monitors provide connectors to resources such as databases, other applications, and

queues. These connectors are typically low-level connectors that require some

sophisticated application development in order to communicate with these various

resources. Once connected, these resources are integrated into the transaction and

leveraged as part of the transaction. As a result, they are also able to recover if a failure

occurs.

TP monitors are unequaled when it comes to supporting a high transaction-processing

load and many clients. They take advantage of queued input buffers to protect against

peaks in the workload. If the load increases, the engine is able to press on without a loss

in response time. TP monitors can also use priority scheduling to prioritize messages and

support server threads, thus saving on the overhead of heavyweight processes. Finally,

the load-balancing mechanisms of TP monitors guarantee that no single process takes on

an excessive load. The fastest-growing segment of the middleware marketplace is defined

by the many new products touting themselves as application servers. What's interesting

about this is that application servers are nothing new (and TP monitors should be

considered application servers because of their many common features). Most application

servers are employed as Web-enabled middleware, processing transactions from Web-

enabled applications. What's more, they employ modern languages such as Java instead

of traditional procedural languages such as C and COBOL (common with TP monitors).

To put it simply, application servers provide not only for the sharing and processing of

application logic, but also for connecting to back-end resources. These resources include
databases, ERP applications, and even traditional mainframe applications. Application

servers also provide user interface development mechanisms. Additionally, they usually

provide mechanisms to deploy the application to the platform of the Web.

Application server vendors are repositioning their products as a technology that solves

B2B application integration problems (some without the benefit of a technology that

works!). Since this is the case, application servers and TP monitors are sure to play a

major role in the B2B application integration domain. Many of these vendors are going

so far as to incorporate features such as messaging, transformation, and intelligent

routing, services that are currently native to message brokers. This area of middleware is

in the throes of a genuine revolution.

2.4.2: Distributed Object Architecture

In Client-Server programming, nothing prevents us from using Structured Modular

programming or shell scripts to implement both client and server application logic. In

DOA, the application logic is organized as objects and distributed over multiple

networked hosts. These objects collaborate over the network to provide the overall

functionality using method invocation as a communication primitive [24]. The invoking

object is called as the “client object” and the remote object on a different host whose

method is being invoked is called as the “server object”. Since this invocation happens

over a network, a reference to the remote object has to be obtained by the client object.

Infrastructure software (often referred to as “middleware”) that provides a level of


abstraction over the network-layer protocols like TCP IP is used to achieve this remote

invocation of a method.

Figure 2.16: Distributed Object Architecture

One thing to be noted is that the distribution of the logic is transparent. The client object

thinks it is calling a local object. The task of actually making the call over the network is

taken over by the infrastructure software. The three most famous frameworks in this

paradigm are DCM (Distributed Component Model), CORBA and DCOM.

Distributed Component Model

Components are the units of processing in DCM [25]. We define a software component

as “a unit of composition with contractually specified interfaces and explicit context

dependencies only. A software component can be deployed independently and is subject


to composition by third parties.” The core principles of Component-Oriented

programming are:

Separation of interface from implementation

In component-based programming, the basic unit in an application is a binary-compatible

interface. An interface defines a set of properties, methods, and events through which

external entities can connect to, and communicate with, the component. According to

Lowy [26], this principle contrasts with the object-oriented view of the world that places

the object rather than its interface at the center. Lowy [26] further says that in

component-based programming, the server is developed independently of the client.

Location transparency

Location transparency allows components to be distributed onto different machines

without hard coding their location into the client code. This allows the location of the

components to be changed without requiring changes to the client code and

recompilation. Components are usually at a higher level of abstraction than objects and

are explicitly geared towards reuse. Components differ from other types of reusable

software modules in that they can be modified at design time as binary executables. In

contrast, libraries, subroutines, and so on must be modified as source code [27].

Component standards specify how to build and interconnect software components. They

show how a component must present itself to the outside world, independent of its

internal implementation. Current popular component standards include .NET, Java EE

and CORBA who provide support for the distributed component model through
Enterprise Services, Enterprise Java Beans (EJB) and CORBA Component Model (CCM)

respectively. Components often exist and operate within containers, which provide a

shared context for interaction with other components. Containers also offer common

access to system-level services for a component’s embedded components (such as process

threads and memory resources). Containers are themselves are typically implemented as

components, which can be nested in other containers. An example is embedding widget

field arrays into panels within GUI windows.

Event-based protocols are commonly used to establish the relationship between a

component and its container. Compliant containers all support the same set of interfaces

which means that components can freely migrate between different containers at runtime

without the need of reconfiguration or recompilation. Containers themselves run on

application servers, which offer services offered by the underlying middleware systems

such as transactions, security, persistence and notification. Also, server components are

often multithreaded, replicated, and pooled, to achieve scalability and reliability.

Consequently server components cannot readily be organized into static containment

hierarchies.

Common Object Request Broker Architecture CORBA, an acronym for Common

Object Request Broker Architecture, is a suite of specifications being standardized by the

Object Management Group for a distributed object architecture and infrastructure. The

CORBA technology can be used for building applications as a collection of distributed

objects components that collaborate over a network. It provides the mechanism for

exposing an object's methods to remote callers (to act as a server) and for discovering
such an exposed server object within the CORBA infrastructure (to invoke it as a client).

CORBA objects can act as servers and clients simultaneously. CORBA uses a platform-

independent interface definition language (IDL) as a common denominator. It is used for

the definition of the calling interfaces and their signatures. An IDL compiler is a tool that

a platform vendor must provide. It compiles the IDL file into platform- specific stub code

and maps the parameter types to platform-specific types. An IDL compiler can generate

both the client stubs and the server skeleton code. The IDL interface definition is

independent of programming language, but maps to all of the popular programming

languages via OMG standards: OMG has standardized mappings from IDL to C, C++,

Java, COBOL, Smalltalk, Ada, Lisp, Python, and IDLscript. Thus, CORBA is language

independent, provided that there is a mapping from the language constructs to the IDL. In

CORBA lingo, an implementation programming language entity that defines the

operations that support a CORBA IDL interface is called as a “Servant”. The heart of the

CORBA specification is the Object Request Broker (ORB), a common communication

software bus for objects. An ORB makes it possible for CORBA objects to communicate

with each other by connecting objects making requests (clients) with objects servicing

requests (servers). Interoperability is implemented by ORB to ORB communication. A

CORBA ORB transparently handles object location, object activation, parameter

marshalling, fault recovery, and security. Figure 6 shows the structure of an ORB in

terms of the various interfaces supported by it.

Distributed Component Object Model Distributed Component Object Model

(DCOM) is an extension of COM developed by Microsoft in 1996. It allows two objects,

one acting as a client and the other acting as the server object, to communicate regardless
of whether the two objects are on the same or on different machines. This communication

structure is achieved using a proxy object in the client and a stub in the server. When

client and component reside on different machines, DCOM simply replaces the local

inter- process communication with a network protocol. The COM run-time provides

object-oriented services to clients and components and uses DCE-RPC and the security

provider to generate standard network packets that conform to the DCOM wire-protocol

standard. A DCOM object has one or more interfaces that a client accesses via interface

pointers.

It is not possible to directly access an object itself; it is accessible only through its

interfaces. Thus, a DCOM object is completely defined by the interfaces that comprise it.

Each DCOM interface is unique in the system. A Globally Unique Identifier (GUID – a

128 bit integer that guarantees uniqueness in space and time for an interface, an object or

a class) allows them to be uniquely named. A DCOM interface is not modifiable; if a new

function is added or if the semantics of an existing function changes, a new interface is

added and a new GUID is assigned to it.

2.4.3: Service-Oriented Architecture

Service-Oriented Architecture Several definitions exist for what constitutes an SOA.

Some take a technical perspective, some others a business perspective and a few define

SOA from an architectural perspective. For example, the W3C (World Wide Web

Consortium) takes a technical perspective and defines SOA as “A set of components

which can be invoked, and whose interface descriptions can be published and
discovered”. This is not very clear as it describes architecture as a technical

implementation and not in the sense the term “architecture” is generally used – to describe

a style or set of practices. A more helpful definition of SOA from an architectural

perspective is provided in MSDN magazine where SOA is defined as “an architecture for

a system or application that is built using a set of services”. A SOA defines application

functionality as a set of shared, reusable services. However, it is not just a system that is

built as a set of services. An application or a system built using SOA could still contain

code that implements functionality specific to that application. On the other hand, all of

the application’s functionality could be made up of services. Some of the other definitions

of SOA include:

“Service-Oriented Architecture is an approach to organizing information

technology in which data, logic, and infrastructure resources are accessed by

routing messages between network interfaces.” [Microsoft 2006b]

“A service-oriented architecture (SOA) is an application framework that

takes everyday business applications and breaks them down into individual

business functions and processes, called services. An SOA lets you build, deploy

and integrate these services independent of applications and the computing

platforms on which they run.” [IBM 2006]


The four tenets of SOA define desirable characteristics of a service

[Microsoft 2004a]:

Service boundaries are explicit.

Services are autonomous.

Services share schema and contract not types.

Service compatibility is based on policy. The most fundamental form of

SOA consists of three components – a Service Consumer, a Service and a Service

directory. These three components interact with each other to provide achieve

automation.

2.4.4: Peer to Peer Architecture

Peer-to-peer (P2P) computing or networking is a distributed application architecture that

partitions tasks or workloads among peers. Peers are equally privileged, equipotent

participants in the application. They are said to form a peer-to-peer network of nodes.

Peers make a portion of their resources, such as processing power, disk storage or

network bandwidth, directly available to other network participants, without the need for

central coordination by servers or stable hosts. Peers are both suppliers and consumers of

resources, in contrast to the traditional client–server model where only servers supply

(send), and clients consume (receive).


The peer-to-peer application structure was popularized by file sharing systems like

Napster. The concept has inspired new structures and philosophies in many areas of

human interaction. Peer-to-peer networking is not restricted to technology, but covers

also social processes with a peer-to-peer dynamic. In such context, social peer-to-peer

processes are currently emerging throughout society.

You might also like