Distributed System Architecture

DISTRIBUTED SYSTEM ARCHITECTURE
If we look back at the history of computers then we realize that the growth has been at a
tremendous pace. Early computers were so large that big rooms were needed to store
them. They were very expensive also and it was not possible for an ordinary person to
use them. That was the era of serial processing systems. Right from that stage now we
have reached to a stage where we can keep the computers in our pockets and even a
layman can use it.
2.1: MILESTONES IN DISTRIBUTED COMPUTING SYSTEMS
The evolution of computer systems can be divided into various phases or generations
[15]. During the early phase of development the changes used to take place after a long
duration of time. New technologies took over a decade to evolve and to be accepted. But
now the changes are very very fast and their acceptance rate is also very high. Everyone
wants to switch to a new technology as soon as it is in the market. Let us have a look at
how we have moved from the era of vacuum tubes to the present day technology.
First Generations (Vacuum Tubes and Plug Boards) - The 1940's
There was no operating system in the earliest electronic digital computers. In that era the
programs were entered in the computer system one bit at a time on mechanical switches
or plug boards. Programming languages and operating systems were unheard of. It was
not possible for a common man to use computer systems. There were specially trained
group of people who used to design and build these systems. The programming,
operation and maintenance of these systems also required special training. For using
these systems the programmers had to reserve them by signing up beforehand. Then at
the designated time they used to come down to the computer room, insert their own
plugboard and execute their program. During this era the systems were very huge which
used to occupy big rooms. Plug boards and mechanical switches were used in building
those systems. One such system is shown in figure 2.1 below.
Figure2.1 (a): Large computers occupying big rooms
Figure2.1 (b): Plug Board with mechanical switches

Second Generation (Transistors and Batch Systems) - The 1950's
During the start of 1950s one job was executed in a system at a time. These were single
user systems as only one person used to execute his/her job at a time. All the resources
were there for that single user only. The computer system of that time did not had hard
disks as we have today. In these systems the tapes and disks had to be loaded before the
execution of the program. This took considerable amount of time which was known as
the job setup time. Hence the computer system was dedicated to one job for more than
the job’s execution time. Similarly after the execution of considerable omelette
considerable “teardown” time was needed which was the time required for removing the
tapes and disk packs. The computer system sat idle during job setup and job teardown.
These were known as single user serial processing systems.
Figure 2.2: Batch Processing System
In a serial processing system the procedure of job setup and teardown was repeated for
each job submitted. The new idea that came up in this era was to group together jobs
which required the same type of execution environment. By doing this and running all
these jobs one after the other the job setup and teardown had to be done only once for the
complete group. This is known as batch processing and it saved pretty good amount of
time.
The programming language used for the control cards was called job control language
(JCL). These single stream batch processing systems became very popular in the early
1960s. General Motors Operating System, Input Output System, FORTRAN Monitor
System and SAGE (Semi-Automatic Ground Environment) are some of the operating
systems which came into existence in 1950s. SAGE was a real time control system which
was designed to monitor weapons systems.
Third Generation (Timesharing and Multiprogramming) - The 1960's
In 1960s the concept of multiprogramming came into existence. In these systems also the
jobs were sent in batches. The advantage here was that multiple programs were loaded
into the memory at the same time. The program during it's course of execution goes
through phases of computation and performing some input output operation.
Figure 2.3: Several jobs in memory in a multiprogramming system

These multiprogramming systems used this to their advantage. While one of the
programs was busy doing some I/O the CPU was given to some other program in the
memory. Hence the CPU was never idle in these systems.
UNIX, VMS, and Windows NT are some of the most popular multiprogramming
operating systems. Another technique called spooling (simultaneous peripheral
operations on line) was again an additional feature in the operating systems of third
generation. There is big difference between the speed of the CPU and the speed of the
peripheral device like a printer. Now if the system has to directly write on the printer then
the speed of this write operation will be as fast as the speed of the printer, which infact is
very slow. Spooling removed this drawback by putting a high speed device like a disk in
between the running program and the low speed peripheral device. So now instead of
writing on the peripheral device the system will write on this high speed disk and will go
back to do some other job. Hence the time of the system is saved. Time sharing systems
were one of the major developments in this area. These systems allowed multiple users to
share computer resources simultaneously. These systems distribute the time among
multiple users where each user gets a very tiny slice of time. The users here are unaware
about this sharing of system resources. During it's allocated time the user utilises all the
resources of the system and then after it's time slice expires the other users get those
resources. The difference between time sharing systems and multiprogramming is very
subtle. In multiprogramming, the computer executes one program until it reaches a
logical stopping point, such as an input/output event, whereas in timesharing systems
every job is allocated a specific small time period only. The first timesharing system was
developed in November 1961 and was called CTSS – Compatible Time-Sharing System.
Fourth Generation (Microprocessor and personal Computer- The 1970s
The 1970s saw several important developments in the field of operating systems. Along
with this there were many technological advancements in the field of data communication
also. Military and university computing started making heavy use of TCP/IP
(Transmission Control Protocol/Internet Protocol) and hence it became widely popular.
Personal computers were also developed in this era only. Microprocessor or the
microchip was the main advancement which led to the development of these personal
computers.. The first IBM PC, IBM 5150 is shown in figure 2.4.
Figure 2.4: First IBM PC, IBM 5150
These microprocessors had thousands of transistors integrated onto small silicon chips
and hence were also known as integrated circuits or ICs. Intel’s 4004 was the first
microprocessor developed in 1971 and it was a 4-bit microprocessor. Other
microprocessors were Intel’s 8008, Motorola 68000, Zilog Z80 and the Mostek 6502. The
first operating system for Intel 8080 was written in 1976 and was known as CP/M
(Control Program for Micros).
Distributed processing and client-server processing - The 1980’s
When more than one computer or processor runs an application in parallel it is known as
distributed processing. There are numerous ways in which this distribution can take
place. Parallel processing is one such example in which there are multiple CPUs in a
single computer which execute programs. In distributed processing execution of a
program occurs on more than one processor in order for it to be completed in a more
efficient manner. Another example of distributed processing is local-area network
(LANs) where single program runs simultaneously at various sites.
Clusters – The 1990’s
When a group of identical or similar computers are connected in the same geographic
location using high speed connections they form a cluster. In cluster computing these
systems operate as a single computer and these computers that form the cluster cannot
work as independent separate systems. A cluster works as one big computer in which a
centralized resource manager manages various resources.
Grid Computing - Late 1990’s
In a Grid also a number of computers are connected using a communication network and
these systems work together on a large problem. The basic difference between a grid and
a cluster is the type of systems that are connected. In a cluster we have similar systems
but in a grid we have got hetrogenous environment, i.e. the systems connected in a grid
are of different types. This hetrogienity is both in terms of hardware as well as software.
This means that the computers that form a grid can run different operating systems and
they can have different hardware also. A Grid can be formed over a Local Area Network,
Metropolitan Area Network or a Wide Area Network which means that they are
inherently distributed in nature Another difference between a cluster and a Grid is that the
nodes in a Grid are autonomous i.e. they have their own resource manager and each node
behaves like an independent entity. As far as the physical location is concerned these
nodes can be geographically far apart and they can be operated independently. Each
computer in a Grid acts as an individual entity with it’s standalone identity.
Figure 2.5: Grid Architecture
Unused resources such as processing time and memory on one computer can be used by a
process running on another computer. This can be achieved with the help of a program
which runs on each node of the distributed environment. We know that the processing
speed of a computer is much larger than the speed of the communication network
between them. The task on a system is broken down into smaller independent parts and
these parts are migrated on various nodes. These nodes process their portions
independently and send back the results to the server.
So in a Grid structure, normally, a server logs onto a bunch of computers (the grid) and
sends them data and a program to run. The program is run on those computers, and when
the results are ready they are send back to the server.
Cloud Computing - Late 1990’s
In cloud computing the resources of computing are provided as a service instead of a
product. This means that the shared resources such as software and information are
provided to computers and other devices as a utility. We can compare this to an
electricity grid where the electric power runs over the wires, which is accessible to
everyone who has the connection and each user pays depending upon their utilization.
Analogous to this in cloud computing the resources can be accessed over the network
typically the Internet [16].
The term cloud computing was first used in this context in 1997 by Ramnath Chellappa
where he defined it as a new computing paradigm where the boundaries of computing
will be determined by economic rationale rather than technical limits alone.

Figure 2.6: Cloud Computing
Salesforce.com introduced the concept of delivering enterprise applications via a simple
website in 1999. This was time when the commercial application of this technology
started to come into the market. After this many more players started to emerge in the
market. Amazon launched its Amazon Web Service in 2002; Google Docs which came in
2006 brought cloud computing to the forefront of public consciousness. Amazon also
introduced its Amazon’s Elastic Compute cloud (EC2) as a commercial web service in
2006. We can also compare this service to a rental agency which provides computing
resources to small companies and individuals which cannot afford to have a full fledged
infrastructure of their own. These companies pay to the service provider in accordance to
the usage.
In the year 2007 corporate giants Google, IBM and a number of universities across the
United States came together in an industry-wide collaboration for this technology. The
concept of private cloud came with Eucalyptus in 2008 which was the first open source
AWS (Amazon Web Services) API compatible platform. This was followed by
OpenNebula which was the first open source software for deploying private and hybrid
clouds.
Microsoft also entered into cloud computing with Windows Azure in November 2009.
By this time most of the big companies were there in cloud computing. The latest entrants
in this technology are Dell, Oracle, Fujitsu, HP, Teradata, and a number of other
household names. Fundamentally the concepts of Grid computing and Cloud are different
but still we can have a cloud cluster within a computational grid and vice-versa.
Cloud Computing versus Grid Computing
The difference between Cloud Computing and Grid Computing lies in the method they
use for determining the tasks within their environments. A single task is broken down
into smaller subtasks in a grid environment and each of these subtasks is distributed
among different computing machines. Once these smaller tasks are completed they are
sent back to the primary machine. The primary machine combines the results obtained
from various nodes and gives one single result.
On the other hand the main focus of a cloud computing architecture is to enable users to
use difference services without investing in the underlying architecture. Although, in a
grid also similar facility for computing power is offered, but cloud computing goes
beyond that. With a cloud various services such as web hosting etc. are also provided to
the users.
The main feature of cloud is that it offers infrastructure as a service (IaaS), software as a
service (SaaS) and platform as a service (PaaS) as well as Web 2.0. The cost and
complexity of buying, configuring, and managing the hardware as well as software
needed to build and deploy applications is eliminated here. Instead these applications are
delivered as a service over the Internet (the cloud).
2.2: MODELS OF COMPUTATION
As we have seen in the previous section as technology developed we moved from
centralised computing model to distributed computing model. Let us discuss these two
models in detail.
2.2.1: Centralized System model
All computing is controlled through a central terminal server(s), which centrally provides
the processing, programs and storage. The workstations (ThinClients, PCs, appliances)
are just used for input and display purposes. They connect to the server(s) where all tasks
are performed. All server resources are purchased once and shared by all users. Security
issues are far easier to coordinate and centrally nail down. Thus Centralized Computing
takes some of the control and all of the parts easily susceptible to failure away from the
desktop appliance. All computing power, processing, program installations, back-ups and
file structures are done on the Terminal or Application Server.
CC Advantages
• Centralized Computing and file storage.
• Redundant technologies incorporated to ensure reduced downtime.

• Computer stations replaced with ThinClient appliances with no moving parts,
improving meantime before failure.
• Centralized management of all users, processes, applications, back-ups and
securities.
• Usually has lower cost of ownership, when measured over 3 + years.
CC Disadvantages
• User access to soft media drives are removed.
• In the rare event of a network failure, the ThinClient Terminal may lose access
to the terminal server. If this happens, there are still means to use some resources
from the local client
Traditionally, this type of computing was only found in Enterprise Level Businesses. In
more recent time, reduced server and network costs have seen this type of computing
deployed in many smaller and medium sized businesses.
2.2.2: Distributed System Model
Every user has their own PC (desktop or laptop) for processing, programs and storage.
Storage is often mixed over the network between the local PC, shared PCs, or a dedicated
file server. Each PC requires the purchase of its own resources (operating system,
programs, etc.). This is also known as Peer-to-Peer (P2P) model. This environment is an
ad-hoc network that is generally grown from a small group of independent computers that
need to share files, resources such as printers and network/internet connections. These
have allowed small business to improve some forms of productivity. If all is to run
smoothly, this model usually needs internal technical skills, or access to outsourced
technical support.
DC Advantages
• Each user has control of their own equipment, to a reasonable degree.
• Each user can add their own programs at their own leisure.
• Sometimes cheaper up front capital cost.
DC Disadvantages
• Typical lifespan of 3 years (maybe stretch to 5 with questionable results).
• Many moving parts (fans, hard drives) which are susceptible to failure.
• Larger vulnerability to security threats (both internal & external).
• Usually has higher cost of ownership, when measured over 3 + years.
This is the more widely used computing configuration, because it has grown out of what
most users and many IT people were used to, within the comfort zone of their home PCs.
As a result, there has been extensive development of many business practices, systems
and security products to help the distributed system fully function in a business
environment.
2.3: CLASSIFICATION BASED ON DESIGN
The processes running on the CPU’s of the different nodes are interconnected with some
sort of communication system. Various models are used for building distributed
computing systems. These are broadly classified as:

2.3.1: Minicomputer Model
The minicomputer model is a simple extension of the centralized time-sharing system. As
shown in Figure 2.7, a distributed computing system based on this model consists of a
few minicomputers (they may be large supercomputers as well) interconnected by a
communication network. Each minicomputer usually has multiple users simultaneously
logged on to it. For this, several interactive terminals are connected to each
minicomputer.
Figure2.7 A distributed computing system based on the minicomputer model.
Each user is logged on to one specific minicomputer, with remote access to other
minicomputers. The network allows a user to access remote resources that are available
on some machine other than the one on to which the user is currently logged. The
minicomputer model may be used when resource sharing (such as sharing of information
databases of different types, with each type of database located on a different machine)
with remote users is desired. The early ARPAnet is an example of a distributed
computing system based on the minicomputer model.
2.3.2: Workstation Model
The workstation model is straightforward: the system consists of workstations (high-end
personal computers) scattered throughout a building or campus and connected by a high-
speed LAN, as shown in figure 2.8. Some of the workstations may be in offices, and thus
implicitly dedicated to a single user, whereas others may be in public areas and have
several different users during the course of a day. In both cases, at any instant of time, a
workstation either has a single user logged into it, and thus has an "owner" (however
temporary), or it is idle.
Figure 2.8. A network of personal workstations, each with a local file system.
In some systems the workstations have local disks and in others they do not. The latter
are universally called diskless workstations, but the former are variously known as
diskful workstations, or disky workstations, or even stranger names. If the workstations
are diskless, the file system must be implemented by one or more remote file servers.
Requests to read and write files are sent to a file server, which performs the work and
sends back the replies.

Diskless workstations are popular at universities and companies for several reasons, not
the least of which is price. Having a large number of workstations equipped with small,
slow disks is typically much more expensive than having one or two file servers equipped
with huge, fast disks and accessed over the LAN.
A second reason that diskless workstations are popular is their ease of maintenance.
When a new release of some program, say a compiler, comes out, the system
administrators can easily install it on a small number of file servers in the machine room.
Installing it on dozens or hundreds of machines all over a building or campus is another
matter entirely. Backup and hardware maintenance is also simpler with one centrally
located 5-gigabyte disk than with fifty 100-megabyte disks scattered over the building.
Another point against disks is that they have fans and make noise. Many people find this
noise objectionable and do not want it in their office. Finally, diskless workstations
provide symmetry and flexibility. A user can walk up to any workstation in the system
and log in. Since all his files are on the file server, one diskless workstation is as good as
another. In contrast, when all the files are stored on local disks, using someone else's
workstation means that you have easy access to his files, but getting to your own requires
extra effort, and is certainly different from using your own workstation.
When the workstations have private disks, these disks can be used in one of at least four
ways:
1. Paging and temporary files.
2. Paging, temporary files, and system binaries.
3. Paging, temporary files, system binaries, and file caching.

4. Complete local file system.
The first design is based on the observation that while it may be convenient to keep all
the user files on the central file servers (to simplify backup and maintenance, etc.) disks
are also needed for paging (or swapping) and for temporary files. In this model, the local
disks are used only for paging and files that are temporary, unshared, and can be
discarded at the end of the login session. For example, most compilers consist of multiple
passes, each of which creates a temporary file read by the next pass. When the file has
been read once, it is discarded. Local disks are ideal for storing such files.
The second model is a variant of the first one in which the local disks also hold the binary
(executable) programs, such as the compilers, text editors, and electronic mail handlers.
When one of these programs is invoked, it is fetched from the local disk instead of from a
file server, further reducing the network load. Since these programs rarely change, they
can be installed on all the local disks and kept there for long periods of time. When a new
release of some system program is available, it is essentially broadcast to all machines.
However, if hat machine happens to be down when the program is sent, it will miss the
program and continue to run the old version. Thus some administration is needed to keep
track of who has which version of which program.
A third approach to using local disks is to use them as explicit caches (in addition to
using them for paging, temporaries, and binaries). In this mode of operation, users can
download files from the file servers to their own disks, read and write them locally, and
then upload the modified ones at the end of the login session. The goal of this
architecture is to keep long-term storage centralized, but reduce network load by keeping
files local while they are being used. A disadvantage is keeping the caches consistent.
What happens if two users download the same file and then each modifies it in different
ways? This problem is not easy to solve, and we will discuss it in detail later in the book.
Fourth, each machine can have its own self-contained file system, with the possibility of
mounting or otherwise accessing other machines' file systems. The idea here is that each
machine is basically self-contained and that contact with the outside world is limited.
This organization provides a uniform and guaranteed response time for the user and puts
little load on the network. The disadvantage is that sharing is more difficult, and the
resulting system is much closer to a network operating system than to a true transparent
distributed operating system.
The advantages of the workstation model are manifold and clear. The model is certainly
easy to understand. Users have a fixed amount of dedicated computing power, and thus
guaranteed response time. Sophisticated graphics programs can be very fast, since they
can have direct access to the screen. Each user has a large degree of autonomy and can
allocate his workstation's resources as he sees fit. Local disks add to this independence,
and make it possible to continue working to a lesser or greater degree even in the face of
file server crashes.
However, the model also has two problems. First, as processor chips continue to get
cheaper, it will soon become economically feasible to give each user first 10 and later
100 CPUs. Having 100 workstations in your office makes it hard to see out the window.
Second, much of the time users are not using their workstations, which are idle, while
other users may need extra computing capacity and cannot get it. From a system-wide
perspective, allocating resources in such a way that some users have resources they do
not need while other users need these resources badly is inefficient.
Using Idle Workstations
The second problem, idle workstations, has been the subject of considerable research,
primarily because many universities have a substantial number of personal workstations,
some of which are idle (an idle workstation is the devil's playground?). Measurements
show that even at peak periods in the middle of the day, often as many as 30 percent of
the workstations are idle at any given moment. In the evening, even more are idle. A
variety of schemes have been proposed for using idle or otherwise underutilized
workstations [17, 18]. The earliest attempt to allow idle workstations to be utilized was
the
rsh program
that comes with Berkeley UNIX. This program is called by
rsh machine command
in which the first argument names a machine and the second names a command to run on
it. What rsh does is run the specified command on the specified machine. Although
widely used, this program has several serious flaws. First, the user must tell which
machine to use, putting the full burden of keeping track of idle machines on the user.
Second, the program executes in the environment of the remote machine, which is
usually different from the local environment. Finally, if someone should log into an idle
machine on which a remote process is running, the process continues to run and the
newly logged-in user either has to accept the lower performance or find another machine.
The research on idle workstations has centered on solving these problems. The key issues
are:
• How is an idle workstation found?
• How can a remote process be run transparently?
• What happens if the machine's owner comes back?
Let us consider these three issues, one at a time.
How is an idle workstation found?
To start with, what is an idle workstation? At first glance, it might appear that a
workstation with no one logged in at the console is an idle workstation, but with modern
computer systems things are not always that simple. In many systems, even with no one
logged in there may be dozens of processes running, such as clock daemons, mail
daemons, news daemons, and all manner of other daemons. On the other hand, a user
who logs in when arriving at his desk in the morning, but otherwise does not touch the
computer for hours, hardly puts any additional load on it. Different systems make
different decisions as to what "idle" means, but typically, if no one has touched the
keyboard or mouse for several minutes and no user-initiated processes are running, the
workstation can be said to be idle. Consequently, there may be substantial differences in
load between one idle workstation and another, due, for example, to the volume of mail
coming into the first one but not the second.
The algorithms used to locate idle workstations can be divided into two categories: server
driven and client driven. In the former, when a workstation goes idle, and thus becomes a
potential compute server, it announces its availability. It can do this by entering its name,
network address, and properties in a registry file (or data base), for example. Later, when
a user wants to execute a command on an idle workstation, he types something like
remote command
and the remote program looks in the registry to find a suitable idle workstation. For
reliability reasons, it is also possible to have multiple copies of the registry.
An alternative way for the newly idle workstation to announce the fact that it has become
unemployed is to put a broadcast message onto the network. All other workstations then
record this fact. In effect, each machine maintains its own private copy of the registry.
The advantage of doing it this way is less overhead in finding an idle workstation and
greater redundancy. The disadvantage is requiring all machines to do the work of
maintaining the registry.
Figure 2.9: A registry-based algorithm for finding and using idle workstations.
Whether there is one registry or many, there is a potential danger of occurring of race
conditions. If two users invoke the remote command simultaneously, and both of them
discover that the same machine is idle, they may both try to start up processes there at the
same time. To detect and avoid this situation, the remote program can check with the idle
workstation, which, if still free, removes itself from the registry and gives the go-ahead
sign. At this point, the caller can send over its environment and start the remote process,
as shown in figure 2.9 above.
The other way to locate idle workstations is to use a client-driven approach. When remote
is invoked, it broadcasts a request saying what program it wants to run, how much
memory it needs, whether or not floating point is needed, and so on. These details are not
needed if all the workstations are identical, but if the system is heterogeneous and not
every program can run on every workstation, they are essential. When the replies come
back, remote picks one and sets it up. One nice twist is to have "idle" workstations delay
their responses slightly, with the delay being proportional to the current load. In this way,
the reply from the least heavily loaded machine will come back first and be selected.
How can a remote process be run transparently?
Finding a workstation is only the first step. Now the process has to be run there. Moving
the code is easy. The trick is to set up the remote process so that it sees the same
environment it would have locally, on the home workstation, and thus carries out the
same computation it would have locally.
To start with, it needs the same view of the file system, the same working directory, and
the same environment variables (shell variables), if any. After these have been set up, the
program can begin running. The trouble starts when the first system call, say a READ, is
executed. What should the kernel do? The answer depends very much on the system
architecture. If the system is diskless, with all the files located on file servers, the kernel
can just send the request to the appropriate file server, the same way the home machine
would have done had the process been running there. On the other hand, if the system has
local disks, each with a complete file system, the request has to be forwarded back to the
home machine for execution.
Some system calls must be forwarded back to the home machine no matter what, even if
all the machines are diskless. For example, reads from the keyboard and writes to the
screen can never be carried out on the remote machine. However, other system calls must
be done remotely under all conditions. For example, the UNIX system calls SBRK
(adjust the size of the data segment), NICE (set CPU scheduling priority), and PROFIL
(enable profiling of the program counter) cannot be executed on the home machine. In
addition, all system calls that query the state of the machine have to be done on the
machine on which the process is actually running. These include asking for the machine's
name and network address, asking how much free memory it has, and so on.
System calls involving time are a problem because the clocks on different machines may
not be synchronized. Using the time on the remote machine may cause programs that
depend on time, like make, to give incorrect results. Forwarding all time-related calls
back to the home machine, however, introduces delay, which also causes problems with
time.
To complicate matters further, certain special cases of calls which normally might have to
be forwarded back, such as creating and writing to a temporary file, can be done much
more efficiently on the remote machine. In addition, mouse tracking and signal
propagation have to be thought out carefully as well. Programs that write directly to
hardware devices, such as the screen's frame buffer, diskette, or magnetic tape, cannot be
run remotely at all. All in all, making programs run on remote machines as though they
were running on their home machines is possible, but it is a complex and tricky business.
What happens if the machine's owner comes back?
The final question on our original list is what to do if the machine's owner comes back
(i.e., somebody logs in or a previously inactive user touches the keyboard or mouse). The
easiest thing is to do nothing, but this tends to defeat the idea of "personal" workstations.
If other people can run programs on your workstation at the same time that you are trying
to use it, there goes your guaranteed response.
Another possibility is to kill off the intruding process. The simplest way is to do this
abruptly and without warning. The disadvantage of this strategy is that all work will be
lost and the file system may be left in a chaotic state. A better way is to give the process
fair warning, by sending it a signal to allow it to detect impending doom, and shut down
gracefully (write edit buffers to the disk, close files, and so on). If it has not exited within
a few seconds, it is then terminated. Of course, the program must be written to expect and
handle this signal, something most existing programs definitely are not.
A completely different approach is to migrate the process to another machine, either back
to the home machine or to yet another idle workstation. Migration is rarely done in
practice because the actual mechanism is complicated. The hard part is not moving the
user code and data, but finding and gathering up all the kernel data structures relating to
the process that is leaving. For example, it may have open files, running timers, queued
incoming messages, and other bits and pieces of information scattered around the kernel.
These must all be carefully removed from the source machine and successfully reinstalled
on the destination machine. There are no theoretical problems here, but the practical
engineering difficulties are substantial. Further literature about this can be found in [19,
20].
In both cases, when the process is gone, it should leave the machine in the same state in
which it found it, to avoid disturbing the owner. Among other items, this requirement
means that not only must the process go, but also all its children and their children. In
addition, mailboxes, network connections, and other system-wide data structures must be
deleted, and some provision must be made to ignore RPC replies and other messages that
arrive for the process after it is gone. If there is a local disk, temporary files must be
deleted, and if possible, any files that had to be removed from its cache restored.
The Sprite system [21] and an experimental system developed at Xerox PARC [22] are
two examples of distributed computing systems based on the workstation model.
2.2.3: Workstation Server Model
The workstation model is a network of personal workstations, each with its own disk and
a local file system. A workstation with its own local disk is usually called a diskful
workstation and a workstation without a local disk is called a diskless workstation. With
the proliferation of high-speed networks, diskless workstations have become more
popular in network environments than diskful workstations, making the workstation-
server model more popular than the workstation model for building distributed
computing systems.
Figure 2.10: A distributed computing system based on the workstation-server
model.
As shown in figure 2.10, a distributed computing system based on the workstation server
model consists of a few minicomputers and several workstations (most of which are
diskless, but a few of which may be diskful) interconnected by a communication
network.
Note that when diskless workstations are used on a network, the file system to be used by
these workstations must be implemented either by a diskful workstation or by a
minicomputer equipped with a disk for file storage. The minicomputers are used for this
purpose. One or more of the minicomputers are used for implementing the file system.
Other minicomputers may be used for providing other types of services, such as database
service and print service. Therefore, each minicomputer is used as a server machine to
provide one or more types of services. Hence in the workstation-server model, in addition
to the workstations, there are specialized machines (may be specialized workstations) for
running server processes (called servers) for managing and providing access to shared
resources.
For a number of reasons, such as higher reliability and better scalability, multiple servers
are often used for managing the resources of a particular type in a distributed computing
system. For example, there may be multiple file servers, each running on a separate
minicomputer and cooperating via the network, for managing the files of all the users in
the system. Due to this reason, a distinction is often made between the services that are
provided to clients and the servers that provide them. That is, a service is an abstract
entity that is provided by one or more servers. For example, one or more file servers may
be used in a distributed computing system to provide file service to the users. In this
model, a user logs onto a workstation called his or her home workstation. Normal
computation activities required by the user's processes are performed at the user's home
workstation, but requests for services provided by special servers (such as a file server or
a database server) are sent to a server providing that type of service that performs the
user's requested activity and returns the result of request processing to the user's
workstation. Therefore, in this model, the user's processes need not be migrated to the
server machines for getting the work done by those machines. For better overall system
performance, the local disk of a diskful workstation is normally used for such purposes as
storage of temporary files, storage of unshared files, storage of shared files that are rarely
changed, paging activity in virtual-memory management, and caching of remotely
accessed data.
As compared to the workstation model, the workstation-server model has several
advantages:
1. In general, it is much cheaper to use a few minicomputers equipped with large, fast
disks that are accessed over the network than a large number of diskful workstations,
with each workstation having a small, slow disk.
2. Diskless workstations are also preferred to diskful workstations from a system
maintenance point of view. Backup and hardware maintenance are easier to perform with
a few large disks than with many small disks scattered all over a building or campus.
Furthermore, installing new releases of software (such as a file server with new
functionalities) is easier when the software is to be installed on a few file server machines
than on every workstation.
3. In the workstation-server model, since all files are managed by the file servers, users
have the flexibility to use any workstation and access the files in the same manner
irrespective of which workstation the user is currently logged on. Note that this is not true
with the workstation model, in which each workstation has its local file system, because
different mechanisms are needed to access local and remote files.
4. In the workstation-server model, the request-response protocol described above is
mainly used to access the services of the server machines. Therefore, unlike the
workstation model, this model does not need a process migration facility, which is
difficult to implement.
The request-response protocol is known as the client-server model of communication. In
this model, a client process (which in this case resides on a workstation) sends a request
to a server process (which in this case resides on a minicomputer) for getting some
service such as reading a block of a file. The server executes the request and sends back a
reply to the client that contains the result of request processing. The client-server model
provides an effective general-purpose approach to the sharing of information and
resources in distributed computing systems. It is not only meant for use with the
workstation-server model but also can be implemented in a variety of hardware and
software environments. The computers used to run the client and server processes need
not necessarily be workstations and minicomputers. They can be of many types and there
is no need to distinguish between them. It is even possible for both the client and server
processes to be run on the same computer. Moreover, some processes are both client and
server processes. That is, a server process may use the services of another server,
appearing as a client to the latter.
5. A user has guaranteed response time because workstations are not used for executing
remote processes. However, the model does not utilize the processing capability of idle
workstations.
The V-System [23] is an example of a distributed computing system that is based on the
workstation-server model.
2.3.4: The Processor Pool Model
Although using idle workstations adds a little computing power to the system, it does not
address a more fundamental issue: What happens when it is feasible to provide 10 or 100
times as many CPUs as there are active users? One solution, as we saw, is to give
everyone a personal multiprocessor. However this is a somewhat inefficient design.

An alternative approach is to construct a processor pool, a rack full of CPUs in the
machine room, which can be dynamically allocated to users on demand. The processor
pool approach is illustrated in figure 2.11. Instead of giving users personal workstations,
in this model they are given high-performance graphics terminals, such as X terminals
(although small workstations can also be used as terminals). This idea is based on the
observation that what many users really want is a high-quality graphical interface and
good performance. Conceptually, it is much closer to traditional timesharing than to the
personal computer model, although it is built with modern technology (low-cost
microprocessors).
Figure 2.11: A system based on the processor pool model.
The motivation for the processor pool idea comes from taking the diskless workstation
idea a step further. If the file system can be centralized in a small number of file servers
to gain economies of scale, it should be possible to do the same thing for compute
servers. By putting all the CPUs in a big rack in the machine room, power supply and
other packaging costs can be reduced, giving more computing power for a given amount
of money. Furthermore, it permits the use of cheaper X terminals (or even ordinary
ASCII terminals), and decouples the number of users from the number of workstations.
The model also allows for easy incremental growth. If the computing load increases by
10 percent, you can just buy 10 percent more processors and put them in the pool.
In effect, we are converting all the computing power into "idle workstations" that can be
accessed dynamically. Users can be assigned as many CPUs as they need for short
periods, after which they are returned to the pool so that other users can have them. There
is no concept of ownership here: all the processors belong equally to everyone.
So far we have tacitly assumed that a pool of n processors is effectively the same thing as
a single processor that is n times as fast as a single processor. In reality, this assumption
is justified only if all requests can be split up in such a way as to allow them to run on all
the processors in parallel. If a job can be split into, say, only 5 parts, then the processor
pool model has an effective service time only 5 times better than that of a single
processor, not n times better.
Still, the processor pool model is a much cleaner way of getting extra computing power
than looking around for idle workstations and sneaking over there while nobody is
looking. By starting out with the assumption that no processor belongs to anyone, we get
a design based on the concept of requesting machines from the pool, using them, and
putting them back when done. There is also no need to forward anything back to a
"home" machine because there are none.
There is also no danger of the owner coming back, because there are no owners. In the
end, it all comes down to the nature of the workload. If all people are doing is simple
editing and occasionally sending an electronic mail message or two, having a personal
workstation is probably enough. If, on the other hand, the users are engaged in a large
software development project, frequently running make on large directories, or are trying
to invert massive sparse matrices, or do major simulations or run big artificial intelligence
or VLSI routing programs, constantly hunting for substantial numbers of idle
workstations will be no fun at all. In all these situations, the processor pool idea is
fundamentally much simpler and more attractive.
2.3.5: A Hybrid Model
A possible compromise is to provide each user with a personal workstation and to have a
processor pool in addition. Although this solution is more expensive than either a pure
workstation model or a pure processor pool model, it combines the advantages of both.
Interactive work can be done on workstations, giving guaranteed response. Idle
workstations, however, are not utilized, making for a simpler system design. They are
just left unused. Instead, all non interactive processes run on the processor pool, as does
all heavy computing in general. This model provides fast interactive response, an
efficient use of resources, and a simple design.
2.4: CLASSIFICATION BASED ON ARCHITECTURE
A network-centric system is an interconnection of hardware, software, and humans that
operate together over a network (e.g., Internet, virtual private network, local area
network, intranet) to accomplish a set of goals. The model simplifies and abstracts the
functions of the individual components of a distributed system and then it considers the
placement of the components across a network of computers. It also gives the
interrelationships between the various components. Based on how the responsibilities are
distributed between system components and how these components are placed we can
place the distributed systems into the following four categories:
a) Client-server Architecture
b) Distributed Object Architecture
c) Service Oriented Architecture
d) Peer to Peer Architecture
2.4.1 Client-server Architecture
The client/server model is a computing model that acts as distributed application which
partitions tasks or workloads between the providers of a resource or service, called
servers, and service requesters, called clients.
Figure 2.12: A Typical Client Server Environment
Often clients and servers communicate over a computer network on separate hardware,
but both client and server may reside in the same system. A server machine is a host that
is running one or more server programs which share their resources with clients. A client
does not share any of its resources, but requests a server's content or service function.
Clients therefore initiate communication sessions with servers which await incoming
requests.
The client/server characteristic describes the relationship of cooperating programs in an
application. The server component provides a function or service to one or many clients,
which initiate requests for such services. With the development of large scale information
systems which are very complex in nature, the client server model has become very
popular. These systems have evolved at a tremendous rate over the past two decades for
application design and deployment.
The C/S computing architecture is the backbone of technologies like groupware and
workflow systems. The Client Server technology is having huge impact on the
development work in the field of computer systems and information technology.
Traditionally we associate the term Client/Server computing to a connection of a desktop
personal computer to an SQL database server over a communication network. This is
perhaps a logical model where the nodes are divided into clients and servers.
One-Tier ~ Monolithic (C/S) Architectures
This architecture has been in existence since the very beginning of the Information
Technology (IT) industry. Simple form of Client/Server computing exists in the
mainframes also where the mainframe acts as a server and an unintelligent terminal acts
as a client. This is an example of a one-tier C/S system.

Two-Tier Client/Server Architectures
When the client communicates directly with a database server, it is known as two tier
architecture. The business or the application logic can reside either on the client or on the
database server in the form of stored procedures.
This model started to emerge in the late eighties and early nineties in the applications
which were developed for Local Area Network. These applications were based on simple
file sharing techniques which were implemented by X-base style products such as
Paradox, dBase, Clipper, FoxPro, etc.
Fat Clients
In the beginning the Client Server systems were such that most of the processing
occurred on the client node itself. In this case the host was a non-mainframe system as a
network file server. Since most of the processing took place on the client node so it was
known as a “fat client”. This configuration is shown in figure 2.13 but the disadvantage of
this model was that it was not able to handle large or even mid-size information systems
(greater than 50 or so connected clients).
For desktop computing the Graphical User Interface (GUI) became the most popular
environment. New horizons started to appear for the two- tier architecture. Specialized
database servers started to replace the general purpose LAN file server. New
development tools such as PowerBuilder, Visual Basic, Delphi etc. started to emerge.
Figure 2.13: Fat Client Model
In this new scheme datasets of information used to be send to the client using Structured
Query Language (SQL) techniques. Most of the processing was still carried out on the
"fat" clients.
If we want to carry out most of the processing on the client the hardware of the client
must be very powerful to support it. That is we need a fatter client. If the client
technology is not that advanced then this is not feasible and hence the application cannot
be afforded. For fat clients the amount of bandwidth required is also quite large and
hence the number of users using the network is reduced.
Thin Client
As opposed to fat client where most of the processing was carried out on the client we
have thin client model which is shown in figure 2.14. In this model the procedures stored
at the database server are invoked by the user as and when required.
The performance is increased in case of the thin client model because the network
bandwidth required here is lesser than the fat client model.

Figure 2.14: Thin Client Model
However the drawback of this model is that it relies heavily on stored procedures which
are much customized and vendor specific. These stored procedures are very closely
linked to the database and hence they have to be changed whenever the business logic
changes. If the database is very large it becomes very difficult to make these changes and
maintain different versions of it.
Remote database transport protocol has to be used for such cases. One such example is
using SQL-Net to carry out the transactions. The Client/Server interaction has to be
mediated through 'heavy' network process in such situations. Due to this the network
transaction size is reduced and query transaction speed is slowed
Either we used thin client or fat client model the two-tier (C/S) systems were not able to
handle distributed systems larger than those comprising of 100 users. They were also not
suitable for mission critical applications.
Three-Tier Client/Server Architectures
In order to cope up with the limitations of the two tier architecture a new model was
developed in which a middle tier was added to achieve '3-tier' architecture.

In this environment the presentation logic is implemented at the client side which is
actually a thin client. The application server(s) holds the business logic and the database
server(s) holds the data.
Figure 2.15: A 3- Tier Client Server Architecture
The three component layers of a multi-tier architecture are:
Front-end component: This component provides a presentation logic
which is portable in nature.
Back-end component: This component is responsible for providing access
to dedicated services, such as a database server.
Middle-tier component: This component is responsible for sharing and
controlling business logic by isolating it from the actual application.
Multi-Tier Client/Server architectures have got many more advantages which include:
The applications can be modified easily to adapt to the changing user
requirement. This is possible because the user interface is independent of the
application logic.
In this architecture only the data which is required to handle a task is
transferred to the client by the application layer. Hence network bottlenecks are
minimized.
Because the server holds the business logic so whenever there is any change
in the business logic we just have to update the server. No changes are needed at
the client as in the case of two-tier architecture.
The client has no information about the database and network operations. It
can access data easily and quickly without having any knowledge about the
location of the data and the number of servers in the system.
Several users can share the database connections by pooling. The cost of
per user licensing can be reduced by using this approach.
Standard SQL is used for writing the data layer. Because standard SQL is
platform independent so the organization has database independence and it is not
limited because of vendor-specific stored procedures.
Standard third or fourth generation languages, such as Java, C or COBOL
can be used for writing the application layer. So the programming can be easily
handled by the organization's programmers who are well versed in these
languages.
Fat Middle
In a multi-tier architecture one or more middle tier components are added as compared to
the traditional client/server architecture. Standard protocols such as RPC or HTTP are
used for interaction between the client system and the middle-tier and standard protocols
for database connectivity such as SQL, ODBC and JDBC are used for interaction
between the middle-tier and the backend server
Most of the application logic is present at the middle-tier. The client calls here are
translated into database queries and other actions and the data from the database is
translated into client data in return. Scalability is easier to achieve here because the
business logic is present on the application server. This also provides for easier handling
of the rapidly changing business needs. Apart from these advantages this also allows for
more open choice of database vendors.
When the middle tier is capable of providing connections to different types of services,
and can integrate and couple them to the client and to each other we get an N tier
architecture.
N-Tier Architectures (thin all over)
With the developments in the field of distributed architectures we got models where in a
multi tier environment a client side computer works as a client as well as a server. In
these client-server systems the processes at the client end are smaller but more
specialized. These processes have the advantage that they can be developed faster and
maintained easily. Similarly on the server side also small specialized processes were
developed.
The industry nowadays is embracing this N-Tier architecture at a very rapid pace. Most
of the applications today are written using this technology only. But this does not mean
that two-tier and three-tier model are completely out Depending on the type of
application, the size of the distributed environment and the type of data access the two- or
three-tiered models are also used departmental applications.
Benefits of N-Tier Architecture
Some of the advantages of using N tier architecture are discussed below:
Suppose a startup company begins by running all tiers on a single machine.
When the traffic increases due to growing business needs each tier can be expanded
and moved to its own machine and then clustered. This is one example of how N-Tier
Architecture improves scalability and supports cost-efficient application building.
Applications can be made more readable and reusable by using N-Tier model. It
is easier to port EJB’s and custom tag libraries to readable applications in well-
maintained templates. Developer productivity increases and application
maintainability improves due to reusability It is an important feature in web
applications.
As there is no single point of failure so applications developed using N-Tier
architectures are more robust. The various tiers are independent of each other. For
example, if a business changes database vendors, they just have to replace the data
tier and adjust the integration tier to any changes that affect it. The business logic tier
and the presentation tier remain unchanged. Likewise, if the presentation layer
changes, this will not affect the integration or data layer. In 3-Tier Architecture all the
layers exist in one and affect each other. A developer would have to pick through the
entire application code to implement any changes. Again, well-designed modules
allow for applications or pieces of applications to be customized and used across
modules or even projects. Reusability is particularly important in web applications.
Finally, N-Tier Architecture helps developers build web applications because it
allows developers to apply their specific skill to that part of the program that best
suits their skill set. Graphic artists can focus on the presentation tier, while
administrators can focus on the database tier.
It is expected that the applications using N tier model will grow almost four-fold over the
next two years.
What kind of systems can benefit?
An N tier `architecture can be used to implement any Client/Server system where
application logic is broken down among various servers. Due to this partitioning of
application an integrated information infrastructure is created which allows for secure,
consistent and global access to critical data. Due to the N tier architecture the network
traffic is also reduced which in turn results in greater reliability, faster network
communications and better overall performance

This model provides for centralized common services in a distributed environment. Here
we have: a backend host such as a mainframe, a UNIX device or database/LAN server;
an intelligent client, and one or more intelligent agents in the middle which control
activities as On-Line Transaction Processing (OLTP), transaction monitoring, message
handling, security and object store control. Object oriented methodologies are also very
heavily used in N tier architecture
TP monitors
Transactional middleware, such as TP monitors and application servers, do a
commendable job of coordinating information movement and method sharing between
many different resources. However, although the transactional paradigm this middleware
employs provides an excellent mechanism for method sharing, it is not as effective at
simple information sharing-the primary goal of B2B application integration. For example,
transactional middleware tends to create a tightly coupled B2B application integration
solution, while messaging solutions tend to be more cohesive. In addition, in order to take
advantage of transactional middleware, the source code in target applications has to be
changed.
In truth, TP monitors are first-generation application servers as well as a transactional
middleware product. They provide a mechanism to facilitate the communications
between two or more applications and a location for application logic. Examples of TP
monitors include Tuxedo from BEA Systems, MTS from Microsoft, and CICS from
IBM.
The TP monitor performs two major services. On one side, a TP monitor provides
services that guarantee the integrity of transactions (a transaction service). On the other
side, a TP monitor provides resource management and runtime management services (an
application server). The two services are orthogonal.
TP monitors provide connectors to resources such as databases, other applications, and
queues. These connectors are typically low-level connectors that require some
sophisticated application development in order to communicate with these various
resources. Once connected, these resources are integrated into the transaction and
leveraged as part of the transaction. As a result, they are also able to recover if a failure
occurs.
TP monitors are unequaled when it comes to supporting a high transaction-processing
load and many clients. They take advantage of queued input buffers to protect against
peaks in the workload. If the load increases, the engine is able to press on without a loss
in response time. TP monitors can also use priority scheduling to prioritize messages and
support server threads, thus saving on the overhead of heavyweight processes. Finally,
the load-balancing mechanisms of TP monitors guarantee that no single process takes on
an excessive load. The fastest-growing segment of the middleware marketplace is defined
by the many new products touting themselves as application servers. What's interesting
about this is that application servers are nothing new (and TP monitors should be
considered application servers because of their many common features). Most application
servers are employed as Web-enabled middleware, processing transactions from Web-
enabled applications. What's more, they employ modern languages such as Java instead
of traditional procedural languages such as C and COBOL (common with TP monitors).
To put it simply, application servers provide not only for the sharing and processing of
application logic, but also for connecting to back-end resources. These resources include
databases, ERP applications, and even traditional mainframe applications. Application
servers also provide user interface development mechanisms. Additionally, they usually
provide mechanisms to deploy the application to the platform of the Web.
Application server vendors are repositioning their products as a technology that solves
B2B application integration problems (some without the benefit of a technology that
works!). Since this is the case, application servers and TP monitors are sure to play a
major role in the B2B application integration domain. Many of these vendors are going
so far as to incorporate features such as messaging, transformation, and intelligent
routing, services that are currently native to message brokers. This area of middleware is
in the throes of a genuine revolution.
2.4.2: Distributed Object Architecture
In Client-Server programming, nothing prevents us from using Structured Modular
programming or shell scripts to implement both client and server application logic. In
DOA, the application logic is organized as objects and distributed over multiple
networked hosts. These objects collaborate over the network to provide the overall
functionality using method invocation as a communication primitive [24]. The invoking
object is called as the “client object” and the remote object on a different host whose
method is being invoked is called as the “server object”. Since this invocation happens
over a network, a reference to the remote object has to be obtained by the client object.
Infrastructure software (often referred to as “middleware”) that provides a level of

abstraction over the network-layer protocols like TCP IP is used to achieve this remote
invocation of a method.
Figure 2.16: Distributed Object Architecture
One thing to be noted is that the distribution of the logic is transparent. The client object
thinks it is calling a local object. The task of actually making the call over the network is
taken over by the infrastructure software. The three most famous frameworks in this
paradigm are DCM (Distributed Component Model), CORBA and DCOM.
Distributed Component Model
Components are the units of processing in DCM [25]. We define a software component
as “a unit of composition with contractually specified interfaces and explicit context
dependencies only. A software component can be deployed independently and is subject

to composition by third parties.” The core principles of Component-Oriented
programming are:
Separation of interface from implementation
In component-based programming, the basic unit in an application is a binary-compatible
interface. An interface defines a set of properties, methods, and events through which
external entities can connect to, and communicate with, the component. According to
Lowy [26], this principle contrasts with the object-oriented view of the world that places
the object rather than its interface at the center. Lowy [26] further says that in
component-based programming, the server is developed independently of the client.
Location transparency
Location transparency allows components to be distributed onto different machines
without hard coding their location into the client code. This allows the location of the
components to be changed without requiring changes to the client code and
recompilation. Components are usually at a higher level of abstraction than objects and
are explicitly geared towards reuse. Components differ from other types of reusable
software modules in that they can be modified at design time as binary executables. In
contrast, libraries, subroutines, and so on must be modified as source code [27].
Component standards specify how to build and interconnect software components. They
show how a component must present itself to the outside world, independent of its
internal implementation. Current popular component standards include .NET, Java EE
and CORBA who provide support for the distributed component model through
Enterprise Services, Enterprise Java Beans (EJB) and CORBA Component Model (CCM)
respectively. Components often exist and operate within containers, which provide a
shared context for interaction with other components. Containers also offer common
access to system-level services for a component’s embedded components (such as process
threads and memory resources). Containers are themselves are typically implemented as
components, which can be nested in other containers. An example is embedding widget
field arrays into panels within GUI windows.
Event-based protocols are commonly used to establish the relationship between a
component and its container. Compliant containers all support the same set of interfaces
which means that components can freely migrate between different containers at runtime
without the need of reconfiguration or recompilation. Containers themselves run on
application servers, which offer services offered by the underlying middleware systems
such as transactions, security, persistence and notification. Also, server components are
often multithreaded, replicated, and pooled, to achieve scalability and reliability.
Consequently server components cannot readily be organized into static containment
hierarchies.
Common Object Request Broker Architecture CORBA, an acronym for Common
Object Request Broker Architecture, is a suite of specifications being standardized by the
Object Management Group for a distributed object architecture and infrastructure. The
CORBA technology can be used for building applications as a collection of distributed
objects components that collaborate over a network. It provides the mechanism for
exposing an object's methods to remote callers (to act as a server) and for discovering
such an exposed server object within the CORBA infrastructure (to invoke it as a client).
CORBA objects can act as servers and clients simultaneously. CORBA uses a platform-
independent interface definition language (IDL) as a common denominator. It is used for
the definition of the calling interfaces and their signatures. An IDL compiler is a tool that
a platform vendor must provide. It compiles the IDL file into platform- specific stub code
and maps the parameter types to platform-specific types. An IDL compiler can generate
both the client stubs and the server skeleton code. The IDL interface definition is
independent of programming language, but maps to all of the popular programming
languages via OMG standards: OMG has standardized mappings from IDL to C, C++,
Java, COBOL, Smalltalk, Ada, Lisp, Python, and IDLscript. Thus, CORBA is language
independent, provided that there is a mapping from the language constructs to the IDL. In
CORBA lingo, an implementation programming language entity that defines the
operations that support a CORBA IDL interface is called as a “Servant”. The heart of the
CORBA specification is the Object Request Broker (ORB), a common communication
software bus for objects. An ORB makes it possible for CORBA objects to communicate
with each other by connecting objects making requests (clients) with objects servicing
requests (servers). Interoperability is implemented by ORB to ORB communication. A
CORBA ORB transparently handles object location, object activation, parameter
marshalling, fault recovery, and security. Figure 6 shows the structure of an ORB in
terms of the various interfaces supported by it.
Distributed Component Object Model Distributed Component Object Model
(DCOM) is an extension of COM developed by Microsoft in 1996. It allows two objects,
one acting as a client and the other acting as the server object, to communicate regardless
of whether the two objects are on the same or on different machines. This communication
structure is achieved using a proxy object in the client and a stub in the server. When
client and component reside on different machines, DCOM simply replaces the local
inter- process communication with a network protocol. The COM run-time provides
object-oriented services to clients and components and uses DCE-RPC and the security
provider to generate standard network packets that conform to the DCOM wire-protocol
standard. A DCOM object has one or more interfaces that a client accesses via interface
pointers.
It is not possible to directly access an object itself; it is accessible only through its
interfaces. Thus, a DCOM object is completely defined by the interfaces that comprise it.
Each DCOM interface is unique in the system. A Globally Unique Identifier (GUID – a
128 bit integer that guarantees uniqueness in space and time for an interface, an object or
a class) allows them to be uniquely named. A DCOM interface is not modifiable; if a new
function is added or if the semantics of an existing function changes, a new interface is
added and a new GUID is assigned to it.
2.4.3: Service-Oriented Architecture
Service-Oriented Architecture Several definitions exist for what constitutes an SOA.
Some take a technical perspective, some others a business perspective and a few define
SOA from an architectural perspective. For example, the W3C (World Wide Web
Consortium) takes a technical perspective and defines SOA as “A set of components
which can be invoked, and whose interface descriptions can be published and
discovered”. This is not very clear as it describes architecture as a technical
implementation and not in the sense the term “architecture” is generally used – to describe
a style or set of practices. A more helpful definition of SOA from an architectural
perspective is provided in MSDN magazine where SOA is defined as “an architecture for
a system or application that is built using a set of services”. A SOA defines application
functionality as a set of shared, reusable services. However, it is not just a system that is
built as a set of services. An application or a system built using SOA could still contain
code that implements functionality specific to that application. On the other hand, all of
the application’s functionality could be made up of services. Some of the other definitions
of SOA include:
“Service-Oriented Architecture is an approach to organizing information
technology in which data, logic, and infrastructure resources are accessed by
routing messages between network interfaces.” [Microsoft 2006b]
“A service-oriented architecture (SOA) is an application framework that
takes everyday business applications and breaks them down into individual
business functions and processes, called services. An SOA lets you build, deploy
and integrate these services independent of applications and the computing
platforms on which they run.” [IBM 2006]

The four tenets of SOA define desirable characteristics of a service
[Microsoft 2004a]:
Service boundaries are explicit.
Services are autonomous.
Services share schema and contract not types.
Service compatibility is based on policy. The most fundamental form of
SOA consists of three components – a Service Consumer, a Service and a Service
directory. These three components interact with each other to provide achieve
automation.
2.4.4: Peer to Peer Architecture
Peer-to-peer (P2P) computing or networking is a distributed application architecture that
partitions tasks or workloads among peers. Peers are equally privileged, equipotent
participants in the application. They are said to form a peer-to-peer network of nodes.
Peers make a portion of their resources, such as processing power, disk storage or
network bandwidth, directly available to other network participants, without the need for
central coordination by servers or stable hosts. Peers are both suppliers and consumers of
resources, in contrast to the traditional client–server model where only servers supply
(send), and clients consume (receive).

The peer-to-peer application structure was popularized by file sharing systems like
Napster. The concept has inspired new structures and philosophies in many areas of
human interaction. Peer-to-peer networking is not restricted to technology, but covers
also social processes with a peer-to-peer dynamic. In such context, social peer-to-peer
processes are currently emerging throughout society.

Distributed System Architecture

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Distributed System Architecture

Uploaded by

Copyright:

Available Formats

DISTRIBUTED SYSTEM ARCHITECTURE

layman can use it.

2.1: MILESTONES IN DISTRIBUTED COMPUTING SYSTEMS

First Generations (Vacuum Tubes and Plug Boards) - The 1940's

those systems. One such system is shown in figure 2.1 below.

Figure2.1 (a): Large computers occupying big rooms

Figure2.1 (b): Plug Board with mechanical switches

These were known as single user serial processing systems.

Figure 2.2: Batch Processing System

was designed to monitor weapons systems.

Third Generation (Timesharing and Multiprogramming) - The 1960's

through phases of computation and performing some input output operation.

Figure 2.3: Several jobs in memory in a multiprogramming system

memory. Hence the CPU was never idle in these systems.

operating systems. Another technique called spooling (simultaneous peripheral

subtle. In multiprogramming, the computer executes one program until it reaches a

logical stopping point, such as an input/output event, whereas in timesharing systems

(Transmission Control Protocol/Internet Protocol) and hence it became widely popular.

Figure 2.4: First IBM PC, IBM 5150

microprocessor developed in 1971 and it was a 4-bit microprocessor. Other

(Control Program for Micros).

Distributed processing and client-server processing - The 1980’s

single computer which execute programs. In distributed processing execution of a

efficient manner. Another example of distributed processing is local-area network

(LANs) where single program runs simultaneously at various sites.

Clusters – The 1990’s

centralized resource manager manages various resources.

Grid Computing - Late 1990’s

computer in a Grid acts as an individual entity with it’s standalone identity.

Figure 2.5: Grid Architecture

independently and send back the results to the server.

Cloud Computing - Late 1990’s

In cloud computing the resources of computing are provided as a service instead of a

provided to computers and other devices as a utility. We can compare this to an

typically the Internet [16].

where he defined it as a new computing paradigm where the boundaries of computing

will be determined by economic rationale rather than technical limits alone.

Salesforce.com introduced the concept of delivering enterprise applications via a simple

Cloud Computing versus Grid Computing

from various nodes and gives one single result.

use difference services without investing in the underlying architecture. Although, in a

complexity of buying, configuring, and managing the hardware as well as software

delivered as a service over the Internet (the cloud).

2.2: MODELS OF COMPUTATION

As we have seen in the previous section as technology developed we moved from

2.2.1: Centralized System model

file structures are done on the Terminal or Application Server.

• Centralized Computing and file storage.

• Redundant technologies incorporated to ensure reduced downtime.

improving meantime before failure.

• Centralized management of all users, processes, applications, back-ups and

• Usually has lower cost of ownership, when measured over 3 + years.

• User access to soft media drives are removed.

from the local client

deployed in many smaller and medium sized businesses.

2.2.2: Distributed System Model

• Each user has control of their own equipment, to a reasonable degree.

• Sometimes cheaper up front capital cost.

• Typical lifespan of 3 years (maybe stretch to 5 with questionable results).

• Larger vulnerability to security threats (both internal & external).

• Usually has higher cost of ownership, when measured over 3 + years.

2.3: CLASSIFICATION BASED ON DESIGN

computing systems. These are broadly classified as:

The minicomputer model is a simple extension of the centralized time-sharing system. As