You are on page 1of 75

SYLLABUS

BCSE1-660 DISTRIBUTED COMPUTING

UNIT-I
INTRODUCTION: Introduction, Examples of Distributed Systems, Trends in Distributed
Systems, Focus on resource sharing, Challenges. Case study: World Wide Web.
COMMUNICATION IN DISTRIBUTED SYSTEM: System Model, Inter process
Communication, the API for internet protocols, External data representation and Multicast
communication.
Network Virtualization: Overlay networks. Case study: MPI.
UNIT-II
REMOTE METHOD INVOCATION AND OBJECTS- Remote Invocation, Introduction,
Request reply protocols, Remote procedure call, Remote method invocation.
Case study: Java RMI - Group communication, Publish-subscribe systems, Message queues,
shared memory approaches, Distributed objects, CORBA- from objects to components

UNIT-III
PEER TO PEER SERVICES AND FILE SYSTEM- Peer-to-peer Systems, Introduction,
Napster and its legacy, Peer-to-peer, Middleware, Routing overlays. Overlay case studies: Pastry,
Tapestry, Distributed File Systems, Introduction - File service architecture, Andrew File system.
192 CS-Engg&Tech-SRM-2013

UNIT-IV
SYNCHRONIZATION AND REPLICATION- Introduction, Clocks, events and process
states, synchronizing physical clocks, Logical time and logical clocks, Global states,
Coordination and Agreement, Introduction, distributed mutual exclusion, Elections, Transactions
and Concurrency Control, Transactions -Nested transactions, Locks, Optimistic concurrency
control, Timestamp ordering -Distributed deadlocks, Replication, Case study - Coda
Devraj Institute of management and technology, Ferozepur

Subject: Distributed Computing Branch: B tech CSE 6th sem

Assignment No.1
Q.1: Explain in deatails Distributed system?
Ans. A distributed system contains multiple nodes that are physically separate but linked
together using the network. All the nodes in this system communicate with each other and handle
processes in tandem. Each of these nodes contains a small part of the distributed operating
system software.
A diagram to better explain the distributed system is:

Types of Distributed Systems

The nodes in the distributed systems can be arranged in the form of client/server systems or peer
to peer systems. Details about these are as follows:

Client/Server Systems

In client server systems, the client requests a resource and the server provides that resource. A
server may serve multiple clients at the same time while a client is in contact with only one
server. Both the client and server usually communicate via a computer network and so they are a
part of distributed systems.

Peer to Peer Systems

The peer to peer systems contains nodes that are equal participants in data sharing. All the tasks
are equally divided between all the nodes. The nodes interact with each other as required as share
resources. This is done with the help of a network.

Advantages of Distributed Systems

Some advantages of Distributed Systems are as follows:

 All the nodes in the distributed system are connected to each other. So nodes can easily
share data with other nodes.
 More nodes can easily be added to the distributed system i.e. it can be scaled as required.
 Failure of one node does not lead to the failure of the entire distributed system. Other
nodes can still communicate with each other.
 Resources like printers can be shared with multiple nodes rather than being restricted to
just one.
Disadvantages of Distributed Systems

Some disadvantages of Distributed Systems are as follows:

 It is difficult to provide adequate security in distributed systems because the nodes as


well as the connections need to be secured.
 Some messages and data can be lost in the network while moving from one node to
another.
 The database connected to the distributed systems is quite complicated and difficult to
handle as compared to a single user system.
 Overloading may occur in the network if all the nodes of the distributed system try to
send data at once.
Q2.Define architecture models in Distributed sysytem.

Ans. Software architecture involves the high level structure of software system abstraction, by
using decomposition and composition, with architectural style and quality attributes. A software
architecture design must conform to the major functionality and performance requirements of
the system, as well as satisfy the non-functional requirements such as reliability, scalability,
portability, and availability.

A software architecture must describe its group of components, their connections, interactions
among them and deployment configuration of all components.

A software architecture can be defined in many ways −

 UML (Unified Modeling Language) − UML is one of object-oriented solutions used in


software modeling and design.

 Architecture View Model (4+1 view model) − Architecture view model represents the
functional and non-functional requirements of software application.

 ADL (Architecture Description Language) − ADL defines the software architecture


formally and semantically.

UML
UML stands for Unified Modeling Language. It is a pictorial language used to make software
blueprints. UML was created by Object Management Group (OMG). The UML 1.0
specification draft was proposed to the OMG in January 1997. It serves as a standard for
software requirement analysis and design documents which are the basis for developing a
software.

UML can be described as a general purpose visual modeling language to visualize, specify,
construct, and document a software system. Although UML is generally used to model software
system, it is not limited within this boundary. It is also used to model non software systems such
as process flows in a manufacturing unit.

The elements are like components which can be associated in different ways to make a
complete UML picture, which is known as a diagram. So, it is very important to understand the
different diagrams to implement the knowledge in real-life systems. We have two broad
categories of diagrams and they are further divided into sub-categories i.e. Structural
Diagrams and Behavioral Diagrams.
Structural Diagrams
Structural diagrams represent the static aspects of a system. These static aspects represent those
parts of a diagram which forms the main structure and is therefore stable.

These static parts are represented by classes, interfaces, objects, components and nodes.
Structural diagrams can be sub-divided as follows −

 Class diagram

 Object diagram

 Component diagram

 Deployment diagram

 Package diagram

 Composite structure

The following table provides a brief description of these diagrams −

Sr.No. Diagram & Description

1
Class

Represents the object orientation of a system. Shows how classes are statically related.

2
Object

Represents a set of objects and their relationships at runtime and also represent the static
view of the system.

3
Component

Describes all the components, their interrelationship, interactions and interface of the
system.

4
Composite structure

Describes inner structure of component including all classes, interfaces of the component,
etc.

5
Package

Describes the package structure and organization. Covers classes in the package and
packages within another package.

6
Deployment

Deployment diagrams are a set of nodes and their relationships. These nodes are physical
entities where the components are deployed.

Behavioral Diagrams
Behavioral diagrams basically capture the dynamic aspect of a system. Dynamic aspects are
basically the changing/moving parts of a system. UML has the following types of behavioral
diagrams −

 Use case diagram

 Sequence diagram

 Communication diagram

 State chart diagram

 Activity diagram

 Interaction overview

 Time sequence diagram

The following table provides a brief description of these diagram −

Sr.No. Diagram & Description

1
Use case

Describes the relationships among the functionalities and their internal/external controllers.
These controllers are known as actors.
2
Activity

Describes the flow of control in a system. It consists of activities and links. The flow can be
sequential, concurrent, or branched.

3
State Machine/state chart

Represents the event driven state change of a system. It basically describes the state change
of a class, interface, etc. Used to visualize the reaction of a system by internal/external
factors.

4
Sequence

Visualizes the sequence of calls in a system to perform a specific functionality.

5
Interaction Overview

Combines activity and sequence diagrams to provide a control flow overview of system
and business process.

6
Communication

Same as sequence diagram, except that it focuses on the object’s role. Each communication
is associated with a sequence order, number plus the past messages.

7
Time Sequenced

Describes the changes by messages in state, condition and events.

Architecture View Model


A model is a complete, basic, and simplified description of software architecture which is
composed of multiple views from a particular perspective or viewpoint.

A view is a representation of an entire system from the perspective of a related set of concerns.
It is used to describe the system from the viewpoint of different stakeholders such as end-users,
developers, project managers, and testers.

4+1 View Model


The 4+1 View Model was designed by Philippe Kruchten to describe the architecture of a
software–intensive system based on the use of multiple and concurrent views. It is a multiple
view model that addresses different features and concerns of the system. It standardizes the
software design documents and makes the design easy to understand by all stakeholders.

It is an architecture verification method for studying and documenting software architecture


design and covers all the aspects of software architecture for all stakeholders. It provides four
essential views −

 The logical view or conceptual view − It describes the object model of the design.

 The process view − It describes the activities of the system, captures the concurrency
and synchronization aspects of the design.

 The physical view − It describes the mapping of software onto hardware and reflects its
distributed aspect.

 The development view − It describes the static organization or structure of the software
in its development of environment.

This view model can be extended by adding one more view called scenario view or use case
view for end-users or customers of software systems. It is coherent with other four views and
are utilized to illustrate the architecture serving as “plus one” view, (4+1) view model. The
following figure describes the software architecture using five concurrent views (4+1) model.
Q3. Define Architecture Description Languages.
Ans. An ADL is a language that provides syntax and semantics for defining a software
architecture. It is a notation specification which provides features for modeling a software
system’s conceptual architecture, distinguished from the system’s implementation.

ADLs must support the architecture components, their connections, interfaces, and
configurations which are the building block of architecture description. It is a form of
expression for use in architecture descriptions and provides the ability to decompose
components, combine the components, and define the interfaces of components.

An architecture description language is a formal specification language, which describes the


software features such as processes, threads, data, and sub-programs as well as hardware
component such as processors, devices, buses, and memory.

It is hard to classify or differentiate an ADL and a programming language or a modeling


language. However, there are following requirements for a language to be classified as an ADL

 It should be appropriate for communicating the architecture to all concerned parties.

 It should be suitable for tasks of architecture creation, refinement, and validation.

 It should provide a basis for further implementation, so it must be able to add


information to the ADL specification to enable the final system specification to be
derived from the ADL.

 It should have the ability to represent most of the common architectural styles.

 It should support analytical capabilities or provide quick generating prototype


implementations.

The object-oriented (OO) paradigm took its shape from the initial concept of a new
programming approach, while the interest in design and analysis methods came much later. OO
analysis and design paradigm is the logical result of the wide adoption of OO programming
languages.

 The first object–oriented language was Simula (Simulation of real systems) that was
developed in 1960 by researchers at the Norwegian Computing Center.
 In 1970, Alan Kay and his research group at Xerox PARC created a personal computer
named Dynabook and the first pure object-oriented programming language (OOPL) -
Smalltalk, for programming the Dynabook.

 In the 1980s, Grady Booch published a paper titled Object Oriented Design that mainly
presented a design for the programming language, Ada. In the ensuing editions, he
extended his ideas to a complete object–oriented design method.

 In the 1990s, Coad incorporated behavioral ideas to object-oriented methods.

The other significant innovations were Object Modeling Techniques (OMT) by James Rum
Baugh and Object-Oriented Software Engineering (OOSE) by Ivar Jacobson.

Introduction to OO Paradigm
OO paradigm is a significant methodology for the development of any software. Most of the
architecture styles or patterns such as pipe and filter, data repository, and component-based can
be implemented by using this paradigm.

Basic concepts and terminologies of object–oriented systems −

Object
An object is a real-world element in an object–oriented environment that may have a physical or
a conceptual existence. Each object has −

 Identity that distinguishes it from other objects in the system.

 State that determines characteristic properties of an object as well as values of properties


that the object holds.

 Behavior that represents externally visible activities performed by an object in terms of


changes in its state.

Objects can be modeled according to the needs of the application. An object may have a
physical existence, like a customer, a car, etc.; or an intangible conceptual existence, like a
project, a process, etc.

Class
A class represents a collection of objects having same characteristic properties that exhibit
common behavior. It gives the blueprint or the description of the objects that can be created
from it. Creation of an object as a member of a class is called instantiation. Thus, an object is
an instance of a class.

The constituents of a class are −

 A set of attributes for the objects that are to be instantiated from the class. Generally,
different objects of a class have some difference in the values of the attributes.
Attributes are often referred as class data.

 A set of operations that portray the behavior of the objects of the class. Operations are
also referred as functions or methods.

Example

Let us consider a simple class, Circle, that represents the geometrical figure circle in a two–
dimensional space. The attributes of this class can be identified as follows −

 x–coord, to denote x–coordinate of the center

 y–coord, to denote y–coordinate of the center

 a, to denote the radius of the circle

Some of its operations can be defined as follows −

 findArea(), a method to calculate area

 findCircumference(), a method to calculate circumference

 scale(), a method to increase or decrease the radius


Encapsulation
Encapsulation is the process of binding both attributes and methods together within a class.
Through encapsulation, the internal details of a class can be hidden from outside. It permits the
elements of the class to be accessed from outside only through the interface provided by the
class.

Polymorphism
Polymorphism is originally a Greek word that means the ability to take multiple forms. In
object-oriented paradigm, polymorphism implies using operations in different ways, depending
upon the instances they are operating upon. Polymorphism allows objects with different internal
structures to have a common external interface. Polymorphism is particularly effective while
implementing inheritance.

Example

Let us consider two classes, Circle and Square, each with a method findArea(). Though the
name and purpose of the methods in the classes are same, the internal implementation, i.e., the
procedure of calculating an area is different for each class. When an object of class Circle
invokes its findArea() method, the operation finds the area of the circle without any conflict
with the findArea() method of the Square class.

Relationships

In order to describe a system, both dynamic (behavioral) and static (logical) specification of a
system must be provided. The dynamic specification describes the relationships among objects
e.g. message passing. And static specification describe the relationships among classes, e.g.
aggregation, association, and inheritance.

Message Passing
Any application requires a number of objects interacting in a harmonious manner. Objects in a
system may communicate with each other by using message passing. Suppose a system has two
objects − obj1 and obj2. The object obj1 sends a message to object obj2, if obj1 wants obj2 to
execute one of its methods.

Composition or Aggregation
Aggregation or composition is a relationship among classes by which a class can be made up of
any combination of objects of other classes. It allows objects to be placed directly within the
body of other classes. Aggregation is referred as a “part–of” or “has–a” relationship, with the
ability to navigate from the whole to its parts. An aggregate object is an object that is composed
of one or more other objects.

Association
Association is a group of links having common structure and common behavior. Association
depicts the relationship between objects of one or more classes. A link can be defined as an
instance of an association. The Degree of an association denotes the number of classes involved
in a connection. The degree may be unary, binary, or ternary.

 A unary relationship connects objects of the same class.


 A binary relationship connects objects of two classes.

 A ternary relationship connects objects of three or more classes.


Inheritance
It is a mechanism that permits new classes to be created out of existing classes by extending and
refining its capabilities. The existing classes are called the base classes/parent classes/super-
classes, and the new classes are called the derived classes/child classes/subclasses.

The subclass can inherit or derive the attributes and methods of the super-class (es) provided
that the super-class allows so. Besides, the subclass may add its own attributes and methods and
may modify any of the super-class methods. Inheritance defines a “is – a” relationship.

Example

From a class Mammal, a number of classes can be derived such as Human, Cat, Dog, Cow, etc.
Humans, cats, dogs, and cows all have the distinct characteristics of mammals. In addition, each
has its own particular characteristics. It can be said that a cow “is – a” mammal.

Q.4 Explain in deatails application processes.

Ans. Two remote application processes can communicate mainly in two different fashions:

 Peer-to-peer: Both remote processes are executing at same level and they exchange data
using some shared resource.

 Client-Server: One remote process acts as a Client and requests some resource from another
application process acting as Server.

In client-server model, any process can act as Server or Client. It is not the type of machine, size
of the machine, or its computing power which makes it server; it is the ability of serving request
that makes a machine a server.

A system can act as Server and Client simultaneously. That is, one process is acting as Server
and another is acting as a client. This may also happen that both client and server processes
reside on the same machine.

Communication
Two processes in client-server model can interact in various ways:
 Sockets

 Remote Procedure Calls (RPC)

A system can act as Server and Client simultaneously. That is, one process is acting as Server
and another is acting as a client. This may also happen that both client and server processes
reside on the same machine.

Communication
Two processes in client-server model can interact in various ways:

 Sockets

 Remote Procedure Calls (RPC)

Sockets
In this paradigm, the process acting as Server opens a socket using a well-known (or known by
client) port and waits until some client request comes. The second process acting as a Client
also opens a socket but instead of waiting for an incoming request, the client processes ‘requests
first’.
When the request is reached to server, it is served. It can either be an information sharing or
resource request.

Remote Procedure Call


This is a mechanism where one process interacts with another by means of procedure calls. One
process (client) calls the procedure lying on remote host. The process on remote host is said to
be Server. Both processes are allocated stubs. This communication happens in the following
way:

 The client process calls the client stub. It passes all the parameters pertaining to program
local to it.

 All parameters are then packed (marshalled) and a system call is made to send them to other
side of the network.

 Kernel sends the data over the network and the other end receives it.

 The remote host passes data to the server stub where it is unmarshalled.

 The parameters are passed to the procedure and the procedure is then executed.

 The result is sent back to the client in the same manner.

There are several protocols which work for users in Application Layer. Application layer
protocols can be broadly divided into two categories:
 Protocols which are used by users.For email for example, eMail.

 Protocols which help and support protocols used by users.For example DNS.

Few of Application layer protocols are described below:

Domain Name System


The Domain Name System (DNS) works on Client Server model. It uses UDP protocol for
transport layer communication. DNS uses hierarchical domain based naming scheme. The DNS
server is configured with Fully Qualified Domain Names (FQDN) and email addresses mapped
with their respective Internet Protocol addresses.

A DNS server is requested with FQDN and it responds back with the IP address mapped with it.
DNS uses UDP port 53.

Simple Mail Transfer Protocol


The Simple Mail Transfer Protocol (SMTP) is used to transfer electronic mail from one user to
another. This task is done by means of email client software (User Agents) the user is using.
User Agents help the user to type and format the email and store it until internet is available.
When an email is submitted to send, the sending process is handled by Message Transfer Agent
which is normally comes inbuilt in email client software.

Message Transfer Agent uses SMTP to forward the email to another Message Transfer Agent
(Server side). While SMTP is used by end user to only send the emails, the Servers normally
use SMTP to send as well as receive emails. SMTP uses TCP port number 25 and 587.

Client software uses Internet Message Access Protocol (IMAP) or POP protocols to receive
emails.

File Transfer Protocol


The File Transfer Protocol (FTP) is the most widely used protocol for file transfer over the
network. FTP uses TCP/IP for communication and it works on TCP port 21. FTP works on
Client/Server Model where a client requests file from Server and server sends requested
resource back to the client.

FTP uses out-of-band controlling i.e. FTP uses TCP port 20 for exchanging controlling
information and the actual data is sent over TCP port 21.
The client requests the server for a file. When the server receives a request for a file, it opens a
TCP connection for the client and transfers the file. After the transfer is complete, the server
closes the connection. For a second file, client requests again and the server reopens a new TCP
connection.

Post Office Protocol (POP)


The Post Office Protocol version 3 (POP 3) is a simple mail retrieval protocol used by User
Agents (client email software) to retrieve mails from mail server.

When a client needs to retrieve mails from server, it opens a connection with the server on TCP
port 110. User can then access his mails and download them to the local computer. POP3 works
in two modes. The most common mode the delete mode, is to delete the emails from remote
server after they are downloaded to local machines. The second mode, the keep mode, does not
delete the email from mail server and gives the user an option to access mails later on mail
server.

Hyper Text Transfer Protocol (HTTP)


The Hyper Text Transfer Protocol (HTTP) is the foundation of World Wide Web. Hypertext is
well organized documentation system which uses hyperlinks to link the pages in the text
documents. HTTP works on client server model. When a user wants to access any HTTP page
on the internet, the client machine at user end initiates a TCP connection to server on port 80.
When the server accepts the client request, the client is authorized to access web pages.

To access the web pages, a client normally uses web browsers, who are responsible for
initiating, maintaining, and closing TCP connections. HTTP is a stateless protocol, which
means the Server maintains no information about earlier requests by clients.

Q.5 Define the Challenges in distributed system.

Ans. Designing a distributed system does not come as easy and straight forward. A number
of challenges need to be overcome in order to get the ideal system. The major challenges in
distributed systems are listed below:

1. Openness

The openness of a computer system is the characteristic that determines whether the system can
be extended and re-implemented in various ways.The openness of distributed systems is
determined primarily by the degree to which new resource-sharing services can be added and be
made available for use by a variety of client programs.
2. Security

Many of the information resources that are made available and maintained in distributed systems
have a high intrinsic value to their users.Their security is therefore of considerable importance.
Security for information resources has three components: confidentiality, integrity, and
availability.

3. Scalability

Distributed systems operate effectively and efficiently at many different scales, ranging from a
small intranet to the Internet. A system is described as scalable if it will remain effective when
there is a significant increase in the number of resources and the number of users.

4. Failure handling

Computer systems sometimes fail. When faults occur in hardware or software, programs may
produce incorrect results or may stop before they have completed the intended computation.
Failures in a distributed system are partial – that is, some components fail while others continue
to function. Therefore the handling of failures is particularly difficult.

5. Concurrency

Both services and applications provide resources that can be shared by clients in a distributed
system. There is therefore a possibility that several clients will attempt to access a shared
resource at the same time. Object that represents a shared resource in a distributed system must
be responsible for ensuring that it operates correctly in a concurrent environment. This applies
not only to servers but also to objects in applications. Therefore any programmer who takes an
implementation of an object that was not intended for use in a distributed system must do
whatever is necessary to make it safe in a concurrent environment.

6. Transparency
Transparency can be achieved at two different levels. Easiest to do is to hide the distribution
from the users. The concept of transparency can be applied to several aspects of a distributed
system.

a) Location transparency: The users cannot tell where resources are located

b) Migration transparency: Resources can move at will without changing their names

c) Replication transparency: The users cannot tell how many copies exist.

d) Concurrency transparency: Multiple users can share resources automatically.

e) Parallelism transparency: Activities can happen in parallel without users knowing.

7. Quality of service

Once users are provided with the functionality that they require of a service, such as the file
service in a distributed system, we can go on to ask about the quality of the service provided. The
main nonfunctional properties of systems that affect the quality of the service experienced by
clients and users are reliability, security and performance. Adaptability to meet changing system
configurations and resource availability has been recognized as a further important aspect of
service quality.

8. Reliability

One of the original goals of building distributed systems was to make them more reliable than
single-processor systems. The idea is that if a machine goes down, some other machine takes
over the job. A highly reliable system must be highly available, but that is not enough. Data
entrusted to the system must not be lost or garbled in any way, and if files are stored redundantly
on multiple servers, all the copies must be kept consistent. In general, the more copies that are
kept, the better the availability, but the greater the chance that they will be inconsistent,
especially if updates are frequent.

9. Performance

Always the hidden data in the background is the issue of performance. Building a transparent,
flexible, reliable distributed system, more important lies in its performance. In particular, when
running a particular application on a distributed system, it should not be appreciably worse than
running the same application on a single processor. Unfortunately, achieving this is easier said
than done.
Devraj Institute of management and technology, Ferozepur

Subject: Distributed Computing Branch: B tech CSE 6th sem

Assignment No.2
Q.1 Casecase in World Wide Web.
Ans. WWW stands for World Wide Web. A technical definition of the World Wide Web is :
all the resources and users on the Internet that are using the Hypertext Transfer Protocol
(HTTP).
A broader definition comes from the organization that Web inventor Tim Berners-Lee helped
found, the World Wide Web Consortium (W3C).
The World Wide Web is the universe of network-accessible information, an embodiment of
human knowledge.

In simple terms, The World Wide Web is a way of exchanging information between computers
on the Internet, tying them together into a vast collection of interactive multimedia resources.
Internet and Web is not the same thing: Web uses internet to pass over the information.

Evolution
World Wide Web was created by Timothy Berners Lee in 1989 at CERN in Geneva. World
Wide Web came into existence as a proposal by him, to allow researchers to work together
effectively and efficiently at CERN. Eventually it became World Wide Web.
The following diagram briefly defines evolution of World Wide Web:

WWW Architecture
WWW architecture is divided into several layers as shown in the following diagram:
Identifiers and Character Set
Uniform Resource Identifier (URI) is used to uniquely identify resources on the web
and UNICODE makes it possible to built web pages that can be read and write in human
languages.

Syntax
XML (Extensible Markup Language) helps to define common syntax in semantic web.

Data Interchange
Resource Description Framework (RDF) framework helps in defining core representation of
data for web. RDF represents data about resource in graph form.

Taxonomies
RDF Schema (RDFS) allows more standardized description of taxonomiesand
other ontological constructs.

Ontologies
Web Ontology Language (OWL) offers more constructs over RDFS. It comes in following
three versions:

 OWL Lite for taxonomies and simple constraints.

 OWL DL for full description logic support.

 OWL for more syntactic freedom of RDF

Rules
RIF and SWRL offers rules beyond the constructs that are available
from RDFs and OWL. Simple Protocol and RDF Query Language (SPARQL) is SQL like
language used for querying RDF data and OWL Ontologies.

Proof
All semantic and rules that are executed at layers below Proof and their result will be used to
prove deductions.

Cryptography
Cryptography means such as digital signature for verification of the origin of sources is used.

User Interface and Applications


On the top of layer User interface and Applications layer is built for user interaction.
WWW Operation
WWW works on client- server approach. Following steps explains how the web works:

1. User enters the URL (say, http://www.tutorialspoint.com) of the web page in the
address bar of web browser.

2. Then browser requests the Domain Name Server for the IP address corresponding to
www.tutorialspoint.com.

3. After receiving IP address, browser sends the request for web page to the web server
using HTTP protocol which specifies the way the browser and web server
communicates.

4. Then web server receives request using HTTP protocol and checks its search for the
requested web page. If found it returns it back to the web browser and close the HTTP
connection.

5. Now the web browser receives the web page, It interprets it and display the contents of
web page in web browser’s window.

Future
There had been a rapid development in field of web. It has its impact in almost every area such
as education, research, technology, commerce, marketing etc. So the future of web is almost
unpredictable.

Apart from huge development in field of WWW, there are also some technical issues that W3
consortium has to cope up with.

User Interface
Work on higher quality presentation of 3-D information is under deveopment. The W3
Consortium is also looking forward to enhance the web to full fill requirements of global
communities which would include all regional languages and writing systems.

Technology
Work on privacy and security is under way. This would include hiding information, accounting,
access control, integrity and risk management.

Architecture
There has been huge growth in field of web which may lead to overload the internet and
degrade its performance. Hence more better protocol are required to be developed.

Q.2 Explain in detail client server.


Ans. In client server computing, the clients requests a resource and the server provides that
resource. A server may serve multiple clients at the same time while a client is in contact with
only one server. Both the client and server usually communicate via a computer network but
sometimes they may reside in the same system.

An illustration of the client server system is given as follows:


Characteristics of Client Server Computing

The salient points for client server computing are as follows:

 The client server computing works with a system of request and response. The client
sends a request to the server and the server responds with the desired information.
 The client and server should follow a common communication protocol so they can
easily interact with each other. All the communication protocols are available at the
application layer.
 A server can only accommodate a limited number of client requests at a time. So it uses a
system based to priority to respond to the requests.
 Denial of Service attacks hindera servers ability to respond to authentic client requests by
inundating it with false requests.
 An example of a client server computing system is a web server. It returns the web pages
to the clients that requested them.
Difference between Client Server Computing and Peer to Peer Computing

The major differences between client server computing and peer to peer computing are as
follows:

 In client server computing, a server is a central node that services many client nodes. On
the other hand, in a peer to peer system, the nodes collectively use their resources and
communicate with each other.
 In client server computing the server is the one that communicates with the other nodes.
In peer to peer to computing, all the nodes are equal and share data with each other
directly.
 Client Server computing is believed to be a subcategory of the peer to peer computing.

Advantages of Client Server Computing

The different advantages of client server computing are:

 All the required data is concentrated in a single place i.e. the server. So it is easy to
protect the data and provide authorisation and authentication.
 The server need not be located physically close to the clients. Yet the data can be
accessed efficiently.
 It is easy to replace, upgrade or relocate the nodes in the client server model because all
the nodes are independent and request data only from the server.
 All the nodes i.e clients and server may not be build on similar platforms yet they can
easily facilitate the transfer of data.

Disadvantages of Client Server Computing

The different disadvantages of client server computing are:

 If all the clients simultaneously request data from the server, it may get overloaded. This
may lead to congestion in the network.
 If the server fails for any reason, then none of the requests of the clients can be fulfilled.
This leads of failure of the client server network.
 The cost of setting and maintaining a client server model are quite high.

Q3. Explain the Peer to Peer system.


Ans. The peer to peer computing architecture contains nodes that are equal participants in data
sharing. All the tasks are equally divided between all the nodes. The nodes interact with each
other as required as share resources.

A diagram to better understand peer to peer computing is as follows:

Characteristics of Peer to Peer Computing

The different characteristics of peer to peer networks are as follows:


 Peer to peer networks are usually formed by groups of a dozen or less computers. These
computers all store their data using individual security but also share data with all the
other nodes.
 The nodes in peer to peer networks both use resources and provide resources. So, if the
nodes increase, then the resource sharing capacity of the peer to peer network increases.
This is different than client server networks where the server gets overwhelmed if the
nodes increase.
 Since nodes in peer to peer networks act as both clients and servers, it is difficult to
provide adequate security for the nodes. This can lead to denial of service attacks.
 Most modern operating systems such as Windows and Mac OS contain software to
implement peer to peer networks.

Advantages of Peer to Peer Computing

Some advantages of peer to peer computing are as follows:

 Each computer in the peer to peer network manages itself. So, the network is quite easy
to set up and maintain.
 In the client server network, the server handles all the requests of the clients. This
provision is not required in peer to peer computing and the cost of the server is saved.
 It is easy to scale the peer to peer network and add more nodes. This only increases the
data sharing capacity of the system.
 None of the nodes in the peer to peer network are dependent on the others for their
functioning.

Disadvantages of Peer to Peer Computing

Some disadvantages of peer to peer computing are as follows:

 It is difficult to backup the data as it is stored in different computer systems and there is
no central server.
 It is difficult to provide overall security in the peer to peer network as each system is
independent and contains its own data.
Q4. Define the the API for internet protocols.

Ans. Transmission Control Protocol (TCP)


TCP is a connection oriented protocol and offers end-to-end packet delivery. It acts as back
bone for connection.It exhibits the following key features:

 Transmission Control Protocol (TCP) corresponds to the Transport Layer of OSI Model.

 TCP is a reliable and connection oriented protocol.

 TCP offers:

o Stream Data Transfer.

o Reliability.

o Efficient Flow Control

o Full-duplex operation.

o Multiplexing.

 TCP offers connection oriented end-to-end packet delivery.

 TCP ensures reliability by sequencing bytes with a forwarding acknowledgement


number that indicates to the destination the next byte the source expect to receive.

 It retransmits the bytes not acknowledged with in specified time period.

TCP Services
TCP offers following services to the processes at the application layer:

 Stream Delivery Service

 Sending and Receiving Buffers

 Bytes and Segments

 Full Duplex Service

 Connection Oriented Service

 Reliable Service
STREAM DELIVER SERVICE
TCP protocol is stream oriented because it allows the sending process to send data as stream of
bytes and the receiving process to obtain data as stream of bytes.

SENDING AND RECEIVING BUFFERS


It may not be possible for sending and receiving process to produce and obtain data at same
speed, therefore, TCP needs buffers for storage at sending and receiving ends.

BYTES AND SEGMENTS


The Transmission Control Protocol (TCP), at transport layer groups the bytes into a packet.
This packet is called segment. Before transmission of these packets, these segments are
encapsulated into an IP datagram.

FULL DUPLEX SERVICE


Transmitting the data in duplex mode means flow of data in both the directions at the same
time.

CONNECTION ORIENTED SERVICE


TCP offers connection oriented service in the following manner:

1. TCP of process-1 informs TCP of process – 2 and gets its approval.

2. TCP of process – 1 and TCP of process – 2 and exchange data in both the two directions.

3. After completing the data exchange, when buffers on both sides are empty, the two
TCP’s destroy their buffers.

RELIABLE SERVICE
For sake of reliability, TCP uses acknowledgement mechanism.

Internet Protocol (IP)


Internet Protocol is connectionless and unreliable protocol. It ensures no guarantee of
successfully transmission of data.

In order to make it reliable, it must be paired with reliable protocol such as TCP at the transport
layer.

Internet protocol transmits the data in form of a datagram as shown in the following diagram:
Points to remember:

 The length of datagram is variable.

 The Datagram is divided into two parts: header and data.

 The length of header is 20 to 60 bytes.

 The header contains information for routing and delivery of the packet.

User Datagram Protocol (UDP)


Like IP, UDP is connectionless and unreliable protocol. It doesn’t require making a connection
with the host to exchange data. Since UDP is unreliable protocol, there is no mechanism for
ensuring that data sent is received.

UDP transmits the data in form of a datagram. The UDP datagram consists of five parts as
shown in the following diagram:

Points to remember:

 UDP is used by the application that typically transmit small amount of data at one time.

 UDP provides protocol port used i.e. UDP message contains both source and destination
port number, that makes it possible for UDP software at the destination to deliver the
message to correct application program.
File Transfer Protocol (FTP)
FTP is used to copy files from one host to another. FTP offers the mechanism for the same in
following manner:

 FTP creates two processes such as Control Process and Data Transfer Process at both
ends i.e. at client as well as at server.

 FTP establishes two different connections: one is for data transfer and other is for control
information.

 Control connection is made between control processes while Data Connection is


made between<="" b="" style="box-sizing: border-box;">

 FTP uses port 21 for the control connection and Port 20 for the data connection.

Trivial File Transfer Protocol (TFTP)


Trivial File Transfer Protocol is also used to transfer the files but it transfers the files without
authentication. Unlike FTP, TFTP does not separate control and data information. Since there is
no authentication exists, TFTP lacks in security features therefore it is not recommended to use
TFTP.

Key points

 TFTP makes use of UDP for data transport. Each TFTP message is carried in separate
UDP datagram.

 The first two bytes of a TFTP message specify the type of message.

 The TFTP session is initiated when a TFTP client sends a request to upload or download
a file.

 The request is sent from an ephemeral UDP port to the UDP port 69 of an TFTP server.
Telnet
Telnet is a protocol used to log in to remote computer on the internet. There are a number of
Telnet clients having user friendly user interface. The following diagram shows a person is
logged in to computer A, and from there, he remote logged into computer B.

Hyper Text Transfer Protocol (HTTP)


HTTP is a communication protocol. It defines mechanism for communication between browser
and the web server. It is also called request and response protocol because the communication
between browser and server takes place in request and response pairs.

HTTP Request
HTTP request comprises of lines which contains:
 Request line

 Header Fields

 Message body

Key Points

 The first line i.e. the Request line specifies the request method i.e. Get or Post.

 The second line specifies the header which indicates the domain name of the server from
where index.htm is retrieved.

HTTP Response
Like HTTP request, HTTP response also has certain structure. HTTP response contains:

 Status line

 Headers

 Message body

Q5. Explain the Inter process Communication.

Ans. Inter Process Communication (IPC) is a mechanism that involves communication of one
process with another process. This usually occurs only in one system.

Communication can be of two types −

 Between related processes initiating from only one process, such as parent and child
processes.

 Between unrelated processes, or two or more different processes.

Following are some important terms that we need to know before proceeding further on this
topic.

Pipes − Communication between two related processes. The mechanism is half duplex meaning
the first process communicates with the second process. To achieve a full duplex i.e., for the
second process to communicate with the first process another pipe is required.
FIFO − Communication between two unrelated processes. FIFO is a full duplex, meaning the
first process can communicate with the second process and vice versa at the same time.

Message Queues − Communication between two or more processes with full duplex capacity.
The processes will communicate with each other by posting a message and retrieving it out of
the queue. Once retrieved, the message is no longer available in the queue.

Shared Memory − Communication between two or more processes is achieved through a


shared piece of memory among all processes. The shared memory needs to be protected from
each other by synchronizing access to all the processes.

Semaphores − Semaphores are meant for synchronizing access to multiple processes. When
one process wants to access the memory (for reading or writing), it needs to be locked (or
protected) and released when the access is removed. This needs to be repeated by all the
processes to secure data.

Signals − Signal is a mechanism to communication between multiple processes by way of


signaling. This means a source process will send a signal (recognized by number) and the
destination process will handle it accordingly.

A program is a file containing the information of a process and how to build it during run time.
When you start execution of the program, it is loaded into RAM and starts executing.

Each process is identified with a unique positive integer called as process ID or simply PID
(Process Identification number). The kernel usually limits the process ID to 32767, which is
configurable. When the process ID reaches this limit, it is reset again, which is after the system
processes range. The unused process IDs from that counter are then assigned to newly created
processes.

The system call getpid() returns the process ID of the calling process.

#include <sys/types.h>

#include <unistd.h>
pid_t getpid(void);

This call returns the process ID of the calling process which is guaranteed to be unique. This
call is always successful and thus no return value to indicate an error.

Each process has its unique ID called process ID that is fine but who created it? How to get
information about its creator? Creator process is called the parent process. Parent ID or PPID
can be obtained through getppid() call.
Devraj Institute of management and technology, Ferozepur

Subject: Distributed Computing Branch: B tech CSE 6th sem

Assignment No.3
Q.1 Explain the Network Virtualization.

Ans. Virtualization is a technology that helps us to install different Operating Systems on a


hardware. They are completely separated and independent from each other. In Wikipedia, you
can find the definition as – “In computing, virtualization is a broad term that refers to the
abstraction of computer resources.

Virtualization hides the physical characteristics of computing resources from their users, their
applications or end users. This includes making a single physical resource (such as a server, an
operating system, an application or a storage device) appear to function as multiple virtual
resources. It can also include making multiple physical resources (such as storage devices or
servers) appear as a single virtual resource...”

Virtualization is often −

 The creation of many virtual resources from one physical resource.

 The creation of one virtual resource from one or more physical resource.
Types of Virtualization

Today the term virtualization is widely applied to a number of concepts, some of which are
described below −

 Server Virtualization

 Client & Desktop Virtualization

 Services and Applications Virtualization

 Network Virtualization

 Storage Virtualization
Let us now discuss each of these in detail.

Server Virtualization

It is virtualizing your server infrastructure where you do not have to use any more physical
servers for different purposes.

Client & Desktop Virtualization

This is similar to server virtualization, but this time is on the user’s site where you virtualize
their desktops. We change their desktops with thin clients and by utilizing the datacenter
resources.
Services and Applications Virtualization

The virtualization technology isolates applications from the underlying operating system and
from other applications, in order to increase compatibility and manageability. For example –
Docker can be used for that purpose.

Network Virtualization

It is a part of virtualization infrastructure, which is used especially if you are going to visualize
your servers. It helps you in creating multiple switching, Vlans, NAT-ing, etc.

The following illustration shows the VMware schema −


Storage Virtualization

This is widely used in datacenters where you have a big storage and it helps you to create,
delete, allocated storage to different hardware. This allocation is done through network
connection. The leader on storage is SAN. A schematic illustration is given below −

Q2. Case study in MPI.

Ans. Parallel computing is now as much a part of everyone’s life as personal computers, smart
phones, and other technologies are. You obviously understand this, because you have embarked
upon the MPI Tutorial website. Whether you are taking a class about parallel programming,
learning for work, or simply learning it because it’s fun, you have chosen to learn a skill that will
remain incredibly valuable for years to come. In my opinion, you have also taken the right path
to expanding your knowledge about parallel programming - by learning the Message Passing
Interface (MPI). Although MPI is lower level than most parallel programming libraries (for
example, Hadoop), it is a great foundation on which to build your knowledge of parallel
programming.

MPI’s design for the message passing model


Before starting the tutorial, I will cover a couple of the classic concepts behind MPI’s design of
the message passing model of parallel programming. The first concept is the notion of
a communicator. A communicator defines a group of processes that have the ability to
communicate with one another. In this group of processes, each is assigned a unique rank, and
they explicitly communicate with one another by their ranks.
The foundation of communication is built upon send and receive operations among processes. A
process may send a message to another process by providing the rank of the process and a
unique tag to identify the message. The receiver can then post a receive for a message with a
given tag (or it may not even care about the tag), and then handle the data accordingly.
Communications such as this which involve one sender and receiver are known as point-to-
pointcommunications.

There are many cases where processes may need to communicate with everyone else. For
example, when a master process needs to broadcast information to all of its worker processes. In
this case, it would be cumbersome to write code that does all of the sends and receives. In fact, it
would often not use the network in an optimal manner. MPI can handle a wide variety of these
types of collective communications that involve all processes.

Mixtures of point-to-point and collective communications can be used to create highly complex
parallel programs. In fact, this functionality is so powerful that it is not even necessary to start
describing the advanced mechanisms of MPI. We will save that until a later lesson. For now, you
should work on installing MPI on a single machine or launching an Amazon EC2 MPI cluster. If
you already have MPI installed, great! You can head over to the MPI Hello World lesson.

During this time, most parallel applications were in the science and research domains. The model
most commonly adopted by the libraries was the message passing model. What is the message
passing model? All it means is that an application passes messages among processes in order to
perform a task. This model works out quite well in practice for parallel applications. For
example, a master process might assign work to slave processes by passing them a message that
describes the work. Another example is a parallel merge sorting application that sorts data
locally on processes and passes results to neighboring processes to merge sorted lists. Almost
any parallel application can be expressed with the message passing model.

Since most libraries at this time used the same message passing model with only minor feature
differences among them, the authors of the libraries and others came together at the
Supercomputing 1992 conference to define a standard interface for performing message passing -
the Message Passing Interface. This standard interface would allow programmers to write
parallel applications that were portable to all major parallel architectures. It would also allow
them to use the features and models they were already used to using in the current popular
libraries.

Q3. Explain in detail Remote Invocation.

Ans. A remote procedure call is an interprocess communication technique that is used for client-

server based applications. It is also known as a subroutine call or a function call.

A client has a request message that the RPC translates and sends to the server. This request may

be a procedure or a function call to a remote server. When the server receives the request, it
sends the required response back to the client. The client is blocked while the server is

processing the call and only resumed execution after the server is finished.

The sequence of events in a remote procedure call are given as follows:


 The client stub is called by the client.
 The client stub makes a system call to send the message to the server and puts the
parameters in the message.
 The message is sent from the client to the server by the client’s operating system.
 The message is passed to the server stub by the server operating system.
 The parameters are removed from the message by the server stub.
 Then, the server procedure is called by the server stub.

A diagram that demonstrates this is as follows:

Advantages of Remote Procedure Call

Some of the advantages of RPC are as follows:


 Remote procedure calls support process oriented and thread oriented models.
 The internal message passing mechanism of RPC is hidden from the user.
 The effort to re-write and re-develop the code is minimum in remote procedure calls.
 Remote procedure calls can be used in distributed environment as well as the local
environment.
 Many of the protocol layers are omitted by RPC to improve performance.

Disadvantages of Remote Procedure Call

Some of the disadvantages of RPC are as follows:


 The remote procedure call is a concept that can be implemented in different ways. It is
not a standard.
 There is no flexibility in RPC for hardware architecture. It is only interaction based.
 There is an increase in costs because of remote procedure call.
RMI (Remote Method Invocation)

The RMI (Remote Method Invocation) is an API that provides a mechanism to create distributed
application in java. The RMI allows an object to invoke methods on an object running in another
JVM.

The RMI provides remote communication between the applications using two
objects stub and skeleton.

Understanding stub and skeleton

RMI uses stub and skeleton object for communication with the remote object.

A remote object is an object whose method can be invoked from another JVM. Let's understand
the stub and skeleton objects:

stub

The stub is an object, acts as a gateway for the client side. All the outgoing requests are routed
through it. It resides at the client side and represents the remote object. When the caller invokes
method on the stub object, it does the following tasks:
1. It initiates a connection with remote Virtual Machine (JVM),
2. It writes and transmits (marshals) the parameters to the remote Virtual Machine (JVM),
3. It waits for the result
4. It reads (unmarshals) the return value or exception, and
5. It finally, returns the value to the caller.
skeleton

The skeleton is an object, acts as a gateway for the server side object. All the incoming requests
are routed through it. When the skeleton receives the incoming request, it does the following
tasks:
1. It reads the parameter for the remote method
2. It invokes the method on the actual remote object, and
3. It writes and transmits (marshals) the result to the caller.

Q 4. What is Request reply protocols?

Ans. Request/Reply Communication: To implement request/reply communication, the BEA


Tuxedo system uses IPC message queues. Queues are the key to connectionless communication.
Each server is assigned an Inter-Process Communication (IPC) message queue called a request
queue and each client is assigned a reply queue. Therefore, rather than establishing and
maintaining a connection with a server, a client application can send requests to the server by
putting those requests on the server's queue, and then check and retrieve messages from the
server by pulling messages from its own reply queue.

The request/reply model is used for both synchronous and asynchronous service requests as
described in the following topics.

Synchronous Messaging

In a synchronous call, a client sends a request to a server, which performs the requested action
while the client waits. The server then sends the reply to the client, which receives the reply.
Synchronous Request/Reply Communication

Asynchronous Messaging?
In an asynchronous call, the BEA Tuxedo client does not wait for a service request it has
submitted to finish before undertaking other tasks. Instead, after issuing a request, the client
performs additional tasks (which may include issuing more requests). When a reply to the first
request is available, the client retrieves it.
Asynchronous Request/Reply Communication

Unsolicited Communication

The BEA Tuxedo system offers a powerful communication paradigm called unsolicited
notification. When unsolicited notification occurs, a BEA Tuxedo client receives a message that
it has never requested. This capability makes it possible for application clients to receive
notification of application-specific events as they occur, without having to request notification
explicitly in real time.

Unsolicited messages can be sent to client processes by name (tpbroadcast) or by an identifier


received with a previously processed message (tpnotify). Messages sent via tpbroadcast can
originate either in a service or in another client. You can target a narrow or wide audience. You
can send a message with or without guaranteed delivery to an individual client through point-to-
point notification (tpnotify), or you can send information to a group of clients (tpbroadcast). For
example, a server may alert a single client that the account about which the client is inquiring has
been closed. Or, a server may send a message to all the clients on a machine to remind the users
that the machine will be shut down for maintenance at a specific time.

Any process that wants to be notified about a particular event (such as a machine being shut
down for maintenance) can register a request, with the system, to be notified automatically.
Once registered, a client or server is informed whenever the specified event occurs. This type of
automatic communication about an event is called unsolicited notification.

Because there is no limit to the number of clients and servers that may generate events and
receive unsolicited notification about such events, the task of managing this category of
communication can become complex. The BEA Tuxedo system offers a tool for managing
unsolicited notification called the EventBroker.
Unsolicited Notification Messaging
Nested and Forwarded Service Requests?

Nested Requests
A powerful feature of the BEA Tuxedo system is that it allows services to act as clients and call
other services. Nesting is limited to two levels, which works particularly well in a 3-tier
client/server architecture, that is, a system that comprises a presentation logic layer, a business
logic layer, and a database layer. In such a system, the presentation layer is used to formulate a
request for a particular business function that involves one or more queries to a database.
Because nesting is limited to two levels, it does not degrade performance.
Nested Service Requests

Forwarded Requests
One alternative to nesting service requests is called request forwarding. Instead of processing a
client's request, a service can pass the request to another service. The second service, also, can
either process the request or pass it to another service.
Forwarded Service Requests

There is no limit to the number of times a request can be forwarded. Because a service that
forwards a request does not need to wait for a reply from the service receiving the request,
forwarding, unlike nesting requests, does not block servers. Forwarding, however, is not
supported by the X/OPEN protocol X/ATMI, which may be a problem in some applications.

Q5. Explain the Remote method invocation.

Ans. A remote procedure call is an interprocess communication technique that is used for client-
server based applications. It is also known as a subroutine call or a function call.

A client has a request message that the RPC translates and sends to the server. This request may
be a procedure or a function call to a remote server. When the server receives the request, it
sends the required response back to the client. The client is blocked while the server is
processing the call and only resumed execution after the server is finished.

The sequence of events in a remote procedure call are given as follows:


 The client stub is called by the client.
 The client stub makes a system call to send the message to the server and puts the
parameters in the message.
 The message is sent from the client to the server by the client’s operating system.
 The message is passed to the server stub by the server operating system.
 The parameters are removed from the message by the server stub.
 Then, the server procedure is called by the server stub.
A diagram that demonstrates this is as follows:

Advantages of Remote Procedure Call

Some of the advantages of RPC are as follows:


 Remote procedure calls support process oriented and thread oriented models.
 The internal message passing mechanism of RPC is hidden from the user.
 The effort to re-write and re-develop the code is minimum in remote procedure calls.
 Remote procedure calls can be used in distributed environment as well as the local
environment.
 Many of the protocol layers are omitted by RPC to improve performance.

Disadvantages of Remote Procedure Call

Some of the disadvantages of RPC are as follows:


 The remote procedure call is a concept that can be implemented in different ways. It is
not a standard.
 There is no flexibility in RPC for hardware architecture. It is only interaction based.
 There is an increase in costs because of remote procedure call.
Devraj Institute of management and technology, Ferozepur

Subject: Distributed Computing Branch: B tech CSE 6th sem

Assignment No.4
Q1. Define Java RMI.

Ans. RMI stands for Remote Method Invocation. It is a mechanism that allows an object
residing in one system (JVM) to access/invoke an object running on another JVM.

RMI is used to build distributed applications; it provides remote communication between Java
programs. It is provided in the package java.rmi.

Architecture of an RMI Application

In an RMI application, we write two programs, a server program (resides on the server) and
a client program (resides on the client).

 Inside the server program, a remote object is created and reference of that object is made
available for the client (using the registry).

 The client program requests the remote objects on the server and tries to invoke its
methods.

The following diagram shows the architecture of an RMI application.


Let us now discuss the components of this architecture.

 Transport Layer − This layer connects the client and the server. It manages the existing
connection and also sets up new connections.

 Stub − A stub is a representation (proxy) of the remote object at client. It resides in the
client system; it acts as a gateway for the client program.

 Skeleton − This is the object which resides on the server side. stubcommunicates with
this skeleton to pass request to the remote object.

 RRL(Remote Reference Layer) − It is the layer which manages the references made by
the client to the remote object.

Working of an RMI Application

The following points summarize how an RMI application works −

 When the client makes a call to the remote object, it is received by the stub which
eventually passes this request to the RRL.

 When the client-side RRL receives the request, it invokes a method called invoke() of
the object remoteRef. It passes the request to the RRL on the server side.

 The RRL on the server side passes the request to the Skeleton (proxy on the server)
which finally invokes the required object on the server.

 The result is passed all the way back to the client.

Marshalling and Unmarshalling

Whenever a client invokes a method that accepts parameters on a remote object, the parameters
are bundled into a message before being sent over the network. These parameters may be of
primitive type or objects. In case of primitive type, the parameters are put together and a header
is attached to it. In case the parameters are objects, then they are serialized. This process is
known as marshalling.

At the server side, the packed parameters are unbundled and then the required method is
invoked. This process is known as unmarshalling.
RMI Registry

RMI registry is a namespace on which all server objects are placed. Each time the server creates
an object, it registers this object with the RMIregistry (using bind() or reBind() methods).
These are registered using a unique name known as bind name.

To invoke a remote object, the client needs a reference of that object. At that time, the client
fetches the object from the registry using its bind name (using lookup() method).

The following illustration explains the entire process −

Goals of RMI

Following are the goals of RMI −

 To minimize the complexity of the application.

 To preserve type safety.

 Distributed garbage collection.

 Minimize the difference between working with local and remote objects.
Q2. Explain in details Message queues.

Ans. Distributed messaging is based on the concept of reliable message queuing. Messages are
queued asynchronously between client applications and messaging systems. A distributed
messaging system provides the benefits of reliability, scalability, and persistence.
Most of the messaging patterns follow the publish-subscribe model (simply Pub-Sub) where
the senders of the messages are called publishers and those who want to receive the messages
are called subscribers.
Once the message has been published by the sender, the subscribers can receive the selected
message with the help of a filtering option. Usually we have two types of filtering, one is topic-
based filtering and another one is content-based filtering.
Note that the pub-sub model can communicate only via messages. It is a very loosely coupled
architecture; even the senders don’t know who their subscribers are. Many of the message
patterns enable with message broker to exchange publish messages for timely access by many
subscribers. A real-life example is Dish TV, which publishes different channels like sports,
movies, music, etc., and anyone can subscribe to their own set of channels and get them
whenever their subscribed channels are available.
The following table describes some of the popular high throughput messaging systems −

Distributed messaging system Description

Apache Kafka Kafka was developed at LinkedIn corporation and later


it became a sub-project of Apache. Apache Kafka is
based on brokerenabled, persistent, distributed
publish-subscribe model. Kafka is fast, scalable, and
highly efficient.

RabbitMQ RabbitMQ is an open source distributed robust


messaging application. It is easy to use and runs on all
platforms.

JMS(Java Message Service) JMS is an open source API that supports creating,
reading, and sending messages from one application to
another. It provides guaranteed message delivery and
follows publish-subscribe model.

ActiveMQ ActiveMQ messaging system is an open source API of


JMS.

ZeroMQ ZeroMQ is broker-less peer-peer message processing.


It provides push-pull, router-dealer message patterns.

Kestrel Kestrel is a fast, reliable, and simple distributed


message queue.

Thrift Protocol
Thrift was built at Facebook for cross-language services development and remote procedure call
(RPC). Later, it became an open source Apache project. Apache Thrift is an Interface
Definition Language and allows to define new data types and services implementation on top
of the defined data types in an easy manner.
Apache Thrift is also a communication framework that supports embedded systems, mobile
applications, web applications, and many other programming languages. Some of the key
features associated with Apache Thrift are its modularity, flexibility, and high performance. In
addition, it can perform streaming, messaging, and RPC in distributed applications.
Storm extensively uses Thrift Protocol for its internal communication and data definition. Storm
topology is simply Thrift Structs. Storm Nimbus that runs the topology in Apache Storm is
a Thrift service.

The concept of message queues isn’t new in the computing world. In fact, I heard this name for
the first time when studying Operating Systems at the university. Quoting the book “Operating
System Concepts”, we can notice how message queues (a.k.a. message passing) was born for
distributed computing.

Message passing provides a mechanism to allow processes to communicate and to synchronize


their actions without sharing the same address space. It is particularly useful in a distributed
environment, where the communicating processes may reside on different computers connected
by a network. For example, an Internet chat program could be designed so that chat participants
communicate with one another by exchanging messages.

Let’s make things more concrete by giving an example. When I need to communicate with a
friend, I just get my phone and send her a WhatsApp message or an e-mail so I don’t have to
disturb the person at that exact moment. I can send as many messages as needed as they will be
buffered in her phone until she is able to get it and read the messages, taking any action required.
This is a message queue between two people communicating with each other, but the same
pattern applies to distributed systems.

First, I have a process called producer that generates messages to be processed. Then, we have
the message queue that stores messages in a queue - acting as a buffer - and route them to
a consumer, some process that will process messages. Since a single queue is shared between
virtually any number of producers and consumers, we can easily design systems to be scaled up
quite easily by simply spawning more processes on more machines as far as they communicate
through the same queue.

Queues provide asynchronous communication: each process controls its own flow and get
messages when they are ready. As a consequence, message queues inherently enhance workload
distribution given that computers will keep retrieving one more message and processing it until
all incoming messages have been processed. Imagine the other scenario where we pre-assign
messages to consumers before hand. It might happen that one process has finished processing all
of its data whereas another one is struggling with its own workload. Since messages were
assigned exclusively to the slow process, the quick one cannot jump over and help it finishing its
job. We have idle resources even though there is work to be done. Point for message queues.
Q3. Explain the Distributed objects.

Ans. The distributed object paradigm


 provides abstractions beyond those of the message-passing model.
 In object-oriented programming, objects are used to represent an entity significant to an
application.
 Each object encapsulates:
 The state or data of the entity: in Java, such data is contained in the instance variables of
each object;
 The operations of the entity, through which the state of the entity can be accessed or
updated.

Local Objects vs. Distributed Objects

 Local objects are those whose methods can only be invoked by a local process, a
process that runs on the same computer on which the object exists.
 A distributed object is one whose methods can be invoked by a remote process, a
process running on a computer connected via a network to the computer on which the
object exists.

The Distributed Object Paradigm

 In a distributed object paradigm, network resources are represented by distributed


objects.
 To request service from a network resource, a process invokes one of its operations or
methods, passing data as parameters to the method.
The Distributed Objects Paradigm

Distributed Object System - 1


 A distributed object is provided, or exported, by a process, here called the object
server.
 A facility, here called an object registry, must be present in the system architecture for
the distributed object to be registered.
 To access a distributed object, a process –an object client – looks up the object registry
for a reference to the object.
 This reference is used by the object client to make calls to the methods.

Distributed Object System – 2

 Logically, the object client makes a call directly to a remote method.


 In reality, the call is handled by a software component, called a client proxy, which
interacts which the software on the client host that provides the runtime support for the
distributed object system.
 Logically, the object client makes a call directly to a remote method.
 In reality, the call is handled by a software component, called a client proxy, which
interacts which the software on the client host that provides the runtime support for the
distributed object system.
 The runtime support is responsible for the interprocess communication needed to transmit
the call to the remote host, including the marshalling of the argument data that needs to
be transmitted to the remote object.

Q4. Explain the CORBA.

Ans. The Common Object Request Broker Architecture (CORBA) is a standard developed by
the Object Management Group (OMG) to provide interoperability among distributed objects.
CORBA is the world's leading middleware solution enabling the exchange of information,
independent of hardware platforms, programming languages, and operating systems. CORBA is
essentially a design specification for an Object Request Broker (ORB), where an ORB provides
the mechanism required for distributed objects to communicate with one another, whether locally
or on remote devices, written in different languages, or at different locations on a network.
The CORBA Interface Definition Language, or IDL, allows the development of language and
location-independent interfaces to distributed objects. Using CORBA, application components
can communicate with one another no matter where they are located, or who has designed them.
CORBA provides the location transparency to be able to execute these applications.
CORBA is often described as a "software bus" because it is a software-based communications
interface through which objects are located and accessed. The illustration below identifies the
primary components seen within a CORBA implementation.

Data communication from client to server is accomplished through a well-defined object-


oriented interface. The Object Request Broker (ORB) determines the location of the target
object, sends a request to that object, and returns any response back to the caller. Through this
object-oriented technology, developers can take advantage of features such as inheritance,
encapsulation, polymorphism, and runtime dynamic binding. These features allow applications
to be changed, modified and re-used with minimal changes to the parent interface. The
illustration below identifies how a client sends a request to a server through the ORB:

Interfacedefinition language
A cornerstone of the CORBA standards is the Interface Definition Language. IDL is the OMG
standard for defining language-neutral APIs and provides the platform-independent delineation
of the interfaces of distributed objects. The ability of the CORBA environments to provide
consistency between clients and servers in heterogeneous environments begins with a
standardized definition of the data and operations constituting the client/server interface. This
standardization mechanism is the IDL, and is used by CORBA to describe the interfaces of
objects.
IDL defines the modules, interfaces and operations for the applications and is not considered a
programming language. The various programming languages, such as Ada, C++, or Java, supply
the implementation of the interface via standardized IDL mappings.

Application development using ORB expressThe basic steps for CORBA development can be
seen in the illustration below. This illustration provides an overview of how the IDL is translated
to the corresponding language (in this example, C++), mapped to the source code, compiled, and
then linked with the ORB library, resulting in the client and server implementation.
Interoperability
The first version of CORBA provided the IDL and standard mappings to just a few languages,
and as the CORBA standard has matured, CORBA 2.0 added more language bindings
(particularly C++ and Java) as well as General Inter-ORB Protocol (GIOP). When a client calls a
CORBA operation, the client ORB sends a GIOP message to the server. The server ORB
converts this request into a call on the server object and then returns the results in a GIOP reply.
This standard transfer syntax, specified by the Object Management Group, allows the
interoperability of ORB-to-ORB interaction and is designed to work over any transport protocol
meeting a minimal set of assumptions.
When GIOP is sent over TCP/IP, it is called Internet Inter ORB Protocol (IIOP). IIOP is
designed to allow different ORB vendors to interoperate with one another. An example of this
interoperability occurs when there is communication between an enterprise designed ORB, and a
smaller real-time application, utilizing a real-time ORB.

The OMG is a non-profit consortium created in 1989 to promote the theory and practice of object
technology for the development for distributed operating systems. The goal is to provide a
common architectural framework for object-oriented applications based on widely available
interface specifications. With a membership of over 800 members, representing large and small
companies within the computer industry, OMG leads the specification development efforts of
CORBA, OMG IDL, IIOP, OMA, UML, MOF, and CWM specifications.
The OMG does not produce software or implementation guidelines, only the specifications to
which OMG members respond to in Request For Information (RFI) and Requests for Proposals
(RFP). By managing these specifications, the OMG supports the adoption process for the
member companies interested in advancing the uses and applications of distributed object-
oriented computing.
Q5. Explain JINI concept.
Ans. JINI is a distributed object technology developed by Sun, partly to make better distributed
programming tools available to Java programmers, and party to overcome some of the inherent
problems with distributed programming. RPC systems handle issues such as data transport, data
formatting, finding correct port number, and to some degree finding the machine a server is
running on.
However, JINI handles additional problems:
• Finding a service if you don’t know the name of it!! Suppose you want to find a laser printer
(but you don’t care which one) and so you don’t know a specific name to look for. None of the
RPC systems we looked at handle this issue. JINI does, by allowing the client to search for a
service based on attributes.
• Finding a replacement service if the service you’ve been using becomes unavailable, either
because of network failure or server failure. (JINI again)
• Automatic discovery — client and server discover each other automatically, and discover what
they need to know about each other.
• Coordination. Allows processes to coordinate their activities.
JINI Scenarios Motivating examples for automatic discovery features of
how JINI might be used. Automatic discovery:
• telephone automatically finds answering machine
• refrigerator finds handheld PC to add milk to the shopping list (or sends message to home
delivery service so milk is on front door step)
• digital cameral finds printer to print on
• PC automatically finds a printer on a network
Basic JINI Concepts Required Servers
The following servers must be run to use JINI
• Web server. This must run on any machine which will host services, because JINI uses HTTP
to transport code.
• RMI activation daemon (rmid) must also run on any machine that will host services
• JINI Lookup Service (reggie) must run on at least one machine JINI services are organized into
communities.
All the machines in a community will have access to the same set of services (shared resources),
and a community must have one or more Lookup Services running on it. If there’s more than one
Lookup Service in a community, then they make the same set of services available, and the
multiple Lookup services are for redundancy in case of a failure or for improved performance.
By default, a community is all the machines on your local network. If the administrator chooses,
the community may be set up to be a smaller group (for example, at WPI the communities could
be organized on department level, or in a company at the level of a workgroup.) Also, distinct
communities can be federated, so that (some or all of) the services in one community are made
available to clients in another community.
In this way, the idea of a JINI community is scalable, and the presence of a central (per
community) Lookup Service is not a barrier to scalability. Key concepts of JINI:
1. Discovery
2. Join
3. Lookup
4. Leasing
5. Remote Events
6. Transactions
7. Coordination
Devraj Institute of management and technology, Ferozepur

Subject: Distributed Computing Branch: B tech CSE 6th sem

Assignment No.5
Q1. Describe term Middleware.
Ans. In distributed architecture, components are presented on different platforms and several
components can cooperate with one another over a communication network in order to achieve
a specific objective or goal.

 In this architecture, information processing is not confined to a single machine rather it is


distributed over several independent computers.

 A distributed system can be demonstrated by the client-server architecture which forms


the base for multi-tier architectures; alternatives are the broker architecture such as
CORBA, and the Service-Oriented Architecture (SOA).

 There are several technology frameworks to support distributed architectures, including


.NET, J2EE, CORBA, .NET Web services, AXIS Java Web services, and Globus Grid
services.

 Middleware is an infrastructure that appropriately supports the development and


execution of distributed applications. It provides a buffer between the applications and
the network.

 It sits in the middle of system and manages or supports the different components of a
distributed system. Examples are transaction processing monitors, data convertors and
communication controllers etc.

Middleware as an infrastructure for distributed system


The basis of a distributed architecture is its transparency, reliability, and availability.

The following table lists the different forms of transparency in a distributed system −

Sr.No. Transparency & Description

1 Access

Hides the way in which resources are accessed and the differences in data
platform.

2 Location

Hides where resources are located.

3 Technology

Hides different technologies such as programming language and OS from user.

4 Migration / Relocation

Hide resources that may be moved to another location which are in use.

5 Replication

Hide resources that may be copied at several location.

6 Concurrency
Hide resources that may be shared with other users.

7 Failure

Hides failure and recovery of resources from user.

8 Persistence

Hides whether a resource ( software ) is in memory or disk.

Advantages
 Resource sharing − Sharing of hardware and software resources.

 Openness − Flexibility of using hardware and software of different vendors.

 Concurrency − Concurrent processing to enhance performance.

 Scalability − Increased throughput by adding new resources.

 Fault tolerance − The ability to continue in operation after a fault has occurred.

Disadvantagess
 Complexity − They are more complex than centralized systems.

 Security − More susceptible to external attack.

 Manageability − More effort required for system management.

 Unpredictability − Unpredictable responses depending on the system organization and


network load.
Q2. Explain the Distributed File Systems.
Ans. A file system is responsible for the organization, storage, retrieval, naming, sharing, and
protection of files. File systems provide directory services, which convert a file name (possibly a
hierarchical one) into an internal identifier (e.g. inode, FAT index). They contain a
representation of the file data itself and methods for accessing it (read/write). The file system is
responsible for controlling access to the data and for performing low-level operations such as
buffering frequently-used data and issuing disk I/O requests.

Our goals in designing a distributed file system are to present certain degrees of transparency to
the user and the system.

Access transparency
Clients are unaware that files are distributed and can access them in the same way as
local files are accessed.
Location transparency
A consistent name space exists encompassing local as well as remote files. The name of a
file does not give it location.

Concurrency transparency
All clients have the same view of the state of the file system. This means that if one
process is modifying a file, any other processes on the same system or remote systems
that are accessing the files will see the modifications in a coherent manner.

Failure transparency
The client and client programs should operate correctly after a server failure.

Heterogeneity
File service should be provided across different hardware and operating system
platforms.

Scalability
The file system should work well in small environments (1 machine, a dozen machines)
and also scale gracefully to huge ones (hundreds through tens of thousands of systems).

Replication transparency
To support scalability, we may wish to replicate files across multiple servers. Clients
should be unaware of this.

Migration transparency
Files should be able to move around without the client’s knowledge.

Support fine-grained distribution of data


To optimize performance, we may wish to locate individual objects near the processes
that use them.

Tolerance for network partitioning


The entire network or certain segments of it may be unavailable to a client during certain
periods (e.g. disconnected operation of a laptop). The file system should be tolerant of
this.

File service types


To provide a remote system with file service, we will have to select one of two models of
operation. One of these is the upload/download model. In this model, there are two fundamental
operations: read filetransfers an entire file from the server to the requesting client, and write
file copies the file back to the server. It is a simple model and efficient in that it provides local
access to the file when it is being used. Three problems are evident. It can be wasteful if the
client needs access to only a small amount of the file data. It can be problematic if the client
doesn’t have enough space to cache the entire file. Finally, what happens if others need to
modify the same file? The second model is a remote access model. The file service provides
remote operations such as open, close, read bytes, write bytes, get attributes, etc. The file system
itself runs on servers. The drawback in this approach is the servers are accessed for the duration
of file access rather than once to download the file and again to upload it.

Another important distinction in providing file service is that of understanding the difference
between directory service and file service. A directory service, in the context of file systems,
maps human-friendly textual names for files to their internal locations, which can be used by the
file service. The file service itself provides the file interface (this is mentioned above). Another
component of file distributed file systems is the client module. This is the client-side interface for
file and directory service. It provides a local file system interface to client software (for example,
the VFS layer of a UNIX/Linux kernel).

Naming issues
In designing a distributed file service, we should consider whether all machines (and processes)
should have the exact same view of the directory hierarchy. We might also wish to consider
whether the name space on all machines should have a global root directory (a.k.a. “super root”)
so that files can be accessed as, for example, //server/path. This is a model that was adopted by
the Apollo Domain System, an early distributed file system, and more recently by the web
community in the construction of a uniform resource locator (URL).

In considering our goals in name resolution, we must distinguish between location transparency
and location independence. By location transparency we mean that the path name of a file gives
no hint to where the file is located. For instance, we may refer to a file as //server1/dir/file. The
server (server) can move anywhere without the client caring, so we have location transparency.
However, if the file moves to server2 things will not work. If we have location independence, the
files can be moved without their names changing. Hence, if machine or server names are
embedded into path names we do not achieve location independence.

It is desirable to have access transparency, so that applications and users can access remote files
just as they access local files. To facilitate this, the remote file system name space should be
syntactically consistent with the local name space. One way of accomplishing this is by
redefining the way files are named and require an explicit syntax for identifying remote files.
This can cause legacy applications to fail and user discontent (users will have to learn a new way
of naming their files). An alternate solution is to use a file system mounting mechanism to
overlay portions of another file system over a node in a local directory structure. Mounting is
used in the local environment to construct a uniform name space from separate file systems
(which reside on different disks or partitions) as well as incorporating special-purpose file
systems into the name space (e.g. /proc on many UNIX systems allows file system access to
processes). A remote file system can be mounted at a particular point in the local directory tree.
Attempts to access files and directories under that node will be directed to the driver for that file
system.

To summarize, our naming options are:

 machine and path naming (machine:path, /machine/path).


 mount remote file systems onto the local directory hierarchy (merging the local and remote name
spaces).
 provide a single name space which looks the same on all machines.

Q3. Define the term Synchronization and replication.


Ans. Data replication and synchronization have been topics of research for quite some time in
the area of databases and distributed databases. Through the advent of mobile computing the
results of this research have to be applied to a new area of application. Before going into details
about synchronization and replication, two terms that are strongly connected to each other, a
basic introduction into mobile computing has to be given.
Imagine a sales person, who is travelling from customer to customer and is collecting orders
from them. The sales person is collecting all data on a laptop where he is also able to access data
that indicates how long an order will take to be delivered and provides the possibility to calculate
customer and order specific conditions. When the sales person is at a customer site, most of the
time communication between the laptop and the central sales person’s corporate database, where
all required data should be entered or accessed, is not possible. This is where data replication
comes in. As already stated, the needed data has to be copied onto the sales person’s device in
order to provide the required functionality. On a daily basis the sales person needs to send the
new orders to the central system by any means of communication link or medium. When
reconnecting to the corporate database or application the two data sets need to be synchronized.
That way, mobile devices become mobile databases as [8] understands them. Mobile databases
appear and disappear over time in a network of connected and stationary databases as well as
other mobile databases.

Synchronization Techniques

Aspects Synchronisation can take place between two or more systems, namely in a 1:1 or 1:n
relation. That means that there is always one active system that starts and controls the
synchronization process. The active system can be any of the participating systems.
Synchronization can be triggered manually or by certain events, which requires? an event
notification mechanism. A system can explicitly inform another system that it wants to
synchronize with it, which is called push synchronization, or pull synchronization where a
system is being asked to engage in a synchronization process
Layers of synchronization and solutions As already described data items can be managed in
different storage systems. The file system is one possible storage solution. Synchronization in
this layer can be achieved in many ways. Tools and protocols that exist can be grouped into
categories. Two categories are: - file system tools like the UNIX tool rsync, that is especially
designed for an efficient synchronization of files, or the coda file system, that is a distributed file
system - version control systems like CVS or the WEBDav protocol, that enables file transfer via
http and incorporates locking and versioning Synchronization mechanisms within databases are
better known as replication mechanisms and will be described in the replication techniques
section. Special constraints are imposed on mobile clients, as described in the introduction.
These have to be considered in the case of synchronization. A couple of solutions exist, that are
mostly concerned with synchronization in the application layer. Microsoft is offering a product
called Active Sync, that can be used to synchronize between PC applications and Windows CE
based mobile clients.

Replication Techniques
Replication is used to achieve better availability and performance by using multiple copies of a
server system, where availability is the main goal. If one of the copies, or replicas is not running,
the service which is provided by a set of replicas can still be working. It is also of great use if the
communication link between two systems is only intermittently available, which is most often
the case in mobile applications. In the distributed systems community, software based
replication, which is being described here, is seen as a cost effective way to increase availability.
In the database community, however, replication is used for both performance and fault-tolerant
purposes. Replication is both used in the database community as well as the distributed system
community. In an abstract functional model is introduced that is used to describe existing
solutions in both fields. The abstract model is also used to compare the replication protocols. The
classification that introduces is semantically equivalent to the one that will be used in the
following review of replication techniques.

Replication in Databases
Server systems usually depend on a resource, mostly on a database system. Replicating the
server without the database improves availability but not performance and leaves the single point
of failure problem. Performance is not improved since all kinds of access, query and update
operations are done on one single database resulting in query-update problems. Therefore in
addition to the replication of the server system, the resource needs to be replicated, to gain
substantial advantages over systems that rely on one resource. According to databases are
collections of data items that are controlled by a database management system. Replicated
databases are therefore a collection or set of databases that store copies of identical data items. A
data item can be referred to as the logical data item or by it’s physical location.

Q4. Define Conflict-Free Replicated Data Types (CRDT).


Ans. In short, CRDTs are objects that can be updated without expensive
synchronization/consensus and they are guaranteed to converge eventually if all concurrent
updates are commutative (see below) and if all updates are executed by each replica
eventually. For giving these guarantees these objects have to satisfy certain conditions, which I
will briefly describe below. For more details and proofs please take a look at Marc Shapiro’s
papers given in the references section below.

In their papers Shapiro et al. consider two models of replication in an eventually consistent
distributed system: state-based and operation-based approach, and based on the model of
replication they define two types of CRDTs, CvRDT (convergent replicated data type) and
CmRDT ( commutative replicated data type). Interestingly, they show that these two replication
models and these two types of CRDTs are equivalent. First let’s take a look at these two
replication approaches, and then we can look at a simple CRDT example to materialize the
concept.

State-based replication: When a replica receives an update from a client it first updates its local
state, and then some time later it sends its full state to another replica. So occasionally every
replica is sending its full state to some other replica in the system. And a replica that receives the
state of another replica applies a merge function to merge its local state with the state it just
received. Similarly this replica also occasionally sends its state to another replica, so every update
eventually reaches all replicas in the system. In their paper Shapiro et al. show that, if the set of
values that the state can take forms a semi-lattice (a partially ordered set with a join/least upper
bound operation) and updates are increasing (e.g., say, state is an integer and update is an
increment operation), and if merge function computes the least upper bound, then replicas are
guaranteed to converge to the same value (which is the least upper bound of the most recent
updates). And for the set of all possible system states to be a semi-lattice, this merge operation has
to be idempotent, associative, and commutative. A replicated object satisfying this property
(called monotonic semi-lattice property in the paper) is one type of CRDT, namely CvRDT —
 convergent replicated data type.

State-based approach. “s” denotes the source replica where the initial update is applied.
From [2].

Operation-based replication: In this approach a replica doesn’t send its full state to another
replica, which can be huge. Instead it just sends/broadcasts the update operation to all the other
replicas in the system and expects them to replay that update (similar to state machine
replication). Since this is a broadcast operation, if there are two updates, u_1 and u_2, applied at
some replica i and if i sends these updates to two other replicas r_1 and r_2, these updates may
arrive to these replicas in different orders, that is, r_1 can receive them in the order u_1 followed
by u_2, while r_2 can receive the updates in the order u_2 followed by u_1. How do these
replicas converge then? Well, they can converge if these updates are commutative — no matter
which order these updates are applied at a replica the resulting state will be the same. In this
model where the updates are broadcast to all replicas, an object for which all concurrent updates
are commutative is called a CmRDT (commutative replicated data type).

Operation-based approach. “s” denotes source replicas and “d” denotes the downstream replicas.
From [2].

The simplest CRDT example is the following integer vector. Assume that we are using the state-
based replication model. To have an integer vector CRDT, we need to show that the set of integer
vectors is a semi-lattice (has a partial order among its elements and a join/least upper bound
operation). In fact it is, because we can (partially) order two vectors v and v’ by defining a binary
relation v <= v’ as ∀i v[i] <= v’[i], that is, a vector v is less than a vector v’ if each integer in v is
less than or equal to the integer in v’ at the same index (e.g., [3,6] <= [4,7]). And we also need to
define a join/least upper bound operation for the merge operation, which we define as the per-
index maximum operation. For example, assume that a replica has state [3,5] and it sends its state
to another replica that has state [4,2] then the result of the merge operation at this replica will be
[4,5]. The final condition is that the state should be monotonically increasing as a result of
updates, and this holds if we define the update operation to be the increment operation for
index i. If you think about it, since the state (in this case the integers in a vector) is just
monotonically increasing, and since each replica does state merges by taking the per-index-
maximum, then eventually the final value at every index will be the maximum value it has ever
been updated to, so the state will eventually converge on all the replicas. This is just a simple
example, it’s surprising to see that with the principles I mentioned here it’s possible to define
complex CRDTs such as sets, maps, and graphs — please see the papers below for more complex
examples.

CRDTs are addressing an interesting and a fundamental problem in distributed systems, but they
have some important limitations which Shapiro et al. acknowledge in [2]: “Since, by design, a
CRDT does not use consensus, the approach has strong limitations; nonetheless, some interesting
and non-trivial CRDTs are known to exist.”. The limitation is CRDTs address only a part of the
problem space as not all of the possible update operations are commutative, and so not all
problems can be cast to CRDTs. On the other hand, for some types of applications CRDTs can
definitely be useful as they provide a nice abstraction to implement replicated distributed systems
while at the same time giving theoretical consistency guarantees.
Q5. Explain term Distributed deadlocks.
Ans. Deadlock is a state of a database system having two or more transactions, when each
transaction is waiting for a data item that is being locked by some other transaction. A deadlock
can be indicated by a cycle in the wait-for-graph. This is a directed graph in which the vertices
denote transactions and the edges denote waits for data items.

For example, in the following wait-for-graph, transaction T1 is waiting for data item X which is
locked by T3. T3 is waiting for Y which is locked by T2 and T2 is waiting for Z which is
locked by T1. Hence, a waiting cycle is formed, and none of the transactions can proceed
executing.

Deadlock Handling in Centralized Systems


There are three classical approaches for deadlock handling, namely −

 Deadlock prevention.
 Deadlock avoidance.
 Deadlock detection and removal.
All of the three approaches can be incorporated in both a centralized and a distributed database
system.

Deadlock Prevention
The deadlock prevention approach does not allow any transaction to acquire locks that will lead
to deadlocks. The convention is that when more than one transactions request for locking the
same data item, only one of them is granted the lock.

One of the most popular deadlock prevention methods is pre-acquisition of all the locks. In this
method, a transaction acquires all the locks before starting to execute and retains the locks for
the entire duration of transaction. If another transaction needs any of the already acquired locks,
it has to wait until all the locks it needs are available. Using this approach, the system is
prevented from being deadlocked since none of the waiting transactions are holding any lock.
Deadlock Avoidance
The deadlock avoidance approach handles deadlocks before they occur. It analyzes the
transactions and the locks to determine whether or not waiting leads to a deadlock.

The method can be briefly stated as follows. Transactions start executing and request data items
that they need to lock. The lock manager checks whether the lock is available. If it is available,
the lock manager allocates the data item and the transaction acquires the lock. However, if the
item is locked by some other transaction in incompatible mode, the lock manager runs an
algorithm to test whether keeping the transaction in waiting state will cause a deadlock or not.
Accordingly, the algorithm decides whether the transaction can wait or one of the transactions
should be aborted.

There are two algorithms for this purpose, namely wait-die and wound-wait. Let us assume
that there are two transactions, T1 and T2, where T1 tries to lock a data item which is already
locked by T2. The algorithms are as follows −

 Wait-Die − If T1 is older than T2, T1 is allowed to wait. Otherwise, if T1 is younger


than T2, T1 is aborted and later restarted.

 Wound-Wait − If T1 is older than T2, T2 is aborted and later restarted. Otherwise, if T1


is younger than T2, T1 is allowed to wait.

Deadlock Detection and Removal


The deadlock detection and removal approach runs a deadlock detection algorithm periodically
and removes deadlock in case there is one. It does not check for deadlock when a transaction
places a request for a lock. When a transaction requests a lock, the lock manager checks
whether it is available. If it is available, the transaction is allowed to lock the data item;
otherwise the transaction is allowed to wait.

Since there are no precautions while granting lock requests, some of the transactions may be
deadlocked. To detect deadlocks, the lock manager periodically checks if the wait-forgraph has
cycles. If the system is deadlocked, the lock manager chooses a victim transaction from each
cycle. The victim is aborted and rolled back; and then restarted later. Some of the methods used
for victim selection are −

 Choose the youngest transaction.


 Choose the transaction with fewest data items.
 Choose the transaction that has performed least number of updates.
 Choose the transaction having least restart overhead.
 Choose the transaction which is common to two or more cycles.
This approach is primarily suited for systems having transactions low and where fast response
to lock requests is needed.

Deadlock Handling in Distributed Systems


Transaction processing in a distributed database system is also distributed, i.e. the same
transaction may be processing at more than one site. The two main deadlock handling concerns
in a distributed database system that are not present in a centralized system are transaction
location and transaction control. Once these concerns are addressed, deadlocks are handled
through any of deadlock prevention, deadlock avoidance or deadlock detection and removal.

Transaction Location
Transactions in a distributed database system are processed in multiple sites and use data items
in multiple sites. The amount of data processing is not uniformly distributed among these sites.
The time period of processing also varies. Thus the same transaction may be active at some sites
and inactive at others. When two conflicting transactions are located in a site, it may happen
that one of them is in inactive state. This condition does not arise in a centralized system. This
concern is called transaction location issue.

This concern may be addressed by Daisy Chain model. In this model, a transaction carries
certain details when it moves from one site to another. Some of the details are the list of tables
required, the list of sites required, the list of visited tables and sites, the list of tables and sites
that are yet to be visited and the list of acquired locks with types. After a transaction terminates
by either commit or abort, the information should be sent to all the concerned sites.

Transaction Control
Transaction control is concerned with designating and controlling the sites required for
processing a transaction in a distributed database system. There are many options regarding the
choice of where to process the transaction and how to designate the center of control, like −

 One server may be selected as the center of control.


 The center of control may travel from one server to another.
 The responsibility of controlling may be shared by a number of servers.
Distributed Deadlock Prevention
Just like in centralized deadlock prevention, in distributed deadlock prevention approach, a
transaction should acquire all the locks before starting to execute. This prevents deadlocks.

The site where the transaction enters is designated as the controlling site. The controlling site
sends messages to the sites where the data items are located to lock the items. Then it waits for
confirmation. When all the sites have confirmed that they have locked the data items,
transaction starts. If any site or communication link fails, the transaction has to wait until they
have been repaired.

Though the implementation is simple, this approach has some drawbacks −

 Pre-acquisition of locks requires a long time for communication delays. This increases
the time required for transaction.

 In case of site or link failure, a transaction has to wait for a long time so that the sites
recover. Meanwhile, in the running sites, the items are locked. This may prevent other
transactions from executing.

 If the controlling site fails, it cannot communicate with the other sites. These sites
continue to keep the locked data items in their locked state, thus resulting in blocking.

Distributed Deadlock Avoidance


As in centralized system, distributed deadlock avoidance handles deadlock prior to occurrence.
Additionally, in distributed systems, transaction location and transaction control issues needs to
be addressed. Due to the distributed nature of the transaction, the following conflicts may occur

 Conflict between two transactions in the same site.


 Conflict between two transactions in different sites.
In case of conflict, one of the transactions may be aborted or allowed to wait as per distributed
wait-die or distributed wound-wait algorithms.

Let us assume that there are two transactions, T1 and T2. T1 arrives at Site P and tries to lock a
data item which is already locked by T2 at that site. Hence, there is a conflict at Site P. The
algorithms are as follows −

 Distributed Wound-Die

o If T1 is older than T2, T1 is allowed to wait. T1 can resume execution after Site P
receives a message that T2 has either committed or aborted successfully at all
sites.

o If T1 is younger than T2, T1 is aborted. The concurrency control at Site P sends a


message to all sites where T1 has visited to abort T1. The controlling site notifies
the user when T1 has been successfully aborted in all the sites.

 Distributed Wait-Wait
o If T1 is older than T2, T2 needs to be aborted. If T2 is active at Site P, Site P
aborts and rolls back T2 and then broadcasts this message to other relevant sites.
If T2 has left Site P but is active at Site Q, Site P broadcasts that T2 has been
aborted; Site L then aborts and rolls back T2 and sends this message to all sites.

o If T1 is younger than T1, T1 is allowed to wait. T1 can resume execution after


Site P receives a message that T2 has completed processing.

Distributed Deadlock Detection


Just like centralized deadlock detection approach, deadlocks are allowed to occur and are
removed if detected. The system does not perform any checks when a transaction places a lock
request. For implementation, global wait-for-graphs are created. Existence of a cycle in the
global wait-for-graph indicates deadlocks. However, it is difficult to spot deadlocks since
transaction waits for resources across the network.

Alternatively, deadlock detection algorithms can use timers. Each transaction is associated with
a timer which is set to a time period in which a transaction is expected to finish. If a transaction
does not finish within this time period, the timer goes off, indicating a possible deadlock.

Another tool used for deadlock handling is a deadlock detector. In a centralized system, there is
one deadlock detector. In a distributed system, there can be more than one deadlock detectors. A
deadlock detector can find deadlocks for the sites under its control. There are three alternatives
for deadlock detection in a distributed system, namely.

 Centralized Deadlock Detector − One site is designated as the central deadlock


detector.

 Hierarchical Deadlock Detector − A number of deadlock detectors are arranged in


hierarchy.

 Distributed Deadlock Detector − All the sites participate in detecting deadlocks and
removing them.
Devraj Institute Of Management and Technology , Firozpur
First Mid Sem Test (Jan-May,2019 Session)

Subject- Distributed Computing Branch/Sem: B.tech (CSE 6th sem)


Maximum Marks: 24 Time allowed: 1.30hours
Instructions
 Section A is compulsory (Each carry 2 marks)
 Do any 2 questions from section B (Each carry 4 marks)
 Do any 1 question from section C (Each carry 8 marks)

SECTION-A

Q1.What is a Distributed Systems?


Q2. What is a Network?
Q3.What is a Distributed objects?
Q4.What is a System Model?
SECTION-B

Q1. Case study: World Wide Web.


Q2. Explain in brief External data representation.
Q3. Define CORBA.
SECTION –C

Q1.What is remote method invocation and objects.


Q2 Case study: Java RMI - Group communication.
Devraj Institute Of Management and Technology , Firozpur
Second Mid Sem Test (Jan-May,2019 Session)

Subject- Distributed Computing Branch/Sem: B.tech (CSE 6th sem)


Maximum Marks: 24 Time allowed: 1.30hours

Instructions
 Section A is compulsory (Each carry 2 marks)
 Do any 2 questions from section B (Each carry 4 marks)
 Do any 1 question from section C (Each carry 8 marks)

SECTION-A

Q1.What is a Distributed File Systems?


Q2. What is a File service architecture?
Q3.What is a Distributed objects?
Q4.What is a System Model?
SECTION-B

Q1. Case study: Pastry, Tapestry


Q2. Explain in brief Global states.
Q3. Case study Coda.
SECTION –C

Q1.Explain in brief Peer-to-peer Systems.


Q2 Explain in brief Transactions and Concurrency Control

You might also like