You are on page 1of 78

CSE2052 - DISTRIBUTED SYSTEMS

MODULE 1

Introduction to Distributed Systems


INTRODUCTION TO DISTRIBUTION SYSTEMS

• Distributed System
• Trends in Distributed Systems
• Focus on resource sharing
• Distributed System model
• Challenges
• Examples of Distributed Systems
• Case study
A distributed system, also known as distributed computing, is a system
with multiple components located on different machines that
communicate and coordinate actions in order to appear as a single
coherent system to the end-user.
• A distributed system is one in which the components located at networked
computers communicate and coordinate their actions only by passing
messages.
• A collection of independent computers that appears to its users as a single
coherent system.
• A distributed system organized as middleware.
• The middleware layer extends over multiple machines.
• Distributed system = independent processors + networking infrastructure.
• a network of workstations allocated to users
• a pool of processors in the machine room allocated Dynamically
• a single file system (all users access files with the same path name)
• user command executed in the best place (user workstation, a workstation
belonging to someone else, or on an unassigned processor in the machine
room)
Goals of Distributed System

• Resource Sharing - Easy for users to access remote resources.


• Heterogeneity - Networks, Hardware, OS, Implementations.
• Transparency - To hide the fact that processors and resources are
physically distributed across multiple computers.
• Openness - To offer services according to standard rules.
• Scalability - Easy to expand and manage
Characteristics

• Concurrency - Parallel program executes in different computer systems by


different person.
• Everyone can use their own system but they can share their resources /files
as web pages with other.
• Carry out tasks independently.
• Task coordinate their actions by exchanging messages.
• No global Clock
• Independent failure of components - Fault or failure may occur in the system
can be identified and rectified easily.
Examples

Web search
• Over 10 billion web search per calendar month.
• Web search engine index the entire contents of www (web pages, multimedia
sources, books).
• Web consists over 63 billion pages and 1 trillion unique web address.
• Challenge : Analysing entire web content and carrying sophisticated processing
on entire database.
Examples
Financial trading
• Finance industry is also in need of distributed system for real time access and
automation of process.
• The amount of data being generated has caused a big data revolution.
• Companies are finding new ways to gather and analyze massive amounts of
data.
• Trillions of dollars are traded in market each day.
• Analyzing and trying to profit is a major concern.
• Drop in share price is notified and release of latest trends are notified is called
distributed event based systems.
Examples
Massively multiplayer online games (MMOGs)
• MMOGs offer an immersive experience where large number of users interact
through the Internet with a persistent virtual world.
Example: Sony’s EverQuest II, EVE Online from Finnish company CCP Games.

• The need for fast response times to preserve the user experience of the
game.
• The real-time propagation of events to the many players and maintaining a
consistent view of the shared world.
Applications

• Finance and commerce


• The information society
• Creative industries and entertainment
• Healthcare
• Education
• Transport and logistics
• Science
• Environmental management
Advantages

• Economical - Microprocessors offer a better price/performance than


mainframes
• Speed - A distributed system may have more total computing power than a
mainframe
• Inherent distribution - Some applications involve spatially separated
machines
• Reliability - If one machine crashes, the system as a whole can still survive
• Incremental growth - Computing power can be added in small increments
Disadvantages
• Security: Data can be accessed by unauthorized users through network
interfaces.
• Privacy: Data can be accessed securely but without the knowledge of the
owner.
• Data integration and consistency: Being able to synchronize the order of
changes to data and states of application in a distributed system is
challenging, especially when the nodes are stating, stopping or failing.
• Network and communication failure: Massages delivered to incorrect nodes or
incorrect order may lead to breakdown in communication and functionality.
• Management overhead: More intelligence, monitoring, logging, load
balancing functions, need to be added for visibility into the operation and
failures of the distributed systems.
Trends in Distributed System

• emergence of pervasive networking technology.


• emergence of Ubiquitous computing.
• Increasing demand of multimedia services.
• Distributed system as a utility.
Pervasive Networking and the modern Internet

• Internet is a vast interconnected collection of computer networks of many


different types.
• Networking has become pervasive resource and devices can be connected at
any time in any place.
• It enables users to use services such as WWW, email and file transfer.
• Intranet is a system in which multiple PC’s are connected to each other.
• It is a sub network operated by companies and other organizations and it is
protected by firewalls.
• Firewall- Protects an intranet by preventing unauthorized messages from
leaving or entering.
• ISPs are companies that provide broadband links and other services.
• Intranets are linked together by backbones.
• A backbone is a network link with high transmission capacity, satellite
connection, fibre optic cable, and other high-bandwidth circuits.

Challenge: Firewalls can be problematic in distribution systems by


impending legitimate access to services when resource sharing.
Mobile and Ubiquitous computing
• Each and everyday new inventions are coming in technology.
• Its advancement leads to integration of small and portable computing devices
into distributed system like
• Laptop computers, handheld devices, mobile phones, GPS enabled devices, PDA,
Pagers, Smart watch, Digital camera.
• Devices enabled in appliances such as washing machine, Wi-Fi systems, cars and
refrigerators.
• Mobile computing users can access the internet; they can access the resources in
their home intranet; and there is a increasing provision for users to utilize
resources such as printers.
• Ubiquitous computing - harnessing small , cheap computational device that are
present in user physical environment including home, office etc.
Mobile and Ubiquitous computing
Challenge:

To make interoperation fast and convenient (spontaneous) even though the


user is in a new environment.
Distributed Multimedia Systems

DMS should be able to store, locate of audio and video files and to transmit
them across the network (possibly in real time).

Benefits of DMS
• Access to live or pre-recorded television broadcasts.
• Access to film libraries offering video on demand services
• Access to music libraries.
• Skype , peer to peer alternative to IP telephony.
Web casting (DMS)

• Application of distributed multimedia system include web casting.


• Web casting is the ability to broadcast audio or video streams over the internet.

Web Casting demands on DMS:


• Provide support for encoding and encryption formats such as mpeg.
• Provide mechanisms for the quality of service to met.
• Provide resource management strategies and scheduling policies.
• Provide adaptation strategies to deal with the inevitable situations.
Distributed Computing as a utility

• Resources are provided by appropriate service suppliers and effectively rented


rather than owned by the end user.
• This model applies to both physical resources and more logical services.
• Physical resources such as storage and processing can be made available to
networked computers.
• Cloud computing is defined as a set of internet based application, storage and
computing services sufficient to support most users needs.
• Overall goal is to provide a range of cloud services , including high performance
computing capabilities, mass storage and richer application services such as
web search.
Focus on Resource Sharing

• Share hardware resources such as printers, data resources such as files, and
resources with more specific functionality such as search engines.
• Pattern of resource sharing vary widely in their scope and in how closely users
work together.
• Pattern of sharing and geographic distribution of users determines the
mechanism to coordinate user’s action.
• Service manages a collection of related resources and presents their functionality
to users and applications.
• Access to service is via the set of operations. e.g. File service.
• Client-server computing
Focus on Resource Sharing

• Many distributed systems can be constructed entirely in the form of


interacting clients and servers.

Hardware resources that can be shared are CPU, Memory, Disk, Screen, Printer.
Software Resources- Web pages, files, object, database, video/audio.
Client Server computing

• Server and client running program on network computers that accepts


request from other to perform service.
• Client sends request to server process.
• Server executes the request.
• Server transmits a reply and data, eg: web server, file servers.
• Message passing operations
• (send, Receive)
Remote Invocation
• The process of sending request for performing action is referred as client
invoking an operation upon the server.
• The server responds to the client request and the entire interaction between
client and server is called remote invocation.

REMOTE PROCEDURE CALL(RPC)

Hides communication behind


procedure call abstraction
Eg: read(fp, buffer)
Distributed System Models

• Minicomputer model
• Workstation model
• Workstation-Server model
• Processor-Pool model
• Hybrid model
Minicomputer model

• Simple extension of centralized time-sharing model.


• Few minicomputers interconnected by a communication network; each
minicomputer has multiple user simultaneous logged on to it.
• Each user has remote access to other minicomputer.
• The model can be used when resource sharing with remote users is desired.
Example : Early ARPA net.
Workstation model

• Several workstations are interconnected by a communication network.


• Each workstation has its own disk and serves as a single-user computer.
• The idea is to interconnect all these workstation by a high-speed LAN; the
idle workstations may be used to process jobs of users who are logged onto
other workstations and do not have sufficient processing power to get their
job processed efficiently.
Example : Sprite system & Xerox PARC.
Workstation-server model

• Workstation without a local disk have become more popular than


workstation with local disk.
• It consists of a few minicomputers and several workstations interconnected
by a communication network.
• In this model, normal computation activities required by user, can be
performed by workstation, but requests for services provided by special
servers are sent to the server.
Example : V-Systems.
Processor-Pool model

• The user does not need large amount of computing power all the time.
• In this model, processors are pooled together to be shared by the users as
needed.
• The pool of processors consists of a large number of microcomputers and
minicomputers attached to the network.
• Each processor has its own memory to load and run a system program or an
application program of the DCS.
• The model has better utilization of processing power and greater flexibility.
Example: Amoeba & Cambridge Distributed Computing Systems.
Fig: Processor-Pool model
Hybrid model

• Workstation-server model has large number of computer users only


performing simple interactive tasks and executing small programs.
• The processor-pool model is more attractive and suitable for a working
environment, where groups of users who often perform jobs needing
massive computations.
• Combining advantages of above two, a hybrid model can be used to build a
distributed system.
• The processors in the pool can be allocated dynamically for computations
that are too large or require several computers for execution.
• Hybrid model gives response to the interactive jobs allowing them to be
more processed in local workstations of the users.
Distributed System Architecture
Software Architecture
• Layered Architecture
• Data centered Architecture
• Object Based Architecture
• Event based Architecture
System Architecture
• Client server Architecture
• Peer to peer Architecture
CHALLENGES

• Heterogeneity
• Openness
• Security
• Scalability
• Failure handling
• Concurrency
• Transparency
• Quality of service
CHALLENGES
Heterogeneity

• The Internet enables users to access services and run applications over a
heterogeneous collection of computers and networks. Heterogeneity (that is,
variety and difference) applies to all of the following:
• Hardware devices: computers, tablets, mobile phones, embedded devices,
etc.
• Operating System: MS Windows, Linux, Mac, Unix, etc.
• Network: Local network, the Internet, wireless network, satellite links, etc.
• Programming languages: Java, C/C++, Python, PHP, etc.
• Different roles of software developers, designers, system managers.
• Middleware: A software layer that provides a programming abstraction as well
as masking the heterogeneity.

• Heterogeneity and Mobile code: A program code that can be transferred from
one computer to another and run at the destination.
Openness

• The openness of distributed systems is determined primarily by the degree to


which new resource-sharing services can be added and be made available for use
by a variety of client programs.
• The well-defined interfaces for a system are published, it is easier for developers
to add new features or replace sub-systems in the future.
• Open distributed systems can be constructed from heterogeneous hardware,
software, and possibly from different vendors.
• Example: Twitter and Facebook have API that allows developers to develop theirs
own software interactively.
Security
• Security for information resources has three components:
• confidentiality (protection against disclosure to unauthorized individuals)
• integrity (protection against alteration or corruption),
• availability for the authorized (protection against interference with the means to
access the resources).
These challenges can be met by use of encryption techniques developed for this
purpose.

The following security challenges have not yet been fully met:
• Denial of service attack
• Security of mobile code
Scalability
A system is said to be scalable if it can handle the addition of users and resources
without suffering a noticeable loss of performance or increase in administrative
complexity.

Scalability has 3 dimensions:


•Size
•Number of users and resources to be processed. Problem associated is
overloading.
•Geography
•Distance between users and resources. Problem associated is
communication reliability.
•Administration
•As the size of distributed systems increases, many of the system needs to be
controlled.
Designing scalable distributed systems presents following challenges:

• Controlling the cost of physical resources, (adding server computers)


• Controlling the performance loss,
• Preventing software resources running out, and (eg: IP address)
• Avoiding performance bottlenecks.
Failure Handling

• Computer systems sometimes fail.


• When faults occur in hardware or software, programs may produce incorrect
results or may stop before they have completed the intended computation.
• The handling of failures in distributed systems is particularly difficult.

Failure: an offered services no longer complies with its specification.


Fault: cause of a failure.
Fault tolerance: no failure despite faults.
Detecting Failures:
• Some failures can be detected. For example, checksums can be used to detect
corrupted data in a message or a file.
• The challenge is to manage in the presence of failure that cannot be detected but
may be suspected.

Masking Failure:
• Some failures that have detected can be hidden or masked.
Two examples of hiding failures:
• Messages can be retransmitted when they fail to arrive.
• File data can be written to a pair of disks so that if one is corrupted the other may
still be correct.
Tolerating Failures:
• To detect and hide all of the failures in a large network is not possible at all time.
• Instead the clients as well as user can tolerate failures.
• Eg: Web browser- When a browser cannot contact the web server, it does not
make the user wait forever while it keeps on trying instead it informs user about
the problem (Try again later)

Recovery from failures:


• If the server gets crashed the permanent data can be recovered or rolled back by
means of software designed.

Redundancy:
• Services can be made to tolerate failures by the use of redundant components.
Concurrency

• Several clients will attempt to access a shared resource at the same time.
• For an object to be safe in a concurrent environment, its operations must be
synchronized in such a way that its data remains consistent.
• This can be achieved by standard techniques such as semaphores, which are
used in most operating systems.
Transparency
• Transparency is defined as the concealment from the user and the application
programmer of the separation of components in a distributed system, so that
the system is perceived as a whole rather than as a collection of independent
components.
• Distributed systems designers must hide the complexity of the systems as
much as they can. Some terms of transparency in distributed systems are:

Access Transparency, Location Transparency,


Concurrency Transparency, Replication Transparency,
Failure Transparency, Mobility Transparency,
Performance Transparency, Scaling Transparency
Quality of Service(QoS)
The criteria considered for quality of service are
• Reliability
• Security
• Performance
• Adaptability
QoS is a network ability to achieve maximum bandwidth and deal with other
network performance elements like latency, error rate, time.
QoS involves controlling and managing network resources by setting priorities for
specific types of data (video, audio, files) on network.
Quality of Service(QoS)
Three fundamental components for basic QoS implementation:
• Identification and marking techniques for coordinating QoS from end to end
between network elements.
• QoS within a single network.
• QoS policy, management and accounting functions to control and to monitor
end to end traffic across network.
Examples
Massively multiplayer online games (MMOGs)
• MMOGs offer an immersive experience where large number of users interact
through the Internet with a persistent virtual world.
Example: Sony’s EverQuest II, EVE Online from Finnish company CCP Games.

• The need for fast response times to preserve the user experience of the
game.
• The real-time propagation of events to the many players and maintaining a
consistent view of the shared world.
Google Search Engine

Architecture of the original Google search engine


In Google Search engine, the web crawling is done by several
distributed crawlers.
There is a URL server that sends lists of URLs to be fetched to the crawlers.
The web pages that are fetched are then sent to the storeserver, which
then compresses and stores the web pages into a repository.
The indexing function is performed by the indexer and the sorter. Indexer
reads the repository, uncompresses the documents, and parses them.
Each document is converted into a set of word occurrences called hits. The
hits record the word, position in the document, an approximation of font
size, and capitalization.
• The indexer distributes these hits into a set of barrels, creating a
partially sorted forward index.
• The indexer parses out all the links in every web page and stores
important information about them in an anchors file.
• This file contains enough information to determine where each link
points from and to, and the text of the link.
• The URLresolver reads the anchors file and converts relative URLs into
absolute URLs and in turn into docIDs. It puts the anchor text into the
forward index, associated with the docID that the anchor points to.
• It also generates a database of links, which are pairs of docIDs. The
links database is used to compute PageRank for all the documents.
CASE STUDY: WWW

• The World Wide Web [www.w3.org I, Berners-Lee 1991] is an evolving system for
publishing and accessing resources and services across the Internet.
• Through commonly available web browsers, users retrieve and view documents
of many types, listen to audio streams and view video streams, and interact with
an unlimited set of services.
• Web has document with links known as hyperlinks if we click on a link it move on
to next page.

Eg: Java web server


CASE STUDY: WWW

This picture shows some web servers, and browsers making requests to them. It is an important
feature that users may locate and manage their own web servers anywhere on the Internet.
CASE STUDY: WWW

WEB : (open source)

• Web contains document with links (hyperlinks).


• Browser is used to retrieve information/data from web.
• The Web is based on three main standard technological components:

• HTML
• URL
• HTTP
Features of WWW
• HyperText Information System.
• Cross-Platform.
• Distributed.
• Open Standards and Open Source.
• Uses Web Browsers to provide a single interface for many services.
• Dynamic, Interactive and Evolving.
• “Web 2.0”
Features of WEB
WWW Architecture
• Identifiers and Character Set
• Uniform Resource Identifier (URI) is used to uniquely identify resources on
the web and UNICODE makes it possible to built web pages that can be read
and write in human languages.
• Syntax
• XML (Extensible Markup Language) helps to define common syntax in
semantic web.
• Data Interchange
• Resource Description Framework (RDF) framework helps in defining core
representation of data for web. RDF represents data about resource in graph
form.
• Taxonomies
• RDF Schema (RDFS) allows more standardized description of taxonomies and
other ontological constructs.
• Ontologies
• Web Ontology Language (OWL) offers more constructs over RDFS. It comes in following three versions:
• OWL Lite for taxonomies and simple constraints.
• OWL DL for full description logic support.
• OWL for more syntactic freedom of RDF
• Rules
• RIF-Rule interchange format
• SWRL-semantic web rule language
• RIF and SWRL offers rules beyond the constructs that are available from RDFs and OWL. Simple Protocol
and RDF Query Language (SPARQL) is SQL like language used for querying RDF data and OWL Ontologies.
• Proof
• All semantic and rules that are executed at layers below Proof and
their result will be used to prove deductions.
• Cryptography
• Cryptography means such as digital signature for verification of the
origin of sources is used.
• User Interface and Applications
• On the top of layer User interface and Applications layer is built for
user interaction.
Parts of URL
• Protocol-HTTP,HTTPS
• Host name/domain name: example.com
• Port name: HTTp:80,HTTPS:443
• Path:www.google.com/cloud
• Query: ??
• Parameters: q=computing
• Fragment: #history
CASE STUDY: WWW

HTML:

• The HyperText Markup Language (HTML), a language for specifying the contents
and layout of pages as they are displayed by web browsers.
• The HyperText Markup Language [www.w3.org II] is used to specify the text and
images that make up the contents of a web page, and to specify how they are laid
out and formatted for presentation to the user.
• A web page contains such structured items as headings, paragraphs, tables and
images. HTML is also used to specify links and which resources are associated
with them.
CASE STUDY: WWW

Sample HTML code:


<!DOCTYPE html>
<html> -describes HTML document
<head> - information about document
<title>Page Title</title>
</head>
<body>- visible page content
<p>content</p> - content
</body>
</html>
CASE STUDY: WWW

URL:

• Viewing a web page begins either by typing the URL of the page into a web
browser, or by following a hypertext link to that page or resource.
• URL is a specialization that defines the network location of a specific resource.
• Uniform Resource Locators (URLs), also known as Uniform Resource Identifiers
(URIs), which identify documents and other resources stored as part of the Web.

Eg: http:// servername [:port] [/pathName] [?query] [ #fragment]


CASE STUDY: WWW

HTTP:

• HTTP stands for hypertext transfer protocol.


• It is a communication protocol used to transfer hypertext documents on the web.
• HTTP is request-reply protocol. It is an application service for retrieving a web
document.
• A client-server system architecture, with standard rules for interaction (the Hyper
Text Transfer Protocol – HTTP) by which browsers and other clients fetch
documents and other resources from web servers.
• HTTP is called as a stateless protocol because each command is executed
independently without any knowledge of the request that were executed before
it.
CASE STUDY: WWW

Request methods in HTTP protocol

• GET method is appended onto the end of the action being requested. Get request
should not send large amount of information.
• HEAD method: is used to ask only for information about document not entire
document.
• POST method: transmits all form input information immediately after the
request.
CASE STUDY: WWW

Dynamic Pages:
Common Gateway Interface (CGI):
• Standard environment for web servers to interface with executable programs
installed on a server that generate web pages dynamically.

APPLET:
It is a small program can be placed on a web page and it will be executed by the
web browser which gives web pages dynamic content.
CASE STUDY: WWW

Asynchronous Javascript and XML (AJAX)


It is a method of building interaction applications for the web that process user
request immediately.
Ajax combines several programming tools including Javascript, XML, CSS.
Google maps is well known application that uses AJAX.
There are 4 main benefits of using Ajax in web applications:
• Call backs
• Making Asynchronous calls
• User friendly.
• Increased speed.
CASE STUDY: WWW

Web Services:
• Web services are open standard (XML, SOAP, HTTP etc.,) based web applications
that interact with other web applications for the purpose of exchanging data.
Features:
• Uses a standardized XML messaging system.
• Is available over the internet or private networks.
Components of Web services
• SOAP(Simple Object Access Protocol)
• UDDI(Universal Description, Discovery, Integration)
• WSDL(Web service Description Language)

You might also like