You are on page 1of 35

Chapter 1 - Introduction

1.1 Introduction and Definition


ƒ before the mid-80s, computers were
ƒ very expensive (hundred of thousands or even millions
of dollars)
ƒ very slow (a few thousand instructions per second)
ƒ not connected among themselves
ƒ after the mid-80s: two major developments
ƒ cheap and powerful microprocessor-based computers
appeared
ƒ computer networks
ƒ LANs at speeds ranging from 10 to 1000 Mbps (now
even 10, 40, and 100Gbps)
ƒ WANs at speed ranging from 64 Kbps to gigabits/sec
ƒ consequence
ƒ feasibility of using a large network of computers to
work for the same application; this is in contrast to the
old centralized systems where there was a single
computer with its peripherals 2
ƒ Definition of a Distributed System
ƒ a distributed system is:
a collection of independent computers that appears to its
users as a single coherent system - computer (Tanenbaum
& Van Steen)

ƒ this definition has two aspects:


1. hardware: autonomous machines
2. software: a single system view for the users

3
ƒ Other Definitions
A distributed system is a system designed to support the
development of applications and services which can exploit a
physical architecture consisting of multiple, autonomous
processing elements that do not share primary memory but
cooperate by sending asynchronous messages over a
communication network (Blair & Stefani)

A distributed system is one that stops you getting any work


done when a machine you have never even heard of crashes
(Leslie)

4
ƒ Why Distributed?
ƒ Resource and Data Sharing
ƒ printers, databases, multimedia servers, ...
ƒ Availability, Reliability
ƒ the loss of some instances can be hidden
ƒ Scalability, Extensibility
ƒ the system grows with demand (e.g., extra servers)
ƒ Performance
ƒ huge power (CPU, memory, ...) available
ƒ Inherent distribution, communication
ƒ organizational distribution, e-mail, video

5
ƒ Problems of Distribution
ƒ Concurrency, Security
ƒ clients must not disturb each other
ƒ Privacy
ƒ e.g., when building a preference profile such as using
cookies
ƒ unwanted communication such as spam
ƒ Partial failure
ƒ we often do not know where the error is (e.g., RPC)
ƒ Location, Migration, Relocation, Replication
ƒ clients must be able to find their servers
ƒ Heterogeneity
ƒ hardware, platforms, languages, management

6
ƒ Characteristics of Distributed Systems
ƒ differences between the computers and the ways they
communicate are hidden from users
ƒ users and applications can interact with a distributed
system in a consistent and uniform way regardless of
location
ƒ distributed systems should be easy to expand and scale
ƒ a distributed system is normally continuously available,
even if there may be partial failures

7
1.2 Goals of a Distributed System
ƒ to support heterogeneous computers and networks and
to provide a single-system view, a distributed system is
often organized by means of a layer of software called
middleware that extends over multiple machines

a distributed system organized as middleware; note that the middleware


layer extends over multiple machines, and offers each application the
same interface
Ack: most diagrams in all slides are taken from the textbook
8
ƒ a distributed system should
ƒ easily connect users with resources (printers, computers,
storage facilities, data, files, Web pages, ...)
ƒ Some of the reasons
ƒ economics: sharing resources such as printers and
high-speed computers
ƒ to collaborate and exchange information
ƒ groupware: software for collaborative editing,
teleconferencing, etc.
ƒ e-commerce: buying and selling goods
ƒ be transparent: hide the fact that the resources and
processes are distributed across multiple computers
ƒ be open
ƒ be scalable
ƒ Transparency in a Distributed System
ƒ a distributed system that is able to present itself to users
and applications as if it were only a single computer
system is said to be transparent 9
ƒ different forms of transparency in a distributed system
Transparency Description
Access Hide differences in data representation
(endianness, file naming, ...) and how a resource
is accessed
Location Hide where a resource is physically located; where
is http://www.prenhall.com/index.html? (naming)
Migration Hide that a resource may move to another location
Relocation Hide that a resource may be moved to another
location while in use; e.g., mobile users using their
wireless laptops and moving from place to place
Replication Hide that a resource is replicated (for availability
and performance); all replicas have the same name
Concurrency Hide that a resource may be shared by several
competitive users; a resource must be left in a
consistent state; through locking
Failure Hide the failure and recovery of a resource
ƒ But trying to achieve all distribution transparency may be
impossible or may not be a good idea 10
ƒ Openness in a Distributed System
ƒ a distributed system should be open
ƒ we need well-defined interfaces
ƒ interoperability
ƒ components of different origin can communicate
ƒ portability
ƒ components work on different platforms
ƒ another goal of an open distributed system is that it
should be flexible and extensible; easy to configure the
system out of different components; easy to add new
components, replace existing ones; easier said than done
ƒ an Open Distributed System is a system that offers
services according to standard rules that describe the
syntax and semantics of those services; e.g., protocols in
networks
ƒ standards - a necessity
11
ƒ in distributed systems, such services are often specified
through interfaces often described using an Interface
Definition Language (IDL)
ƒ specify only syntax: the names of the functions, types
of parameters, return values, possible exceptions, ...
ƒ semantics given in an informal way by means of natural
languages

ƒ Scalability in Distributed Systems


ƒ a distributed system should be scalable; there are three
dimensions
ƒ size: adding more users and resources to the system
ƒ geographically: users and resources may be far apart
ƒ administratively: should be easy to manage even if it
spans many administrative organizations
ƒ but a scalable system may exhibit performance problems

12
ƒ scalability problems leading to low performance

Concept Example
Single server for all users-mostly for
Centralized services
security reasons
Centralized data A single on-line telephone book
Doing routing based on complete
Centralized algorithms
information
examples of scalability limitations

ƒ Scaling Techniques: how to solve scaling problems


ƒ the problem is mainly performance, and arises as a result
of limitations in the capacity of servers and networks (for
geographical scalability with high latency and mostly
unreliable links)
ƒ three possible solutions: hiding communication latencies,
distribution, and replication
13
a. Hide Communication Latencies
ƒ try to avoid waiting for responses to remote service
requests
ƒ let the requester do other useful job
ƒ i.e., construct requesting applications that use only
asynchronous communication instead of synchronous
communication; when a reply arrives the application is
interrupted
ƒ good for batch processing and parallel applications
since independent tasks can be scheduled while
another task is waiting for communication to complete
or use multithreading for non-parallel programs
ƒ hiding communication latencies is not in general
applicable for interactive applications
ƒ for interactive applications, try to reduce
communication; move part of the job to the client to
reduce communication; e.g., filling a form to access a
database and checking the entries 14
(a) a server checking the correctness of field entries
(b) a client doing the job
ƒ e.g., checking the completeness of mandatory fields
ƒ shipping code is now supported in Web applications using
Java Applets and ActiveX controls (with some security
issues) 15
b. Distribution
ƒ means splitting a component into smaller parts and
spreading those parts across the system
ƒ e.g., DNS - Domain Name System (abebe@aau.edu.et)
ƒ divide the name space into nonoverlapping zones
ƒ for details, see later in Chapter 5 - Naming

an example of dividing the DNS name space into zones 16


c. Replication
ƒ replicate components across a distributed system to
increase availability and for load balancing, leading to
better performance
ƒ replication is decided by the owner of a resource
ƒ caching (a special form of replication) also reduces
communication latency; decided by the user
ƒ but, caching and replication may lead to consistency
problems (see Chapter 7 - Consistency and Replication)

17
Pitfalls when Developing Distributed Systems
„ because of false assumptions made by first time
developers (of distributed systems) which are related to
the properties of distributed systems and do not occur in
nondistributed applications
ƒ The network is reliable (making it difficult to achieve
failure transparency)
ƒ The network is secure
ƒ The network is homogeneous
ƒ The topology does not change
ƒ Latency is zero
ƒ Bandwidth is infinite
ƒ Transport cost is zero
ƒ There is one administrator

18
1.3 Types of Distributed Systems
ƒ Three types: distributed computing systems, distributed
information systems, and distributed pervasive/embedded
systems
1. Distributed Computing Systems
ƒ Used for high-performance computing tasks
ƒ two types: cluster computing and grid computing
ƒ Cluster Computing
ƒ a collection of similar workstations or PCs
(homogeneous), closely connected by means of a
high-speed LAN
ƒ each node runs the same operating system
ƒ used for parallel programming in which a single
compute intensive program is run in parallel on
multiple machines

19
an example of a cluster computing system
„ a master node runs a middleware (containing libraries for
parallel programs) and controls other compute nodes; it
„ allocates tasks
„ provides an interface to users
„ etc. 20
ƒ Grid Computing
ƒ “Resource sharing and coordinated problem solving in
dynamic, multi-institutional virtual organizations” (Ian
Foster)
ƒ high degree of heterogeneity: no assumptions are made
concerning hardware, operating systems, networks,
administrative domains, security policies, etc.
ƒ Globus is a software system for Grid Computing; read
about the Globus Alliance at http://www.globus.org/
2. Distributed Information Systems
ƒ many networked applications
ƒ Problem: interoperability
ƒ at the lowest level: wrap a number of requests into a
single larger request and have it executed as a
distributed transaction; all or none of the requests would
be executed
ƒ how to let applications communicate directly with each
other, i.e., Enterprise Application Integration (EAI) 21
a. Transaction Processing Systems
ƒ consider database applications
ƒ special primitives are required to program transactions,
supplied either by the underlying distributed system or by
the language runtime system
ƒ exact list of primitives depends on the type of application;
procedure calls, ordinary statements, etc. can also be
included
Primitive Description
BEGIN_TRANSACTION Mark the start of a transaction
Terminate the transaction and try to
END_TRANSACTION
commit
Kill the transaction and restore the old
ABORT_TRANSACTION
values
Read data from a file, a table, or
READ
otherwise
Write data to a file, a table, or
WRITE
otherwise 22
ƒ The Transaction Model
ƒ the model for transactions comes from the world of
business
ƒ a supplier and a retailer negotiate on
ƒ price
ƒ delivery date
ƒ quality
ƒ etc.
ƒ until the deal is concluded they can continue negotiating
or one of them can terminate
ƒ but once they have reached an agreement they are bound
by law to carry out their part of the deal
ƒ transactions between processes is similar with this
scenario

23
ƒ e.g., assume the following banking operation
ƒ withdraw an amount x from account 1
ƒ deposit the amount x to account 2
ƒ what happens if there is a problem after the first activity is
carried out?
ƒ group the two operations into one transaction; either both
are carried out or neither
ƒ we need a way to roll back when a transaction is not
completed

24
ƒ e.g. reserving a seat from Manchester to Lalibella through
Heathrow and AA Bole airports

BEGIN_TRANSACTION BEGIN_TRANSACTION
reserve Man → Heathrow; reserve Man → Heathrow;
reserve Heathrow → Bole; reserve Heathrow → Bole;
reserve Bole → Lalibella; reserve Bole → Lalibella full ⇒
END_TRANSACTION ABORT_TRANSACTION
(a) (b)

(a) transaction to reserve three flights commits


(b) transaction aborts when third flight is unavailable

25
ƒ properties of transactions, often referred to as ACID
1. Atomic: to the outside world, the transaction happens
indivisibly; a transaction either happens completely or
not at all; intermediate states are not seen by other
processes
2. Consistent: the transaction does not violate system
invariants; e.g., in an internal transfer in a bank, the
amount of money in the bank must be the same as it
was before the transfer (the law of conservation of
money); this may be violated for a brief period of time,
but not seen to other processes
3. Isolated or Serializable: concurrent transactions do not
interfere with each other; if two or more transactions
are running at the same time, the final result must look
as though all transactions run sequentially in some
order
4. Durable: once a transaction commits, the changes are
permanent; see later in Chapter 8 - Fault Tolerance
26
ƒ Classification of Transactions
ƒ a transaction could be flat, nested or distributed
ƒ Flat Transaction
ƒ consists of a series of operations that satisfy the ACID
properties
ƒ simple and widely used but with some limitations
ƒ do not allow partial results to be committed or aborted
ƒ i.e., atomicity is also partly a weakness
ƒ in our airline reservation example, we may want to
accept the first two reservations and find an
alternative one for the last
ƒ some transactions may take too much time

27
ƒ Nested Transaction
ƒ constructed from a number of subtransactions; it is
logically decomposed into a hierarchy of
subtransactions; the flight reservation can be split into
three transactions, each accessing a different database
ƒ the top-level transaction forks off children that run in
parallel, on different machines; to gain performance or
for programming simplicity
ƒ each may also execute one or more subtransactions
ƒ permanence (durability) applies only to the top-level
transaction; commits by children should be undone
ƒ Distributed Transaction
ƒ a flat transaction that operates on data that are
distributed across multiple machines
ƒ problem: separate algorithms are needed to handle the
locking of data and committing the entire transaction;
see later in Chapter 8 for distributed commit
28
(a) a nested transaction
(b) a distributed transaction

29
b. Enterprise Application Integration
ƒ how to integrate applications independent from their
databases
ƒ transaction systems rely on request/reply
ƒ how can applications communicate with each other; by
means of a middleware

middleware as a communication facilitator in enterprise application


integration 30
ƒ there are different communication models
ƒ RPC (Remote procedure Call)
ƒ RMI (Remote Method Invocation)
ƒ MOM (Message-Oriented Middleware)
ƒ Stream-Oriented Communication
ƒ Multicast Communication
ƒ see later in Chapter 4 - Communication
3. Distributed Pervasive Systems
ƒ the distributed systems discussed so far are
characterized by their stability; fixed nodes having high-
quality connection to a network
ƒ there are also mobile and embedded computing devices
which are small, battery-powered, mobile, and with a
wireless connection

31
ƒ three requirements for pervasive applications
ƒ embrace contextual changes: a device is aware that
its environment (location, identities of nearby people
and objects, time of the day, season, temperature,
etc.) may change all the time, e.g., by changing its
network access point; hence its operations and
services must be adapted to the current context
ƒ encourage ad hoc composition: devices are used in
different ways by different users
ƒ recognize sharing as the default: devices join a
system to access or provide information
ƒ examples of pervasive systems
ƒ Home Systems that integrate consumer electronics
ƒ Electronic Health Care Systems to monitor the well-
being of individuals
ƒ Sensor Networks
ƒ read pages 26 - 30
32
[ Diversion
ƒ Different approaches to distribution - Lost in the forest of
distribution
ƒ Distributed system
ƒ N autonomous computers (sites): n administrators, n
data/control flows
ƒ an interconnection network
ƒ user view: one single (virtual) system
ƒ (traditional) programmer view: client-server
ƒ Parallel system
ƒ 1 computer, n nodes: one administrator, one
scheduler, one power source
ƒ memory: it depends
ƒ programmer view: one single machine executing
parallel codes; various programming models
(message passing, distributed shared memory, …)
33
ƒ Cluster computing
ƒ use of PCs interconnected by a (high performance)
network as a parallel (cheap) machine
ƒ Network computing
ƒ from LAN (cluster) computing to WAN computing
ƒ set of machines distributed over a MAN/WAN that are
used to execute parallel loosely coupled codes
ƒ depending on the infrastructure, network computing
comes in many flavours: grid computing, P2P, Internet
computing, etc.
a. Grid computing
ƒ “Resource sharing and coordinated problem solving
in dynamic, multi-institutional virtual organizations”
(Ian Foster)
b. Peer-to-peer computing
ƒ a site is both client and server
ƒ application: mostly file sharing, but also others like
Internet Telephony (Skype) 34
ƒ 2 approaches:
ƒ centralized management: Napster
ƒ distributed management: Gnutella, Kazaa
c. Internet Computing
ƒ use of (idle) computers interconnected by Internet for
processing large throughput applications
ƒ programmer view: a single master, n servants
ƒ Cloud Computing
ƒ a general term for anything that involves delivering hosted
services over the Internet
ƒ a model for enabling convenient, on-demand network
access to a shared pool of configurable computing
resources (e.g., networks, servers, storage, applications,
and services) that can be rapidly provisioned and released
with minimal management effort or service provider
interaction
ƒ Service models: Software as a Service - SaaS; Platform as
a Service – PaaS; Infrastructure as a Service - IaaS
] 35

You might also like