Professional Documents
Culture Documents
o Shareability
o Expandability
o Local autonomy
Disadvantages:
o Network reliance
o Complexities
o Security
o Multiple point of failure
1
It is generally impossible to detect whether a server is down, or that it is
simply slow in responding. Consequently, a system may have to report
that a service is not available, although, in fact, the server is just slow.
2
Q9: What is the difference between a multiprocessor and a
multicomputer?
In a multiprocessor, the CPUs have access to a shared main memory.
There is no shared memory in multicomputer systems. In a
multicomputer system, the CPUs can communicate only through
message passing.
4
Q17: Give five types of hardware resource and five types of
data or software resource that can usefully be shared. Give
examples of their sharing as it occurs in practice in
distributed systems.
o Hardware:
- CPU: compute server (executes processor-intensive applications
for clients), remote object server (executes methods on behalf of
clients), worm program (shares CPU capacity of desktop machine
with the local user). Most other servers, such as file servers, do
some computation for their clients, hence their CPU is a shared
resource.
- memory: cache server (holds recently accessed web pages in its
RAM, for faster access by other local computers) disk: file server,
virtual disk server (see Chapter 8), video on demand server.
- screen: Network window systems, such as X-11, allow processes
in remote computers to update the content of windows.
- printer: networked printers accept print jobs from many computers.
managing them with a queuing system.
- network capacity: packet transmission enables many simultaneous
communication channels (streams of data) to be transmitted on the
same circuits.
o Data/software:
- web page: web servers enable multiple clients to share read-only
page content (usually stored in a file, but sometimes generated on-
the-fly).
- file: file servers enable multiple clients to share read-write files.
Conflicting updates may result in inconsistent results. Most useful
for files that change infrequently, such as software binaries.
- database: databases are intended to record the definitive state of
some related sets of data. They have been shared ever since
multi-user computers appeared. They include techniques to
manage concurrent updates.
- newsgroup content: The Netnews system makes read-only copies
of the recently posted news items available to clients throughout
5
the Internet. A copy of newsgroup content is maintained at each
Netnews server that is an approximate replica of those at other
servers. Each server makes its data available to multiple clients.
- video/audio stream: Servers can store entire videos on disk and
deliver them at playback speed to multiple clients simultaneously.
- exclusive lock: a system-level object provided by a lock server,
enabling several clients to coordinate their use of a resource (such
as printer that does not include a queuing scheme).
6
Assignment 2
2
Q8: We gave two examples of using interceptors in
adaptive middleware. What other examples come to mind?
There are several. For example, we could use an interceptor to support
mobility. In that case, a request-level interceptor would first look up the
current location of a referenced object before the call is forwarded.
Likewise, an interceptor can be used to transparently encrypt messages
when security is at stake. Another example is when logging is needed.
Instead of letting this be handled by the application, we could simply
insert a method-specific interceptor that would record specific events
before passing a call to the referenced object. More of such example will
easily come to mind.
3
Q11: Give an example of a self-managing system in which
the analysis component is completely distributed or even
hidden.
We already came across this type of system: in unstructured peer-to-
peer systems where nodes exchange membership information, we saw
how a topology could be generated. The analysis component consists of
dropping certain links that will not help converge to the intended
topology. Similar examples can be found in other such systems as we
referred to as well.
4
A torrent file contains the information that is needed to download a
specific file.
It refers to what is known as a tracker, which is a server that is keeping
an accurate account of active nodes that have (chunks) of the requested
file.
An active node is one that is currently downloading another file.
Obviously, there will be many different trackers, although (there will
generally be only a single tracker per file (or collection of files).
Once the nodes have been identified from where chunks can be
downloaded, the downloading node effectively becomes active. At that
point, it will be forced to help others.
5
Browsers are clients of Domain Name Servers (DNS) and web servers
(HTTP). Some intranets are configured to interpose a Proxy server.
Proxy servers fulfil several purposes – when they are located at the
same site as the client, they reduce network delays and network traffic.
When they are at the same site as the server, they form a security
checkpoint, and they can reduce load on the server. N.B. DNS servers
are also involved in all the application architectures described below, but
they ore omitted from the discussion for clarity.
Email:
6
tables to determine a route for each message and then forwards the
message to the next SMTP server on the chosen route. Each SMTP
server similarly processes and forwards each incoming message unless
the domain name in the message address matches the local domain. In
the latter case, it attempts to deliver the message to local recipient by
storing it in a mailbox file on a local disk or file server. Reading
messages: User Agent (the user’s mail reading program) is either a
client of the local file server or a client of a mail delivery server such as a
POP or IMAP server. In the former case, the User Agent reads
messages directly form the mailbox file in which they were placed during
the message delivery. (Examples of such user agents are the UNIX mail
and pine commands.) In the latter case, the User Agent requests
information about the contents of the user’s mailbox file from a POP or
IMAP server and receives messages Distributed Systems, Edition 5:
Chapter 2 Solutions.fm 3 from those servers for presentation to the user.
POP and IMAP are protocols specifically designed to support mail
access over wide areas and slow network connections, so a user can
continue to access her home mailbox while travelling.
Netnews:
Posting news articles: User Agent (the user’s news composing program)
is a client of a local NNTP server and passes each outgoing article to the
7
NNTP server for delivery. Each article is assigned a unique identifier.
Each NNTP server holds a list of other NNTP servers for which it is a
newsfeed – they are registered to receive articles from it. It periodically
contacts each of the registered servers, delivers any new articles to
them and requests any that they have which it has not (using the
articles’ unique id’s to determine which they are). To ensure delivery of
every article to every Netnews destination, there must be a path of
newsfeed connections from that reaches every NNTP server.
Browsing/reading articles: User Agent (the user’s news reading
program) is a client of a local NNTP server. The User Agent requests
updates for all the newsgroups to which the user subscribes and
presents them to the user.
8
- people often turn their desktop computers off when not using them.
Even if on most of the time, they will be off when user is away for an
extended time, or the computer is being moved.
- The owners of participating computers are unlikely to be known to
other participants, so their trustworthiness is unknown. With current
hardware and operating systems, the owner of a computer has total
control over the data on it and may change it or delete it at will.
- Network connections to the peer computers are exposed to attack
(including denial of service).
The importance of these problems depends on the application. For
the music downloading that was the original driving force for peer-to-
peer it isn’t very important. Users can wait until the relevant host is
running to access a particular piece of music. There is little motivation
for users to tamper with the music. But for more conventional
applications such as file storage availability and integrity are all-
important.
Solutions:
Replication:
- The object’s identifier is derived from its hash code. The identifier is
used to address the object. When the object is received by a client,
the hash code can be checked for correspondence with the identifier.
9
The hash algorithms used must obey the properties required of a
secure hash algorithm.
11
Assignment 3
1
server call: 5 + 2000/40000 + 5 + 50000/40000 = 11.3
milliseconds.
Q2: The Internet is far too large for any router to hold
routing information for all destinations. How does the
Internet routing scheme deal with this issue?
If a router does not find the network id portion of a destination address in
its routing table, it dispatches the packet to a default address an
adjacent gateway or router that is designated as responsible for routing
packets for which there is no routing information available. Each router’s
default address carries such packets towards a router than has more
complete routing information, until one is encountered that has a specific
entry for the relevant network id.
Q4: Make a table like Figure 3.5 describing the work done
by the software in each protocol layer when Internet
applications and the TCP/IP suite are implemented over an
Ethernet.
2
Q5: Can we be sure that no two computers on the Internet
have the same IP address?
Nothing on the same LAN can have the same IP address as anything
else but something on somebody else’s LAN can have the same
addresses as yours. As your LAN breaks out through a gateway, the IP
address is translated to a public facing one through that. Incoming is
directed to an address on the LAN, outgoing to another public facing IP
address which in turn translates that for its own LAN or whatever behind
it.
There are three important things in an IPV4 config:
Address (device LAN address)
Subnet mask
Default gateway.
The first is the device’s identity on the LAN, the second tells it which
portions of the address define the LAN and which the individual device,
and the third where it’s all going to and coming from. So, any two LANS
can be an exact copy of each other, but the addresses must be
translated for the public gateways into something unique.
3
each of the following application-level or presentation-level
protocols:
➢ virtual terminal access (for example, Telnet).
➢ ii) file transfer (for example, FTP).
➢ user location (for example, rwho, finger).
➢ information browsing (for example, HTTP).
➢ remote procedure call
i) The long duration of sessions, the need for reliability and the
unstructured sequences of characters transmitted make connection-
oriented communication most suitable for this application. Performance
is not critical in this application, so the overheads are of little
consequence.
ii) File calls for the transmission of large volumes of data.
Connectionless would be ok if error rates are low and the messages can
be large, but on the Internet, these requirements aren’t met, so TCP is
used.
iii)Connectionless is preferable, since messages are short, and a single
message is sufficient for each transaction.
iv) Either mode could be used. The volume of data transferred on each
transaction can be quite large, so TCP is used in practice.
v) RPC achieves reliability by means of timeouts and re-trys. so
connectionless (UDP) communication is often preferred.
4
hosts on the network. Packets are therefore transmitted and received in
strict sequence. It can’t happen in ATM networks because they are
connection oriented. Transmission is always through virtual channels,
and VCs guarantee to deliver data in the order in which it is transmitted.
5
message, parsing it and deciding whether it need be acted upon are
incurred by every host on the network, whereas only a small number are
likely locations for a given resource. Despite this, note that the Internet
ARP does rely on Ethernet broadcasting. The trick is that it doesn’t do it
very often - just once for each host to locate other hosts on the local net
that it needs to communicate with. ii. Broadcasting is hardly feasible in a
large-scale network such as the Internet. It might just be possible in an
intranet, but ought to be avoided for the reasons given above. Ethernet
multicast addresses are matched in the Ethernet controller. Multicast
messages are passed up to the OS only for addresses that match
multicast groups the local host is subscribing to. If there are several
such, the address can be used to discriminate between several daemon
processes to choose one to handle each message.
6
7
Q12: Use the diagram in Figure 3.13 as a basis for an
illustration showing the segmentation and encapsulation
of an HTTP request to a server and the resulting reply.
Assume that the request is a short HTTP message, but the
reply includes at least 2000 bytes of HTML.
8
would wait 0.5 seconds. While the Nagle’s algorithm is waiting for an
acknowledgement, the server process can write additional data (e.g.,
image files) into the buffer. They will be sent as soon as the
acknowledgement is received.
b) For a remote shell (Telnet) application: the application will write
individual keystrokes into the buffer (and in the normal case of full
duplex terminal interaction they are echoed by the remote host to the
Telnet client for display). With the basic algorithm, full duplex operation
would result in a delay of 0.5 seconds before any of the characters typed
are displayed on the screen. With Nagle’s algorithm, the first character
typed is sent immediately and the remote host echoes it with an
acknowledgement piggybacked in the same packet. The
acknowledgement triggers the sending of any further characters that
have been typed in the intervening period. So, if the remote host
responds sufficiently rapidly, the display of typed characters appears to
be instantaneous. But note that a badly written remote application that
reads data from the TCP buffer one character at a time can still cause
problems - each read will result in an acknowledgement indicating that
one further character should be sent - resulting in the transmission of an
entire IP frame for each character. Clarke [1982] called this the silly
window syndrome. His solution is to defer the sending of
acknowledgements until there is a substantial amount of free space
available.
c) For a continuous mouse input (e.g., sending mouse positions to an X-
Windows application running on a compute server): this is a difficult form
of input to handle remotely. The problem is that the user should see
smooth feedback of the path traced by the mouse, with minimal lag.
Neither the basic TCP algorithm nor Nagle’s nor Clarke’s algorithm
achieves this very well. A version of the basic algorithm with a short
timeout (0.1 seconds) is the best that can be done, and this is effective
when the network is lightly loaded and has low end-to-end latency -
conditions that can be guaranteed only on local networks with controlled
loads. See Tanenbaum [1996] pp. 534-5 for further discussion of this.
9
Q14: Construct a network diagram like Figure 3.10 for the
local network at your institution or company.
The first part of the question is a little misleading. Neither Ethernet nor
the Internet support ‘discovery’ services as such. A newly installed
computer must be configured with the domain names of any servers that
it needs to access. The only exception is the DNS. Services such as
BootP and DHCP enable a newly connected host to acquire its own IP
address and to obtain the IP addresses of one or more local DNS
servers. To obtain the IP addresses of other servers (e.g., SMTP, NFS,
etc.) it must use their domain names. In Unix, the nslookup command
can be used to examine the database of domain names in the local DNS
servers and a user can select appropriate ones for use as servers. The
domain names are translated to IP addresses by a simple DNS request.
The Address Resolution Protocol (ARP) provides the answer to the
second part of the question. This is described on pages 95-6. Each
network type must implement ARP in its own way. The Ethernet and
related networks use the combination of broadcasting and caching of the
results of previous queries described on page 96.
11
Q17: Can firewalls prevent denial of service attacks such
as the one described on page 112? What other methods
are available to deal with such attacks?
12
Final Revision
Sheet
1
Administrative scalability is the Number of administrative domains.
Challenges of designing scalable distributed systems:
▪ Cost of physical resources
o cost should linearly increase with system size
▪ Performance Loss
o For example, in hierarchically structure data, search
performance loss due to data growth should not be exceed
certain size of data.
▪ Preventing software resources running out:
o Numbers used to represent Internet address (32 bit- >64bit)
▪ Avoiding performance bottlenecks:
o Use decentralized algorithms (centralized DNS to
decentralized).
2
o Tasks can be completed despite failures
o E.g., message retransmission, failure of a Web server node
should not bring down the website.
▪ Replication transparency
o Access to replicated resources as if there was just one. And
provide enhanced reliability and performance without
knowledge of the replicas by users or application
programmers.
▪ Migration (mobility / relocation) transparency
o Allow the movement of resources and clients within a system
without affection the operation of users or applications.
o E.g., switching from one name server to another at runtime;
migration of an agent / process form one node to another.
▪ Concurrency transparency
o there are other sharing the same resources
▪ Performance transparency:
o Allows the system to be reconfigured to improve
performance as loads vary.
o E.g., dynamic addition/deletion of components. switching
from linear structures to hierarchical structures when the
number of users increase.
▪ Scaling transparency:
o Allows the system and applications to expand in scale
without change to the system structure or the application
algorithms.
▪ Application-level transparencies:
o Persistence transparency
▪ Masks the deactivation and reactivation of an object
o Transaction transparences
▪ Hides the coordination required to satisfy the
transactional properties of operations.
3
o E.g., Network File System
▪ Location transparency
o Access without knowledge of location
o E.g., separation of domain name from machine address.
4
Q9: What is the difference between a distributed operating
system and a network operating system?
A distributed operating system manages multi-processors and
homogeneous multi-computers.
A network operating system connects different independent computers
that each have their own operating system so that users can easily use
the services available on each computer.
5
Encryption is a fundamental technique: used to implement confidentiality
and integrity. (transform data into unreadable).
6
▪ Different information is created and maintained by
different persons (e.g., Web pages)
o People
▪ Computer supported collaborative work (virtual teams,
engineering, virtual surgery)
o Retail store and inventory systems for supermarket chain
(e.g., Coles, Safeway)
▪ Power imbalance and load variation
o Distribute computational load among different computers.
▪ Reliability
o Long term preservation and data backup (replication) at
different location.
▪ Economies
o Sharing a printer by many users and reduce the cost of
ownership.
o Building a supercomputer out of a network of computers.
Numbers of
Smallest Large Largest
computer
7
A system that consists of a collection of two or more independent
computers which coordinate their processing through the exchange of
synchronous or asynchronous message passing.
▪ Advantages
o Shareability
o Expandability
o Local autonomy
o Improved performance
o Improved reliability and availability
o Potential cost reductions
▪ Disadvantages
o Network reliance
o Complexities
o Security
o Multiple point of failure
8
A distributed system organized as middleware. The middleware layer
extends over multiple machines and offers each application the same
interface.
9
In a multiprocessor, the CPUs have access to a shared main memory.
There is no shared memory in multicomputer systems. In a
multicomputer system, the CPUs can communicate only through
message passing.
10
o printer: networked printers accept print jobs from many
computers. managing them with a queuing system.
o network capacity: packet transmission enables many
simultaneous communication channels (streams of data) to
be transmitted on the same circuits.
▪ Data/software:
o web page: web servers enable multiple clients to share read-
only page content (usually stored in a file, but sometimes
generated on-the-fly).
o file: file servers enable multiple clients to share read-write
files. Conflicting updates may result in inconsistent results.
Most useful for files that change infrequently, such as
software binaries.
o object: possibilities for software objects are limitless. E.g.,
shared whiteboard, shared diary, room booking system, etc.
o database: databases are intended to record the definitive
state of some related sets of data. They have been shared
ever since multi-user computers appeared. They include
techniques to manage concurrent updates.
o newsgroup content: The Netnews system makes read-only
copies of the recently posted news items available to clients
throughout the Internet. A copy of newsgroup content is
maintained at each Netnews server that is an approximate
replica of those at other servers. Each server makes its data
available to multiple clients.
o video/audio stream: Servers can store entire videos on disk
and deliver them at playback speed to multiple clients
simultaneously.
o exclusive lock: a system-level object provided by a lock
server, enabling several clients to coordinate their use of a
resource (such as printer that does not include a queuing
scheme).
11
Cloud computing is defined in terms of supporting Internet-based
services (Whether application, storage, or other computing-based
services), where everything is a service. and dispensing with local data
storage or application software. is completely consistent with client-
server computing and indeed client-server concepts support the
implementation of cloud computing. highlights one of the key elements
of cloud computing in moving to a world where you can dispense with
local services. This level of ambition may or may not be there in client-
server computing. As a final comment, cloud computing promotes a view
of computing as a utility and this is linked to often novel business models
whereby services can be rented rather than being owned, leading to a
more flexible and elastic approach to service provision and acquisition.
This is a key distinction in cloud computing and represents the key
novelty in cloud computing.
To summarize, cloud computing is partially a technical innovation in
terms of the level of ambition, but largely a business innovation in terms
of viewing computing services as a utility.
▪ Eavesdropping
o obtaining private or secret information.
▪ Masquerading
o assuming the identity of another user/principal.
▪ Message tampering
o altering the content of messages in transit
▪ man in the middle attack (tampers with the secure
channel mechanism).
▪ Replaying
o storing secure messages and sending them later.
▪ Denial of service
o flooding a channel or other resource, denying access to
others.
13
Reliability is the measure of how long a machine performs its intended
function, whereas availability is the measure of the percentage of time a
machine is operable. For example, a machine may be available 90% of
the time, but reliable only 75% of the time from a performance
standpoint.
14
Q8: Concurrency transparency enables several processes to operate
concurrently using shared resources without interference between them.
( True )
Q9: The failure model attempts to give a precise specification of the
faults that can be exhibited by processes and communication channels.
( True )
Q10: Designing system through layers and services is the best way to
break up the complexity of systems. ( True )
Q11: Protocols are agreement between two communicating parties on
how the communication is to proceed. ( True )
15