You are on page 1of 48

Assignment 1

Dr. Eng. Samy Elmokadem


Q1: Define distributed systems and what are the
Advantages and Disadvantages of Distributed Systems?
Advantages:
o Improved performance

o Shareability
o Expandability
o Local autonomy
Disadvantages:
o Network reliance
o Complexities
o Security
o Multiple point of failure

Q2: What is the role of middleware in a distributed system?


To enhance the distribution transparency that is missing in network
operating systems. In other words, middleware aims at improving the
single-system view that a distributed system should have.

Q3: Explain what is meant by (distribution) transparency


and give examples of different types of transparency.
Distribution transparency is the phenomenon by which distribution
aspects in a system are hidden from users and applications. Examples
include access transparency, location transparency, migration
transparency, relocation transparency, replication transparency,
concurrency transparency, failure transparency, and persistence
transparency.

Q4: Why is it sometimes so hard to hide the occurrence


and recovery from failures in a distributed system?

1
It is generally impossible to detect whether a server is down, or that it is
simply slow in responding. Consequently, a system may have to report
that a service is not available, although, in fact, the server is just slow.

Q5: Why is it not always a good idea to aim at


implementing the highest degree of transparency
possible?
Aiming at the highest degree of transparency may lead to a
considerable loss of performance that users are not willing to
accept.

Q6: What is an open distributed system and what benefits


does openness provide?
An open distributed system offers services according to clearly defined
rules. An open system is capable of easily interoperating with other open
systems but also allows applications to be easily ported between
different implementations of the same system.

Q7: Describe precisely what is meant by a scalable system.


A system is scalable with respect to either its number of components,
geographical size, or number and size of administrative domains, if it
can grow in one or more of these dimensions without an unacceptable
loss of performance.

Q8: Scalability can be achieved by applying different


techniques. What are these techniques?
Scaling can be achieved through distribution, replication, and caching.

2
Q9: What is the difference between a multiprocessor and a
multicomputer?
In a multiprocessor, the CPUs have access to a shared main memory.
There is no shared memory in multicomputer systems. In a
multicomputer system, the CPUs can communicate only through
message passing.

Q10: What is the difference between a distributed


operating system and a network operating system?
A distributed operating system manages multiprocessors and
homogeneous multicomputer. A network operating system connects
different independent computers that each have their own operating
system so that users can easily use the services available on each
computer.

Q11: Explain the principal operation of a page-based


distributed shared memory system.
Page-based DSM makes use of the virtual memory capabilities of an
operating system. Whenever an application addresses a memory
location that is currently not mapped into the current physical memory, a
page fault occurs, giving the operating system control. The operating
system can then locate the referred page, transfer its content over the
network, and map it too physical memory. At that point, the application
can continue.

Q12: What is the reason for developing distributed shared


memory systems?
Q13: What do you see as the main problem hindering
efficient implementations?
The main reason is that writing parallel and distributed programs based
on message-passing primitives is much harder than being able to use
shared memory for communication. Efficiency of DSM systems is
3
hindered by the fact, no matter what you do, page transfers across the
network need to take place. If pages are shared by different processors,
it is quite easy to get into a state like thrashing in virtual memory
systems. In the end, DSM systems can never be faster than message-
passing solutions and will generally be slower due to the overhead
incurred by keeping track of where pages are.

Q14: Explain what false sharing is in distributed shared


memory systems. What possible solutions do you see?
False sharing happens when data belonging to two different and
independent processes (possibly on different machines) are mapped
onto the same logical page. The effect is that the page is swapped
between the two processes, leading to an implicit and unnecessary
dependency. Solutions include making pages smaller or prohibiting
independent processes to share a page.

Q15: What is a three-tiered client-server architecture?


A three-tiered client-server architecture consists of three logical layers,
where each layer is, in principle, implemented at a separate machine.
The highest layer consists of a client user interface, the middle layer
contains the actual application, and the lowest layer implements the data
that are being used.

Q16: What is the difference between a vertical distribution


and a horizontal distribution?
Vertical distribution refers to the distribution of the different layers in a
multitiered architectures across multiple machines. In principle, each
layer is implemented on a different machine. Horizontal distribution deals
with the distribution of a single layer across multiple machines, such as
distributing a single database.

4
Q17: Give five types of hardware resource and five types of
data or software resource that can usefully be shared. Give
examples of their sharing as it occurs in practice in
distributed systems.
o Hardware:
- CPU: compute server (executes processor-intensive applications
for clients), remote object server (executes methods on behalf of
clients), worm program (shares CPU capacity of desktop machine
with the local user). Most other servers, such as file servers, do
some computation for their clients, hence their CPU is a shared
resource.
- memory: cache server (holds recently accessed web pages in its
RAM, for faster access by other local computers) disk: file server,
virtual disk server (see Chapter 8), video on demand server.
- screen: Network window systems, such as X-11, allow processes
in remote computers to update the content of windows.
- printer: networked printers accept print jobs from many computers.
managing them with a queuing system.
- network capacity: packet transmission enables many simultaneous
communication channels (streams of data) to be transmitted on the
same circuits.
o Data/software:
- web page: web servers enable multiple clients to share read-only
page content (usually stored in a file, but sometimes generated on-
the-fly).
- file: file servers enable multiple clients to share read-write files.
Conflicting updates may result in inconsistent results. Most useful
for files that change infrequently, such as software binaries.
- database: databases are intended to record the definitive state of
some related sets of data. They have been shared ever since
multi-user computers appeared. They include techniques to
manage concurrent updates.
- newsgroup content: The Netnews system makes read-only copies
of the recently posted news items available to clients throughout

5
the Internet. A copy of newsgroup content is maintained at each
Netnews server that is an approximate replica of those at other
servers. Each server makes its data available to multiple clients.
- video/audio stream: Servers can store entire videos on disk and
deliver them at playback speed to multiple clients simultaneously.
- exclusive lock: a system-level object provided by a lock server,
enabling several clients to coordinate their use of a resource (such
as printer that does not include a queuing scheme).

Q18: Compare and contrast cloud computing with more


traditional client-server computing?
Q19: What is novel about cloud computing as a concept?
Cloud computing is defined in terms of supporting Internet-based
services (Whether application, storage, or other computing-based
services), where everything is a service. and dispensing with local data
storage or application software. is completely consistent with client-
server computing and indeed client-server concepts support the
implementation of cloud computing. highlights one of the key elements
of cloud computing in moving to a world where you can dispense with
local services. This level of ambition may or may not be there in client-
server computing. As a final comment, cloud computing promotes a view
of computing as a utility and this is linked to often novel business models
whereby services can be rented rather than being owned, leading to a
more flexible and elastic approach to service provision and acquisition.
This is a key distinction in cloud computing and represents the key
novelty in cloud computing. To summarize, cloud computing is partially a
technical innovation in terms of the level of ambition, but largely a
business innovation in terms of viewing computing services as a utility.

6
Assignment 2

Dr. Eng. Samy Elmokadem


Q1: If a client and a server are placed far apart, we may see
network latency dominating overall performance. How can
we tackle this problem?
It really depends on how the client is organized. It may be possible to
divide the client-side code into smaller parts that can run separately. In
that case, when one part is waiting for the server to respond, we can
schedule another part. Alternatively, we may be able to rearrange the
client so that it can do other work after having sent a request to the
server. This last solution effectively replaces the synchronous client-
server communication with asynchronous one-way communication.

Q2: What is a three-tiered client-server architecture?


A three-tiered client-server architecture consists of three logical layers,
where each layer is, in principle, implemented at a separate machine.
The highest layer consists of a client user interface, the middle layer
contains the actual application, and the lowest layer implements the data
that are being used.

Q3: What is the difference between a vertical distribution


and a horizontal distribution?
Vertical distribution refers to the distribution of the different layers in a
multitiered architectures across multiple machines. In principle, each
layer is implemented on a different machine. Horizontal distribution deals
with the distribution of a single layer across multiple machines, such as
distributing a single database.

Q4: Consider a chain of processes P1, P 2, …, Pn


implementing a multi-tiered client-server architecture.
Process Pi is client of process Pi+1, and Pi will return a
reply to Pi-1 only after receiving a reply from Pi+1. What
are the main problems with this organization, when looking
at the request-reply to performance at process P1?
1
Performance can be expected to be bad for large n. The problem is that
each communication between two successive layers is, in principle,
between two different machines. Consequently, the performance
between P1 and P2 may also be determined by n −2 request-reply to
interactions between the other layers. Another problem is that if one
machine in the chain performs badly or is even temporarily unreachable,
then this will immediately degrade the performance at the highest level

Q5: In a structured overlay network, messages are routed


according to the topology of the overlay. What is an
important disadvantage of this approach?
Logical distance can be long (if two ends of the tree communicate often,
they will always experience a high latency, since they always must
traverse the entire tree to communicate).

Q6: Consider an unstructured overlay network in which


each node randomly chooses c neighbors. If P and Q are
both neighbors of R, what is the probability that they are
also neighbors of each other?
Consider a network of N nodes. If each node chooses c neighbors at
random, then the probability that P will choose Q, or Q chooses P is
roughly 2c / (N -1).

Q7: Not every node in a peer-to-peer network should


become super- peer. What are reasonable requirements
that a super- peer should meet?
In the first place, the node should be highly available, as many other
nodes rely on it. Also, it should have enough capacity to process
requests. Most important perhaps is that fact that it can be trusted to do
its job well.

2
Q8: We gave two examples of using interceptors in
adaptive middleware. What other examples come to mind?
There are several. For example, we could use an interceptor to support
mobility. In that case, a request-level interceptor would first look up the
current location of a referenced object before the call is forwarded.
Likewise, an interceptor can be used to transparently encrypt messages
when security is at stake. Another example is when logging is needed.
Instead of letting this be handled by the application, we could simply
insert a method-specific interceptor that would record specific events
before passing a call to the referenced object. More of such example will
easily come to mind.

Q9: To what extent are interceptors dependent on the


middle ware where they are deployed?
In general, interceptors will be highly middleware dependent. It is easy to
see why: the client stub will most likely be tightly bound to the lower-level
interfaces offered by the middleware, just as message level interceptors
will be highly dependent on the interaction between middleware and the
local operating system. Nevertheless, it is possible to standardize these
interfaces, opening the road to developing portable interceptors, albeit
often for a specific class of middleware. This last approach has been
followed for CORBA.

Q10: Modem cars are stuffed with electronic devices. Give


some examples of feedback control systems in cars.
One obvious one is cruise control. On the one hand this subsystem
measures current speed, and when it changes from the required setting,
the car is slowed down or speeded up. The anti-lock braking systems
(ABS) is another example. By pulsating the brakes of a car, while at the
same time regulating the pressure that each wheel is exerting, it is
possible to continue steering without losing control because the wheels
are blocked. A last example is formed by the closed circuit of sensors
that monitor engine condition. As soon as a dangerous state is reached,
a car may come to an automatic halt to prevent the worst.

3
Q11: Give an example of a self-managing system in which
the analysis component is completely distributed or even
hidden.
We already came across this type of system: in unstructured peer-to-
peer systems where nodes exchange membership information, we saw
how a topology could be generated. The analysis component consists of
dropping certain links that will not help converge to the intended
topology. Similar examples can be found in other such systems as we
referred to as well.

Q12: (Lab assignment) Using existing software, design and


implement a BitTorrent-based system for distributing files
to many clients from a single, powerful server. Matters are
simplified by using a standard Web server that can operate
as tracker.

Once a node has identified where to download a file from, it joins a


swarm of downloaders who in parallel get file chunks from the source,
but also distribute these chunks amongst each other.
To download a file, a user needs to access a global directory, which is
just one of a few well-known Web sites. Such a directory contains
references to what are called .torrent files.

4
A torrent file contains the information that is needed to download a
specific file.
It refers to what is known as a tracker, which is a server that is keeping
an accurate account of active nodes that have (chunks) of the requested
file.
An active node is one that is currently downloading another file.
Obviously, there will be many different trackers, although (there will
generally be only a single tracker per file (or collection of files).
Once the nodes have been identified from where chunks can be
downloaded, the downloading node effectively becomes active. At that
point, it will be forced to help others.

Q13: Provide three specific and contrasting examples of


the increasing levels of heterogeneity experienced in
contemporary distributed systems as defined in Section
2.2.
Heterogeneity exists in many areas of a contemporary distributed
system including in the areas of hardware, operating systems, networks,
and programming languages. We look at the first three as examples:

- In terms of hardware, distributed systems are increasingly


heterogeneous featuring (typically Intel based) PCs, smart phones,
resource-limited sensor nodes, and resource-rich cluster computers
or multicore processors.
- In terms of operating systems, a distributed system may include
computers running Windows, MAC OS, various flavors of Unix, and
more specialist operating systems for smart phones or sensor nodes.
- In terms of networks, the Internet is also increasingly heterogeneous
embracing wireless technologies and ad hoc styles of networking.

Q14: Describe and illustrate the client-server architecture


of one or more major Internet applications (for example,
the Web, email, or Netnews).
Web:

5
Browsers are clients of Domain Name Servers (DNS) and web servers
(HTTP). Some intranets are configured to interpose a Proxy server.
Proxy servers fulfil several purposes – when they are located at the
same site as the client, they reduce network delays and network traffic.
When they are at the same site as the server, they form a security
checkpoint, and they can reduce load on the server. N.B. DNS servers
are also involved in all the application architectures described below, but
they ore omitted from the discussion for clarity.
Email:

Sending messages: User Agent (the user’s mail composing program) is


a client of a local SMTP server and passes each outgoing message to
the SMTP server for delivery. The local SMTP server uses mail routing

6
tables to determine a route for each message and then forwards the
message to the next SMTP server on the chosen route. Each SMTP
server similarly processes and forwards each incoming message unless
the domain name in the message address matches the local domain. In
the latter case, it attempts to deliver the message to local recipient by
storing it in a mailbox file on a local disk or file server. Reading
messages: User Agent (the user’s mail reading program) is either a
client of the local file server or a client of a mail delivery server such as a
POP or IMAP server. In the former case, the User Agent reads
messages directly form the mailbox file in which they were placed during
the message delivery. (Examples of such user agents are the UNIX mail
and pine commands.) In the latter case, the User Agent requests
information about the contents of the user’s mailbox file from a POP or
IMAP server and receives messages Distributed Systems, Edition 5:
Chapter 2 Solutions.fm 3 from those servers for presentation to the user.
POP and IMAP are protocols specifically designed to support mail
access over wide areas and slow network connections, so a user can
continue to access her home mailbox while travelling.
Netnews:

Posting news articles: User Agent (the user’s news composing program)
is a client of a local NNTP server and passes each outgoing article to the

7
NNTP server for delivery. Each article is assigned a unique identifier.
Each NNTP server holds a list of other NNTP servers for which it is a
newsfeed – they are registered to receive articles from it. It periodically
contacts each of the registered servers, delivers any new articles to
them and requests any that they have which it has not (using the
articles’ unique id’s to determine which they are). To ensure delivery of
every article to every Netnews destination, there must be a path of
newsfeed connections from that reaches every NNTP server.
Browsing/reading articles: User Agent (the user’s news reading
program) is a client of a local NNTP server. The User Agent requests
updates for all the newsgroups to which the user subscribes and
presents them to the user.

Q15: A search engine is a web server that responds to


client requests to search in its stored indexes and
(concurrently) runs several web crawlers tasks to build
and update the indexes. What are the requirements for
synchronization between these concurrent activities?
The crawler tasks could build partial indexes to new pages
incrementally, then merge them with the active index (including deleting
invalid references). This merging operation could be done on an off-line
copy. Finally, the environment for processing client requests is changed
to access the new index. The latter might need some concurrency
control, but in principle it is just a change to one reference to the index
which should be atomic.

Q16: The host computers used in peer-to-peer systems are


often simply desktop computers in users’ offices or
homes. What are the implications of this for the availability
and security of any shared data objects that they hold and
to what extent can any weaknesses be overcome using
replication?
Problems:

8
- people often turn their desktop computers off when not using them.
Even if on most of the time, they will be off when user is away for an
extended time, or the computer is being moved.
- The owners of participating computers are unlikely to be known to
other participants, so their trustworthiness is unknown. With current
hardware and operating systems, the owner of a computer has total
control over the data on it and may change it or delete it at will.
- Network connections to the peer computers are exposed to attack
(including denial of service).
The importance of these problems depends on the application. For
the music downloading that was the original driving force for peer-to-
peer it isn’t very important. Users can wait until the relevant host is
running to access a particular piece of music. There is little motivation
for users to tamper with the music. But for more conventional
applications such as file storage availability and integrity are all-
important.

Solutions:
Replication:

- If data replicas are sufficiently widespread and numerous, the


probability that all are unavailable simultaneously can be reduced a
negligible level.
- One method for ensuring the integrity of data objects stored at
multiple hosts (against tampering or accidental error) is to perform an
algorithm to establish a consensus about the value of the data (e.g.,
by exchanging hashes of the object’s value and comparing them).
This is discussed in Chapter 15. But there is a simpler solution for
objects whose value doesn’t change (e.g., media files such as music,
photographs, radio broadcasts or films).
Secure hash identifiers:

- The object’s identifier is derived from its hash code. The identifier is
used to address the object. When the object is received by a client,
the hash code can be checked for correspondence with the identifier.

9
The hash algorithms used must obey the properties required of a
secure hash algorithm.

Q17: List the types of local resource that are vulnerable to


an attack by an untrusted program that is downloaded
from a remote site and run in a local computer.
Objects in the file system e.g., files, directories can be
read/written/created/deleted using the rights of the local user who runs
the program.
Network communication - the program might attempt to create sockets,
connect to them, send messages etc.
Access to printers.
It may also impersonate the user in various ways, for example,
sending/receiving email

Q18: Give examples of applications where the use of


mobile code is beneficial.
Doing computation close to the user, as in Applet’s example Enhancing browser e.g.,
to allow server-initiated communication. Cases where objects are sent to a process
and the code is required to make them usable. (e.g., as in RMI)

Q19: Define the integrity property of reliable


communication and list all the possible threats to integrity
from users and from system components. What measures
can be taken to ensure the integrity property in the face of
each of these sources of threats.
Integrity - the message received is identical to the one
sent, and no messages are delivered twice. threats from
users:

- Injecting spurious messages, replaying old messages, altering


messages during transmission.

Threats from system components:


10
- Messages may get corrupted in route.
- messages may be duplicated by communication protocols that
retransmit messages.
For threats from users, use secure channels. use of authentication
techniques and nonces. For threats from system components.
Checksums to detect corrupted messages - but then we get a validity
problem (dropped message). Duplicated messages can be detected if
sequence numbers are attached to messages.

Q20: Describe possible occurrences of each of the main


types of security threat (threats to processes, threats to
communication channels, denial of service) that might
occur on the Internet.
Threats to processes: without authentication of principals and servers,
many threats exist. An enemy could access other user’s files or
mailboxes or set up ‘spoof’ servers. E.g., a server could be set up to
‘spoof’ a bank’s service and receive details of user’s financial
transactions.
Threats to communication channels: IP spoofing - sending requests to
servers with a false source address, man-in-the-middle attacks.
Denial of service: flooding a publicly available service with irrelevant
messages.

11
Assignment 3

Dr. Eng. Samy Elmokadem


Q1: A client sends a 200 byte request message to a
service, which produces a response containing 5000 bytes.
Estimate the total time to complete the request in each of
the following cases, with the performance assumptions
listed below:
➢ Using connectionless (datagram) communication (for
example, UDP);
➢ Using connection-oriented communication (for example,
TCP);
➢ The server process is in the same machine as the client.

Latency per packet (local or remote, incurred on both send


and receive):5 milliseconds
Connection setup time (TCP only):5 milliseconds
Data transfer rate:10 megabits per second
MTU:1000 bytes
Server request processing time:2 milliseconds
Assume that the network is lightly loaded.
The send and receive latencies include (operating system) software
overheads as well as network delays.
If the former dominates, then the estimates are as below. If network
overheads dominate, then the times may be reduced because the
multiple response packets can be transmitted and received right after
each other.
➢ UDP: 5 + 2000/10000 + 2 + 5(5 + 10000/10000) = 37.2
milliseconds
➢ TCP: 5 + 5 + 2000/10000 + 2 + 5(5 + 10000/10000) = 42.2
milliseconds
➢ same machine: the messages can be sent by a single in memory
copy; estimate Interprocess data transfer rate at 40
megabits/second. Latency/message ~5 milliseconds. Time for

1
server call: 5 + 2000/40000 + 5 + 50000/40000 = 11.3
milliseconds.

Q2: The Internet is far too large for any router to hold
routing information for all destinations. How does the
Internet routing scheme deal with this issue?
If a router does not find the network id portion of a destination address in
its routing table, it dispatches the packet to a default address an
adjacent gateway or router that is designated as responsible for routing
packets for which there is no routing information available. Each router’s
default address carries such packets towards a router than has more
complete routing information, until one is encountered that has a specific
entry for the relevant network id.

Q3: What is the task of an Ethernet switch? What tables


does it maintain?
An Ethernet switch must maintain routing tables giving the Ethernet
addresses and network id for all hosts on the local network (connected
set of Ethernets accessible from the switch). It does this by ‘learning’ the
host addresses from the source address fields on each network. The
switch receives all the packets transmitted on the Ethernets to which it is
connected. It looks up the destination of each packet in its routing tables.
If the destination is not found, the destination host must be one about
which the switch has not yet learned, and the packet must be forwarded
to all the connected networks to ensure delivery. If the destination
address is on the same Ethernet as the source, the packet is ignored
since it will be delivered directly. In all other cases, the switch transmits
the packet on the destination host’s network, determined from the
routing information.

Q4: Make a table like Figure 3.5 describing the work done
by the software in each protocol layer when Internet
applications and the TCP/IP suite are implemented over an
Ethernet.

2
Q5: Can we be sure that no two computers on the Internet
have the same IP address?
Nothing on the same LAN can have the same IP address as anything
else but something on somebody else’s LAN can have the same
addresses as yours. As your LAN breaks out through a gateway, the IP
address is translated to a public facing one through that. Incoming is
directed to an address on the LAN, outgoing to another public facing IP
address which in turn translates that for its own LAN or whatever behind
it.
There are three important things in an IPV4 config:
Address (device LAN address)
Subnet mask
Default gateway.
The first is the device’s identity on the LAN, the second tells it which
portions of the address define the LAN and which the individual device,
and the third where it’s all going to and coming from. So, any two LANS
can be an exact copy of each other, but the addresses must be
translated for the public gateways into something unique.

Q6: Compare connectionless (UDP) and connection-


oriented (TCP) communication for the implementation of

3
each of the following application-level or presentation-level
protocols:
➢ virtual terminal access (for example, Telnet).
➢ ii) file transfer (for example, FTP).
➢ user location (for example, rwho, finger).
➢ information browsing (for example, HTTP).
➢ remote procedure call
i) The long duration of sessions, the need for reliability and the
unstructured sequences of characters transmitted make connection-
oriented communication most suitable for this application. Performance
is not critical in this application, so the overheads are of little
consequence.
ii) File calls for the transmission of large volumes of data.
Connectionless would be ok if error rates are low and the messages can
be large, but on the Internet, these requirements aren’t met, so TCP is
used.
iii)Connectionless is preferable, since messages are short, and a single
message is sufficient for each transaction.
iv) Either mode could be used. The volume of data transferred on each
transaction can be quite large, so TCP is used in practice.
v) RPC achieves reliability by means of timeouts and re-trys. so
connectionless (UDP) communication is often preferred.

Q7: Explain how it is possible for a sequence of packets


transmitted through a wide area network to arrive at their
destination in an order that differs from that in which they
were sent. Why can’t this happen in a local network?
Packets transmitted through a store-and-forward network travels by a
route that is determined dynamically for each packet. Some routes will
have more hops or slower switches than others. Thus, packets may
overtake each other. Connection-oriented protocols such as TCP
overcome this by adding sequence numbers to the packets and re-
ordering them at the receiving host. It can’t happen in local networks
because the medium provides only a single channel connecting all the

4
hosts on the network. Packets are therefore transmitted and received in
strict sequence. It can’t happen in ATM networks because they are
connection oriented. Transmission is always through virtual channels,
and VCs guarantee to deliver data in the order in which it is transmitted.

Q8: A specific problem that must be solved in remote


terminal access protocols such as Telnet is the need to
transmit exceptional events such as ‘kill signals’ from the
‘terminal’ to the host in advance of previous transmitted
data. Kill signals should reach their destination ahead of
any other ongoing transmissions. Discuss the solution of
this problem with connection-oriented and connectionless
protocols.
The problem is that a kill signal should reach the receiving process
quickly even when there is buffer overflow (e.g., caused by an infinite
loop in the sender) or other exceptional conditions at the receiving host.
With a connection-oriented, reliable protocol such as TCP, all packets
must be received and acknowledged by the sender, in the order in which
they are transmitted. Thus, a kill signal cannot overtake other data
already in the stream. To overcome this, an out-of-band signaling
mechanism must be provided. In TCP this is called the URGENT
mechanism. Packets containing data that is flagged as URGENT bypass
the flow control mechanisms at the receiver and are read immediately.
With connectionless protocols, the process at the sender simply
recognizes the event and sends a message containing a kill signal in the
next outgoing packet. The message must be resent until the receiving
process acknowledges it.

Q9: What are the disadvantages of using network-level


broadcasting to locate resources:
➢ in a single Ethernet?
➢ in an intranet?
➢ To what extent is Ethernet multicast an improvement on
broadcasting?
All broadcast messages in the Ethernet must be handled by the OS, or
by a standard daemon process. The overheads of examining the

5
message, parsing it and deciding whether it need be acted upon are
incurred by every host on the network, whereas only a small number are
likely locations for a given resource. Despite this, note that the Internet
ARP does rely on Ethernet broadcasting. The trick is that it doesn’t do it
very often - just once for each host to locate other hosts on the local net
that it needs to communicate with. ii. Broadcasting is hardly feasible in a
large-scale network such as the Internet. It might just be possible in an
intranet, but ought to be avoided for the reasons given above. Ethernet
multicast addresses are matched in the Ethernet controller. Multicast
messages are passed up to the OS only for addresses that match
multicast groups the local host is subscribing to. If there are several
such, the address can be used to discriminate between several daemon
processes to choose one to handle each message.

Q10: Suggest a scheme that improves on Mobile IP for


providing access to a web server on a mobile device that is
sometimes connected to the Internet by the mobile phone
network and at other times has a wired connection to the
Internet at one of several locations.
The idea is to exploit the cellular phone system to locate the mobile
device and to give the IP address of its current location to the client.

Q11: Show the sequence of changes to the routing tables


in Figure 3.8 that will occur (according to the RIP algorithm
given in Figure 3.9) after the link labelled 3 in Figure 3.7 is
broken.

6
7
Q12: Use the diagram in Figure 3.13 as a basis for an
illustration showing the segmentation and encapsulation
of an HTTP request to a server and the resulting reply.
Assume that the request is a short HTTP message, but the
reply includes at least 2000 bytes of HTML.

Q13: Consider the use of TCP in a Telnet remote terminal


client. How should the keyboard input be buffered at the
client? Investigate Nagle’s and Clark’s algorithms [Nagle
1984, Clark 1982] for flow control and compare them with
the simple algorithm described on page 103 when TCP is
used by:
a) a web server
b) a Telnet application
c) a remote graphical application with continuous mouse
input.
The basic TCP buffering algorithm described on p. 105 is not very
efficient for interactive input. Nagle’s algorithm is designed to address
this. It requires the sending machine to send any bytes found in the
output buffer, then wait for an acknowledgement. Whenever an
acknowledgement is received, any additional characters in the buffer are
sent. The effects of this are:
a) For a web server: the server will normally write a whole page of HTML
into the buffer in a single write. When the write is completed, Nagle’s
algorithm will send the data immediately, whereas the basic algorithm

8
would wait 0.5 seconds. While the Nagle’s algorithm is waiting for an
acknowledgement, the server process can write additional data (e.g.,
image files) into the buffer. They will be sent as soon as the
acknowledgement is received.
b) For a remote shell (Telnet) application: the application will write
individual keystrokes into the buffer (and in the normal case of full
duplex terminal interaction they are echoed by the remote host to the
Telnet client for display). With the basic algorithm, full duplex operation
would result in a delay of 0.5 seconds before any of the characters typed
are displayed on the screen. With Nagle’s algorithm, the first character
typed is sent immediately and the remote host echoes it with an
acknowledgement piggybacked in the same packet. The
acknowledgement triggers the sending of any further characters that
have been typed in the intervening period. So, if the remote host
responds sufficiently rapidly, the display of typed characters appears to
be instantaneous. But note that a badly written remote application that
reads data from the TCP buffer one character at a time can still cause
problems - each read will result in an acknowledgement indicating that
one further character should be sent - resulting in the transmission of an
entire IP frame for each character. Clarke [1982] called this the silly
window syndrome. His solution is to defer the sending of
acknowledgements until there is a substantial amount of free space
available.
c) For a continuous mouse input (e.g., sending mouse positions to an X-
Windows application running on a compute server): this is a difficult form
of input to handle remotely. The problem is that the user should see
smooth feedback of the path traced by the mouse, with minimal lag.
Neither the basic TCP algorithm nor Nagle’s nor Clarke’s algorithm
achieves this very well. A version of the basic algorithm with a short
timeout (0.1 seconds) is the best that can be done, and this is effective
when the network is lightly loaded and has low end-to-end latency -
conditions that can be guaranteed only on local networks with controlled
loads. See Tanenbaum [1996] pp. 534-5 for further discussion of this.

9
Q14: Construct a network diagram like Figure 3.10 for the
local network at your institution or company.

Q15: Describe how you would configure a firewall to


protect the local network at your institution or company.
What incoming and outgoing requests should it intercept?
Filtering
The primary purpose of a firewall is packet filtering. When a computer
sends a request across the Internet, it takes the form of small packets of
data, which travel through the network to their destination. The target
server responds with its own packets of data, which return along the
same route. A firewall monitors every packet that passes through it,
considering its source, destination, and what type of data it contains, and
it compares that information to its internal rule set. If the firewall detects
that the packet is unauthorized, it discards the data. Typically, firewalls
allow traffic from common programs such email or Web browsers, while
discarding most incoming requests. You can also configure a firewall to
disallow access to certain websites or services to prevent employees
from accessing non-work resources while on the clock.
Logging
Another important aspect of a firewall is its ability to log any traffic that
passes through it. By recording the information from packets that pass
through or that it discards, it can provide you with a clear picture of the
kind of traffic your system experiences. This can be valuable in
10
identifying the source of an external attack, but you can also use it to
monitor your employees’ activities online to prevent lost productivity.
Internal Threats
While the primary goal of a firewall is to keep attackers out, it also
serves a valuable purpose by monitoring outgoing connections. Many
types of malwares will send out a signal once they take over a system,
allowing the author to trigger specific actions or even control the
computer remotely. A firewall can alert you when an unknown program
attempts to "phone home," alerting you to a possible malware infection
and allowing you to shut it down before it causes major damage to your
network. Heading off a malware attack before it activates will keep your
employees productive, protect vital company data and save you the cost
of cleaning up the problem with other security software.

Q16: How does a newly installed personal computer


connected to an Ethernet discover the IP addresses of
local servers? How does it translate them to Ethernet
addresses?

The first part of the question is a little misleading. Neither Ethernet nor
the Internet support ‘discovery’ services as such. A newly installed
computer must be configured with the domain names of any servers that
it needs to access. The only exception is the DNS. Services such as
BootP and DHCP enable a newly connected host to acquire its own IP
address and to obtain the IP addresses of one or more local DNS
servers. To obtain the IP addresses of other servers (e.g., SMTP, NFS,
etc.) it must use their domain names. In Unix, the nslookup command
can be used to examine the database of domain names in the local DNS
servers and a user can select appropriate ones for use as servers. The
domain names are translated to IP addresses by a simple DNS request.
The Address Resolution Protocol (ARP) provides the answer to the
second part of the question. This is described on pages 95-6. Each
network type must implement ARP in its own way. The Ethernet and
related networks use the combination of broadcasting and caching of the
results of previous queries described on page 96.

11
Q17: Can firewalls prevent denial of service attacks such
as the one described on page 112? What other methods
are available to deal with such attacks?

Since a firewall is simply another computer system placed in front of


some intranet services that require protection, it is unlikely to be able to
prevent denial of service (DoS) attacks for two reasons:
➢ The attacking traffic is likely to closely resemble real service
requests or responses.
➢ Even if they can be recognized as malicious (and they could be in
the case described on p. 96), a successful attack is likely to
produce malicious messages in such large quantities that the
firewall itself is likely to be overwhelmed and become a bottleneck,
preventing communication with the services that it protects.
Other methods to deal with DoS attacks: no comprehensive defense
has yet been developed. Attacks of the type described on p. 96,
which are dependent on IP spoofing (giving a false ‘senders address’)
can be prevented at their source by checking the senders address on
all outgoing IP packets. This assumes that all Internet sites are
managed in such a manner as to ensure that this check is made - an
unlikely circumstance. It is difficult to see how the targets of such
attacks (which are usually heavily used public services) can defend
themselves with current network protocols and their security
mechanisms. With the advent of quality-of-service mechanisms in
IPv6, the situation should improve. It should be possible for a service
to allocate only a limited amount of its total bandwidth to each range
of IP addresses, and routers throughout the Internet could be setup to
enforce these resource allocations. However, this approach has not
yet been fully worked out.

12
Final Revision
Sheet

Dr. Eng. Samy Elmokadem


Answer the following questions:

Q1: Distributed systems can be organized in many


different ways, distinction between software architecture
and system architecture.
Software architecture: is more concerned about the logical organization of
the software: how do components interact, it what ways can they be
structured, how can they be made independent, and so on.
System architecture: considers where the components that constitute a
distributed system are placed across the various machines. (Physical
Machines)

Q2: Distributed systems are based on familiar and widely


used computer networks, give six examples for distributed
systems?
▪ Internet
▪ Intranets
▪ wireless networks
▪ Mobile and ubiquitous computing
▪ Cloud computing
▪ Teleconferencing

Q3: What does Distributed Computing systems mean?


“Distributed system is one on which I cannot get any work done because
some machine I have never heard of has crashed.”
“A system that consists of a collection of two or more independent
computers which coordinate their processing through the exchange of
synchronous or asynchronous message passing.”

Q4: Explain what is meant by administrative scalability and


why it is a difficult problem to solve?

1
Administrative scalability is the Number of administrative domains.
Challenges of designing scalable distributed systems:
▪ Cost of physical resources
o cost should linearly increase with system size
▪ Performance Loss
o For example, in hierarchically structure data, search
performance loss due to data growth should not be exceed
certain size of data.
▪ Preventing software resources running out:
o Numbers used to represent Internet address (32 bit- >64bit)
▪ Avoiding performance bottlenecks:
o Use decentralized algorithms (centralized DNS to
decentralized).

Q5: Define distribution transparency and list the eight


dimensions of transparency in distributed systems?
An important goal of a distributed system is to hide the fact that its
processes and resources are physically distributed across multiple
computers.
A distributed system that can present itself to users and applications as
if it were only a single computer system is said to be transparent.
To hide from the user and the application programmer of the separation /
distribution of components, so that the system is perceived rather than a
collection of independent components.
ISO Reference Model for Open Distributed Processing (ODP) identifies
the following forms of transparencies:
▪ Access transparency
o Access to local or remote resources is identical
o E.g., Network File System
▪ Location transparency
o Access without knowledge of location
o E.g., separation of domain name from machine address.
▪ Failure transparency

2
o Tasks can be completed despite failures
o E.g., message retransmission, failure of a Web server node
should not bring down the website.
▪ Replication transparency
o Access to replicated resources as if there was just one. And
provide enhanced reliability and performance without
knowledge of the replicas by users or application
programmers.
▪ Migration (mobility / relocation) transparency
o Allow the movement of resources and clients within a system
without affection the operation of users or applications.
o E.g., switching from one name server to another at runtime;
migration of an agent / process form one node to another.
▪ Concurrency transparency
o there are other sharing the same resources
▪ Performance transparency:
o Allows the system to be reconfigured to improve
performance as loads vary.
o E.g., dynamic addition/deletion of components. switching
from linear structures to hierarchical structures when the
number of users increase.
▪ Scaling transparency:
o Allows the system and applications to expand in scale
without change to the system structure or the application
algorithms.
▪ Application-level transparencies:
o Persistence transparency
▪ Masks the deactivation and reactivation of an object
o Transaction transparences
▪ Hides the coordination required to satisfy the
transactional properties of operations.

Q6: Differentiate between location and access


transparency?
▪ Access transparency
o Access to local or remote resources is identical

3
o E.g., Network File System
▪ Location transparency
o Access without knowledge of location
o E.g., separation of domain name from machine address.

Q7: Explain three-tiered client-server architecture (with


drawing)?
A three-tiered client-server architecture consists of three logical layers,
where each layer is, in principle, implemented at a separate machine.
▪ The highest layer consists of a client user interface.
▪ The middle layer contains the actual application.
▪ The lowest layer implements the data that are being used.

Q8: What is meant by resource sharing?


Resource sharing is the main motivating factors for constructing
distributed systems.
One resource is being shared for multiple users Reduced cost, e.g., file
and print server Sharing of data between users, e.g., project files
Resources accessed through a Service Manages a collection of
resources Exports an operation interface to users E.g., file service, print
service.

4
Q9: What is the difference between a distributed operating
system and a network operating system?
A distributed operating system manages multi-processors and
homogeneous multi-computers.
A network operating system connects different independent computers
that each have their own operating system so that users can easily use
the services available on each computer.

Q10: What is the difference between a vertical distribution


and a horizontal distribution?
refers to the distribution of the different layers in a
Vertical Distribution:
multitiered architectures across multiple machines. In principle, each
layer is implemented on a different machine.
Horizontal Distribution: deals
with the distribution of a single layer across
multiple machines, such as distributing a single database.

Q11: Define: Fault detection, Fault masking, Fault


toleration, Fault recovery. By what means “Fault
Tolerance” can be achieved in a distributed system?
Fault tolerance can be achieved by the following mechanisms:
▪ Fault detection
o Checksums, heartbeat.
▪ Fault masking
o Retransmission of corrupt messages, redundancy.
▪ Fault toleration
o Exception handling, timeouts.
▪ Fault recovery
o Rollback mechanisms.

Q12: What is meant by Encryption?

5
Encryption is a fundamental technique: used to implement confidentiality
and integrity. (transform data into unreadable).

Q13: Explain three-tiered client-server architecture (with


drawing)?
Answered in Question 7

Q14: Describe the difference between broadcast, multicast


and unicast.
▪ Broadcast
o message without destination reaching all reachable
neighbors.
▪ Multicast
o message sent to a given set of destinations.
o Example: IP multicast.
▪ Unicast:
o One-to-one Destination – unique receiver host address
▪ Anycast:
o message delivered to one destination out of a set of possible
ones. Example: DNS requests to root servers

Q15: What are the reasons of the use of distributed


systems?
▪ Functional Separation
o Existence of computers with different capability and purpose:
▪ Clients and Servers
▪ Terminal / host
▪ Data collection and data processing
▪ Physical separation
o systems rely on the fact that computers are physically
separated (e.g. to satisfy reliability requirements, disasters)
▪ Inherent distribution
o Information

6
▪ Different information is created and maintained by
different persons (e.g., Web pages)
o People
▪ Computer supported collaborative work (virtual teams,
engineering, virtual surgery)
o Retail store and inventory systems for supermarket chain
(e.g., Coles, Safeway)
▪ Power imbalance and load variation
o Distribute computational load among different computers.
▪ Reliability
o Long term preservation and data backup (replication) at
different location.
▪ Economies
o Sharing a printer by many users and reduce the cost of
ownership.
o Building a supercomputer out of a network of computers.

Q16: Compare between computer networks (LAN, MAN and


WAN) according to scale, speed, cost, delay time and the
switching method.
Criteria LAN MAN WAN

Cost Low High Higher

Network Size Small Large Largest

Speed Fastest Slow Slowest

Transmission Twisted Pair and Fiber-optic, radio


Twisted Pair
Media Type Fiber-Optic cables and satellite

Numbers of
Smallest Large Largest
computer

Q17: Define distributed systems and what are the


Advantages and Disadvantages of Distributed Systems?

7
A system that consists of a collection of two or more independent
computers which coordinate their processing through the exchange of
synchronous or asynchronous message passing.
▪ Advantages
o Shareability
o Expandability
o Local autonomy
o Improved performance
o Improved reliability and availability
o Potential cost reductions
▪ Disadvantages
o Network reliance
o Complexities
o Security
o Multiple point of failure

Q18: What is the role of middleware in a distributed


system?
distributed systems are often organized by means of a layer of software-
that is, logically placed between a higher-level layer consisting of users
and applications, and a layer underneath consisting of operating
systems and basic communication facilities, as shown in Figure
Accordingly, such a distributed system is sometimes called middleware.

8
A distributed system organized as middleware. The middleware layer
extends over multiple machines and offers each application the same
interface.

Q19: Why is it sometimes so hard to hide the occurrence


and recovery from Failures in a distributed system?
It is generally impossible to detect whether a server is down, or that it is
simply slow in responding. Consequently, a system may have to report
that a service is not available, although, in fact, the server is just slow.

Q20: Why is it not always a good idea to aim at


implementing the highest degree of transparency
possible?
Aiming at the highest degree of transparency may lead to a considerable
loss of performance that users are not willing to accept.

Q21: What is an open distributed system and what benefits


does openness provide?
An open distributed system offers services according to clearly defined
rules. An open system is capable of easily interoperating with other open
systems but also allows applications to be easily ported between
different implementations of the same system.

Q22: Scalability can be achieved by applying different


techniques. What are these techniques?
Scaling can be achieved through distribution, replication, and caching.

Q23: What is the difference between a multiprocessor and


a multicomputer?

9
In a multiprocessor, the CPUs have access to a shared main memory.
There is no shared memory in multicomputer systems. In a
multicomputer system, the CPUs can communicate only through
message passing.

Q24: In many layered protocols, each layer has its own


header. Surely it would be more efficient to have a single
header at the front of each message with all the control in
it than all these separate headers. Why is this not done?
Each layer must be independent of the other ones. The data passed
from layer k + 1 down to layer k contains both header and data, but layer
k cannot tell which is which. Having a single big header that all the
layers could read and write would destroy this transparency and make
changes in the protocol of one layer visible to other layers. This is
undesirable.

Q25: Give five types of hardware resource and five types of


data or software resource that can usefully be shared. Give
examples of their sharing as it occurs in practice in
distributed systems.
▪ Hardware:
o CPU: compute server (executes processor-intensive
applications for clients), remote object server (executes
methods on behalf of clients), worm program (shares CPU
capacity of desktop machine with the local user). Most other
servers, such as file servers, do some computation for their
clients, hence their CPU is a shared resource.
o memory: cache server (holds recently accessed web pages
in its RAM, for faster access by other local computers) disk:
file server, virtual disk server (see Chapter 8), video on
demand server (see Chapter 15).
o screen: Network window systems, such as X-11, allow
processes in remote computers to update the content of
windows.

10
o printer: networked printers accept print jobs from many
computers. managing them with a queuing system.
o network capacity: packet transmission enables many
simultaneous communication channels (streams of data) to
be transmitted on the same circuits.
▪ Data/software:
o web page: web servers enable multiple clients to share read-
only page content (usually stored in a file, but sometimes
generated on-the-fly).
o file: file servers enable multiple clients to share read-write
files. Conflicting updates may result in inconsistent results.
Most useful for files that change infrequently, such as
software binaries.
o object: possibilities for software objects are limitless. E.g.,
shared whiteboard, shared diary, room booking system, etc.
o database: databases are intended to record the definitive
state of some related sets of data. They have been shared
ever since multi-user computers appeared. They include
techniques to manage concurrent updates.
o newsgroup content: The Netnews system makes read-only
copies of the recently posted news items available to clients
throughout the Internet. A copy of newsgroup content is
maintained at each Netnews server that is an approximate
replica of those at other servers. Each server makes its data
available to multiple clients.
o video/audio stream: Servers can store entire videos on disk
and deliver them at playback speed to multiple clients
simultaneously.
o exclusive lock: a system-level object provided by a lock
server, enabling several clients to coordinate their use of a
resource (such as printer that does not include a queuing
scheme).

Q26: Compare and contrast cloud computing with more


traditional client-server computing?

11
Cloud computing is defined in terms of supporting Internet-based
services (Whether application, storage, or other computing-based
services), where everything is a service. and dispensing with local data
storage or application software. is completely consistent with client-
server computing and indeed client-server concepts support the
implementation of cloud computing. highlights one of the key elements
of cloud computing in moving to a world where you can dispense with
local services. This level of ambition may or may not be there in client-
server computing. As a final comment, cloud computing promotes a view
of computing as a utility and this is linked to often novel business models
whereby services can be rented rather than being owned, leading to a
more flexible and elastic approach to service provision and acquisition.
This is a key distinction in cloud computing and represents the key
novelty in cloud computing.
To summarize, cloud computing is partially a technical innovation in
terms of the level of ambition, but largely a business innovation in terms
of viewing computing services as a utility.

Q27: Define: Protocols, Routing and Congestion control.


Protocol: a
well-known set of rules and formats to be used for
communication between processes to perform a given task.
Routing:
It is a function that is required in all networks except those LANs,
such as the Ethernet, that provide direct connections between all pairs of
attached hosts.
Congestion control:
it is the technique that modulates traffic entry into a
telecommunications network to avoid congestive collapse resulting from
oversubscription. This is typically accomplished by reducing the rate of
packets. Whereas congestion control prevents senders from
overwhelming the network, flow control prevents the sender from
overwhelming the receiver.

Q28: List the general security requirements? And state the


different methods of attack in DS.
General Security Requirements:

▪ Confidentiality (Privacy, Secrecy)


o Protection from disclosure to unauthorized parties
12
o E.g., overhear talk, illegal data copy (Interception)
▪ Integrity
o Protection from unauthorized change of data / tampering of
services
o Violations be detectable and recoverable.
o E.g., Message relay (Fabrication, Modification)
▪ Availability
o Legitimate users have access anytime
o E.g., Denial of Service Attack (Interrupt)
o One facet of dependable systems, as well.
Methods of Attack:

▪ Eavesdropping
o obtaining private or secret information.
▪ Masquerading
o assuming the identity of another user/principal.
▪ Message tampering
o altering the content of messages in transit
▪ man in the middle attack (tampers with the secure
channel mechanism).
▪ Replaying
o storing secure messages and sending them later.
▪ Denial of service
o flooding a channel or other resource, denying access to
others.

Q29: You are designing a client-server system and must


decide between a thin and a fat client. What criteria do you
use to make your decision?
Thin Client: is
software that is primarily designed to communicate with a
server. Its features are produced by servers such as a cloud platform.
Fat Client: is
software that implements its own features. It may connect to
servers, but it remains mostly functional when disconnected.

Q30: What is the difference between Availability and


Reliability?

13
Reliability is the measure of how long a machine performs its intended
function, whereas availability is the measure of the percentage of time a
machine is operable. For example, a machine may be available 90% of
the time, but reliable only 75% of the time from a performance
standpoint.

Q31: Which is more fault tolerant, UDP or TCP? Why?


TCP Protocol is better in terms of Fault Tolerance, and this is because
TCP is a connection-oriented protocol, whereas UDP is a connectionless
protocol. A key difference between TCP and UDP is speed, as TCP is
comparatively slower than UDP. Overall, UDP is a much faster, simpler,
and efficient protocol, however, retransmission of lost data packets is
only possible with TCP.

State for each of the following statement if (True) or


(False).
Q1: Resource sharing is not the main motivating factor for constructing
distributed systems. ( False )
Q2: An Architectural model of a distributed system is concerned with the
placement of its parts and relationship between them. ( True )

Q3: Distributed systems are suitable for building global e-business. (


True )
Q4: The construction of distributed systems provides no challenges. (
False )
Q5: The interaction model deals with performance and with the difficulty
of setting time limits in a distributed system, for example for message
delivery. ( True )
Q6: In large networks – adaptive routing is used for choosing the best
route for communication between two points in the network is re-
evaluated periodically. ( True )
Q7: An internet provides local users only local services but no internet
services. ( False )

14
Q8: Concurrency transparency enables several processes to operate
concurrently using shared resources without interference between them.
( True )
Q9: The failure model attempts to give a precise specification of the
faults that can be exhibited by processes and communication channels.
( True )
Q10: Designing system through layers and services is the best way to
break up the complexity of systems. ( True )
Q11: Protocols are agreement between two communicating parties on
how the communication is to proceed. ( True )

Q12: The peer-to-peer model is not related to distributed systems. (


False )
Q13: Local area networks are based on packet broadcasting on a
shared medium, mostly Ethernet. ( True )
Q14: Middleware – a layer of software whose purpose is to mask
heterogeneity and to provide a convenient programming model to
application programmers. ( True )
Q15: Peer-to-peer architecture removes the difference between client
and server. ( True )
Q16: Firewalls serve to protect an external network from internal attacks.
( True )
Q17: Replication transparency enables multiple instances of resources
to be used to increase reliability and performance without knowledge of
the replicas by users or application programmers. ( True )
Q18: The security model discusses the possible faults to processes and
communication channels. ( False )
Q19: Encryption is the process of encoding a message in such a way as
to hide its contents. ( True )

Q20: Data transfer rate is measured in packets per second. ( False )

15

You might also like