You are on page 1of 46

UNIT-V-APPLICATION LAYER

1.Explain the various components of an email system and the protocols used.
E-mail is one of the most popular internet services than it was when it was envisaged.
Common Architecture
The following shows components of e-mail system involved in Alice sending a message to Bob

Components
1. User Agent
2. Message
3. Message Transfer Agent
4. Message Access Agent

User Agent
􀂾 A user agent (UA) is software that is either command (eg. pine, elm) or GUI based (eg.
Microsoft Outlook, Netscape). It facilitates:
o Composing messages􀂾UA helps to compose messages by providing a template that comes
with a built-in editor.
o Reading messages􀂾UA checks mail in the incoming box and apart from message provides
information such as sender, size, subject and flag (read, new).
o Replying to messages􀂾UA allows user to reply (send message) back to sender
o Forwarding messages􀂾UA facilitates forwarding message to a third party.
o Handling mailboxes􀂾UA creates two mailboxes for each user, namely inbox (to store
received emails) and outbox (to keep all sent mails).
Message Format
􀂾 RFC822 defines message to have two parts namely header and a body.
􀂾 The message header is a series of <CRLF> terminated lines. Each header line contains an
type and value separated by a colon (:). It is filled by the user/system. Some of them are:
o From􀍸user who sent the message
o To􀍸identifies the message recipient(s).
o Subject􀍸says something about the purpose of the message
o Date􀍸when the message was transmitted
o E-mail address consists of user_name@domain_name where domain_name is
hostname of the mail server.
􀂾 The body of the message contains the actual information
o The header is separated from the message body by a blank line.
􀂾 Initially email system was designed to send messages only in NVT 7-bit ASCII format.
o Languages such as French, German, Chinese, Japanese were not supported.
o Image, audio and video files cannot be sent.

Multipurpose Internet Mail Extensions (MIME)


􀂾 MIME is a supplementary protocol that allows non-ASCII data to be sent through e-mail.
􀂾 MIME transforms non-ASCII data to NVT ASCII and delivers to client MTA. The NVT
ASCII data is converted back to non-ASCII form at the recipient mail server.
􀂾 MIME defines five headers. They are:
o MIME-Version􀂾specifies the current version 1.1
o Content-Type􀂾specifies message type such as text (plain, html), image (jpeg, gif),
audio, video and application (postscript, msword). If more than one type exists, then it is
termed as multipart (mixed).
o Content-Transfer-Encoding􀂾defines how data in the message body is encoded such as
binary, base64, 7-bit, etc.
o Content-Id􀂾unique identifier the whole message in a multiple message type.
o Content-Description􀂾describes type of the message body.
For example, a message containing plain text and an image file looks like:
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="-------417CA6E2DE4ABCAFBC5"
From: Alice Smith <Alice@cisco.com>
To: Bob@cs.Princeton.edu
Subject: promised material
Date: Mon, 07 Sep 1998 19:45:19 -0400
---------417CA6E2DE4ABCAFBC5
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

---------417CA6E2DE4ABCAFBC5
Content-Type: image/jpeg
Content-Transfer-Encoding: base64
Message Transfer Agent (MTA): SMTP
􀂾 Message Transfer Agent (MTA) is a mail daemon (a version of sendmail program) that
helps to transmit/receive message over the network.
􀂾 To send mail a system must have the client MTA, and to receive mail a system must have a
server MTA.
􀂾 Simple Mail Transfer Protocol (SMTP) defines communication between client/server MTA.
􀂾 SMTP defines how commands and responses must be sent back and forth.
􀂾 Some commands sent from client MTA are:

􀂾 SMTP uses TCP connection on port 25 to forward the entire message and store at
intermediate mail servers/mail gateways until it reaches the recipient mail server.
The following example shows commands and responses using SMTP protocol
HELO cs.princeton.edu
250 Hello daemon@mail.cs.princeton.edu [128.12.169.24]
MAIL FROM:<Bob@cs.princeton.edu>
250 OK
RCPT TO:<Alice@cisco.com>
250 OK
RCPT TO:<Tom@cisco.com>
550 No such user here
DATA
354 Start mail input; end with <CRLF>.<CRLF>
Blah blah blah...
...etc. etc. etc.
<CRLF>.<CRLF>
250 OK
QUIT
221 Closing connection

In each exchange, the client posts a command and the server responds with a code. The server
also returns a human-readable explanation for the code. After the commands and responses,
client sends the message which is ended by a period (.) and terminates the connection.

Message Access Agent (MAA)/Mail Reader: POP and IMAP


􀂾 MAA or mail reader allows user to retrieve messages from the mailbox, so that user can
perform actions such as reply, forwarding, etc.
􀂾 The two message access protocols are:
o Post Office Protocol, version 3 (POP3)
o Internet Mail Access Protocol, version 4 (IMAP4)
􀂾 SMTP is a push type protocol whereas POP3 and IMAP4 are pop type protocol.

POP3
􀂾 POP3 is simple and limited in functionality
􀂾 POP3 client is installed on the recipient computer and POP3 server on the mail server.
􀂾 The client opens a connection to the server on TCP port 110.
􀂾 The client sends username and password to access the mailbox and retrieve the messages.
􀂾 POP3 works in two modes namely, delete and keep mode.
o In delete mode, mail is deleted from the mailbox after retrieval
o In keep mode, mail after reading is kept in mailbox for later retrieval.
Downloading message using POP3 is shown below:
IMAP4
􀂾 IMAP is a client/server protocol running over TCP. The client issues commands and the
mail server responds.
o The client can issue commands such as LOGIN, AUTHENTICATE, SELECT, EXAMINE, CLOSE,
LOGOUT, etc.
o Server responses include OK, FETCH, STORE, DELETE, EXPUNGE, NO, BAD, etc.
􀂾 The exchange begins with the client authenticating itself to access the mailbox. This is
represented as a state transition diagram as shown below.
􀂾 When the user asks to FETCH a message, server returns it in MIME format and the mail
reader decodes it.
􀂾 IMAP also defines message attributes such as size and flags such as Seen, Answered,
Deleted and Recent.

2.Why is POP not preferred?


􀂾 It does not allow the user to organize their mail on the server
􀂾 The user cannot have different folders on the server
􀂾 It does not allow the user to partially check the contents of the mail before downloading

3.List the advantages of IMAP over POP


IMAP4 is more powerful and more complex than POP3. The additional features provided are:
􀂾 A user can check the e-mail header prior to downloading.
􀂾 A user can search the contents of the e-mail for a specific string of characters prior to
downloading.
􀂾 A user can partially download e-mail. This is especially useful if bandwidth is limited and
the e-mail contains multimedia with high bandwidth requirements.
􀂾 A user can create, delete, or rename mailboxes on the mail server.
􀂾 A user can create a hierarchy of mailboxes in a folder for e-mail storage.

4.What is Web-based mail?


􀂾 E-mail is such a common application that some websites today provide this service to
anyone who accesses the site such as Hotmail, Yahoo, etc.
􀂾 Mail transfer from Alice's browser to her mail server is done through HTTP
􀂾 The message transfer from sending mail server to receiving mail server is through SMTP
􀂾 Finally, the message from the receiving Web server to Bob's browser is done using HTTP
􀂾 The website sends a form to be filled in by Bob, which includes log-in id and password.
􀂾 If the credentials match, the e-mail is transferred from Web server to Bob's browser in
HTML format.

5..Explain HTTP protocol in detail.


􀂾 WWW is a distributed client/server service, in which a client (web browser) can access a
service through a server, where the service is distributed over many locations called sites.
􀂾 Both the client and server use Hypertext Transfer Protocol (HTTP).
􀂾 Web browsers allow users to access files (repository of information) through uniform
resource locator (URL).
􀂾 When user enters URL in the web browser, the browser forms a request message and sends
to the server.
􀂾 The server retrieves the requested URL and sends it as a response message.
􀂾 The browser displays the response in HTML / appropriate format.
􀂾 HTTP uses one TCP connection on well known port 80 to transfer data between client and
the server.
􀂾 HTTP is a stateless request/response protocol as shown.
􀂾 The general form of message is shown below:
START_LINE <CRLF>
MESSAGE_HEADER <CRLF>
<CRLF>
MESSAGE_BODY <CRLF>

Request Message

Request line
The request line specifies three elements:
􀂾 HTTP version specifies current version of the protocol i.e., 1.1
􀂾 URL specifies path (absolute/relative) along with document name.
􀂾 The Request type specifies methods that operate on the URL are:

For example, the request line to retrieve file index.html on host cs.princeton.edu is GET
http://www.cs.princeton.edu/index.html HTTP/1.1
Request Header
Request Header specifies client's configuration and preferred document format:
Request Header Description

The above example using request header is specified as


GET index.html HTTP/1.1
Host: www.cs.princeton.edu

Response Messages

Status line
􀂾 The status code field consists of three digits (1xx–Informational, 2xx–Success, 3xx–
Redirection, 4xx–Client Error, 5xx–Server Error)
􀂾 The status phrase explains the status code in text form. Some of them are:

For example, the server reports as follows, if the requested file is not found
HTTP/1.1 404 Not Found
Response Header

The response for a moved page is given below.


HTTP/1.1 301 Moved Permanently
Location: http://www.princeton.edu/cs/index.html.

6.What is WWW? Explain the architecture of WWW. Also explain the terms Browser,
Server, URL, Cookies with clear figures.
The World Wide Web (WWW) is a repository of all resources and users on the Internet that are
using the Hypertext Transfer Protocol (HTTP).
Architecture: The WWW today is a distributed client-server service, in which a client using a
browser can access a service using a server. However, the service provided is distributed over
many locations called sites.
Each site holds one or more documents called web pages. Each web page, however, can contain
some links to other web pages in the same or other sites. In other words, a web page can be simple
or composite. A simple web page has no links to other web pages; a composite web page has one
or more links to other web pages. Each web page is a file with a name and address.
Assume we need to retrieve a scientific document that contains one reference to another text file
and one reference to a large image. The main document and the image are stored in two separate
files (file A and file B) in the same site; the referenced text file (file C) is stored in another site.
Since we are dealing with three different files, we need three transactions if we want to see the
whole document. The first transaction (request/response) retrieves a copy of the main document
(file A), which has references (pointers) to the second and third files. When a copy of the main
document is retrieved and browsed, the user can click on the reference to the image to invoke the
second transaction and retrieve a copy of the image (file B). If the user needs to see the contents of
the referenced text file, she can click on its reference (pointer) invoking the third transaction and
retrieving a copy of file C. Note that although files A and B both are stored in site I, they are
independent files with different names and addresses. Two transactions are needed to retrieve
them. A very important point we need to remember is that file A, file B, and file C are independent
web pages, each with independent names and addresses. Although references to file B or C are
included in file A, it does not mean that each of these files cannot be retrieved independently. A
second user can retrieve file B with one transaction. A third user can retrieve file C with one
transaction.
A variety of vendors offer commercial browsers that interpret and display a web page, and all of
them use nearly the same architecture. Each browser usually consists of three parts: a controller,
client protocols, and interpreters.

Web Server: The web page is stored at the server. Each time a request arrives, the corresponding
document is sent to the client. To improve efficiency, servers normally store requested files in a
cache in memory; memory is faster to access than a disk. A server can also become more efficient
through multithreading or multiprocessing. In this case, a server can answer more than one request
at a time. Some popular web servers include Apache and Microsoft Internet Information Server.
Uniform Resource Locator (URL): A web page, as a file, needs to have a unique identifier to
distinguish it from other web pages. To define a web page, we need three identifiers: host, port,
and path. However, before defining the web page, we need to tell the browser what client server
application we want to use, which is called the protocol. This means we need four identifiers to
define the web page. The first is the type of vehicle to be used to fetch the web page; the last three
make up the combination that defines the destination object (web page). To combine these four
pieces together, the uniform resource locator (URL) has been designed; it uses three different
separators between the four pieces as shown below:
Example:
Cookies: The original purpose of the Web, retrieving publicly available documents, exactly fits
this design. Today the Web has other functions that need to remember some information about the
clients. For these purposes, the cookie mechanism was devised. When a client sends a request to a
server, the browser looks in the cookie directory to see if it can find a cookie sent by that server. If
found, the cookie is included in the request. When the server receives the request, it knows that
this is an old client, not a new one. Note that the contents of the cookie are never read by the
browser or disclosed to the user. It is a cookie made by the server and eaten by the server.

7.Categorize the documents used in WWW. Explain static, dynamic and active documents with
details and clear figures.

Three basic types of web documents are:


Static.
A static web document resides in a file that it is associated with a web server. The author of a static
document determines the contents at the time the document is written. Because the contents do not
change, each request for astatic document results in exactly the same response.
Dynamic.
A dynamic web document does not exist in a predefined form. When a request arrives, the webserver
runs an application program that creates the document. The server returns the output of the program
as a response to the browser that requested the document. Because a fresh document is created or
each request,
the contents of a dynamic document can vary from one request to another.
Active
An active web document consists of a computer program that the server sends to the browser and that
the browser must run locally. When it runs, the active document program can interact with the user
change the display continuously.

Advantages and disadvantages of each document type


Static
Advantages: simplicity, reliability and performance. The browser can place a copy in a cache on a
local disk".
Disadvantages: inflexibility, changes are time consuming because they require a human to edit the
file.

Dynamic
Advantages: ability to report current information (current stocks prices, current weather conditions,
current availability of tickets or a concert%. Because, both static and dynamic documents use
HTML, a browser does not know whether the server extracted the page from a disk file or obtained
the page dynamically from a computer program.
Disadvantages: increased cost and, like a static document, a dynamic document does not change
after a browser retrieves a copy.
Thus, information in a dynamic document begins to age as soon as it has been sent to the browser
(stock prices).
Server push. The server runs the programs periodically and sends the new document to the browser

Active
Advantages: ability to update information continuously. for example, only an active document can
change the display quickly enough to show an animated image. More important, an active document
can access sources of information directly and update the display continuously. For example, an
active document that displays stock prices can continue to retrieve stock information and change the
display without requiring any action from the user.
8.Distinguish between persistent and non-persistent connection.
Non-persistent connection
􀂾 A TCP connection is required for each request/response
􀂾 Imposes high overhead on the server because the server needs N buffers for N URL pointers
and TCP overhead for each connection
Persistent connection
􀂾 Client and server can exchange multiple request/response messages over the same TCP
connection
􀂾 Eliminates the connection setup overhead and load on the server
􀂾 TCP’s congestion window mechanism is able to operate more efficiently.
􀂾 The problem is that how long the connection should be kept open.
o The server times out, if there is no request from the client for a specified period

9.Write short note on caching.


􀂾 Caching enables the client to retrieve document faster and reduces load on the server.
􀂾 Caching can be implemented at different places
o For example, the ISP router can cache pages. Further such request coming from its
clients, the ISP responds.
o Proxy server is a host that keeps copies responses to recent requests. The client sends
request to the proxy server. The proxy server either responds to client or forwards the
request to the server.
o The browser also can cache pages.
􀂾 Server assigns expiration date (using Expires header field) to each page, beyond which the
page should not be cached.
􀂾 Therefore prior to caching a page, its expiration date is checked. If a cached page reaches its
expiration, then the page is deleted.
􀂾 The proxy node also can verify whether it has the latest document by using If-Modified-
Since header.
􀂾 A page can have cache directives that must be adhered by all caching nodes. (for example, a
no-cache page).

10.Explain the role of DNS on a computer network.


􀂾 We remember domain-names rather than IP address of a host, since it is user-friendly.
􀂾 Thus, need for a system to map domain name to an IP address that includes:
o A namespace to define domain names without conflict.
o Binding of domain names to IP address
o A name server that returns IP address for a given name
􀂾 During early days of internet, there were only few hundred hosts
o A central authority called the Network Information Center (NIC) maintained name-to-
address bindings in a flat-file called hosts.txt
o A new host that joins the internet would mail its name and IP address to NIC.
o NIC updates hosts.txt and mails to all hosts.
o Name server resolved domain names using a simple lookup hosts.text
􀂾 As hosts grew to thousands and millions, the flat file approach failed, leading to evolution
of DNS in mid 1980s.
Name Hierarchy
􀂾 DNS was originally funded by ARPA
􀂾 DNS uses hierarchical name space for domains in the Internet.
􀂾 Hierarchical naming permits use of same sub-domain name in different domains.
􀂾 Domain names are case insensitive and can be up to 63 characters
􀂾 DNS names are processed from right to left and use periods as the separator.
􀂾 DNS can be used to map names to values, not necessarily from domain names to IP address.
􀂾 DNS hierarchy can be visualized as a tree, where each node in the tree corresponds to a
domain and the leaves relate to hosts.
􀂾 Six big domains are .edu (education) .com (commercial) .gov (US government) .mil (US
military) .org (non-profitable organization) and .net (network providers).
􀂾 Top level domain exist one for each country .uk (united kingdom) .fr (france) .in (india), etc.

Name Servers
􀂾 The domain hierarchy is partitioned into zones.
􀂾 Each zone acts as central authority for that part of the sub-tree.
􀂾 The topmost domains are managed by NIC.
􀂾 In the .edu hierarchy, princeton is a zone.
􀂾 Each zone can be further sub-divided that manage using their own name servers such as CS
department under princeton university. The hierarchy of name servers is shown below.
􀂾 Each zone information is implemented on at least two name servers.
􀂾 Clients send queries to name servers, and name servers respond to it.
􀂾 The response contains either the host IP address or address of another name server
􀂾 Each name server contains a collection of resource records.
􀂾 A resource record is a name-to-value binding and is a 5-tuple with the following fields

􀂾 Name tells the domain to which this record applies. It is the primary search key, used to
satisfy queries
􀂾 The Type field tells what kind of record it is. Some commonly used types are:
o NS􀂾Value field contains a name server
o CNAME􀂾Value field contains canonical name for the host. Used to define aliases.
o MX􀂾Value field contains a mail server that accepts messages for the domain.
o A􀂾Value field contains an IP address
􀂾 The Value field can be a number, a domain name, or an ASCII string. The semantics depend
on the record type
􀂾 For internet information, the Class field is always IN.
􀂾 The TTL field gives an indication of how long the resource record is valid.

Root name server


􀂾 The root name server contains an NS record for each second-level server.
􀂾 It also has an A record that translates this name into corresponding IP address.
The following shows part of .edu root name server
(princeton.edu, cit.princeton.edu, NS, IN)
(cit.princeton.edu, 128.196.128.233, A, IN)

Zone name server
􀂾 The zone name server princeton.edu has a name server available on host cit.princeton.edu
that contains the following records.
􀂾 Some records contain A records, whereas others point to next level name servers.
(cs.princeton.edu, gnat.cs.princeton.edu, NS, IN)
(gnat.cs.princeton.edu, 192.12.69.5, A, IN)

Eventually, third-level name server, such as the domain cs.princeton.edu, contains A records for
all of its hosts.
(cs.princeton.edu, gnat.cs.princeton.edu, MX, IN)
(cicada.cs.princeton.edu, 192.12.69.60, A, IN)
(cic.cs.princeton.edu, cicada.cs.princeton.edu, CNAME, IN)
(gnat.cs.princeton.edu, 192.12.69.5, A, IN)

Name Resolution
For example, the step involved in the lookup for name cicada.cs.princeton.edu is as follows:
􀂾 The client first sends a query containing cicada.cs.princeton.edu to the root server.
􀂾 The root server, does not finds an exact match, but locates the NS record for princeton.edu
􀂾 The root returns the A record for princeton.edu back to the client.
􀂾 The client sends the same query to 128.196.128.233 and receives the A record for
cs.princeton.edu
􀂾 Finally the client sends the query to 192.12.69.5 and gets the A record for
cicada.cs.princeton.edu
The drawback with this lookup is:
􀂾 All hosts should know the root name server, which is not feasible.
􀂾 Instead, the client can send query to the local name server that it knows
􀂾 The local name server can query the root name server on behalf of the client.
􀂾 Once the local NS gets the required response, it caches the A record based on TTL and sends
the record to the client.

11.What is DNS Protocol? How does it work for Internet? Explain Generic, country and
inverse domains with examples.

A Domain Name System is a hierarchical decentralized naming system for computers and other
resources connected to the internet or private networks. In use since 1985, it associates information
with domain names assigned to participating entities and translates domain names to their
numerical IP addresses. It's considered an essential component of the functionality of the internet.
Working of DNS for the Internet
A frequently used analogy is that DNS functions as the phonebook for the internet; it stores the
long numerical IP addresses by an easier to remember website address. The hierarchy of domain is
read from right to left; a domain name is divided into separate parts, or labels, separated by dots
(ex: Google.com) with the farthest right demarking the dominant domain. (In this case, .com)
When a user types a domain name into a URL or web address, the computer uses a DNS server
to look up the domain name and redirect the page to the correct IP address.

Types of domains:

DNS is a protocol that can be used in different platforms. In the Internet, the domain name space
(tree) was originally divided into three different sections: generic domains, country domains, and
the inverse domains. However, due to the rapid growth of the Internet, it became extremely
difficult to keep track of the inverse domains, which could be used to find the name of a host when
given the IP address. The inverse domains are now deprecated (see RFC 3425).

Generic Domains

The generic domains define registered hosts according to their generic behaviour. Each node in the
tree defines a domain, which is an index to the domain name space database.

Looking at the tree, we see that the first level in the generic domains section allows 14 possible
labels. These labels describe the organization types as listed in Table.
Country Domains

The country domains section uses two-character country abbreviations (e.g., us for United States).
Second labels can be organizational, or they can be more specific national designations. The
United States, for example, uses state abbreviations as a sub-division of us (e.g., ca.us.). Figure
shows the country domains section. The address uci.ca.us. can be translated to University of
California, Irvine, in the state of California in the United States.

Inverse Domains

• Inverse domain is used to map an address to a name.


• For example, a client send a request to the server for performing a particular task, server finds
a list of authorized client. The list contains only IP addresses of the client.
• The server sends a query to the DNS server to map an address to a name to determine if the
client is on the authorized list.
• This query is called an inverse query.
• This query is handled by first level node called arpa.
12.Explain the terms: name space, domain name space, domains and zones.

Name space

A Name space is a context within which the names of all objects must be unambiguously
resolvable. Name spaces can be flat or hierarchical.

1. Flat Name Spaces


Flat name spaces do not scale well because they can grow only so large before all available names
are used up. Once a name is used more than once in a name space, the name space violates the
unambiguously resolvable requirement.

2. Hierarchical Name Space


A hierarchical name space is divided into different areas, which can be thought of as sub name
spaces. Each area is its own sub name space within the overall name space. Therefore, each object
must have a unique name only within its sub name space in order to have an unambiguously
resolvable name within the name space hierarchy. Hierarchical name spaces, then, can scale to
extremely large networks — as you add more objects to the overall name space, you have to find
unique names for them within only the sub name space to which they belong.

Domain Name Space

To have a hierarchical name space, a domain name space was designed. In this design the names
are defined in an inverted-tree structure with the root at the top.

Domain

A domain is a subtree of the domain name space. The name of the domain is the name of the node
at the top of the subtree.
Zone

Since the complete domain name hierarchy cannot be stored on a single server, it is divided among
many servers. What a server is responsible for or has authority over is called a zone. We can define
a zone as a contiguous part of the entire tree.

13.Describe asymmetric key cryptography.

Asymmetric cryptography, also known as public key cryptography, uses public and private keys to
encrypt and decrypt data. The keys are simply large numbers that have been paired together but are
not identical (asymmetric). One key in the pair can be shared with everyone; it is called the public
key. The other key in the pair is kept secret; it is called the private key. Either of the keys can be used
to encrypt a message; the opposite key from the one used to encrypt the message is used for
decryption.

Many protocols like SSH, Open PGP, S/MIME, and SSL/TLS rely on asymmetric cryptography for
encryption and digital signature functions. It is also used in software programs, such as browsers,
which need to establish a secure connection over an insecure network like the internet or need to
validate a digital signature. Encryption strength is directly tied to key size and doubling key length
delivers an exponential increase in strength, although it does impair performance. As computing
power increases and more efficient factoring algorithms are discovered, the ability to factor larger
and larger numbers also increases.
For asymmetric encryption to deliver confidentiality, integrity, authenticity and non-repudiation,
users and systems need to be certain that a public key is authentic, that it belongs to the person or
entity claimed and that it has not been tampered with or replaced by a malicious third party. There is
no perfect solution to this public key authentication problem. A public key infrastructure (PKI),
where trusted certificate authorities certify ownership of key pairs and certificates, is the most
common approach, but encryption products based on the Pretty Good Privacy (PGP) model
(including Open PGP), rely on a decentralized authentication model called a web of trust, which
relies on individual endorsements of the link between user and public key.

14.Classify modes of operation for block ciphers. Explain each with valid figures and
examples.
Encryption algorithms are divided into two categories based on input type, as block cipher and
stream cipher. Block cipher is an encryption algorithm which takes fixed size of input say b bits
and produces a ciphertext of b bits again. If input is larger than b bits it can be divided further. For
different applications and uses, there are several modes of operations for a block cipher.

Electronic Code Book (ECB) –


Electronic code book is the easiest block cipher mode of functioning. It is easier because of direct
encryption of each block of input plaintext and output is in form of blocks of encrypted ciphertext.
Generally, if a message is larger than b bits in size, it can be broken down into bunch of blocks and
the procedure is repeated.Procedure of ECB is illustrated below:

Advantages of using ECB –


• Parallel encryption of blocks of bits is possible, thus it is a faster way of encryption.
• Simple way of block cipher.
Disadvantages of using ECB –
• Prone to cryptanalysis since there is a direct relationship between plaintext and ciphertext.

Cipher Block Chaining –


Cipher block chaining or CBC is an advancement made on ECB since ECB compromises some
security requirements. In CBC, previous cipher block is given as input to next encryption
algorithm after XOR with original plaintext block. In a nutshell here, a cipher block is produced by
encrypting a XOR output of previous cipher block and present plaintext block.
The process is illustrated here:

Advantages of CBC –
• CBC works well for input greater than b bits.
• CBC is a good authentication mechanism.
• Better resistive nature towards cryptanalysis than ECB.
Disadvantages of CBC –
• Parallel encryption is not possible since every encryption requires previous cipher.

Cipher Feedback Mode (CFB) –


In this mode the cipher is given as feedback to the next block of encryption with some new
specifications: first an initial vector IV is used for first encryption and output bits are divided as set
of s and b-s bits the left-hand side s bits are selected and are applied an XOR operation with
plaintext bits. The result given as input to a shift register and the process continues. The encryption
and decryption process for the same is shown below, both of them use encryption algorithm.

Advantages of CFB –
• Since, there is some data loss due to use of shift register, thus it is difficult for applying
cryptanalysis.

Output Feedback Mode –


The output feedback mode follows nearly same process as the Cipher Feedback mode except that
it sends the encrypted output as feedback instead of the actual cipher which is XOR output. In this
output feedback mode, all bits of the block are sent instead of sending selected s bits. The Output
Feedback mode of block cipher holds great resistance towards bit transmission errors. It also
decreases dependency or relationship of cipher on plaintext.
Counter Mode –

The Counter Mode or CTR is a simple counter-based block cipher implementation. Every time a
counter initiated value is encrypted and given as input to XOR with plaintext which results in
ciphertext block. The CTR mode is independent of feedback use and thus can be implemented in
parallel.Its simple implementation is shown below:
15. Explain the internal and external network security issues you can visualize in a
network.
Few internal network security issues are:
a) Malicious cyber-attacks:
The most likely perpetrators of cyber-attacks are system administrators or the other IT staff with
privileged system access. Technically proficient employees can use their system access to open
back doors into computer systems, or leave programs on the network to steal information or wreak
havoc.
The best protection against this sort of attack is to monitor employees closely and be alert for
disgruntled employees who might abuse their positions. In addition, experts advise immediately
cancelling network access and passwords when employees leave the company, to avoid those
using passwords to remotely access the network in future.

b) Social engineering:
Perhaps one of the most common ways for attackers to gain access to a network is by exploiting
the trusting nature of your employees.

c) Downloading malicious internet content:


Some reports suggest the average employee in a small business spends up to an hour a day surfing
the web for personal use — perhaps looking at video or file-sharing websites, playing games or
using social media websites such as Facebook.
It's not just time that this activity could cost you. Analyst reports show that the number of malware
and virus threats is increasing by more than 50 percent each year, and many of these destructive
payloads can be inadvertently introduced to the network by employees.
The best advice is to constantly update and patch your IT systems to ensure you are protected.

d) Information leakage:
There are now a staggering number of ways that information can be taken from your computer
networks and released outside the organisation. Whether it's an MP3 player, a CD-ROM, a digital
camera or USB data stick, today's employees could easily take a significant chunk of your
customer database out of the door in their back pocket.

e) Illegal activities:
It's important to remember that, as an employer, you are responsible for pretty much anything your
employees do using your computer network — unless you can show you have taken reasonable
steps to prevent this.
To protect yourself, experts advise a two-pronged approach. First, use monitoring software to
check email and internet traffic for certain keywords or file types. You might also choose to block
certain websites and applications completely.
Few internal network security issues are:
a) Economic threats:
The economy can be considered an external threat to businesses because, no matter how hard a
company works or how good its products are, economic conditions dictate a business's profit and
success. Economic downturns can decrease the demand for goods or services on the consumer
market. On the other hand, a robust economy will inspire more consumer spending and business
growth. According to the Economic Development Research and Training Centre, studying
economic trends, such as household spending or consumer demand reports, can help companies
track economic patterns in their external environments.

b) Competitors:
Competition is a significant external threat to businesses and is a product of the marketplace. A
competitive market requires knowing who your competitors are. Competition serves as an external
threat because businesses compete with other organizations for the same customers. In turn, this
challenge can cause one company to flourish and the other to flop.

c) Global Environment:
The global environment can be risky for companies that rely on horticulture, agriculture or other
types of natural resources. Weather patterns are examples of global environmental threats that can
impact a company’s resources, projects and profitability. Businesses track and trend weather
patterns and global changes to monitor what types of environmental risks are out there.

d) Political factors:
Political decisions or changes can threaten businesses. Foreign investments, for instance, can be
threatened by political decisions to go to war with other countries. Or government-funded agencies
can have their businesses impacted by budget cuts or budget deficits.

e) New technology:
The technological field, with all of its advancements, can serve as a potential external threat to
businesses. Technological changes can give companies a competitive advantage, leaving others
behind. For instance, travel agencies were exposed to a technological threat when the Internet gave
customers the ability to do their own research and make their own travel plans from their
computers, thereby eliminating the need for travel agencies. Technological changes should be
monitored to determine if there are any direct threats to a business.
16.Explain RSA Public key algorithm with suitable example.

There are several asymmetric-key cryptosystems, one of the common public key algorithms is the
RSA cryptosystem, named for its inventors (Rivest, Shamir, and Adleman). RSA uses two
exponents, e and d, where e is public and d is private. Suppose P is the plaintext and C is the
ciphertext. Alice uses C = Pe mod n to create ciphertext C from plaintext P; Bob uses P = Cd mod
n to retrieve the plaintext sent by Alice. The modulus n, a very large number, is created during the
key generation process.

Procedure

Bob chooses two large numbers, p and q, and calculates n = p × q and φ = (p − 1) × (q − 1). Bob
then selects e and d such that (e × d) mod φ = 1. Bob advertises e and n to the community as the
public key; Bob keeps d as the private key. Anyone, including Alice, can encrypt a message and
send the ciphertext to Bob, using C = (Pe) mod n; only Bob can decrypt the message, using P =
(Cd) mod n. An intruder such as Eve cannot decrypt the message if p and q are very large numbers
(she does not know d).
Applications

Although RSA can be used to encrypt and decrypt actual messages, it is very slow if the message
is long. RSA, therefore, is useful for short messages. In particular, we will see that RSA is used in
digital signatures and other cryptosystems that often need to encrypt a small message without
having access to a symmetric key. RSA is also used for authentication.

17. Write a short note on DES with clear figures. Also list its limitations.

The Data Encryption Standard (DES) is a symmetric-key block cipher published by the National
Institute of Standards and Technology (NIST).
DES is an implementation of a Feistel Cipher. It uses 16 round Feistel structure. The block size is
64- bit. Though, key length is 64-bit, DES has an effective key length of 56 bits, since 8 of the 64
bits of the key are not used by the encryption algorithm (function as check bits only). General
Structure of DES is depicted in the following illustration.
Since DES is based on the Feistel Cipher, all that is required to specify DES is: -

1. Round function
2. Key schedule
3. Any additional processing − Initial and final permutation
Initial and Final Permutation
The initial and final permutations are straight Permutation boxes (P-boxes) that are inverses of
each other. They have no cryptography significance in DES. The initial and final permutations are
shown as follows: -

Round Function
The heart of this cipher is the DES function, f. The DES function applies a 48-bit key to the
rightmost 32 bits to produce a 32-bit output.

Expansion Permutation Box − Since right input is 32-bit and round key is a 48-bit, we first need
to expand right input to 48 bits. Permutation logic is graphically depicted in the following
illustration: - The graphically depicted permutation logic is generally described as table in DES
specification illustrated as shown: -
XOR (Whitener). − After the expansion permutation, DES does XOR operation on the expanded
right section and the round key. The round key is used only in this operation. Substitution Boxes.
− The S-boxes carry out the real mixing (confusion). DES uses 8 S-boxes, each with a 6-bit input
and a 4-bit output. Refer the following illustration: -

The S-box rule is illustrated below: -

There
are a total of eight S-box tables. The output of all eight s-boxes is then combined in to 32 bit
section.

Straight Permutation − The 32 bit output of S-boxes is then subjected to the straight permutation
with rule shown in the following illustration:
Key Generation The round-key generator creates sixteen 48-bit keys out of a 56-bit cipher key.
The process of key generation is depicted in the following illustration: -

The logic for Parity drop, shifting, and Compression P-box is given in the DES description.
DES Analysis
The DES satisfies both the desired properties of block cipher. These two properties make cipher
very strong.
• Avalanche effect − A small change in plaintext results in the very great change in the
ciphertext.
• Completeness − Each bit of ciphertext depends on many bits of plaintext.
During the last few years, cryptanalysis have found some weaknesses in DES when key selected
are weak keys. These keys shall be avoided.
DES has proved to be a very well-designed block cipher. There have been no significant
cryptanalytic attacks on DES other than exhaustive key search.
Disadvantages:
1. Experts have found a weakness in the design of the cipher.
2. S box creates same output with two chosen input.
3. The initial and final permutation is not exactly clear and seems confusing.

18.Classify traditional cipher. Explain Transpositional Cipher with an example.

The two types of traditional symmetric ciphers are Substitution Cipher and Transposition
Cipher. The following flowchart categories the traditional ciphers:

1.SubstitutionCipher:
Substitution Ciphers are further divided into Mono-alphabetic Cipher and Poly-alphabetic
Cipher.
First, let’s study about mono-alphabetic cipher.
1. Mono-alphabetic Cipher–
In mono-alphabetic ciphers, each symbol in plain-text (eg; ‘o’ in ‘follow’) is mapped to one
cipher-text symbol. No matter how many times a symbol occurs in the plain-text, it will
correspond to the same cipher-text symbol. For example, if the plain-text is ‘follow’ and the
mapping is :
o f -> g o -> p l -> m w -> x
The cipher-text is ‘gpmmpx’. Types of mono-alphabetic ciphers are:
(a) Additive Cipher (Shift Cipher / Caesar Cipher) –
The simplest mono-alphabetic cipher is additive cipher. It is also referred to as ‘Shift
Cipher’ or ‘Caesar Cipher’. As the name suggests, ‘addition modulus 2’ operation is
performed on the plain-text to obtain a cipher-text.
C = (M + k) mod n
M = (C – k) mod n
where,
C -> cipher-text
M -> message/plain-text
k -> key
The key space is 26. Thus, it is not very secure. It can be broken by brute-force attack.
For more information and implementation see Caesar Cipher
(b) Multiplicative Cipher –
The multiplicative cipher is similar to additive cipher except the fact that the key bit is
multiplied to the plain-text symbol during encryption. Likewise, the cipher-text is
multiplied by the multiplicative inverse of key for decryption to obtain back the plain-text.
C=(M*k)mod n
M = (C * k-1) mod n
where,
k-1 -> multiplicative inverse of k (key)
The key space of multiplicative cipher is 12. Thus, it is also not very secure.
(c) Affine Cipher –
The affine cipher is a combination of additive cipher and multiplicative cipher. The key
space is 26 * 12 (key space of additive * key space of multiplicative) i.e. 312. It is
relatively secure than the above two as the key space is larger.
Here two keys k1 and k2 are used.
C=[(M*k1)+k2]modn
M = [(C – k2) * k1-1 ] mod n
For more information and implementation, see Affine Cipher
Now, let’s study about poly-alphabetic cipher.
2. Poly-alphabetic Cipher –
In poly-alphabetic ciphers, every symbol in plain-text is mapped to a different cipher-text
symbol regardless of its occurrence. Every different occurrence of a symbol has different
mapping to a cipher-text. For example, in the plain-text ‘follow’, the mapping is :
f -> q o -> w l -> e l -> r o -> t w -> y
Thus, the cipher text is ‘qwerty’.
Types of poly-alphabetic ciphers are:

2.TranspositionCipher:
The transposition cipher does not deal with substitution of one symbol with another. It
focuses on changing the position of the symbol in the plain-text. A symbol in the first
position in plain-text may occur in fifth position in cipher-text.
Two of the transposition ciphers are:
Transposition Cipher with example:
• It is another type of cipher where the order of the alphabets in the plaintext is rearranged to
create the ciphertext. The actual plaintext alphabets are not replaced.
• An example is a ‘simple columnar transposition’ cipher where the plaintext is written
horizontally with a certain alphabet width. Then the ciphertext is read vertically as shown.
• For example, the plaintext is “golden statue is in eleventh cave” and the secret random key
chosen is “five”. We arrange this text horizontally in table with number of column equal to key
value. The resulting text is shown below.

• The ciphertext is obtained by reading column vertically downward from first to last
column. The ciphertext is ‘gnuneaoseenvltiltedasehetivc’.
• To decrypt, the receiver prepares similar table. The number of columns is equal to key
number. The number of rows is obtained by dividing number of total ciphertext
alphabets by key value and rounding of the quotient to next integer value.
• The receiver then writes the received ciphertext vertically down and from left to right
column. To obtain the text, he reads horizontally left to right and from top to bottom
row.

19.Explain AES architecture with clear figures.

The more popular and widely adopted symmetric encryption algorithm likely to be encountered
nowadays is the Advanced Encryption Standard (AES). It is found at least six time faster than triple
DES.
A replacement for DES was needed as its key size was too small. With increasing computing
power, it was considered vulnerable against exhaustive key search attack. Triple DES was
designed to overcome this drawback but it was found slow.
The features of AES are as follows −

• Symmetric key symmetric block cipher


• 128-bit data, 128/192/256-bit keys
• Stronger and faster than Triple-DES
• Provide full specification and design details
• Software implementable in C and Java

Operation of AES
AES is an iterative rather than Feistel cipher. It is based on ‘substitution–permutation network’. It
comprises of a series of linked operations, some of which involve replacing inputs by specific
outputs (substitutions) and others involve shuffling bits around (permutations).
Interestingly, AES performs all its computations on bytes rather than bits. Hence, AES treats the
128 bits of a plaintext block as 16 bytes. These 16 bytes are arranged in four columns and four
rows for processing as a matrix −
Unlike DES, the number of rounds in AES is variable and depends on the length of the key. AES
uses 10 rounds for 128-bit keys, 12 rounds for 192-bit keys and 14 rounds for 256-bit keys. Each
of these rounds uses a different 128-bit round key, which is calculated from the original AES key.
The schematic of AES structure is given in the following illustration −
Encryption Process
Here, we restrict to description of a typical round of AES encryption. Each round comprise of four
sub-processes. The first round process is depicted below −

Byte Substitution (SubBytes)


The 16 input bytes are substituted by looking up a fixed table (S-box) given in design. The result
is in a matrix of four rows and four columns.
Shift rows
Each of the four rows of the matrix is shifted to the left. Any entries that ‘fall off’ are re-inserted
on the right side of row. Shift is carried out as follows −
• First row is not shifted.
• Second row is shifted one (byte) position to the left.
• Third row is shifted two positions to the left.
• Fourth row is shifted three positions to the left.
• The result is a new matrix consisting of the same 16 bytes but shifted with respect to each
other.
Mix Columns
Each column of four bytes is now transformed using a special mathematical function. This
function takes as input the four bytes of one column and outputs four completely new bytes,
which replace the original column. The result is another new matrix consisting of 16 new bytes. It
should be noted that this step is not performed in the last round.
Add round key
The 16 bytes of the matrix are now considered as 128 bits and are XORed to the 128 bits of the
round key. If this is the last round then the output is the ciphertext. Otherwise, the resulting 128
bits are interpreted as 16 bytes and we begin another similar round.
Decryption Process
The process of decryption of an AES ciphertext is similar to the encryption process in the reverse
order. Each round consists of the four processes conducted in the reverse order −

• Add round key


• Mix columns
• Shift rows
• Byte substitution
Since sub-processes in each round are in reverse manner, unlike for a Feistel Cipher, the
encryption and decryption algorithms needs to be separately implemented, although they are very
closely related.
AES Analysis
In present day cryptography, AES is widely adopted and supported in both hardware and software.
Till date, no practical cryptanalytic attacks against AES has been discovered. Additionally, AES
has built-in flexibility of key length, which allows a degree of ‘future-proofing’ against progress
in the ability to perform exhaustive key searches.
However, just as for DES, the AES security is assured only if it is correctly implemented and
good key management is employed.

20.Illustrate about symmetric key cryptography.


Confidentiality can be achieved using ciphers. Ciphers can be divided into two broad categories:
symmetric key and asymmetric-key.

Symmetric-Key Ciphers

A symmetric-key cipher uses the same key for both encryption and decryption, and the key can be
used for bidirectional communication, which is why it is called symmetric.

general idea behind a symmetric-key cipher is mentioned in the figure given below.
the symmetric-key encipherment uses a single key (the key itself may be a set of values) for both
encryption and decryption. In addition, the encryption and decryption algorithms are inverses of
each other. If P is the plaintext, C is the ciphertext, and K is the key, the encryption algorithm
Ek(x) creates the ciphertext from the plaintext; the decryption algorithm Dk(x) creates the
plaintext from the ciphertext. We assume that Ek(x) and Dk(x) are inverses of each other: they
cancel the effect of each other if they are applied one after the other on the same input. We have

in which, Dk(Ek(x)) = Ek(Dk(x)) = x. We need to emphasize that it is better to make the


encryption and decryption public but keep the shared key secret.

This means that Alice and Bob need another channel, a secured one, to exchange the secret key.
Alice and Bob can meet once and exchange the key personally. The secured channel here is the
face-to-face exchange of the key. They can also trust a third party to give them the same key. They
can create a temporary secret key using another kind of cipher⎯asymmetric-key ciphers⎯which
will be described later. Encryption can be thought of as locking the message in a box; decryption
can be thought of as unlocking the box. In symmetric-key encipherment, the same key locks and
unlocks, as shown in Figure 31.3. Later sections show that the asymmetric-key encipherment
needs two keys, one for locking and one for unlocking.

The symmetric-key ciphers can be divided into traditional ciphers and modern ciphers. Traditional
ciphers are simple, character-oriented ciphers that are not secure based on today’s standard.
Modern ciphers, on the other hand, are complex, bit oriented ciphers that are more secure. We
briefly discuss the traditional ciphers to pave the way for discussing more complex modern
ciphers.
Traditional Symmetric-Key Ciphers Traditional ciphers belong to the past. However, we briefly
discuss them here because they can be thought of as the components of the modern ciphers. To be
more exact, we can divide traditional ciphers into substitution ciphers and transposition ciphers.

21.Explain poly alphabetic substitution with a suitable example.

In a polyalphabetic cipher, each occurrence of a character may have a different substitute. The
relationship of a character in the plaintext to a character in the ciphertext is one-to-many. For
example, “a” could be enciphered as “D” at the beginning of the text, but as “N” in the middle.
Polyalphabetic ciphers have the advantage of hiding the letter frequency of the underlying
language. Even single-letter frequency statistics cannot be used to break the ciphertext. To create a
polyalphabetic cipher, we need to make each ciphertext character dependent on both the
corresponding plaintext character and the position of the plaintext character in the message. This
implies that our key should be a stream of subkeys, in which each subkey depends somehow on
the position of the plaintext character that uses that subkey for encipherment. In other words, we
need to have a key stream k = (k1, k2, k3, …) in which ki is used to encipher the ith character in
the plaintext to create the ith character in the ciphertext. To see the position dependency of the key,
let us discuss a simple polyalphabetic cipher called the autokey cipher. In this cipher, the key is a
stream of subkeys, in which each subkey is used to encrypt the corresponding character in the
plaintext. The first subkey is a predetermined value secretly agreed upon by Alice and Bob. The
second subkey is the value of the first plaintext character (between 0 and 25). The third subkey is
the value of the second plaintext character, and so on. The name of the cipher, autokey, implies
that the subkeys are automatically created from the plaintext cipher characters during the
encryption process.

Example: Assume that Alice and Bob agreed to use an autokey cipher with initial key value k1 =
12. Now Alice wants to send Bob the message “Attack is today”. Enciphering is done character
by character. Each character in the plaintext is first replaced by its integer value. The first subkey
is added to create the first ciphertext character. The rest of the key is created as the plaintext
characters are read. Note that the cipher is polyalphabetic because the three occurrences of “a” in
the plaintext are encrypted differently. The three occurrences of “t” are also encrypted differently
39 Write a note on network security.
22. Explain monoalphabetic substitution with suitable examples

A substitution cipher replaces one symbol with another. If the symbols in the plaintext are
alphabetic characters, we replace one character with another. For example, we can replace letter A
with letter D and letter T with letter Z. If the symbols are digits (0 to 9), we can replace 3 with 7
and 2 with 6. Substitution ciphers can be categorized as either monoalphabetic ciphers or
polyalphabetic ciphers.

Monoalphabetic Ciphers In a monoalphabetic cipher, a character (or a symbol) in the plaintext is


always changed to the same character (or symbol) in the ciphertext regardless of its position in the
text. For example, if the algorithm says that letter A in the plaintext is changed to letter D, every
letter A is changed to letter D. In other words, the relationship between letters in the plaintext and
the ciphertext is one-to-one. The simplest monoalphabetic cipher is the additive cipher (or shift
cipher). Assume that the plaintext consists of lowercase letters (a to z), and that the ciphertext
consists of uppercase letters (A to Z). To be able to apply mathematical operations on the plaintext
and ciphertext, we assign numerical values to each letter (lowercase or uppercase), as shown in
Figure below.

23. Distinguished Between Centralised and Decentralised Peer to Peer Networks:


P2P Networks Internet users that are ready to share their resources become peers and form a
network. When a peer in the network has a file (for example, an audio or video file) to share,
it makes it available to the rest of the peers. An interested peer can connect itself to the
computer where the file is stored and download it. After a peer downloads a file, it can make
it available for other peers to download. As more peers join and download that file, more
copies of the file become available to the group. Since lists of peers may grow and shrink, the
question is how the paradigm keeps track of loyal peers and the location of the files. To
answer this question, we first need to divide the P2P networks into two categories: centralized
and decentralized.
Centralized Networks: In a centralized P2P network, the directory system listing of the peers
and what they offer uses the client-server paradigm, but the storing and downloading of the
files are done using the peer-to-peer paradigm. For this reason, a centralized P2P network is
sometimes referred to as a hybrid P2P network. Napster was an example of a centralized P2P.
In this type of network, a peer first registers itself with a central server. The peer then
provides its IP address and a list of files it has to share. To avoid system collapse, Several
Server is used.
A peer, looking for a particular file, sends a query to a central server. The server searches
its directory and responds with the IP addresses of nodes that have a copy of the file. The
peer contacts one of the nodes and downloads the file. The directory is constantly
updated as nodes join or leave the peer. Centralized networks make the maintenance of
the directory simple but have several drawbacks. Accessing the directory can generate
huge traffic and slow down the system. The central servers are vulnerable to attack, and if
all of them fail, the whole system goes down.
Decentralized Network: A decentralized P2P network does not depend on a centralized
directory system. In this model, peers arrange themselves into an overlay network, which
is a logical network made on top of the physical network. Depending on how the nodes in
the overlay network are linked, a decentralized P2P network is classified as either
unstructured or structured.
Unstructured Networks: In an unstructured P2P network, the nodes are linked randomly.
A search in an unstructured P2P is not very efficient because a query to find a file must
be flooded through the network, which produces significant traffic and still the query may
not be resolved. Two examples of this type of network are Gnutella and Freenet. Gnutella
The Gnutella network is an example of a peer-to-peer network that is decentralized but
unstructured. It is unstructured in the sense that the directory is randomly distributed
between nodes. When node A wants to access an object (such as a file), it contacts one of
its neighbors. A neighbor, in this case, is any node whose address is known to node A.
Node A sends a query message to the neighbor, node W. The query includes the identity
of the object (for example, file name). If node W knows the address of node X, which has
the object, it sends a response message that includes the address of node X. Node A now
can use the commands defined in a transfer protocol such as HTTP to get a copy of the
object from node X. If node W does not know the address of node X, it floods the request
from A to all its neighbors. Eventually one of the nodes in the network responds to the
query message, and node A can get access to node X, but node A needs to know the
address of at least one neighbor. This is done at the bootstrap time, when the node installs
the Gnutella software for the first time. The software includes a list of nodes (peers) that
node A can record as neighbors. Node A can later use the two messages, called ping and
pong, to investigate whether or not a neighbor is still alive. One of the problems with the
Gnutella network is the lack of scalability because of flooding. When the number of
nodes increases, flooding becomes problematic. To make the query more efficient, the
new version of Gnutella implemented a tiered system of ultra nodes and leaves. A node
entering into the network is a leaf, not responsible for routing; nodes which are capable of
routing are promoted to ultra nodes. This allows queries to propagate further and
improves efficiency and scalability. Gnutella adopted a number of other techniques such
as adding Query Routing Protocol (QRP) and Dynamic Querying (DQ) to reduce traffic
overhead and make searches more efficient.
Structured Networks: A structured network uses a predefined set of rules to link nodes so
that a query can be effectively and efficiently resolved. The most common technique used
for this purpose is the Distributed Hash Table (DHT). DHT is used in many applications
including Distributed Data Structure (DDS), Content Distributed Systems (CDS),
Domain Name System (DNS), and P2P file sharing. One popular P2P file sharing
protocol that uses the DHT is BitTorrent.

24.Explain DHT in detail


Distributed Hash Table (DHT) A Distributed Hash Table (DHT) distributes data (or
references to data) among a set of nodes according to some predefined rules. Each peer in
a DHT-based network becomes responsible for a range of data items. To avoid the
flooding overhead that we discussed for unstructured P2P networks, DHT-based
networks allow each peer to have a partial knowledge about the whole network. This
knowledge can be used to route the queries about the data items to the responsible nodes
using effective and scalable procedures.
Address Space: In a DHT-based network, each data item and the responsible peer is
mapped to a point in a large address of size 2m. The address space is designed using
modular arithmetic, which means that we can think of points in the address space as
distributed evenly on a circle with 2m points (0 to 2m − 1) using clockwise direction as
shown in Figure 29.2.

Most of the DHT implementations use m = 160.

Hashing Peer Identifier: The first step in creating the DHT system is to place all peers on
the address space ring. This is normally done by using a hash function that hashes the
peer identifier, normally its IP address, to an m-bit integer, called a node ID. A hash
function is a mathematical function that creates an output from an input. However, DHT
uses some of the cryptographic hash functions such as Secure Hash Algorithm (SHA)
that are collision resistant, which means that the probability of two inputs being mapped
to the same output is very low.
Hashing Object Identifier: The name of the object (for example, a file) to be shared is
also hashed to an m-bit integer in the same address space. The result in DHT parlance is
called a key. In the DHT an object is normally related to the pair (key, value) in which
the key is the hash of the object name and the value is the object or a reference to the
object. Storing the Object There are two strategies for storing the object: the direct
method and the indirect method. In the direct method, the object is stored in the node
whose ID is somehow closest to the key in the ring. The term closest is defined
differently in each protocol. This involves the object’s most likely being transported from
the computer that originally owned it. However, most DHT systems use the indirect
method due to efficiency. The peer that owns the object keeps the object, but a reference
to the object is created and stored in the node whose ID is closest to the key point. In
other words, the physical object and the reference to the object are stored in two different
locations. In the direct strategy, we create a relationship between the node ID that stores
the object and the key of the object; in the indirect strategy, we create a relationship
between the reference (pointer) to the object and the node that stores that reference. In
either case, the relationship is needed to find the object if the name of the object is given.

Example: Although the normal value of m is 160, for the purpose of demonstration, we
use m = 5 to make our examples tractable. In Figure 29.3, we assume that several peers
have already joined the group. The node N5 with IP address 110.34.56.20 has a file
named Liberty that it wants to share with its peers. The node makes a hash of the file
name, “Liberty,” to get the key = 14. Since the closest node to key 14 is node N17, N5
creates a reference to the file name (key), its IP address, and the port number (and
possibly some other information about the file) and sends this reference to be stored in
node N17. In other words, the file is stored in N5, the key of the file is k14 (a point in the
DHT ring), but the reference to the file is stored in node N17. We will see later how other
nodes can first find N17, extract the reference, and then use the reference to access the
file Liberty. Our example shows only one key on the ring, but in an actual situation there
are millions of keys and nodes in the ring. Routing DHT’s main function is to route a
query to the node responsible for storing the reference to an object. Each DHT
implementation uses a different strategy for routing, but all follow the idea that each node
needs to have a partial knowledge about the ring to route a query to a node that is closest
to the responsible node. Arrival and Departure of Nodes In a P2P network, each peer can
be a desktop or a laptop computer, which can be turned on or off. When a computer peer
launches the DHT software, it joins the network; when the computer is turned off or the
peer closes the software, it leaves the network. A DHT implementation needs to have a
clear and efficient strategy to handle arrival or Figure 29.3 Example 29.1 N2 0 N5 N10
N17 N29 ID space of size 25 (m = 5) 5 = hash (110.34.56.20) 14 = hash (“Liberty”) 5200
Liberty 110.34.56.20 Legend N25 N20 : key = hash (object name) : node = hash (IP
address) : point (potential key or node) k14 14 (110.34.56.20, 5200) 80.201.52.40 Key
Reference departure of the nodes and the effect of this on the rest of the peers. Most DHT
implementations treat the failure of a node as a departure.

You might also like