You are on page 1of 30

HTTP- Hypertext Transfer Protocol (HTTP)

• The World Wide Web (WWW), also called the Web, is an information space where documents and other web
resources are identified by Uniform Resource Locators (URLs), interlinked by hypertext links, and accessible via
the Internet
• The original goal of the Web is to organize and retrieve information, drawing on ideas about hypertext—
interlinked documents.
• Web is as a set of cooperating clients and servers, all of whom speak the same language: HTTP.
ARCHITECTURE
• The WWW today is a distributed client-
server service, in which
• a client using a browser can access a
service using a server.
• However, the service provided is
distributed over many locations called
sites.
• Each site holds one or more documents,
referred to as Web pages.
• Each Web page, however, can contain
some links to other Web pages in the
same or other sites.
• a Web page can be simple or composite.
A simple Web page has no link to other
Web pages; a composite Web page has
one ormore links to other Web pages.
• Each Web page is a file with a name and
address
• A website[1] is a collection of related web pages
WEBSITE including multimedia content, typically identified with
a common domain name, and published on at least
one web server.
• examples are wikipedia.org, google.com,
and amazon.com

Website is a collection of webpages.


WEB PAGES
hypertext
• The core idea of hypertext is that
• one document can link to another
document, and the protocol (HTTP) and
document language (HTML) were designed
to meet that goal.
• system of interlinked documents is known as
hypertext.
Web Client (Browser)

• Most people are exposed to the Web


through a graphical client program or
web browser like Safari, Chrome,
Firefox, or Internet Explorer.
• web browser has a function that
allows the user to obtain an object by
opening a URL.
Web Server

• The Web page is stored at the server. Each time a client request
arrives, the corresponding document is sent to the client.
• To improve efficiency, servers normally store requested files in a
cache in memory; memory is faster to access than disk.
• Some popular Web servers include Apache and Microsoft Internet
Information Server.
Uniform Resource Locators (URLs)

• Uniform Resource Locators


(URLs) - provide information that
allows objects on the Web to be
located
• Eg:
http://www.cs.princeton.edu/in
dex.html
• Every Web page has an address
so that browsers, and you, can
find it. Every Web page has a
URL,
• Finally, a Web URL can have a query part at the end,
following a question mark, eg:
• http://airline.travel:80/index.phtml?chickens
http :// airinfo.travel : 80 / index.phtml • When a URL has a query part, it tells the host computer more
https :// www.firstpost.com /india/anna- specifically what you want the page to display When you type
university-results-2018-april-may-re-evaluation-results- a URL into your Web browser,
released-on-aucoe-annauniv-edu-5085411.html • you can leave out the http:// part because the browser adds
• The first item in a URL, the letters that appear before the it for you..
colon, is the scheme, which describes the way a browser can • Another useful URL scheme is mailto.
get to the resource.
• A mailto URL looks like this:
• Following the colon are two slashes (always forward slashes,
never backslashes) and • mailto:internet12@gurus.org
• the name of the host computer on which the resource lives; • That is, a mailto link is an e-mail address.
in this case, airline travel . • Clicking a mailto URL runs your e-mail program and creates a
• Then comes another slash and a path, which gives the name new message addressed to the address in the link
of the resource on that host; in this case, a file named
index.phtml.
• Web URLs allow a few other optional parts. They can include
a port number, which specifies, which of several programs
running on that host should handle the request. The port
number goes after a colon after the host name, eg:
• http://airline.travel:80/index.phtml
• The standard http port number is 80.
Types of URL
• A relative URL typically
• A URL specifies the location of a target(file, consists only of the path, and
directory, HTML page, image, program, and optionally, the resource, but
so on) stored on a local or networked no scheme or server. because
computer. it assumes the files are located
• An absolute URL contains all the information in a folder or on a server that’s
necessary to locate a resource. relative to the originating
document
• An absolute URL uses the following
format: scheme://server/path/resource • Eg: index.html
• eg:http://www.cs.princeton.edu/index.html
• If you opened a URL, your web browser would open a TCP
connection to the web server at a machine called
www.cs.princeton.edu and immediately retrieve and display the
file called index.html.
• Most files on the Web contain images and text, and many have
other objects such as audio and video clips, pieces of code, etc.
• They also frequently include URLs that point to other files that
may be located on other machines, which is the core of the
“hypertext” part of HTTP and HTML.
• A web browser has some way in which you can recognize URLs
(often by highlighting or underlining some text. These
embedded URLs are called hypertext links.
• When you ask your web browser to open one of these
embedded URLs (e.g., by pointing and clicking on it with a
mouse), it will open a new connection and retrieve and display
a new file. This is called following a link.
• It thus becomes very easy to hop from one machine to another
around the network, following links to all sorts of information.
• to embed a link in a document and allow a user to follow that
link to get another document-- basis of a hypertext system.
• When you ask your browser to view a page, your browser (the
client) fetches the page from the server using HTTP running
over TCP.
• Like SMTP, HTTP is a text-oriented protocol. HTTP is a
request/response protocol, where every message has the
general form
HTTP is a • <CRLF> stands for carriage-return+line-feed.
request/response • The first line (START LINE) indicates whether this is a
protocol, where every request message or a response message.
message has the general
form • The next set of lines specifies zero or more of these
MESSAGE HEADER lines—the set is terminated by a
blank line is a collection of options and parameters
START_LINE <CRLF> that qualify the request or response.
MESSAGE_HEADER<CRLF> • HTTP defines many possible header types, some of
<CRLF> which pertain to request messages, some to
response messages, and some to the data carried in
MESSAGE_BODY <CRLF> the message body.
• Finally, after the blank line comes the contents of the
requested message (MESSAGE BODY); this part of the
message is where a server would place the requested
page when responding to a request, and it is typically
empty for request messages.
• Request Messages
• The first line of an HTTP request message specifies
three things:
• the operation to be performed,
• the Web page the operation should be performed
on, and the version of HTTP being used.
• Although HTTP defines a wide assortment of
possible request operations—including
• write operations that allow a Web page to be
posted on a server—
• the two most common operations are
• GET (fetch the specified Web page) and
• HEAD (fetch status information about the specified
Web page).
• GET-used when your browser wants to retrieve and
display a Web page.
• HEAD-used to test the validity of a hypertext link or
to see if a particular page has been modified since
• For example, the START LINE
• GET http://www.cs.princeton.edu/index.html HTTP/1.1

.
Conditional Request
• A client can add a condition in its
request. In this case, the server will
send the requested Web page if the
condition is met or inform the client
otherwise.
• One of the most common conditions
imposed by the client is the time and
date the Web page is modified.
• If-Modified-Since, which gives the
client a way to conditionally request
to a Web page—the server returns
the page only if it has been modified
since the time specified in that
header line
• Response Messages
• Like request messages, response
messages begin with a single START
LINE.
• In this case, the line specifies the
version of HTTP being used,
• a three-digit code indicating whether
or not the request was successful, and
a text string giving the reason for the
response.
• HTTP/1.1 202 Accepted- server was able to satisfy
the request.
• HTTP/1.1 404 Not Found- it was not able to satisfy
the request because the page was not found.
Uniform Resource Identifiers
• .A URI is a character string that identifies a resource, where a resource can be anything
that has identity, such as a document, an image, or a service.
• The format of URIs:
• The first part of a URI is a scheme that names a particular way of identifying certain
kind of resource, such as
• mailto for email addresses or file for file names.
• The second part of a URI, separated from the first part by a colon, is the scheme-
specific part.
• It is a resource identifier consistent with the scheme in the first part, as in the URIs
• mailto:santa@northpole.org
• and
• file:///C:/foo.html
• A resource doesn’t have to be retrievable or accessible .
• extensible markup language (XML) namespaces are identified by URIs.
Nonpersistent Connection Vs persistent Connection
TCP Connections:
• The original version of HTTP (1.0) established a
separate TCP connection for each data item
retrieved from the server.
• It’s a very inefficient mechanism:
• connection setup and teardown messages had to
be exchanged between the client and server even
if all the client wanted to do was verify that it had
the most recent copy of a page. Thus, retrieving a
page that included some text and a dozen icons or
other small graphics would result in 13 separate
TCP connections being established and closed.
• the sequence of events:
• for fetching a page that has just a single
embedded object.
• Colored lines indicate TCP messages,
• while black lines indicate the HTTP requests and
responses.
• Disadvantage:
You can see two round trip times are spent setting
up TCP connections.
latency impact,
there is also processing cost on the server to handle
the extra TCP connection establishment and
termination.
• To overcome this situation,
• HTTP version 1.1 introduced persistent
connections—the client and server can
exchange multiple request/ response
messages over the same TCP connection.
• Advantages:
• Frist , eliminate the connection setup
overhead, thereby reducing the load on
the server,
• the load on the network caused by the
additional TCP packets, and the delay
understood by the user.
• Second, because a client can send
multiple request messages down a single
TCP connection, TCP’s congestion window
mechanism is able to operate more
efficiently.
• persistent connection is the case where
the connection is already open
(presumably due to some prior access of
the same server).
• Disavantages:
• neither the client nor server necessarily
knows how long to keep a particular TCP
connection open.
• This is especially critical on the server, which
might be asked to keep connections opened
on behalf of thousands of clients.
• solution : the server must time out and close
a connection if it has received no requests on
the connection for a period of time.
• Also, both the client and server must watch
on the other side has elected to close the
connection, and they must use that
information as a signal that they should close
their side of the connection as well. both
sides must close a TCP connection before it is
fully terminated
• added complexity may be one reason why
persistent connections were not used from
the outset, but today it is widely accepted
that the benefits of persistent connections
more than offset the drawbacks.
•Caching is the temporary storage of web
documents such as HTML pages and images. web
• Caching can be implemented in many different places.
• a user’s browser can cache recently accessed pages and
simply display the cached copy if the user visits the same
browser stores copies of web pages visited recently to page again.
reduce its bandwidth usage, server load, and lag.
• a site can support a single site-wide cache. This allows
• a cache is a hardware or software component that stores users to take advantage of pages previously downloaded
data so that future requests for that data can be served by other users.
faster.
• Closer to the middle of the Internet, Internet Service
• Advantages:. Providers (ISPs) can cache pages.
• From the client’s perspective, a page that can be • Note that, in the second case, the users within the site
retrieved from a nearby cache can be displayed much most likely know what machine is caching pages on behalf
more quickly than if it has to be fetched from across the of the site, and they configure their browsers to connect
world. directly to the caching host.
• From the server’s perspective, having a cache intercept • This node is sometimes called a proxy.
and satisfy a request reduces the load on the server.
• In contrast, the sites that connect to the ISP are probably
not aware that the ISP is caching pages.
• It simply happens to be the case that HTTP requests
coming out of the various sites pass through a common
ISP router. This router can peek inside the request
message and look at the URL for the requested page. If it
has the page in its cache, it returns it. If not, it forwards
the request to the server and watches for the response to
fly by in the other direction.
• When it does, the router saves a copy in the hope that it
can use it to satisfy a future request.
• HTTP supports proxy servers.
• A proxy server is a computer that keeps
copies of responses to recent requests.
• The HTTP client sends a request to the REQUEST
proxy server. The proxy server checks its REQUEST
HTTP TARGET
cache. If the response is not stored in CLIENT PROXY SERVER SERVER
the cache, the proxy server sends the CHECKES
request to the corresponding server. If the
CACHE response
• Incoming responses are sent to the is not
proxy server and stored for future stored in
requests from other clients. the cache
RESPONSES
• The proxy server reduces the load on the ARE STORED
RESPONSE
original server, decreases traffic, and IN CACHE
improves latency.
• However, to use the proxy server, the
client must be configured to access the
proxy instead of the target server.
• PROXY SERVER acts as a server
• Note that the proxy server acts
both as a server and client. REQUEST
PROXY
• When it receives a request from CLIENT
SERVER
RESPONSE
a client for which it has a
response, it acts as a server and
sends the response to the client.
• When it receives a request from
a client for which it does not
REQUEST
have a response, it first acts as a REQUEST TARGET
CLIENT PROXY
server and sends a request to NO RESPONSE SERVER SERVER
the target server. When the RESPONSE
RESPONSE
response has been received, it
acts again as a server and sends
the response to the client.
• No matter where pages are cached, the More generally, there are a set of cache
ability to cache Web pages is important directives that must be obeyed by all caching
enough that HTTP has been designed to mechanisms along the request/response
make the job easier. chain.
• The trick is that the cache needs to make • These directives specify whether or not a
sure it is not responding with an out-of-date document can be cached,
version of the page.
• how long it can be cached,
• For example, the server assigns an
expiration date (the Expires header field) to • how fresh a document must be, and so on
each page it sends back to the client (or to a
cache between the server and client).
• The cache remembers this date and knows
that it need not reverify the page each time
it is requested until after that expiration
date has passed.
• After that time the cache can use the HEAD
or conditional GET operation (GET with If-
Modified-Since header line) to verify that it
has the most recent copy of the page.
• Cache Update • HTTP Security
• how long a response should remain in • The HTTP per se does not provide
the proxy server?before being deleted security.
and replaced. • HTTP can be run over the Secure Socket
• different strategies:. Layer (SSL).
• to store the list of sites whose • In this case, HTTP is referred to asHTTPS.
information remains the same for a • HTTPS provides confidentiality, client and
while. server authentication, and data integrity.
• For example, a news agency may change
its news page every morning. This means
that a proxy server can get the news
early in the morning and keep it until the
next day.
• to add some headers to show the last
modification time of the information.
The proxy server can then use the
information in this header to guess how
long the information would be valid.

You might also like