You are on page 1of 50

Unit 4

WWW - Architectural Components

• Web pages
• Web browser and web server
• Data representation standard
Web pages
• Large set of documents that are accessible
to internet users
• Each web page is classified as a
hypermedia document
hyper means document can contain
selectable links that refer to other
documents
media means document can contain items
other than text (e.g., graphics images)
Web browser and Web server
• Web browser - An application program that
user invokes to access and display a web
page
Client that contacts a web server to obtain
a copy of a specified page
• Web server - A given web server can
manage more than one page
Data representation standard

• Depends on contents of web page


• GIF or JPEG for a page that contains a
single graphics image
• HTML for a page that contains a mixture
of text and other items
HTML document – a file that contains text
along with embedded commands, called
tags, that give guidelines for display
• <TAG>
• Tags in pair
<CENTER> and </CENTER>
Items between the commands to be
centered in the browser’s window
Uniform Resource Locators
• URL - Unique name that is used to identify
a web page
• URL begins with a specification of the
scheme(transfer protocol) used to access
the page
Format of remainder of the URL depends
on the scheme
http scheme
• http: // hostname [:port] / path [;parameters]
[?query]
italic denotes an item to be supplied
brackets denote optional item
hostname string specifies domain name, IPv4
address of the computer on which server
for the page operates
:port is optional
• path string identifies one particular
document
• ;parameters – additional parameters
• ?query - question
• Typical URL, user enters contains only a
hostname and path
http://www.cs.purdue.edu/people/comer
[absolute form of URL]
• Relative URL
/people/comer
Seldom seen by a user
Meaningful after communication has been
established with a specific web server
[www.cs.purdue.edu in this case]
Example HTML document
• How a URL is produced from a selectable link
in a document?
<HTML>
The author of this text is
<A HREF="http://www.cs.purdue.edu/people/comer">
Douglas Comer.</A>
</HTML>
• Pair <A> and </A> known as anchor
URL added to first tag
Item to be displayed between tags
• Opening example document using
browser
displays
The author of this text is Douglas Comer

[Underlined text will appear as selectable


link]
Hypertext Transfer Protocol
• Used for communication between a browser
and a web server or between intermediate
machines and web servers
• Characteristics
 App layer
 Request/Response
 Stateless
 Bi-directional transfer
 Capability negotiation
 Support for caching
 Support for intermediaries
Application layer
• HTTP operates at the application layer
• It assumes a protocol such as TCP
• Does not provide reliability or
retransmission itself
Request/Response
• Once a transport session has been
established
 One side (usually a browser) must send a
HTTP request
 Other side responds to request
Stateless
• Each HTTP request is self-contained
• Server does not keep a history of previous
requests or previous sessions
Bi-directional transfer
• In most cases, a browser requests a web
page, and the server transfers a copy to
browser
• HTTP also allows transfer from a browser
to server (e.g., when user supplies data)
Capability Negotiation
• HTTP allows browsers and servers to
negotiate details such as the character set
to be used during transfers
• A sender can specify the capabilities it
offers, and a receiver can specify the
capabilities it accepts
Support for caching
• To improve response time, a browser
caches a copy of each web page it
retrieves. If a user requests a page again,
the browser can interrogate the server to
determine whether the contents of page
has changed since copy was cached
Support for Intermediaries
• HTTP allows a machine along the path
between a browser and a server to act as
a proxy server that caches web pages and
answers a browser’s request from its
cache
HTTP GET Request
• Simplest case – a browser contacts a web
server directly
 Browser extracts host name section from
URL
 Uses DNS (Domain Name System) to map
the name into an equivalent IP address
 Form a TCP connection
 Browser and web server can use HTTP to
communicate
• Browser sends a request for a specific
page
Sends HTTP GET command
GET /people/comer/ HTTP/1.1

Note – keyword GET, relative URL, HTTP


version number
• Server responds by sending a copy of
page
Error Messages
• If web server receives an illegal request
 Server usually generates error message in
valid HTML
 Browser will attempt to display whatever
the server returns
Error Messages
<HTML>
<HEAD> <TITLE>400 Bad Request</TITLE>
</HEAD>
<BODY>
<H1>Error in Request</H1> Your ...
</BODY>
</HTML>
If we open the above using browser , we see
Error in Request
Your ….
Note – Browser uses head of the document internally and only shows
body of the document
<H1> and </H1> means display in large and bold
Persistent connections
• First version of HTTP uses a paradigm of
one TCP connection per data transfer
 Client opens a TCP connection and sends
HTTP GET request
 The server sends a copy of requested page
and closes connection
 One request , one response per connection
[This is not persistent connection]
• Version 1.1 of HTTP adopts persistent
connection approach
 Client opens a TCP connection
 Client leaves the connection in place
during multiple requests and responses
 When either a client or server is ready to
close
It informs other side
Persistent connections
• Advantage –
Reduced overhead – fewer TCP connections mean
Lower response latency
Less overhead on underlying networks
Less memory used for buffers
Less use of CPU time
[ A browser can further optimize by pipelining
requests – send requests back-to-back without
waiting for a response]
Persistent connections
• Disadvantage
Need to identify the beginning and end of
each item sent over connection
Two possible techniques to handle this
 Send a length followed by the item
Send sentinel after the item to mark the end
(not used since item transmitted may
contains sequence of octets corresponding
to sentinel)
Data length and program output
• Sometimes web pages are generated
dynamically when request arrives – server
uses technology such as the Common
Gateway Interface (CGI) that allows a
computer program running on server to
create a web page ( compare this to stored page)
• If a server does not know length of a page ,
server can inform the browser that it will
close the connection after transmitting the
page
Length encoding and HTTP headers

• How does server sends length info?

 Each HTTP transmission contains


a header, a blank line, document (web page)

 Each line in the header contains – a


keyword, a colon, and information
Header Meaning

Content-Length: Size of document in octets

Content-Type: Type of document

Content-Encoding: Encoding used for document

Content-Language: Language(s) used in document


• Example for HTTP transmission

Content-Length: 34
Content-Language: en
Content-Encoding: ascii

<HTML> A trivial example. </HTML>

Note – Blank line after three header lines


Content-Length: header required for persistent connection
• Browser and server can exchange meta
information using headers -
Connection: close
This header is used in place of Content-
Length: , when server does not know length
of page
Browser knows after document is received,
further request should not be sent
Negotiation
• Headers can be used to permit a client and
server to negotiate capabilities
• Examples for capabilities –
 Characteristics about connection - Whether
access is authenticated
 Representation – which types of
compression can be used
 Content – whether text files must be in
english
 Control – length of time a page remains
valid

• Types of negotiation
 Server-driven
 Agent driven (Browser driven)
Server driven negotiation
• Request from browser specifies a list of
preferences along with URL of the desired
document
• Server selects, from among available
options, one that satisfies the browser’s
preferences
Agent driven negotiation
• Browser sends a request to the server to
ask what is available
• The server returns a list of possibilities
• The browser selects one of them and
sends a second request to obtain
document
Note –
Disadvantage – two server interactions
Advantage - Browser retains control over choice
Negotiation
• A browser uses a HTTP Accept: header
Accept: text/html, text/plain; q=0.5, text/x-dvi; q=0.8

[ q is preference level]
Browser is indicating preferences
text/html
text/x-dvi
text/plain
• Variety of Accept headers exist
Accept-Encoding:
Accept-Charset:
Accept-Language:
Conditional Requests
• When a browser sends a request, it includes
a header that qualifies conditions under
which the request should be honored
If the specified condition is not met, the
server does not return the requested
document
• Advantage - Unnecessary transfers avoided
Conditional Requests
• Example
If-Modified-Since: Mon, 01 Apr 2013 05:00:01 GMT

Above header can be included with GET


request to avoid a transfer if document
older than
01 Apr 2013
Proxy Servers and Caching
• Proxy servers provide an optimization
 Decrease latency
 Reduce load on web servers

• Two forms of proxy servers exist


 Nontransparent
 Transparent
Nontransparent
proxy server
• Visible to a user – the user configures a
browser to contact the proxy instead of
the original source (web server)
• Caches web pages and answers
subsequent requests for a page from the
cache
Transparent
proxy
• Does not require any changes to a
browser’s configuration
• Examines all TCP connections that pass
through the proxy, and intercepts any
connection to port 80
• Caches web pages and answers
subsequent requests for a page from the
cache
HTTP support for proxy servers
• The protocol specifies exactly
how a proxy handles each request
How headers should be interpreted by proxies
How a browser negotiates with a proxy
How proxy negotiates with server

• Specific headers for use by proxies


Allows proxy to authenticate itself to server
Allows each proxy that handles a web page to
record its identity so the ultimate recipient
receives a list of all intermediate proxies
HTTP support for proxy servers
• HTTP allows a server to control how
proxies handle each web page
Server can include header
Max-Forwards: N

At most N proxies between web server


and browser
Caching
• Improves efficiency
 Reduces both latency and network traffic by
eliminating unnecessary transfers
• Most obvious aspect of caching – storage
When a web page is initially accessed, a copy
is stored on disk, either by browser, proxy or
both
For subsequent requests for the same page –
retrieve from cache (not from web server)
• Central question is about timing – How
long should an item be kept in a cache?
 Keeping a cached copy too long means -
Copy becoming stale (changes to original
are not reflected in the cached copy)
 If cached copy not kept long enough -
Inefficient because the next request must
go to web server
HTTP support for caching
• HTTP allows a web server to control
caching in two ways
1. When it answers a request for a page, a
server can specify caching details
 Page can be cached or not
 Proxy can cache page or not
 Community with which cached copy can be shared
 Time at which cached copy must expire
 Limits on transformations that can be applied to copy
2. HTTP allows a browser to force
revalidation of a page
Browser sends a request for a page and uses a header to specify that
maximum age ( time since a copy of page was stored) cannot be
greater than zero
No copy of the page in a cache can be used since age will be grater
than zero

So original web server will answer the request

Proxies along the way and browser will get a fresh copy from server

You might also like