Basics
Collection of millions of files stored on thousands of servers all over the world. The components required to make a web are Structural Components
- Web Server - Web Client - Internet
Semantic Components
Well defined set of Languages Protocols Uniform Resource Locators (URLs)
Contd
Languages
HTML (Hyper Text Markup Language) XML (eXtensible Markup Language ) JAVA JAVA Script VRML (Virtual Reality Modeling Language)
Protocols
- Hyper Text Transfer Protocol (HTTP)
URLs
- Naming of Web Pages
WWW Background
1989-1990 - Tim Berners-Lee invented the World Wide Web
at CERN(European Center for Nuclear Research)
1993 Mark Andreessen invented MOSAIC at National
Center for Super Computing Applications (NCSA)
First graphical browser Internets first killer app Became Netscape Inc.
Contd
W3C
--
World Wide Web Consortium
1994
Recent Browsers are..
Internet Explorer
Mozilla Firefox Google Chrome
retain many characteristics of MOSAIC
Quick Aside Web server use
Source: Netcraft Server Survey, November 2013
Usage Share of Web Browsers
Web and HTTP
First some jargon Web page consists of objects Object can be HTML file, JPEG image, Java applet, audio file, Web page consists of base HTML-file which includes several referenced objects Each object is addressable by a URL Example URL: http://www.someschool.edu/someDept/pic.gif
protocol
host name
path name
WWW Structure
Clients use browser to send URLs via HTTP to servers requesting a Web page Web pages constructed using HTML (or other markup language) and consist of text, graphics, sounds etc. Servers respond to the clients with the requested Web page or with an error message
Clients browser renders Web page returned by server
The entire system runs over standard networking protocols (TCP/IP, DNS,)
user clicks link
browser sends request
GET http://www.sitams.org/index.html HTTP/1.1
server finds page
such syuhhow howget gtw his hsio if iart ertage ag ty ty gun ghntee ty we we ghty ghty syuh how gtw chid chdiawl qw oat oatyf hsio wet wetdeli dfla get ght a a i ert ag ty ghn ty we ghty chdi qw oatyf wet dfla ght a
communicate with HTTP
server sends page back
syuh how gtw hsio i ert ag ty ghn ty we ghty chdi qw oatyf wet dfla ght a
syuh how gtw hsio i ert ag ty ghn ty we ghty syuh how gtw gtw chdi syuh qw oatyf how hsioght ert ag ty ty wet dfla hsio ii ert a ag ghn ty ty we we ghty ghty ghn chdi qw oatyf chdi qw syuh oatyf how gtw wet dfla dfla ght a ag ty wet hsioght i ert a
browser displays it
ghn ty we ghty chdi qw oatyf wet dfla ght a
web client (browser)
web server (stores pages)
Contd
HTTP overview
HTTP: hypertext transfer protocol
Webs application layer
protocol client/server model client: browser that requests, receives, displays Web objects server: Web server sends objects in response to requests
PC running Explorer
Server running Apache Web server
Mac running Navigator
HTTP overview (continued)
Uses TCP:
client initiates TCP
HTTP is stateless
server maintains no
connection (creates socket) to server, port 80 server accepts TCP connection from client HTTP messages (applicationlayer protocol messages) exchanged between browser (HTTP client) and Web server (HTTP server) TCP connection closed
information about past client requests
Protocols that maintain state are complex! past history (state) must be maintained if server/client crashes, their views of state may be inconsistent, must be reconciled
aside
Client Side
Steps when link is selected Eg: http://www.sitams.org/btech/ece.html
- The browser determines the URL - The browser asks DNS for IP address of www.sitams.org - DNS replies with IP address - The browser makes a TCP connection on the IP address - It then sends over a request asking for the file /btech/ece.html - The www. sitams.org server sends the file /btech/ece.html - TCP connection is released - The browser fetches and displays all the text and images in the file
Server Side
Steps the Server performs when a link is selected
- Accept the TCP connection from the client - Get the name of the file requested - Get the file from the disk - Return the file to the client - Release the TCP connection
Protocol - HTTP
Protocol for client/server communication
Very simple request/response protocol Client sends request message, server replies with response message
Three versions has been used
0.9/1.0 RFC 1945 1.1 RFC 2068 1.0 dominates today but 1.1 is catching up
Contd
HTTP/0.9
- Transfer of Raw data across the Internet
HTTP/1.0
- It is a stop and wait protocol - Provides messages in Multipurpose Internet Mail Extension (MIME) types - Separate TCP connection for each file
HTTP/1.1 focused on performance enhancements
Persistent connections Pipelining Enhanced caching options
HTTP connections
Nonpersistent HTTP At most one object is sent over a TCP connection. Persistent HTTP Multiple objects can be sent over single TCP connection between client and server.
Nonpersistent HTTP
Suppose user enters URL www.sitams.org/ece/home.index
1a. HTTP client initiates TCP
(contains text, references to 10 jpeg images)
connection to HTTP server (process) at www.sitams.org on port 80
1b. HTTP server at host
2. HTTP client sends HTTP
www.sitams.org waiting for TCP connection at port 80. accepts connection, notifying client
request message (containing
URL) into TCP connection socket. Message indicates that client wants object /ece/home.index
3. HTTP server receives request
message, forms response message containing requested object, and sends message into its socket
time
Nonpersistent HTTP (cont.)
4. HTTP server closes TCP 5. HTTP client receives response
message containing html file, displays html. Parsing html file, finds 10 referenced jpeg objects
connection.
time 6. Steps 1-5 repeated for each
of 10 jpeg objects
Non-Persistent HTTP: Response time
Definition of RTT: time for a small packet to travel from client to server and back. Response time: one RTT to initiate TCP connection one RTT for HTTP request and first few bytes of HTTP response to return file transmission time total = 2RTT+transmit time
initiate TCP connection
RTT
request file RTT file received time
time to transmit file
time
Persistent HTTP
Nonpersistent HTTP issues: requires 2 RTTs per object OS overhead for each TCP connection browsers often open parallel TCP connections to fetch referenced objects Persistent HTTP server leaves connection open after sending response subsequent HTTP messages between same client/server sent over open connection client sends requests as soon as it encounters a referenced object as little as one RTT for all the referenced objects
HTTP Request Messages
GET retrieve document specified by URL PUT store specified document under given URL HEAD retrieve info. about document specified by URL OPTIONS retrieve information about available options POST give information to the server DELETE remove document specified by URL TRACE loopback request message CONNECT Reserved for future use
HTTP Request Format
request-line ( request message-URL-HTTP-version) headers (0 or more)
<blank line> body (only for POST request)
Clients browser constructs and sends message Typical HTTP request:
GET http://www.sitams.org/ece/faculty_details/index.html HTTP/1.0
HTTP Response Codes
1xx Informational request received, processing Ex: 100:server agrees to handle clients request. 2xx Success action received, understood, accepted Ex:200:request succeeded;204:no content present. 3xx Redirection further action necessary Ex:301:Page moved;304:cache page still valid. 4xx Client Error bad syntax or cannot be fulfilled Ex:403:forbidden page;404:page not found. 5xx Server Error server failed Ex:500:internal server error;503:try again later
HTTP Response Format
status-line (HTTP-version response-code response-phrase) headers (0 or more)
<blank line>
body
Web servers construct and send response messages Typical HTTP response:
- HTTP/1.0 301 Moved Permanently Location: http://www.sitams.org/ece/faculty/index.html
HTTP Headers
Both requests and responses contain a variable number of header fields
Consists of field name, colon, space, field value 17 possible header types divided into three categories
Request Response Body
Example: Date: Friday, 10-Dec-13 11:30:01 IST Example: Content-length: 3001
User-server state: cookies
Many major Web sites use cookies Four components: Example: Susan always access Internet always from PC visits specific e1) cookie header line of HTTP response message commerce site for first 2) cookie header line in time HTTP request message when initial HTTP 3) cookie file kept on users host, managed by requests arrives at site, users browser site creates: 4) back-end database at unique ID Web site entry in backend database for ID
Cookies: keeping state (cont.)
client
ebay 8734
server
usual http request msg
cookie file
ebay 8734 amazon 1678
Set-cookie: 1678
usual http request msg
usual http response
Amazon server creates ID 1678 for user create
entry
cookie: 1678
one week later:
ebay 8734 amazon 1678
usual http response msg usual http request msg
cookiespecific action
cookiespectific action
access
access
backend database
cookie: 1678
usual http response msg
Cookies (continued)
What cookies can bring: authorization shopping carts recommendations user session state (Web e-mail) Cookies and privacy: cookies permit sites to learn a lot about you you may supply name and e-mail to sites
aside
How to keep state: protocol endpoints: maintain state at sender/receiver over multiple transactions cookies: http messages carry state
Web caches (proxy server)
Goal: satisfy client request without involving origin server
user sets browser:
origin server
Web accesses via cache browser sends all HTTP requests to cache
client
Proxy server
object in cache: cache returns object else cache requests object from origin server, then returns object to client
client
origin server
More about Web caching
cache acts as both
client and server typically cache is installed by ISP (university, company, residential ISP)
Why Web caching? reduce response time for client request reduce traffic on an institutions access link. Internet dense with caches: enables poor content providers to effectively deliver content (but so does P2P file sharing)
Caching example
Assumptions
average object size = 100,000
origin servers
public Internet
bits avg. request rate from institutions browsers to origin servers = 15/sec delay from institutional router to any origin server and back to router = 2 sec
1.5 Mbps access link
institutional network 10 Mbps LAN
Consequences
utilization on LAN = 15%
utilization on access link = 100% total delay
= Internet delay + access delay + LAN delay = 2 sec + minutes + milliseconds
institutional cache
Caching example (cont)
possible solution
increase bandwidth of access
origin servers
public Internet
link to, say, 10 Mbps
consequence
utilization on LAN = 15% utilization on access link = 15% Total delay
= Internet delay + access delay + LAN delay = 2 sec + msecs + msecs often a costly upgrade
10 Mbps access link
institutional network 10 Mbps LAN
institutional cache
Caching example (cont)
possible solution: install cache
suppose hit rate is 0.4
origin servers
public Internet
consequence
40% requests will be
satisfied almost immediately 60% requests satisfied by origin server utilization of access link reduced to 60%, resulting in negligible delays (say 10 msec) total avg delay = Internet delay + access delay + LAN delay = .6*(2.01) secs + .4*milliseconds < 1.4 secs
1.5 Mbps access link
institutional network 10 Mbps LAN
institutional cache
Conditional GET
Goal: dont send object if
cache
HTTP request msg
If-modified-since: <date>
server
object not modified
cache has up-to-date cached version cache: specify date of cached copy in HTTP request
If-modified-since: <date>
server: response contains no
HTTP response
HTTP/1.0 304 Not Modified
object if cached copy is upto-date:
HTTP/1.0 304 Not Modified
HTTP request msg
If-modified-since: <date>
HTTP response
HTTP/1.0 200 OK
object modified
<data>
FTP: the file transfer protocol
FTP FTP user client interface local file system
file transfer
FTP server
remote file system
user at host
transfer file to/from remote host client/server model
remote) server: remote host ftp: RFC 959 ftp server: port 21
client: side that initiates transfer (either to/from
FTP: separate control, data connections
FTP client contacts FTP server
TCP control connection port 21
at port 21, TCP is transport protocol TCP data connection FTP FTP port 20 client authorized over control client server connection client browses remote server opens another TCP directory by sending commands data connection to transfer over control connection. another file. when server receives file control connection: out of transfer command, server band opens 2nd TCP connection (for FTP server maintains state: file) to client current directory, earlier after transferring one file, authentication server closes data connection.
FTP commands, responses
Sample commands:
sent as ASCII text over
Sample return codes
status code and phrase (as
control channel USER username PASS password
LIST return list of file in
current directory (gets) file
RETR filename retrieves
STOR filename stores
(puts) file onto remote host
in HTTP) 331 Username OK, password required 125 data connection already open; transfer starting 425 Cant open data connection 452 Error writing file