You are on page 1of 30

1/31/2008

Prof. Reuven Aviv Tel Hai Academic College Department of Computer Science Topics in Data Communication

World Wide Web


Acknowledgements for slides: A. Tanenbaum, Computer Networks

The World Wide Web Architectural Overview Static Web Documents Dynamic Web Documents HTTP The HyperText Transfer Protocol Content Delivery Networks

1/31/2008

Architectural Overview

Architectural Overview
Client with Web browser program Server with Web Server and pages (html) Other servers with Web Servers and pages Links between pages

1/31/2008

Browser Operation when User clicks on a link B picks the URL from the clicked link B gets IP address of Web server from DNS B open TCP connection to the (IP, port 80) B sends a request for page (HTTP packet) W.S. sends the linked page (HTTP packet) Page is in html language B. closes TCP connection B. interpret html, displays page to user B fetches & presents images linked to the file

The Client Side


non html in page: PDF, GIF, JPEG, MP3, MPEG, ... Plug-ins: Code installed as an extension to the browser Code uses browser functions & v.v e.g. to supply the data to the plug-in Helper Applications, invoked by B as a separate process

Plug-in

Helper Application

1/31/2008

Server Side
Accepts TCP connection Gets name of requested file (HTTP packet) Gets the file (local disk) Sends back the file (HTTP packets) Release TCP connection To improve performance Maintain cache of files Multithreading

Multi-threaded Web Server


Front-end thread accept request, build record Pass record to a Working Thread All threads share memory , including the cache If page not in cache, WT initiates disk read

1/31/2008

Tasks of a Working Thread Resolving name of the file Authenticating client (another lecture) Perform access control on client Check the cache Fetch file from disk Determine MIME type of file This will be sent to the client Send reply to client Construct HTTP packet(s) Write in the Web Server log

What if the CPU cant handle the load?

Server Farm on a LAN

Problems Each Processing Node has its own cache P.N. specialize with certain files Both requests and replies via the Front-end

Solution?

1/31/2008

TCP Handoff
Front-end passes the TCP endpoint (IP, port) to the Processing Node Processing Node send page to Client

Normal

TCP Handoff

URLs Uniform Resource Locaters


URL provides answers to what?
What is the name of the page? What is the location of the page? How to access the page (which protocol)?

? ?

1/31/2008

Statelessness and Cookies


HTTP is request/reply; stateless But, server needs: to recognize users (registered?, adapt home page) to keep track of visited items (shopping cart) Cookies (small text files) keep that info. Stored at Client C:\Documents and Settings\aviv\Cookies Identified by domain name of the sending server

Cookies: Structure domain: where the cookie came from Path: root of the file tree related to cookie Content: variableName=value pairs. Anything Expires if set it is kept (persistent cookie) Secure: If set cookie is sent only to secure server

Usages?

1/31/2008

Using cookies

Casino server chooses which gambling option it presents Store Server puts items in cart in the cookie Web Portal server presents stock prices and Sport results sneaky.com records visits of UserID in certain pages pages include adds/banners/small pictures User not aware its browser visited sneaky.com User profile is built, maybe with name/password

HTML: Hypertext Markup Language

1/31/2008

HTML HyperText Markup Language

(b)

Text with markups instructions (formatting, links,) Instructions in form of pair of tags <h2> </h2>

Formatted Page Presented by browser

1/31/2008

Some HTML Tags

HTML Table

10

1/31/2008

HTML Input: Forms

Browser presents a web page with a form User fill the form Browser stores User inputs in variables Browser send the information via HTTP

HTML Input: Web page with a Form

(b)

11

1/31/2008

Browser Response
A possible response from the browser to the server with information filled in by the user. A string of name=value

Server passes the string to back-end script for processing (e.g. Perl script) Script writes to DB, might create new page

Automatic Processing of Web Pages


Need to process html web pages by programs E.g. Find a book that was published after 2000 Program searches page(s), which have no structure. Hard for program to understand if 2000 is a year or a price Idea: Build documents (pages) with structure that will be useful for program Describe a document by XML language to define named structures, sub-structures XML: eXtnsible Markup Language

12

1/31/2008

A simple Web page in XML

Hierarchical Structure We define a structure, named book_list Book_list: a list of three structures named book Book: three fields, each with name & value

A simple Web page in XML


A program can search for book_list.book.date >= 2002 How a browser will present this page to a user? Need an processor that creates from XML doc an HTML page with formatting tags Instructions for the processor are in another file Written in the eXtensible Style Language (XSL) Referenced in the XML file (at the top) Browsers include XML/XSL processor and do this automatically on given XML/XSL files

13

1/31/2008

eXtensible Style Language

XSL

Pure html

XSL language program

Server Side Dynamic Pages: CGI Script

14

1/31/2008

Dynamic Web Documents

Server Side Dynamic pages: Embedded PHP Web server calls the PHP interpreter before downloading test.php Web Server maintains info about the browser (OS type, ..) in the variable HTTP_USER_AGENT Php re-writes the page, inserting the value of HTTP_USER_AGENT

15

1/31/2008

Web Page With A Form PHP Script Processing Form data


User Input: Barbara, 24

Output from PHP Script html page

Client-Side Dynamic Pages: Embedded Javascript

16

1/31/2008

Server Side & Client Side Dynamic Pages

Client Side is faster. Used for local interaction with User

JavaScript is a full blown language

17

1/31/2008

Various ways to create and Display Content

Embedded Java Applets downloadable ActiveX control

HTTP Protocol

18

1/31/2008

HTTP Protocol (1)


Versions 1.0, 1.1 RFC 2616 Request Response Using TCP (port 80 on server side) Persistent connection (HTTP 1.1) Request: ASCII Response: RFC 822 MIME-like A general protocol for object oriented Apps Accessing functionality of Remote Objects Many but not all methods are Web specific E.g. GET Object (not necessary a file)

HTTP Protocol (2)


transaction oriented client/server protocol between Web browser (client) and Web server stateless each transaction treated independently flexible format handling client may specify supported formats

19

1/31/2008

Examples of HTTP Operation


Direct connection

Via Intermediary system(s)

Caching

Intermediary systems 1: Proxy process


Usage: Clients within organization must authenticate external Web Server. Proxy sits on the client side of the firewall (FW) a. Proxy authenticates server (e.g. passwd, cert) b. replies carry authentication data e.g. SSL header (encrypted hash of message) Proxy send requests to server & replies to clients Acts as a client in interacting with the server Acts as a server in interacting with clients

20

1/31/2008

Types of Intermediate HTTP Systems

Intermediary systems 2: Gateway process


1: Server inside organization must authenticate external Client. Gateway sits on the Server side of the firewall a. GW authenticates Client (e.g. password, cert) b. requests carry authentication data e.g. SSL header (encrypted hash of message) 2: Client connects to non-http Server (e.g. FTP) Client sends http requests. GW translates

21

1/31/2008

Intermediary systems 3: Tunnel


Tunnel perform no operation on http messages used if an intermedate is required for the connection but understanding http not required E.g. Initial authentication of Client and/or Server After that messages retransmitted unchanged

HTTP Operation - Caches


Caching can be done by a client, server or intermediary system stores previous requests/ responses may return stored response to subsequent requests not all requests can be cached

22

1/31/2008

HTTP Messages

General Structure of HTTP message


Request Line: Method (e.g. GET), Resource (filename), HTTP Vers Response Status Line: HTTP Vers; Status Code e.g. OK; Reason Headers general: Date, Upgrade (to better version) Request: Host, Accept-charset, Response: Server (Softw), Accept-ranges (willing to take partial page with range expressed in bytes) Entity Header Content-Type, Last-Modified, Entity Body: Data (e.g. html page)

23

1/31/2008

Request and Reply


GET /rfc.html HTTP/1.1 Host: www.ietf.org HTTP/1.1 200 OK Date: Wed, 08 May 2002 22:54:22 GMT //Request Line //Request Header //Status Line //General Hdr

Server: Apache/1.3.20 (Unix) mod_ssl/2.8.4 /Response Hdr Last-Modified: Mon, 11 Sep 2000 13:56:29 GMT//Entity Headers ETag: 2a79d-c8b-39bce48d Accept-Ranges: bytes Content-length:3211 Content-Type: text/html X-pad: avoid browser bug <html> .. // non standard field //page id, used in caching //express range in bytes

Conditional GET (1) GET /fruit/kiwi.gif HTTP/.0 User-agent: Mozilla/4.0 HTTP/1.0 200 OK Date: Wed, 1 Aug 199815:39:29 Server: Apache/1.3.0 (Unix) Last-Modified: Mon, 22 June 1998 09:23:24 Content-Type: image/gif (data)

24

1/31/2008

Conditional GET (2)


One week later GET /fruit/kiwi.gif HTTP/1.0 User-agent: Mozilla/4.0 If-Modified-since: Mon, 22 June 1998 09:23;24 HTTP/1.0 304 Not Modified Date: Wed, 19 Aug 1998 15:39:29 Server: Apache/1.3.0 (Unix) (empty entity body)

HTTP1.1 Methods

25

1/31/2008

Response Status Codes

Request Headers
User-Agent Accept Host Authorization Cookie # Date Upgrade suggest switch to another version Info about the browser (OS) Type of pages client can handle The server DNS name client credentials (e.g. passwd) Cookie that was received before

Accept-charset; Accept-Encoding; Accept-Lang

26

1/31/2008

Response Headers
Server Info about the Server Content-Encoding; Content-Length; Content-Language; Content-Type (MIME type) Last-Modified Location commanding client to go elsewhere Accept-Ranges The server will accept byte range requests Set-Cookie # Please save attached cookie with number # Date Upgrade

Entity Body
entity body is an arbitrary sequence of octets HTTP can transfer any type of data including: text, binary data, audio, images, video data is content of resource identified by URL interpretation data determined by header fields: Content-Type - defines data interpretation Content-Encoding - applied to data Transfer-Encoding - used to form entity body

27

1/31/2008

More Header Fields


Forwarded: Gateways and proxies add this header with their URL Connection: close, keep-alive,.. special instructions Keep-Alive: If was set in Connection, it indicates max time the sender will keep connection open waiting for next request, or max number of additional requests that will be allowed on the current persistent connection Pragma Implementation specific info relevant to any recipient along the way

HTTP Messages BNF Format


HTTP-Message = Simple-Request | Simple-Response | Full-Request | Full-Response Full-Request = Request-Line *( General-Header | Request-Header | Entity-Header ) CRLF [ Entity-Body ] Full-Response = CRLF [ Entity-Body ] Simple-Request = "GET" SP Request-URL CRLF Simple-Response = [ Entity-Body ] Status-Line *( General-Header | Response-Header | Entity-Header )

28

1/31/2008

Content Delivery Networks

Content Delivery Networks (1)


A Content Provider has a main page with links to many content items (pictures, music, video, newspapers) A CDN company (e.g Akamai) contract Content Provider to deliver the content on their CDN contentservers The CDN also contract many O(10K) ISPs to put CDN content-servers with the content on the ISP nets The CDN redirects the links in the main page of the CP to CDN main Server (changing the href)

29

1/31/2008

Example: The Furry Video Content Provider


Original Web Page Of Content Provider

Web Page Of Content Provider With redirections

Example (Contd) User types www.furryvideo.com, gets to main page of the Content Provider FurryVideo User clicks on content item Client sends Request to the cdn-server.com cdn-Server identifies (from file name) which object is required, and from IP address of user, which CDN servers is the closest to the Client cdn-server sends response to client with status code 301 and Location header, giving the files URL on a content server close to the client Client connects to the CDN content-server

30