You are on page 1of 30

y 1989-1990

Tim Berners-Lee invents the World Wide Web at CERN


y It was designed for transferring text and graphics

WWW

simultaneously y Client/Server data transfer protocol


y y

Communication via application level protocol System ran on top of standard networking infrastructure Simple and easy to use Requires a client application to render text/graphics

y Text mark up language


y y

WWW Components
y Structural Components
y Clients/browsers

to dominant implementations y Servers run on sophisticated hardware y Caches many interesting implementations y Internet the global infrastructure which facilitates data transfer

y Semantic Components
y Hyper Text Transfer Protocol (HTTP) y Hyper Text Markup Language (HTML)
y

eXtensible Markup Language (XML)

y Uniform Resource Identifiers (URIs)

WWW Structure
y Clients use browser application to send URIs via HTTP to servers requesting a Web page y Web pages constructed using HTML (or other markup language) and consist of text, graphics, sounds plus embedded files y Servers (or caches) respond with requested Web page
y Or with error message

y Client s browser renders Web page returned by server


y Page is written using Hyper Text Markup Language (HTML) y Displaying text, graphics and sound in browser y Writing data as well

y The entire system runs over standard networking protocols (TCP/IP, DNS, )

WWW Structure (cont..)

Architecture of WWW

Uniform Resource Identifiers


Uniform Resource Identifier (URI) is a string of characters which is used to identify a name or a resource on the internet. y URI is composed of two different parts:1) URL- Uniform resource locator 2) URN- Uniform resource name.
y

We can say that URL is just like the street name and the URN is like the name of the name of a person. URL - It tells us about the resource with its location on the web. URN- It only talks about the resource without telling anything about its location

HTTP was invented by Ted Nelson. Basics y Protocol


It is a protocol used for distributed, collaborated systems. It is used in the client server architecture. The www is based on the foundation of HTTP. It uses the port number 80 for the communication purposes. y HTTP can work with both Connection Oriented as well as Connectionless protocols.
y y y y

retrieve document specified by URL y PUT store specified document under given URL y HEAD retrieve info. about document specified by URL y OPTIONS retrieve information about available options y POST give information to the server y DELETE remove document specified by URL y TRACE loopback request message y CONNECT for use by caches
y GET

HTTP Request Messages

HTTP Request Format


request-line ( request request-URI HTTP-version) headers (0 or more) <blank line> body (only for POST request)

y First type of HTTP message: requests y Client browsers construct and send message y Typical HTTP request: y GET http://www.google.com/index.html HTTP/1.0

HTTP Response Format


status-line (HTTP-version response-code response-phrase) headers (0 or more) <blank line> body

y Second type of HTTP message: response y Web servers construct and send response messages y Typical HTTP response: y HTTP/1.0 301 Moved Permanently Location: http://www.google.co.in/cs/index.html

HTTP Response Codes


y 1xx y 2xx y 3xx y 4xx y 5xx

Informational request received, processing Success action received, understood, accepted Redirection further action necessary Client Error bad syntax or cannot be fulfilled Server Error server failed

URL Examples
y www.howstuff works.com y www.en.wikipedia.org/wiki/.edu y www.india.org y www.dtic.mil y www.india.gov.in y www.lpu.in

How DNS Works??

Important Concepts for DNS


y IP forwarding tables y IP addressing

global addressing, alternatives, lookup

hierarchical, CIDR, y IP service best effort, simplicity of routers y IP packets header fields, fragmentation, ICMP y IP routers
y Architecture y Common case processing y Complex/expensive lookup algorithms

Naming
y How do we efficiently locate resources? y DNS: name IP address y Challenge y How do we scale these to the wide area?

Obvious Solutions
Why not centralize DNS? y Single point of failure y Traffic volume y Distant centralized database y Single point of update
y Doesn t scale!

Obvious Solutions (2)


Why not use /etc/hosts? y Count of hosts are increasing: machine per domain machine per user
y Many more downloads y Many more updates

Domain Name System Goals


y Basically a wide area distributed database y Scalability y Decentralized maintenance y Robustness y Global scope y Names mean the same thing everywhere y Don t need y Atomicity y Strong consistency

DNS Design: Hierarchy Structure

DNS Design: Zone & Domain


Zone = contiguous section of name space Domain- responsibility is divided to sub parts

root org net gwu ucb edu com uk cmu cs cmcl ece bu mit
Subtree Single node Complete Tree

ca

Servers/Resolvers
y Each host has a resolver y Typically a library that applications can link to y Local name servers hand-configured (e.g. /etc/resolv.conf) y Name servers y Responsible for some zone y Local servers
y y

Do lookup of distant host names for local hosts Typically answer queries about local zone

Typical Resolution
root & edu DNS server

www.en.wiki.edu

Client

Local DNS server

ns1.wiki.edu DNS server ns1.en.wiki.edu DNS server

Lookup Methods
Recursive query:
y Server goes out and searches
Recursive

for more info (recursive) y Only returns final answer or not found

Iterative query:
y Server responds with as

much as it knows (iterative) y I don t know this name, but ask this server Workload impact on choice? y Local server typically does recursive y Root/distant server does iterative

Workload and Caching


y Are all servers/names likely to be equally popular?
y Why might this be a problem? How can we solve this problem?

y DNS responses are cached


y Quick response for repeated translations y Other queries may reuse some parts of lookup
y

NS records for domains

y DNS negative queries are cached


y Don t have to repeat past mistakes y E.g. misspellings, search strings in resolv.conf

y Cached data periodically times out


y Lifetime (TTL) of data controlled by owner of data y TTL passed with every record

Reliability
y DNS servers are replicated
y Name service available if

one replica is up y Queries can be load balanced between replicas

y UDP used for queries


y Need reliability

must implement this on top of UDP! y Why not just use TCP?

Reverse DNS

Crawling
y Web crawling is the process of locating, fetching, and storing y

the pages on the Web The computer programs that perform this task are referred to as web crawlers or spiders A typical Web crawler starts from a set of seed pages, locates new pages by parsing the downloaded seed pages, extracts the hyperlinks within, stores the extracted links in a fetch queue for retrieval, continues downloading until the fetch queue gets empty or a satisfactory number of pages are downloaded

Crawling (Cont..)

Benefits of WWW

You might also like