Presentation

y 1989-1990
Tim Berners-Lee invents the World Wide Web at CERN

y It was designed for transferring text and graphics
WWW
simultaneously y Client/Server data transfer protocol

y y
Communication via application level protocol System ran on top of standard networking infrastructure Simple and easy to use Requires a client application to render text/graphics
y Text mark up language

y y
WWW Components
y Structural Components
y Clients/browsers
to dominant implementations y Servers run on sophisticated hardware y Caches many interesting implementations y Internet the global infrastructure which facilitates data transfer
y Semantic Components
y Hyper Text Transfer Protocol (HTTP) y Hyper Text Markup Language (HTML)
y
eXtensible Markup Language (XML)
y Uniform Resource Identifiers (URIs)
WWW Structure
y Clients use browser application to send URIs via HTTP to servers requesting a Web page y Web pages constructed using HTML (or other markup language) and consist of text, graphics, sounds plus embedded files y Servers (or caches) respond with requested Web page
y Or with error message
y Client s browser renders Web page returned by server

y Page is written using Hyper Text Markup Language (HTML) y Displaying text, graphics and sound in browser y Writing data as well
y The entire system runs over standard networking protocols (TCP/IP, DNS, )
WWW Structure (cont..)
Architecture of WWW
Uniform Resource Identifiers

Uniform Resource Identifier (URI) is a string of characters which is used to identify a name or a resource on the internet. y URI is composed of two different parts:1) URL- Uniform resource locator 2) URN- Uniform resource name.
y
We can say that URL is just like the street name and the URN is like the name of the name of a person. URL - It tells us about the resource with its location on the web. URN- It only talks about the resource without telling anything about its location
HTTP was invented by Ted Nelson. Basics y Protocol

It is a protocol used for distributed, collaborated systems. It is used in the client server architecture. The www is based on the foundation of HTTP. It uses the port number 80 for the communication purposes. y HTTP can work with both Connection Oriented as well as Connectionless protocols.
y y y y
retrieve document specified by URL y PUT store specified document under given URL y HEAD retrieve info. about document specified by URL y OPTIONS retrieve information about available options y POST give information to the server y DELETE remove document specified by URL y TRACE loopback request message y CONNECT for use by caches
y GET
HTTP Request Messages
HTTP Request Format

request-line ( request request-URI HTTP-version) headers (0 or more) <blank line> body (only for POST request)
y First type of HTTP message: requests y Client browsers construct and send message y Typical HTTP request: y GET http://www.google.com/index.html HTTP/1.0
HTTP Response Format

status-line (HTTP-version response-code response-phrase) headers (0 or more) <blank line> body
y Second type of HTTP message: response y Web servers construct and send response messages y Typical HTTP response: y HTTP/1.0 301 Moved Permanently Location: http://www.google.co.in/cs/index.html
HTTP Response Codes

y 1xx y 2xx y 3xx y 4xx y 5xx
Informational request received, processing Success action received, understood, accepted Redirection further action necessary Client Error bad syntax or cannot be fulfilled Server Error server failed
URL Examples
y www.howstuff works.com y www.en.wikipedia.org/wiki/.edu y www.india.org y www.dtic.mil y www.india.gov.in y www.lpu.in
How DNS Works??
Important Concepts for DNS

y IP forwarding tables y IP addressing
global addressing, alternatives, lookup
hierarchical, CIDR, y IP service best effort, simplicity of routers y IP packets header fields, fragmentation, ICMP y IP routers
y Architecture y Common case processing y Complex/expensive lookup algorithms
Naming
y How do we efficiently locate resources? y DNS: name IP address y Challenge y How do we scale these to the wide area?
Obvious Solutions
Why not centralize DNS? y Single point of failure y Traffic volume y Distant centralized database y Single point of update
y Doesn t scale!
Obvious Solutions (2)

Why not use /etc/hosts? y Count of hosts are increasing: machine per domain machine per user
y Many more downloads y Many more updates
Domain Name System Goals

y Basically a wide area distributed database y Scalability y Decentralized maintenance y Robustness y Global scope y Names mean the same thing everywhere y Don t need y Atomicity y Strong consistency
DNS Design: Hierarchy Structure
DNS Design: Zone & Domain

Zone = contiguous section of name space Domain- responsibility is divided to sub parts
root org net gwu ucb edu com uk cmu cs cmcl ece bu mit
Subtree Single node Complete Tree
ca
Servers/Resolvers
y Each host has a resolver y Typically a library that applications can link to y Local name servers hand-configured (e.g. /etc/resolv.conf) y Name servers y Responsible for some zone y Local servers
y y
Do lookup of distant host names for local hosts Typically answer queries about local zone
Typical Resolution
root & edu DNS server
www.en.wiki.edu
Client
Local DNS server
ns1.wiki.edu DNS server ns1.en.wiki.edu DNS server
Lookup Methods
Recursive query:
y Server goes out and searches
Recursive
for more info (recursive) y Only returns final answer or not found
Iterative query:
y Server responds with as
much as it knows (iterative) y I don t know this name, but ask this server Workload impact on choice? y Local server typically does recursive y Root/distant server does iterative
Workload and Caching

y Are all servers/names likely to be equally popular?
y Why might this be a problem? How can we solve this problem?
y DNS responses are cached

y Quick response for repeated translations y Other queries may reuse some parts of lookup
y
NS records for domains
y DNS negative queries are cached

y Don t have to repeat past mistakes y E.g. misspellings, search strings in resolv.conf
y Cached data periodically times out

y Lifetime (TTL) of data controlled by owner of data y TTL passed with every record
Reliability
y DNS servers are replicated
y Name service available if
one replica is up y Queries can be load balanced between replicas
y UDP used for queries

y Need reliability
must implement this on top of UDP! y Why not just use TCP?
Reverse DNS
Crawling
y Web crawling is the process of locating, fetching, and storing y
the pages on the Web The computer programs that perform this task are referred to as web crawlers or spiders A typical Web crawler starts from a set of seed pages, locates new pages by parsing the downloaded seed pages, extracts the hyperlinks within, stores the extracted links in a fetch queue for retrieval, continues downloading until the fetch queue gets empty or a satisfactory number of pages are downloaded
Crawling (Cont..)
Benefits of WWW

Presentation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Presentation

Uploaded by

Copyright:

Available Formats

y 1989-1990

Tim Berners-Lee invents the World Wide Web at CERN

simultaneously y Client/Server data transfer protocol

y Text mark up language

eXtensible Markup Language (XML)

y Uniform Resource Identifiers (URIs)

y Client s browser renders Web page returned by server

WWW Structure (cont..)

Uniform Resource Identifiers

HTTP was invented by Ted Nelson. Basics y Protocol

HTTP Request Messages

HTTP Request Format

HTTP Response Format

HTTP Response Codes

How DNS Works??

Important Concepts for DNS

global addressing, alternatives, lookup

Obvious Solutions (2)

Domain Name System Goals

DNS Design: Hierarchy Structure

DNS Design: Zone & Domain

Local DNS server

ns1.wiki.edu DNS server ns1.en.wiki.edu DNS server

Workload and Caching

y DNS responses are cached

NS records for domains

y DNS negative queries are cached

y Cached data periodically times out

one replica is up y Queries can be load balanced between replicas

y UDP used for queries

You might also like