Professional Documents
Culture Documents
Presentation
Presentation
WWW
Communication via application level protocol System ran on top of standard networking infrastructure Simple and easy to use Requires a client application to render text/graphics
WWW Components
y Structural Components
y Clients/browsers
to dominant implementations y Servers run on sophisticated hardware y Caches many interesting implementations y Internet the global infrastructure which facilitates data transfer
y Semantic Components
y Hyper Text Transfer Protocol (HTTP) y Hyper Text Markup Language (HTML)
y
WWW Structure
y Clients use browser application to send URIs via HTTP to servers requesting a Web page y Web pages constructed using HTML (or other markup language) and consist of text, graphics, sounds plus embedded files y Servers (or caches) respond with requested Web page
y Or with error message
y The entire system runs over standard networking protocols (TCP/IP, DNS, )
Architecture of WWW
We can say that URL is just like the street name and the URN is like the name of the name of a person. URL - It tells us about the resource with its location on the web. URN- It only talks about the resource without telling anything about its location
retrieve document specified by URL y PUT store specified document under given URL y HEAD retrieve info. about document specified by URL y OPTIONS retrieve information about available options y POST give information to the server y DELETE remove document specified by URL y TRACE loopback request message y CONNECT for use by caches
y GET
y First type of HTTP message: requests y Client browsers construct and send message y Typical HTTP request: y GET http://www.google.com/index.html HTTP/1.0
y Second type of HTTP message: response y Web servers construct and send response messages y Typical HTTP response: y HTTP/1.0 301 Moved Permanently Location: http://www.google.co.in/cs/index.html
Informational request received, processing Success action received, understood, accepted Redirection further action necessary Client Error bad syntax or cannot be fulfilled Server Error server failed
URL Examples
y www.howstuff works.com y www.en.wikipedia.org/wiki/.edu y www.india.org y www.dtic.mil y www.india.gov.in y www.lpu.in
hierarchical, CIDR, y IP service best effort, simplicity of routers y IP packets header fields, fragmentation, ICMP y IP routers
y Architecture y Common case processing y Complex/expensive lookup algorithms
Naming
y How do we efficiently locate resources? y DNS: name IP address y Challenge y How do we scale these to the wide area?
Obvious Solutions
Why not centralize DNS? y Single point of failure y Traffic volume y Distant centralized database y Single point of update
y Doesn t scale!
root org net gwu ucb edu com uk cmu cs cmcl ece bu mit
Subtree Single node Complete Tree
ca
Servers/Resolvers
y Each host has a resolver y Typically a library that applications can link to y Local name servers hand-configured (e.g. /etc/resolv.conf) y Name servers y Responsible for some zone y Local servers
y y
Do lookup of distant host names for local hosts Typically answer queries about local zone
Typical Resolution
root & edu DNS server
www.en.wiki.edu
Client
Lookup Methods
Recursive query:
y Server goes out and searches
Recursive
for more info (recursive) y Only returns final answer or not found
Iterative query:
y Server responds with as
much as it knows (iterative) y I don t know this name, but ask this server Workload impact on choice? y Local server typically does recursive y Root/distant server does iterative
Reliability
y DNS servers are replicated
y Name service available if
must implement this on top of UDP! y Why not just use TCP?
Reverse DNS
Crawling
y Web crawling is the process of locating, fetching, and storing y
the pages on the Web The computer programs that perform this task are referred to as web crawlers or spiders A typical Web crawler starts from a set of seed pages, locates new pages by parsing the downloaded seed pages, extracts the hyperlinks within, stores the extracted links in a fetch queue for retrieval, continues downloading until the fetch queue gets empty or a satisfactory number of pages are downloaded
Crawling (Cont..)
Benefits of WWW