You are on page 1of 37

WEB SEARCH IRS

Prof.Pravin V.Shinde
Web Crawler
Use of Web Crawler
Continue..
Crawling Policies
Selection Policy
Re-Visit Policy
Re-Visit Policy
Politeness Policy
Parallelization Policy
Components
Contd..
Contd..
Web Crawler Architecture
Crawling Infrastructure
Contd..
Graph Search Problem
Contd..
Frontier
History and Page Repository
Contd..
Fetching
Fetching Contd..
Parsing
URL Extraction and
Canonicalization
Canonicalization Procedure
Stoplisting
Stemming
HTML Tag Tree
Example
Contd..
URL Normalization
Crawler Identification
Multithreaded Crawler
Contd..
Thank You

You might also like