You are on page 1of 9

Focused Crawler

• Only search web pages about a specific topic(e.g., cricket)


thus reducing amount of network traffic and download.
• Objective – selectively seek out pages that are relevant to a
predefined set of topics and throw out all unrelated pages.
• It assumes that some labeled examples of relevant and non
relevant pages are available.
Algorithms of Focused Crawling
1. FISH SEARCH
2. SHARK SEARCH
3. INFOSPIDERS
4. N-BEST FIRST
5. INTELLIGENT CRAWLING
Fish Search Algorithm
• Uses principle of the fish school metaphor.
• Fetches document according to relevancy
• If relevant then score=1
else score=0 (i.e. irrelevant)
Shark Search Algorithm
• Improved version of fish search.
• Score between 0 and 1 using Vector space model.
• Child relevance depends on
• Inherited score
• Meta data
Infospider Algorithm
• Uses Neural network and Back propagation.
• It is multiagent system for mining of information.
• Crawls only current surroundings .
• Not provide stale information.
N-Best First
• Generalization of Best First.
• At each point N documents are picked for crawling instead
of one page.
• Using some algorithm it chooses best document to crawl.
Intelligent Crawling
• Give priorities to documents on basis of characteristics.
• Characteristics are page content, URL data or sibling pages.
• It has the potential of self learning.
Conclusion
• Many algorithms
Which to use?
ANS - depends on weaknesses and strengths of algorithm
• Like Fish search algorithm is slow and resource consuming while
shark search algorithm is more effective than Fish search.
• InfoSpiders algorithm is more scalable.
• N-Best first has better performance than InfoSpiders and Shark
search.
• Intelligent crawling is the highly effective algorithm that learns to
crawl without user training.
THANK YOU

You might also like