You are on page 1of 14

WEB

CRAWLER
SUBMITTED BY :

PIYUSH KUMAR (1751118)


SHASHI BHUSHAN (1751120)
ASHISH KUMAR (1751130)
Contents
1 DEFINITION:-WEB CRAWLER 7 OPERATING
ENVIRONMENT

2 USE CASES OF WEB 8 FUTURE SCOPE OF THE


CRAWLER SYSTEM

3 NEED OF THE WEB 9 REFERENCES


CRAWLER
4 WORKING OF WEB
CRAWLER
5 HOW DO SEARCH
ENGINE WORKS

6 FAESIBILITY STUDY
QUESTION ARISES

What is a
Web
Crawler ?
Definition:
A web crawler (also known as a web
spider or web robot) is a program or
automated script which browses the
World Wide Web in a methodical,
automated manner. This process is
called Web crawling or spidering.
USE CASES OF WEB CRAWLER

1.SEARCH ENGINES 2.COPY RIGHT VIOLATION

3.KEYBOARD BASED 4.WEB ANALYTICS


FINDINGS

& many more...


NEEDS OF WEB CRAWLER
YOUR LOGO
• To maintain mirror sites for popular websites.

• To test web pages and links for valid syntax and


structure.

• To monitor sites to see when their structure or


content change.

• To search for copyright infringements.


WORKING OF CRAWLER’S :

STEP 1. Discovering URL’s.

STEP 2. Exploring Lists of Seed’s.

STEP 3. Adding the INDEX.

STEP 4. Updating the INDEX.

STEP 5. Crawling Frequency.


How do search engine works ?

• The job of the crawlers is to discover new content. they do


this by following links.

• Crawling is a massive process and the search engines crawls


billions of pages every day finding new content and
recrawling old content to check if it's changed.

• Search engines crawlers aren't smart,they are simple bits of


software programmed to single mindedly collect data and
send it back to search engine data centres
FEASIBILITY STUDY
Feasibility study is defined as evaluation or analysis of potential
impact of a proposed project or program. There are 3 aspects of
the feasibility study :

1. TECHNICAL FEASIBILITY:

2. FINANCIAL FEASIBILITY:

3. OPERATIONAL FEASIBILITY:
OPERATING ENVIRONMENT

Software requirement at the time of Developement:


❖ FrontEnd - AWT, Swing
❖ BackEnd - java
❖ Technology - JSP-Servelet.java
❖ Software - JDK(1.5 or above)

Hardware reqirement :
❖ Hard Disk - at least 20GB HDD
❖ ram - 1 GB RAM

PLATFORM - JAVA
FUTURE SCOPE OF THE SYSTEM

This application can be easily imlemented under various


suations:

⮚ We can add new features when required. Reusability is


possible as and when require in this application. There is
flexibility in all modules.

⮚ After making modifications to it , it can become a more


powerful search engine.
References

GOOGLE
Thanks
for
listening

You might also like