Website Fetcher

Description of the project: The Website Fetcher is a multithreaded windows application that downloads and stores Web pages Uniform Resource Identifier (URI’s), for a Web search engine. Roughl , a crawler starts off b placing an initial set of UR!s, so, in a "ueue, where all UR!s to be retrie#ed are $ept and prioriti%ed. From this "ueue, the crawler gets a UR! (in some order), downloads the page, e&tracts an UR!s in the downloaded page, and puts the new UR!s in the "ueue. This process is repeated until the crawler decides to stop. 'ollected pages are later used for other applications, such as a Web search engine or a Web cache. (s the si%e of the Web grows, it becomes more difficult to retrie#e the whole or a significant portion of the Web using a single process. Therefore, man search engines often run multiple processes in parallel to perform the abo#e tas$, so that download rate is ma&imi%ed. We refer to this t pe of fetcher as a parallel crawler. This t pe of applications is often used in search engines where there is a need of collecting all the UR!’s based on a "uer and inde&ing them on priorit . This application is a .)et based fetcher #er similar to *ooglebot, *oogle’s crawler. This application has got its use as a bac$end processing component for a search engine. The results (URI data) gathered b the website fetcher will be gi#en to an inde&er which inde&es page data so that the search "uer gi#es the results faster.

Modules: Crawler Views
• • • • • Threads #iew. Re"uests #iew. +I+, t pes. -utput 'onnections. (d#anced settings.


Multithreaded Downloader

Software requirements: o o o o o o o o o o +icrosoft .. +icrosoft . +icrosoft '0../ +icrosoft '0.)et language +icrosoft Windows .)et framewor$ . 7T+!.)et./// and abo#e. +icrosoft 1:! 1er#er ./// 123 or higher +icrosoft 4isual 1tudio .//5 I6. Software requirements: .)et language./. +icrosoft (12. (8(9 Tool $it.)et framewor$ . 9+!.

Sign up to vote on this title
UsefulNot useful