Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
17Activity
0 of .
Results for:
No results containing your search query
P. 1
Search Engine

Search Engine

Ratings: (0)|Views: 419 |Likes:
Published by id.arun5260
seminar report on search engine
seminar report on search engine

More info:

Published by: id.arun5260 on Feb 28, 2009
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as DOC, PDF, TXT or read online from Scribd
See more
See less

02/07/2014

pdf

text

original

 
 
INTRODUCTION
Search engine is a software program that takes as input a search phrase, matchit with entries made earlier and returns a set of links pointing to their locations.Actually consumers would really prefer a finding engine, rather than a searchengine. Search engines match queries against an index that they create. Theindex consists of the words in each document, plus pointers to their locationswithin the documents. This is called an inverted file.A search engine or IR system comprises four essential modules:A document processor A query processor A search and matching functionA ranking capabilityWhile users focus on "search," the search and matching function is only one of the four modules. Each of these four modules may cause the expected or unexpected results that consumers get when they use a search engine.
 
CHAPTER 2SEARCH ENGINE MODULES
 
 
 SEARCH ENGINE MODULES
1.Document Processor 
The document processor prepares, processes, and inputs the documents, pages,or sites that users search against. The document processor performs some or allof the following steps:1.Process the input.2.Identifies potential indexable elements in documents.3.Deletes stop words.4.Stems terms.5.Extracts index entries.6.Computes weights.7.Creates and updates the main inverted file against which the search enginesearches in order to match queries to documents.Processing the inputIt simply standardizes the multiple formats encountered when derivingdocuments from various providers or handling various Web sites. This stepserves to merge all the data into a single, consistent data structure that all thedownstream processes can handle. The need for a well-formed, consistentformat is of relative importance in direct proportion to the sophistication of later steps of document processing.Identifying potential indexable elementsIt dramatically affects the nature and quality of the document representation thatthe engine will search against. In designing the system, we must define the word"term." Is it the alphanumeric characters between blank spaces or punctuation? If so, what about non-compositional phrases (phrases in which the separate wordsdo not convey the meaning of the phrase, like "skunk works" or "hot dog"), multi-word proper names, or inter-word symbols such as hyphens or apostrophes thatcan denote the difference between "small business men" versus small-businessmen.” Each search engine depends on a set of rules that its document processor must execute to determine what action is to be taken by the "tokenizer," i.e. thesoftware used to define a term suitable for indexing.

Activity (17)

You've already reviewed this. Edit your review.
MySearchResults liked this
1 thousand reads
1 hundred reads
Archit Dogra liked this
Mehedi Hasann liked this
Mehedi Hasann liked this
hannory liked this
Adesh BlasterBoy liked this
faridsk liked this
sabitharraj liked this

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->