Professional Documents
Culture Documents
Volume 2, Issue 9, September - 2015. ISSN 2348 4853, Impact Factor 1.317
INTRODUCTION
A web search engine is a software that is designed to search for information on the World Wide Web
[1].The search results are generally presented in a line of results often referred to as search engine
results pages( SERPs). The information may be a mix of web pages, images, and other types of files.
A search engine operates in the following order[4]:
1. Web crawling.
2. Indexing.
3. Searching.
Web crawling can be considered as processing items in a queue where the crawler visits a web page, it
mines links to other web pages. So the crawler puts these URLs at the back end of a queue, and continues
crawling to a URL that it removes from the front end of the queue. [5]
Web search engines work by storing information about many web pages, which they retrieve from the
HTML markup of the pages. These pages are retrieved by a Web Crawler (sometimes also known as a
spider) an automated Web crawler which follows every link on the site.[6]
When a user enters a query into a search engine (typically by using keywords), the engine examines its
www.ijafrc.org
IV. ALGORITHM
1. Add sources - We connect our application server to heterogeneous database sources based on
their IPs. We maintain metadata which contains details of these sources like authentication
information, typee of the various data sources etc, and thus helps in establishing reliable
connections with these database servers . These servers form our registered servers .This
.
is our
arena from where we have to search the required keyword.
www.ijafrc.org
StepsStart
1-a)
1-b)
1-c)
End
2. Add and identify a local repository which contains various directories and files which form a part
of search domain of this application.
3. Generate Queries - As per the search keyword provided by the user, multiple heterogeneous
queries are generated for all the registered sources. These queries are generated locally on the
application server and mapped to all the connected servers (local or remote). Also, the files of the
repository(of step 2) are traversed in a breadth first manner and each file is searched for the
keyword.
StepsStart
3-a)
3-b)
3-c)
3-d)
3-e)
3-f)
end
4. Execute Queries - The queries that are generated in step 3 are executed by the local query
processors of all the connected data sources .The results from all servers are then sent back to the
application server.
StepsStart
4-a)
4-b)
4-c)
The corresponding generated queries are sent to the local query processors of all
the registered servers.
Execute the queries (at local query processors).
Send the result back to the application server.
end
5. Save result - The results from all the sources are maintained in an solution set called knowledge
base. We maintain tables which contain records of all the successfully searched keywords
www.ijafrc.org
including their locations. We arrange the results on the basis of their frequency of being searched
and viewed. In this process, queries are generated once for each keyword because the next time
of their being searched, knowledge base is used to provide the result.
StepsStart
5-a)
5-b)
5-c)
If the search from any source is successful, save all the result sets in the knowledge
base.
Also, save the successfully saved keyword.
Present the results to the user.
end
6. Update knowledge Base - If a user is not satisfied with the result set from the knowledge base,
then the process of query generation is started again for that keyword and the knowledge base is
updated with the new result set.
StepsStart
6-a)
6-b)
6-c)
end
V. CONCLUSION
The wiggler crawls through the various servers and an identified repository and returns a set of results in
the knowledge base. Wiggler is an attempt to retrieve user desired information without physical
integration, algorithm has been proposed for searching requisite information and saving the same in
knowledge base. The solution has been tested in university, however researcher/engineers are
encouraged to test the solution on large data sets and accordingly algorithm(if required) can be enhance
for precision and performance. This proposition can be extended to crawl through web portals in
addition to the database servers and the repository.
VI. REFERENCES
[1]
https://en.wikipedia.org/wiki/Web_search_engine.
[2]
http://www.techclinch.com/search-engine/
[3]
www.ijafrc.org
[4]
The Anatomy of a LargeScale Hypertextual Web Search Engine - Sergey Brin and Lawrence Page Computer Science Department, Stanford University, Stanford, CA 94305
[5]
[6]
Web
Search
Engines:
Part
2
David
Hawking
http://web.mst.edu/~ercal/253/Papers/WebSearchEngines-1.pdf
the
Web,
August
2001,
http://oak.cs.ucla.edu/
CSIRO
ICT
Centre
www.ijafrc.org