Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
27Activity
0 of .
Results for:
No results containing your search query
P. 1
Ieee Format

Ieee Format

Ratings:

4.67

(3)
|Views: 20,161|Likes:
Published by api-3747880
Its the paper on Google's Analogy
Its the paper on Google's Analogy

More info:

Published by: api-3747880 on Oct 15, 2008
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

03/18/2014

pdf

text

original

 
Google’s Analogy
Altamash R. Jiwani, Government College of Engineering Amravati.
Abstract:
The amount of information on the web isgrowing rapidly and with this the webcreates new challenges for informationretrieval as well as the number of newinexperienced users in the art of webresearch.A search engine is an information retrievalsystem designed to help find informationstored on the World Wide Web. Thissearch engine allows one to ask forinformation on the basis of specific criteriamentioned and retrieves a list of items thatmatch those criteria. This list is sorted withrespect to some measure of relevance of the results.In this paper, I present
Google
search engine, which has become aprototype of a large-scale search enginewhich is in heavy use today and would bea far more than that in the near futurewhich could be estimated from the fact that
Google
has an indexed database of at least24 million pages.This paper mainly covers “How
Google
works?” which includes
Google
hardware architecture , servers, what
Google
indexes , features and limitations,
Google
ranking principles and tips,Googleplex and a lot more….Everybody is running for thisamazing thing which has changed the wayof how we surf the net. Technically theyare devising new ways to get a high
Google
Pagerank.This paper would provide a lot of tricksand hacks for increasing your page rank inthe worlds most popular search engine.
WhyGoogleis considered in this paper??
Because
Google
is the most popular large-scale search engine which addresses manyof the problems of existing systems. Itmakes heavy use of the additional structurepresent in hypertext to provide muchhigher quality search results. Some of thefeatures include Fast crawling technologyto gather the web documents and keepthem up to date, efficient use of Storagespace to store indices, minimal responsetime Queries system. In short “The bestnavigation service” which instead of making things easier for the computer,make things easier for the user and makethe computer work harder.As a
Google
user, were familiar with thespeed and accuracy of a
Google
search.How exactly does
Google
manage to findthe right results for every query as quicklyas it does? All questions like this would beanswered in this paper.There’s something deeper to learn about
Google
like a mystery waited to be solved.An example of this could be that
Google
isa company that has built a single very
 
large, custom computer comprising of 100,000 servers... It’s running their owncluster operating system. They make theirbig computer even bigger and faster eachmonth, while lowering the cost of CPUcycles, making an efficient system with aunique combination of advanced hardwareand software.
Google
has taken the last 10years of systems software research out of university labs, and built their ownproprietary, production quality system.What will they do next with the world’sbiggest computer and most advancedoperating system? Still remains a mystery.
TYPES OF SEARCHENGINES………
There are basically three types of searchengines:1)Those that are powered by robots(called crawlers; ants or spiders)2) Those that are powered by humansubmissions.3) Those that are a hybrid of the two.Crawler-based search engines are thosethat use automated software agents (calledcrawlers) that visit a Web site, read theinformation on the actual site, read thesites meta tags and also follow the linksthat the site connects to performingindexing on all linked Web sites as well.The crawler returns all that informationback to a central depository, where thedata is indexed. The crawler willperiodically return to the sites to check forany information that has changed.Human-powered search engines rely onhumans to submit information that issubsequently indexed and catalogued.Only information that is submitted is putinto the index.In both cases, when you query asearch engine to locate information, youare actually searching through the indexthat the search engine has created —youare not actually searching the Web. Theseindices are giant databases of informationthat is collected and stored andsubsequently searched. This explains whysometimes a search on a commercialsearch engine, such as Yahoo! or Google,will return results that are, in fact, deadlinks
Why will the same search on differentsearch engines produce differentresults?
Part of the answer to that question isbecause not all indices are going to beexactly the same. It depends on what thespiders find or what the humans submitted.But more important, not every searchengine uses the same algorithm to searchthrough the indices.
Google developers:
Larry PageCo-Founder & President, GoogleProducts
 
Sergey BrinCo-Founder & President, GoogleTechnology
Google’s Hardware:
To provide sufficient service capacity,Google’s physical structure consists of clusters of computers situated around theworld known as server farms. These serverfarms consist of a large number of commodity level computers running Linuxbased systems that operate with GFS, orthe Google file system.It has been speculated that Google has theworld’s largest computer. The estimatestates Google as having up to:
Ø
899 racks
Ø
79,112 machines
Ø
158,224 CPUs
Ø
316,448 Ghz of processing power
Ø
158,224 Gb of RAM
Ø
6,180 Tb of Hard Drive space
How Google Handles SearchQueries??????
When a user enters a queryinto the searchbox at Google.com, it is randomly sent toone of many Google clusters. The querywill then be handled solely by that cluster.A load balancer that is monitoring thecluster then spreads the request out overthe servers in the cluster to make sure theload on the hardware is even.Then the following process is done.
ü
Determine the documents pointed toby the keywords
ü
Sort these documents using eachone’s Page Rank 
ü
Provide links to these documents onthe Web
ü
Provide a link to view the cachedversion of the document in the docserver farm
ü
Pull an excerpt from the page, usingthe cached version of the page, togive a quick idea of what it is about
ü
Return an initial result set of document excerpts and links, withlinks to retrieve further result sets of matches, rendered as HTML.
ü
By default, Google returns result insets of ten matches (as an HTMLpage).
ü
You can change the number of results you want to see on theGoogle Preferences page.Google prides itself on the fact that mostqueries are answered in less than half asecond. Considering the number of stepsinvolved in answering a query, you can seethat this is quite a technological feat.

Activity (27)

You've already reviewed this. Edit your review.
1 hundred reads
1 thousand reads
Babu Dipak liked this
Mst Krrsh liked this
Guru Raghavendra liked this
Sharmila Gavali liked this
Gv Ravi liked this
Barkha Tiwari liked this

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->