You are on page 1of 16

HOW SEARCH ENGINES WORK?

WHICH ONE IS THE BEST


SEARCH ENGINE

Submitted to Submitted
by
Mr. Rakesh Pandya Ameya
Ranadive
Submitted to Vivek
Bhadane
MIS faculty Tejas
Sonvane
PIET MBA Ishan
Valodkar
Dhara
shah
Darshita
Vasava

PARUL INSTITUTE OF ENGINEERING AND TECHNOLOGY (MBA)


BATCH-2009-11
Brief Introduction about the topic

This topic gives an overall idea about what are the search engines,
how do they really work and which is the best search engine in today’s
world.
In this discussion we will talk about how the search engines have
evolved through the different stages of history and have evolved as
the best one’s even though the competition in front of them. Also how
search engines work and the best among them.
Detail Description about the topic

1) What is a search engine?


Ans: A web search engine is a tool designed to search for
information on the World Wide Web. The search results are usually
presented in a list and are commonly called hits. The information
may consist of web pages, images, information and other types of
files. Some search engines also mine data available in databases or
open directories Unlike Web directories, which are maintained by
human editors, search engines operate algorithmically or are a
mixture of algorithmic and human input.

2) History of the search engines:


Ans:
Timeline (full list)
Year Engine Event
Aliweb Launch
1993
JumpStation Launch
WebCrawler Launch
1994 Infoseek Launch
Lycos Launch
AltaVista Launch
Open Text Web Index Launch
1995 Magellan Launch
Excite Launch
SAPO Launch
Dogpile Launch
Inktomi Founded
1996
HotBot Founded
Ask Jeeves Founded
Northern Light Launch
1997
Yandex Launch
1998 Google Launch
AlltheWeb Launch
GenieKnows Founded
1999 Naver Launch
Teoma Founded
Vivisimo Founded
Baidu Founded
2000
Exalead Founded
2003 Info.com Launch
Yahoo! Search Final launch
2004 A9.com Launch
Sogou Launch
MSN Search Final launch
Ask.com Launch
2005
GoodSearch Launch
SearchMe Founded
wikiseek Founded
Quaero Founded
Ask.com Launch
2006
Live Search Launch
ChaCha Beta Launch
Guruji.com Beta Launch
wikiseek Launched
Sproose Launched
2007
Wikia Search Launched
Blackle.com Launched
Powerset Launched
Picollator Launched
2008
Viewzi Launched
Cuil Launched
Boogami Launched
LeapFish Beta Launch
Forestle Launched
VADLO Launched
Sperse! Search Launched
Duck Duck Go Launched
Bing Launched
Feald Beta Launch
2009
Yebol Beta Launch
Mugurdy Launched

i. Before there were web search engines there was a complete list
of all webservers. The list was edited by Tim Berners-Lee and
hosted on the CERN webserver. One historical snapshot from
1992 remains. As more and more webservers went online the
central list could not keep up. On the NCSA site new servers
were announced under the title "What's New!" but no complete
listing existed any more.

ii. The very first tool used for searching on the (pre-web) Internet
was Archie. The name stands for "archive" without the "v." It
was created in 1990 by Alan Emtage, a student at McGill
University in Montreal. The program downloaded the directory
listings of all the files located on public anonymous FTP (File
Transfer Protocol) sites, creating a searchable database of file
names; however, Archie did not index the contents of these sites

iii. The rise of Gopher (created in 1991 by Mark McCahill at the


University of Minnesota) led to two new search programs,
Veronica and Jughead. Like Archie, they searched the file
names and titles stored in Gopher index systems. Veronica (Very
Easy Rodent-Oriented Net-wide Index to Computerized Archives)
provided a keyword search of most Gopher menu titles in the
entire Gopher listings. Jughead (Jonzy's Universal Gopher
Hierarchy Excavation And Display) was a tool for obtaining menu
information from specific Gopher servers. While the name of the
search engine "Archie" was not a reference to the Archie comic
book series, "Veronica" and "Jughead" are characters in the
series, thus referencing their predecessor.

iv. In June 1993, Matthew Gray, then at MIT, produced what was
probably the first web robot, the Perl-based World Wide Web
Wanderer, and used it to generate an index called 'Wandex'.
The purpose of the Wanderer was to measure the size of the
World Wide Web, which it did until late 1995. The web's first
search engine Aliweb appeared in November 1993. Aliweb did
not use a web robot, but instead depended on being notified by
website administrators of the existence at each site of an index
file in a particular format.

v. JumpStation (released in December 1993) used a web robot to


find web pages and to build its index, and used a web form as
the interface to its query program. It was thus the first WWW
resource-discovery tool to combine the three essential features
of a web search engine (crawling, indexing, and searching) as
described below. Because of the limited resources available on
the platform on which it ran, its indexing and hence searching
were limited to the titles and headings found in the web pages
the crawler encountered.

vi. One of the first "full text" crawler-based search engines was
WebCrawler, which came out in 1994. Unlike its predecessors,
it let users search for any word in any webpage, which has
become the standard for all major search engines since. It was
also the first one to be widely known by the public. Also in 1994
Lycos (which started at Carnegie Mellon University) was
launched, and became a major commercial endeavor

vii. Soon after, many search engines appeared and vied for
popularity. These included Magellan, Excite, Infoseek,
Inktomi, Northern Light, and AltaVista. Yahoo! was among
the most popular ways for people to find web pages of interest,
but its search function operated on its web directory, rather
than full-text copies of web pages. Information seekers could
also browse the directory instead of doing a keyword-based
search.
viii. In 1996, Netscape was looking to give a single search engine an
exclusive deal to be their featured search engine. There was so
much interest that instead a deal was struck with
Netscape by 5 of the major search engines, where for
$5Million per year each search engine would be in a
rotation on the Netscape search engine page. These five
engines were: Yahoo!, Magellan, Lycos, Infoseek and Excite.

ix. Search engines were also known as some of the brightest stars
in the Internet investing frenzy that occurred in the late 1990s.
Several companies entered the market spectacularly, receiving
record gains during their initial public offerings. Some have
taken down their public search engine, and are marketing
enterprise-only editions, such as Northern Light. Many search
engine companies were caught up in the dot-com bubble, a
speculation-driven market boom that peaked in 1999 and ended
in 2001

x. Around 2000, the Google search engine rose to prominence.


The company achieved better results for many searches with an
innovation called PageRank. This iterative algorithm ranks
web pages based on the number and PageRank of other web
sites and pages that link there, on the premise that good or
desirable pages are linked to more than others. Google also
maintained a minimalist interface to its search engine. In
contrast, many of its competitors embedded a search engine in a
web portal.

xi. By 2000, Yahoo was providing search services based on


Inktomi's search engine. Yahoo! acquired Inktomi in 2002, and
Overture (which owned AlltheWeb and AltaVista) in 2003.
Yahoo! switched to Google's search engine until 2004, when it
launched its own search engine based on the combined
technologies of its acquisitions.
xii. Microsoft first launched MSN Search in the fall of 1998 using
search results from Inktomi. In early 1999 the site began to
display listings from Looksmart blended with results from
Inktomi except for a short time in 1999 when results from
AltaVista were used instead. In 2004, Microsoft began a
transition to its own search technology, powered by its own web
crawler (called msnbot).
xiii. Microsoft's rebranded search engine, Bing, was launched on
June 1, 2009. On July 29, 2009, Yahoo! and Microsoft finalized
a deal in which Yahoo! Search would be powered by Microsoft
Bing technology.
xiv. According to Hitbox, Google's worldwide popularity peaked at
82.7% in December, 2008. July 2009 rankings showed Google
(78.4%) losing traffic to Baidu (8.87%), and Bing (3.17%). The
market share of Yahoo! Search (7.16%) and AOL (0.6%) were
also declining
xv. In the United States, Google held a 63.2% market share in May
2009, according to Nielsen NetRatings.In the People's Republic
of China, Baidu held a 61.6% market share for web search in
July 2009.
3) How do the search engines work?

Ans: A search engine operates, in the following order

A)Web crawling

B)Indexing

C)Searching

There are two types of search engines. The first is using robots
called crawlers or spiders.

A) Web crawling:

(i) Search Engines use spiders to index websites. The


spiders will index your entire site when you submit your
site to a search engine. A spider is an automated bot
that is used by a search engine. A spider reads a site
and looks at the content, the meta tags and any links
that are on the site. The spider then returns all that
information back to a central depository, where the data
is indexed. All links on your site will be visited and
indexed also. Some engines limit how many pages of a
site they index.
(ii) The spider will periodically return to the sites to check
for any information that has changed. The frequency of
these visits depends on the moderators of the search
engines.
(iii) A spider is almost like a book where it contains the table
of contents, the actual content and the links and
references for all the websites it finds during its search,
and it may index up to a million pages a day.
(iv) When you make a query on a search engine, it searches
through its index and not the actual web. Different
search engine rank sites differently due to their
algorithms.
(v) Search engines look heavily on the keywords on a
webpage, however, they can detect when a page is
being ‘stuffed’ with keywords. The algorithms then look
at the way the pages link to other web pages. This is so
a search engine can check if the links are relative to
site.

B) Indexing:

(vi) The contents of each page are then analyzed to


determine how it should be indexed (for example,
words are extracted from the titles, headings, or special
fields called meta tags). Data about web pages are
stored in an index database for use in later queries. A
query can be a single word. The purpose of an index is
to allow information to be found as quickly as possible.
Some search engines, such as Google, store all or part
of the source page (referred to as a cache) as well as
information about the web pages, whereas others, such
as AltaVista, store every word of every page they find.
This cached page always holds the actual search text
since it is the one that was actually indexed, so it can be
very useful when the content of the current page has
been updated and the search terms are no longer in it.
This problem might be considered to be a mild form of
linkrot, and Google's handling of it increases usability
by satisfying user expectations that the search terms
will be on the returned webpage. This satisfies the
principle of least astonishment since the user
normally expects the search terms to be on the
returned pages. Increased search relevance makes
these cached pages very useful, even beyond the fact
that they may contain data that may no longer be
available elsewhere

C) Searching:

(vii) When a user enters a query into a search engine


(typically by using key words), the engine examines its
index and provides a listing of best-matching web
pages according to its criteria, usually with a short
summary containing the document's title and
sometimes parts of the text. The index is built from the
information stored with the data and the method by
which the information is indexed. Most search engines
support the use of the boolean operators AND, OR
and NOT to further specify the search query. Boolean
operators are for literal searches that allow the user to
refine and extend the terms of the search. The engine
looks for the words or phrases exactly as entered. Some
search engines provide an advanced feature called
proximity search which allows users to define the
distance between keywords. There is also concept-
based searching where the research involves using
statistical analysis on pages containing the words or
phrases you search for. As well, natural language
queries allow the user to type a question in the same
form one would ask it to a human. A site like this would
be Askjeeves.com.
(viii) The usefulness of a search engine depends on the
relevance of the result set it gives back. While there
may be millions of web pages that include a particular
word or phrase, some pages may be more relevant,
popular, or authoritative than others. Most search
engines employ methods to rank the results to provide
the "best" results first. How a search engine decides
which pages are the best matches, and what order the
results should be shown in, varies widely from one
engine to another. The methods also change over time
as Internet usage changes and new techniques evolve.
There are two main types of search engine that have
evolved: one is a system of predefined and
hierarchically ordered keywords that humans have
programmed extensively. The other is a system that
generates an "inverted index" by analyzing texts it
locates. This second form relies much more heavily on
the computer itself to do the bulk of the work.
(ix) Most Web search engines are commercial ventures
supported by advertising revenue and, as a result,
some employ the practice of allowing advertisers to pay
money to have their listings ranked higher in
search results. Those search engines which do not
accept money for their search engine results make
money by running search related ads alongside the
regular search engine results. The search engines make
money every time someone clicks on one of these ads.

4) Which search engine is the best?

• According to me Google is the best. It is the best because of


the following reasons
A)Google's Search Engine Home Page

• Google's home page is extremely clean and


simple, loads quickly, and delivers arguably
the best results of any search engine out
there, mostly due to its PageRank
technology and massive listings (more than
8 billion at the time of this writing).

B)Google's Search Engine Options

• Searchers have more than one option on


Google's home page; there is the capacity
to search for images, comments in UseNet
discussion forums, and Google’s unique
news hub, and many more choices. Google’s
own shopping search engine Froogle, is
also available for searchers to tap into
Google's sizable shopping listings.

C)How to use Google's search engine

• Be specific. Google is not an “intuitive”


search engine (unfortunately, there aren't
any!), and therefore cannot read your mind.
Try to be as concise as possible; instead of
"jeans", try "Levi 501 jeans".

• Search for phrases. For example, if you’re


searching for a specific quote, type in “to be
or not to be". Google will search for the
entire phrase just how it appears in between
the quotes. For more information on how
use phrases in your searches, check out
Looking for a Specific Phrase.
• Be selective. Use “common words”, such
as and, if, not and numbers ONLY if you
want them included in the search. Google
excludes them otherwise. If you want them
included, use a phrase search by putting
quotations around your search query, or
include the common word by putting a
space and a plus sign right in front of it. For
example, if you are looking for the season
five DVD of “Sex and the City”, type in “sex
and the city dvd season +5”.

• Exclude extra results. If you want to


narrow down your searches even further,
focus your search by placing a "-" (negative
sign) in front of words you want to avoid.
For example, if you're searching for "coffee"
and want to avoid Starbucks, you would
type in "coffee -Starbucks" (without quotes)

D)Google Search Tips

• All you need to do is just enter a word


or phrase and hit “enter”. Google will
only come up with results that contain all
the words in the search word or phrase;so
refining your search just means adding or
subtracting words to the search terms
you’ve already submitted. Google's search
results can easily be narrowed down by
using phrases instead of just one word; for
example, when looking for "coffee" search
for "Starbucks coffee" instead and you'll get
much better results.

• Google doesn't care about capitalized


words and will even suggest correct
spellings of words or phrases. Google
also excludes common words such as
"where" and "how", and since Google will
return results that include all of the words
you enter in, there's no need to include the
word "and", as in "coffee and starbucks."

E) Google Advanced Search

• For more advanced Google search tips,


you'll definitely want to check out my
Google Cheat Sheet. Google is just way
too big for only one article to cover all it has
to offer in the way of search.

F) Google Toolbar and Google Desktop Search

• What else is great about Google?


Personally, I'm a big fan of both the Google
Toolbar and the Google Desktop Search.
The best feature of the toolbar is that it
blocks annoying pop-ups, and offers instant
access to Google's search technology. The
desktop search has its critics but I've had no
problems with it. Both downloads are free
and take up very little space on your
system; give them a try and see what you
think.

G)Google Search Engine Extras: There's so many extra search


options on Google that it's difficult to find space to list them all. Here
are few special features:

• Search for Books: If you're looking for text


from a specific book, type in the name of
the book (in quotes), or if you're looking for
books about a particular subject, type in
"books about xxx". Google will return results
that contain content either in the book itself,
and will offer links to Book Results at the top
of the search page.

• Google Calculator: Use Google's calculator


by just typing in whatever calculation you'd
like Google to figure out. For example: half a
quart in tablespoons.

• Google Definitions: Ask Google to define


something by typing in define (insert term).

Bibliography
1) en.wikipedia.org/wiki/web_search_engine
2) www.monash.com/ spidap 4.html

You might also like