You are on page 1of 11

How Search Engine Works

COMSATS INSTITUTE OF INFORMATION


TECHNOLOGY Islamabad

Assignment Number 1
Subject: ICT
Title: How Search Engine Works
Submitted to:
Sir Ashfaq Hussain Farooqi
Submitted by:
Muhammad Imran Taj
Reg Number:
SP18-BCS-116
Date of Submission:
26 Feb, 2018

Department of Computer Science,


2018

Contents
1 Introduction.........................................................................................................................................3

1|Page
How Search Engine Works

1.1 History.........................................................................................................................................3
2 Working...............................................................................................................................................4
2.1 Crawling.......................................................................................................................................4
2.2 Indexing.......................................................................................................................................5
2.2.1 Design Factor.......................................................................................................................6
2.2.2 Data Structures:...................................................................................................................6
2.3 Retrieval.......................................................................................................................................7
2.3.1 The Information Retrieving Cycle:........................................................................................7
2.4 Advantages and Disadvantages:..................................................................................................8
2.4.1 Advantages:.........................................................................................................................8
2.4.2 Disadvantages:.....................................................................................................................9
3 Important Search Engines:...................................................................................................................9
3.1 Market Share:..............................................................................................................................9
4 Search Engine Submission:................................................................................................................10
References.................................................................................................................................................11
5 Following table contains information about different elements of MS word, formatting patterns or
styles used in the report:...........................................................................................................................12

1 Introduction
A Search Engine is a program software available through the internet
that searches documents and files for keywords and returns the results

2|Page
How Search Engine Works

of any files containing those keywords.


 The most popular and well - known search engine today is Google.
 Other popular search engines include AOL, Ask.com, Baidu,
Bing and Yahoo.

Figure 1: Different Search Engines

1.1 History
The following table shows the history of search engine form first Search engine till Google.
Archie was the first tool ever made as search engine on the internet. The name stands for
“archive” without the “v”. It was created Alan Emtage, Bill Heelan and J. Peter computer
Science students at McGill University of Canada.
Google adopted the idea of selling search terms in 1998, from a small search engine
Company named goto.com This move had a significant effect on Se business.[ CITATION Pur18 \l
1033 ]

Table 1: History of Search Engines Till Google

Year Engine Current status


1993 W3Catalog Inactive
Ali web Inactive
Jump Station Inactive

3|Page
How Search Engine Works

WWW Word Inactive


1994 Web Crawler Active, Aggregator
Go. Com Inactive
Lycos Active
Info seek Inactive
1995 AltaVista Inactive
Daum Active
Magellan Inactive
Excite Active
SAPO Active
Yahoo! Active
1996 Dogpile Active
Inktomi Inactive
Hot Bot Active
Ask Jeeves Active
1997 Northern Light Inactive
Yandex Active
1998 Google Active
MSN Search Active as Bing

2 Working
Every search engine has three main functions:
 Crawling (to discover content).
 Indexing (to discover and store content).
 Retrieval (to fetch relevant content when users query the search engine).
Now we discuss them in some detail:

2.1 Crawling
 This involves scanning sites and collecting details about each page: titles,
Images, keywords, other linked pages, etc.
 Different crawlers may look for different details like page layouts, where
advertisements are placed, where links are crammed in, etc.

How a website crawled?


 An automated bot (called a “spider”) visits page after page as quickly as possible,
Using page links to find where to go next. Even is the earliest days, Google’s spiders
could reach several hundred pages per second. Nowadays, it’s in the thousands.
 When a web crawler visits a page, it collects every link on the page and adds them
to its list of next pages to visit. It collects the link on that page, and repeats.

Web crawler also revisit past papers to see if any change is happened.
4|Page
How Search Engine Works

 Some sites are crawled more frequently, and some are crawled to greater depths, but
sometimes a crawler may give up if a site’s page hierarchy is too complex.

2.2 Indexing
Indexing is when the data from a crawl is processed and placed in database.
After crawling has been done, the results have been into Google’s index
According to Layman’s terms, indexing is the process of adding webpages in
to Google search
 By Default, every WordPress post and page is indexed.
The actual search engine index is the place where all the data the search engine
has collected is stored. It provides the results of search queries and the pages
that are stored within the search index that appear on the search index page.
Here’s is a sneak peek of one of Google’s search center.

Figure 2: one of Google Search center

Parts of search index


There are two parts of search index which are:
5|Page
How Search Engine Works

2.2.1 Design Factor


The design factors of a search index design the architecture of the index and how
the index actually works. The parts all combine to create the working search index
and it includes:
 Merge factors: It decide how the information enters the index.
 Index size: It pertains to the amount of computer space necessary to
support the index.
 Storage techniques: which is the solution of how the information is stored.
 Fault tolerance: It refers to the issue of how important it is for the search index
To be reliable.
 Lookup speed: Pertain how quickly a word can be found when the data is
searched in the inverted index.

2.2.2 Data Structures:


When a search engine index is being built, there are also many different types of data
Structures to choose from. These data structures can be:

 Suffix tree: Supports linear time lookup and is structured like a tree.
 Tree: An ordered tree data structure that stores and associative array.
 Inverted index: Stores a list of occurrence in the form of hash table.
 Citation Index: Stores citation between certain documents.
 Term Document Matrix: It stores the occurrences of words in documents
in a two dimensional sparse document.

2.3 Retrieval
Information Retrieval is the activity of obtaining information relevant to an
Information need from a collection of information resources.
It is the science of searching for information in a document, searching for
documents themselves, and also searching for metadata that describes
data, and for database of texts.
Searches and communication are the two most popular purposes of computer.
An information retrieval process begins when a user enters query into the
System. In information retrieval a query does not uniquely identify a single
object in the collection. Instead, several objects may match the query,
perhaps with different degrees of relevancy.

2.3.1 The Information Retrieving Cycle:


 Source Selection.

6|Page
How Search Engine Works

 Query Formation.
 Query Search.
 Ranked List Selection.
 Documents Examination.
 Documents Delivery.

Figure 3: The IR Cycle

2.4 Advantages and Disadvantages:


Search Engines provide some popular ways of finding information on the
Internet.
They have following advantages:

2.4.1 Advantages:
2.4.1.1 Variety:
 An Internet search can generate a variety of sources for information.
 This variety allows anyone searching for information to choose the
types of sources they would like to use, or to use a variety of sources
to gain a greater understanding of a subject.

2.4.1.2 Precision:
 Search engines have the ability to provide refined or more precise results.
 Some search engines, such as Google or Yahoo, enable you to specify
the type of web sources to be searched.
 Being able to search more precisely allows you to cut down on the amount
of information generated by your search
 Search engines within a website allow you to search information only on
that website, filtering out information from other web sources and giving
more precision in a user's search for information.

7|Page
How Search Engine Works

2.4.1.3 Organization:
 Internet search engines help to organize the Internet and individual websites.
 Search engines aid in organizing the vast amount of information that can sometimes
be scattered in various places on the same web page into an organized list that can be
used more easily.

2.4.2 Disadvantages:
 Search engine show way too much useless on our screen.
 Sometimes you even cannot find anything useful from searching results.
 It wastes us much time to pick up useful information from seas of searching results.
 Those who use search engine frequently may become lazy even stupid.
 Search engine may bring people to various pornographic websites. Those websites are
especially harmful for children. And search engine has not found the best way to keep
those evil websites from kids now.

3 Important Search Engines:


 Google is the world’s most popular search engine.
 Bing comes after Google in popularity and then Baidu, and Yahoo!.

3.1 Market Share:


Google is the world's most popular search engine, with a market share of 74.52 percent as of
February, 2018.
The world's most popular search engines (with >1% market share) are:

Table 2: Market Share

Search Engine Market Share in February, 2018

Google 44.52%

Bing 38.49%

8|Page
How Search Engine Works

Baidu 10.98%

Yahoo! 08.41%

4 Search Engine Submission:


 Search engine submission is a process in which a webmaster submits a website
directly to a search engine. While search engine submission is sometimes
presented as a way to promote a website, it generally is not necessary because
the major search engines use web crawlers, that will eventually find most web
sites on the Internet without assistance.
 They can either submit one web page at a time, or they can submit the entire
site using a sitemap but it is normally only necessary to submit the homepage
 of a web site as search engines are able to crawl a well-designed website.
There are two remaining reasons to submit a web site or web page to a search engine:

 Add an entirely new web site without waiting for a search engine to discover it,
 And have a web site's record updated after a substantial redesign.

9|Page
How Search Engine Works

References
Baeza-Yates, R. (n.d.). Query Recommendation Using Query Logs in Search Engines.
Silverstein, C. (1999). Analysis of a very large web search engine query log. 12.
Sullivan, D. (October 14, 2002 ). How Search Engines Work . 12.

10 | P a g e
How Search Engine Works

5 Following table contains information about different elements of MS word,


formatting patterns or styles used in the report:

Sr # 1 Name Line No Page No


1 Underline text 4,5,6 3
2 Underline text 1,2,3 4
3 bullets 4,5 3
4 bullets 2,3,4,6,7,8,9,10 4
5 bullets 5 to 9 6
6 bullets 1 to 5,17,18 7
7 bullets 3 to 8 8
8 bullets 1 to 7 9
9 bullets 1 to 4 10
10 header All pages All pages
11 Page numbering All pages All pages
12 italic All captions All captions

11 | P a g e

You might also like