Professional Documents
Culture Documents
Clements
1
S hE i
Search Engines
2
What is a Search Engine?
y A client/server application
y A document retrieval system
y Use regularly updated indexes to operate quickly and
efficiently
ffi i l
y Designed to help find information stored:
y On a computer system, such as on the World Wide Web
y Inside a corporate or proprietary network
y In a personal computer
y Different selection and relevance criteria can apply in
different environments, or for different uses
diff t i t f diff t
y Allows one to ask for content meeting specific criteria
y Typically those containing a given word or phrase
y Retrieves a list of items that match those criteria
R i li f i h h h i i
3
What else?
y Some also mine data available in:
l d l bl
y Newsgroups
y Large databases
L d t b
y Open directories like DMOZ.org
y What about the text of books
y Web directories – maintained by human editors
y Search engines – operate algorithmically
y Many website “search engines” are actually front
ends to search engines of others
4
History
y Archie – First search tool for the Internet
y “Archive" without the "v", not the character from 'Archie' comic book
y Created (1990) by Alan Emtage, a student at McGill University, Montreal
y Downloaded the directory listings of all files located on public anonymous FTP sites
y Creating a searchable database of filenames ‐ not file contents
y Gopher – indexed plain text documents
y Created (1991) by Mark McCahill at the University of Minnesota
y Named after the school's mascot
y Most of the Gopher sites became websites after the creation of the WWW
y Veronica –
Veronica searched the files stored in Gopher index systems
searched the files stored in Gopher inde s stems
y Very Easy Rodent‐Oriented Net‐wide Index to Computerized Archives
y Provided a keyword search of most Gopher menu titles
y Jughead – searched the files stored in Gopher index systems
y Jonzy
Jonzy's Universal Gopher Hierarchy Excavation And Display
s Universal Gopher Hierarchy Excavation And Display
y Tool for obtaining menu information from various Gopher servers
y Wandex – first Web search engine
y Used index collected by the World Wide Web Wanderer, a web crawler developed by
Matthew Gray at MIT in 1993
5
Google
y 2001 –
rose to prominence
t i
y Currently the most popular search engine
y Success based on the concept of link popularity and
p p p y
PageRank
y PageRank – The number of websites and webpages that link to a
page
p g
y Possible to order its results by how many websites link to each
found page
y PageRank is based on citation analysis developed (1950s)
g y p ( 95 )
by Eugene Garfield at the University of Pennsylvania
y Minimalist user interface was very popular with users
y Utilize more than 150 criteria to determine relevancy
6
Others
y Yahoo! Search
y Founders David Filo and Jerry Yang, Ph.D. candidates at Stanford University
F d D id Fil d J Y Ph D did t t St f d U i it
y Started in a campus trailer (February 1994) to keep track of their personal
interests on the Internet
y 2002, Yahoo! acquired Inktomi
y 2003, Yahoo! acquired Overture, which owned AlltheWeb and AltaVista
2003 Yahoo! acquired Overture which owned AlltheWeb and AltaVista
y 2004, launched its own search engine
y Microsoft’s Windows Live Search
y Most recent major search engine is
y Powered by its own web crawler (called msnbot)
y 2006, Microsoft migrated to the new search platform
y Ask.com
y February 2006, rebranded Ask Jeeves
y M ( ith lki di ti
Maps (with walking directions and dynamic address generation)
d d i dd ti )
y "Smart Answers" were added
y Algorithmic engine using relevance ranking originally developed for Teoma
y Features generally unavailable elsewhere to help narrow, expand, and select
related names
y Page previews
y "Zoom"
7
Oh S hI d
Other Search Indexes
8
Ad dS h Li k
Advanced Search Link
9
Ab G l Li k
About Google Link
10
G l H l
Google Help
11
3rd P R
Party Resources
12
Different from Search Engines….
Different from Search Engines
13
S hE i Di i
Search Engines vs. Directories
S
Search Engines
h E i Search Directories
y Automated—no human y Indexed by humans
intervention
y E
Examples:
l
y Paid advertisers top
y Yahoo
results lists
14
Computers are quick but they don’t think….
~D.A.
DA
15