Welcome to Scribd!

Introducing Indexes

Uploaded by

0% found this document useful (0 votes)

12 views8 pages

This document introduces indexes and how they enable efficient searching of large collections of documents. It explains that indexes allow words or topics in a document to be looked up quickly, similar to a book's index. The key insight is that search engines like Google build and search indexes to handle their huge collections of web pages. The challenge with indexing the web versus a static book is that the web is constantly changing, so a search engine's index must also be able to dynamically update as pages are added or modified.

Original Description:

Original Title

1. Introducing Indexes

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

12 views8 pages

Introducing Indexes

Uploaded by

triva

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 8

Search inside document

Introducing Indexes

Powered by @ Brilliant App

• In the previous module, we have explored a series of
ever-more-powerful search engines that searched
through a relatively large set of documents: Arthur
Conan Doyle’s Sherlock Holmes stories.
• The complete works of Arthur Conan Doyle are
pretty big! Even so, as you saw in the
exploration Searching the Whole World, they’re just
small enough that you could get away with searching
the simplest possible way: reading through the
whole document from beginning to end, looking for
the right words.
• That’s not how the Sherlock search engine you used
worked, and it wouldn’t be remotely possible for
Google to work this way!
• There's "one weird trick" behind any search engine:
building and using an index.
• A book’s index allows you to access a book in an "inside
out" manner. Instead of scanning the book to find
references to a topic, the index is an alphabetical listing of
topics that explains where in the book that topic appears.
• Even huge internet search engines containing many
trillions of documents ultimately rely on storing all that
information in an index that is stored in a computer.
• The most fundamental operation on an index
is looking up a word. If the index is built just like a
book’s index, then a word can be found by looking in
the middle of the index, and then (depending on the
word) looking at the word in the middle of the first
half or the middle of the second half.
• By always looking in the middle, you can look up a word
in an index containing trillions of web pages by just
checking a few dozen entries! That’s the magic of binary
search!
• If your goal is to create a search engine for
Arthur Conan Doyle's Sherlock Holmes
stories, looking up a word in an index is the
only thing you need to do.
• If you want to create a search engine for
web pages, this is not enough. Why?
Correct answer: The web keeps changing, so an index for it
would have to keep changing as well .
Explanation
• The web does have a lot of data, but that in itself
does not present an insurmountable barrier to
building an index. Also, although web pages do
not have page numbers, they do have unique
identifiers — the URL you use to access them!
• The fundamental difference between a book and
the web is that a book is published and printed
once and for all, whereas the web is constantly
changing. Anyone can add a new web page at any
time, and anyone can remove or modify a web
page that they own. So an index for the web
would itself need to be able to change as needed.
• In a book, an index is a static document, created one time
for one book. There’s no space in a book’s index to add new
topics. That won’t work for a search engine: the web is
changing constantly!
• A search engine’s index has to support one other critical
operation: adding a new word. You can think of an
index that supports addition as a huge organized wall full
of sticky notes. It’s always possible to add a new sticky
note.
• In further lecture, you’re going to assume
that you have an index that supports these
two critical operations:
• quickly looking up words and
• quickly inserting new words.
• Based on these two operations, you’ll learn
how to build a index for search engine.
Search Engine that efficiently supports
complex queries over many millions of
documents.

Assignment 1
Document23 pages
Assignment 1
Balawal Rasool
No ratings yet
Unit 4 Search Strategies
Document11 pages
Unit 4 Search Strategies
Malvern Muzarabani
No ratings yet
The Quick Way To Write Your First Ebook
From Everand
The Quick Way To Write Your First Ebook
Mr. Wordweaver
No ratings yet
Searching For Journal Articles: Jack Hyland
Document25 pages
Searching For Journal Articles: Jack Hyland
BusinessLibrarianDCU
No ratings yet
Search Engines: by Bhaswanth 16311A0507
Document23 pages
Search Engines: by Bhaswanth 16311A0507
Bhaswanth Gudimella
No ratings yet
Search Engines: by Bhaswanth 16311A0507
Document23 pages
Search Engines: by Bhaswanth 16311A0507
Bhaswanth Gudimella
No ratings yet
How To Write An Essay
Document26 pages
How To Write An Essay
api-3756381
100% (2)
Word Search With An Index
Document12 pages
Word Search With An Index
triva
No ratings yet
The Easy Way To Write Your First Ebook
From Everand
The Easy Way To Write Your First Ebook
MUHAMMAD NUR WAHID ANUAR
No ratings yet
How Search Engines Work Mike Grehan
Document57 pages
How Search Engines Work Mike Grehan
jayashree99
No ratings yet
E-Resources in Health Sciences: by Sukhdev Singh
Document95 pages
E-Resources in Health Sciences: by Sukhdev Singh
ade_lia
No ratings yet
Google Scholar: Carter Matherly, PH.D
Document17 pages
Google Scholar: Carter Matherly, PH.D
Dean Giustini
No ratings yet
How To Write An Essay
Document26 pages
How To Write An Essay
Adel Amin
0% (1)
MODULE 5 Research Writing Part 1
Document8 pages
MODULE 5 Research Writing Part 1
Mosedeil Herbert Tabios
No ratings yet
01 Data
Document20 pages
01 Data
Hend maarof
No ratings yet
SPPM 1002 Web Searching
Document12 pages
SPPM 1002 Web Searching
Izlaikha Aziz
No ratings yet
Internet Search Tools Search Engines Meta-Search Engines Metasites Directories
Document10 pages
Internet Search Tools Search Engines Meta-Search Engines Metasites Directories
Jowjie TV
No ratings yet
Lee Esta Guía en Español.: Background
Document8 pages
Lee Esta Guía en Español.: Background
Vagdo
No ratings yet
Guide To Finding Articles-Books
Document7 pages
Guide To Finding Articles-Books
somemeat
No ratings yet
How to Get a Great Job: A Library How-To Handbook
From Everand
How to Get a Great Job: A Library How-To Handbook
Editors of the American Library Association
Rating: 3 out of 5 stars
3/5 (1)
Internet Research: Finding Materials On The Internet
Document8 pages
Internet Research: Finding Materials On The Internet
Anshu Arora
No ratings yet
Database & Search Engine
Document17 pages
Database & Search Engine
bhavgifee
No ratings yet
Lesson 1: Literature Search: Bmeezykg
Document15 pages
Lesson 1: Literature Search: Bmeezykg
ERIKA ANNE CADAWAN
No ratings yet
Guide To Finding Articles - Books
Document7 pages
Guide To Finding Articles - Books
Minecraft Server
No ratings yet
Search Engines: Sara Khalid Suliman
Document34 pages
Search Engines: Sara Khalid Suliman
Magnon Be7wak
No ratings yet
Effective Web Searching
Document13 pages
Effective Web Searching
Steve Evans
No ratings yet
Will My e-Book Look Just Like My Printed Book?: A Quick Guide to Understanding the Differences Between E-books and Traditional Books
From Everand
Will My e-Book Look Just Like My Printed Book?: A Quick Guide to Understanding the Differences Between E-books and Traditional Books
Bo Bennett
No ratings yet
Guide To Finding Articles - Books
Document7 pages
Guide To Finding Articles - Books
asdasda
No ratings yet
Literature Review Web Browser
Document8 pages
Literature Review Web Browser
c5nj94qn
100% (1)
Searching The Internet
Document49 pages
Searching The Internet
anbreengondal
No ratings yet
Academic Writing With Ai Apps
Document245 pages
Academic Writing With Ai Apps
Salma Ayman
No ratings yet
How To Write An Essay
Document26 pages
How To Write An Essay
maximus29
No ratings yet
Manage Your Research Using Mendeley
Document46 pages
Manage Your Research Using Mendeley
AMOEB
No ratings yet
Research in Philosophy: A Guide For Students: 1.1 What Does Your Instructor Expect?
Document32 pages
Research in Philosophy: A Guide For Students: 1.1 What Does Your Instructor Expect?
Malkish Rajkumar
No ratings yet
W11 - Reading and Reviewing
Document49 pages
W11 - Reading and Reviewing
Vũ Xuân Khánh
No ratings yet
Elasticsearch Blueprints - Sample Chapter
Document24 pages
Elasticsearch Blueprints - Sample Chapter
Packt Publishing
No ratings yet
Lesson 3 A Internet As Nursing Resource
Document64 pages
Lesson 3 A Internet As Nursing Resource
beer_ettaa
No ratings yet
English: Quarter 2 - Module 2: The Search Engine
Document6 pages
English: Quarter 2 - Module 2: The Search Engine
joy sumerbang
No ratings yet
Pro SQL Server Relational Database Design and Implementation: Best Practices for Scalability and Performance
From Everand
Pro SQL Server Relational Database Design and Implementation: Best Practices for Scalability and Performance
Louis Davidson
No ratings yet
Search It, Find It: The Translator's Minimalist Guide to Online Search
From Everand
Search It, Find It: The Translator's Minimalist Guide to Online Search
Carlos Djomo
No ratings yet
Scien&fic Literacy: TPB Bahasa Inggris Dr. Susi Yuliawa&, M.Hum
Document15 pages
Scien&fic Literacy: TPB Bahasa Inggris Dr. Susi Yuliawa&, M.Hum
Fathian Nur Dalillah
No ratings yet
Making The Case
Document10 pages
Making The Case
Naxief Ismail
No ratings yet
Basic Research Methods: Annie Gabriel Library
Document5 pages
Basic Research Methods: Annie Gabriel Library
ramizahmedkapadia
No ratings yet
Evernote for Self-Publishing: How to Write Your Book in Evernote from Start to Finish
From Everand
Evernote for Self-Publishing: How to Write Your Book in Evernote from Start to Finish
Jose John
Rating: 5 out of 5 stars
5/5 (2)
Effective Internet Searching
Document14 pages
Effective Internet Searching
R-jay Enriquez Guinto
No ratings yet
Effective Internet Searching
Document14 pages
Effective Internet Searching
Stephanie Nicole Pablo
No ratings yet
The Easy Way To Write Effective Ebook: A Beginners Guide To Making Ebook
From Everand
The Easy Way To Write Effective Ebook: A Beginners Guide To Making Ebook
Richard Smith
No ratings yet
Lesson 2 - Constructing A Search Query
Document12 pages
Lesson 2 - Constructing A Search Query
Antony
No ratings yet
Index: Back Matter Book Library Catalogue
Document9 pages
Index: Back Matter Book Library Catalogue
sai krishna chakka
No ratings yet
The Anatomy of A Large-Scale Hypertextual
Document41 pages
The Anatomy of A Large-Scale Hypertextual
Shivani
No ratings yet
Research Methodologies: Chapter Three Literature Review and Referencing
Document72 pages
Research Methodologies: Chapter Three Literature Review and Referencing
Milion Nugusie
No ratings yet
Basic Research Methods: Annie Gabriel Library
Document5 pages
Basic Research Methods: Annie Gabriel Library
Shanu Khan
No ratings yet
Google Thesis
Document6 pages
Google Thesis
afodfexskdfrxb
100% (1)
Library Research Notebook
Document19 pages
Library Research Notebook
api-645491299
No ratings yet
Title: Library Book Search System Using Zigbee Technology Author: Arnold C. Paglinawan, Marloun Sejera, Christopher Fernan, Michael Latiza, Yuri
Document2 pages
Title: Library Book Search System Using Zigbee Technology Author: Arnold C. Paglinawan, Marloun Sejera, Christopher Fernan, Michael Latiza, Yuri
Deign Sese
No ratings yet
Book Indexing: A Step-by-Step Guide
From Everand
Book Indexing: A Step-by-Step Guide
Stephen Ullstrom
No ratings yet
Kako Napisati Esej
Document26 pages
Kako Napisati Esej
Boris Mitić
No ratings yet
Information Retrieval
Document21 pages
Information Retrieval
Mukx Mukaronda
No ratings yet
The Internet - Your Research Library
Document11 pages
The Internet - Your Research Library
Felipe Ens
No ratings yet
Literature Review Zeitform
Document5 pages
Literature Review Zeitform
eygyabvkg
100% (1)
Lecture 2
Document57 pages
Lecture 2
triva
No ratings yet
Lecture 1
Document66 pages
Lecture 1
triva
No ratings yet
Queries With AND and OR Operators
Document29 pages
Queries With AND and OR Operators
triva
No ratings yet
Word Search With An Index
Document12 pages
Word Search With An Index
triva
No ratings yet
Magic Words That Bring You Riches PDF
Document122 pages
Magic Words That Bring You Riches PDF
Anthony Walker
No ratings yet
Madu Assignment
Document12 pages
Madu Assignment
Sevgi Atıcı
No ratings yet
NZ Maths Homework Books
Document8 pages
NZ Maths Homework Books
caraqpc2
100% (1)
Conner, Gary - Six Sigma and Other Continuous Improvement Tools For The Small Shop-Society of Manufacturing Engineers (2002)
Document377 pages
Conner, Gary - Six Sigma and Other Continuous Improvement Tools For The Small Shop-Society of Manufacturing Engineers (2002)
Anibal Daza
No ratings yet
B1 CHP 7
Document12 pages
B1 CHP 7
JasmineVeena
No ratings yet
Thesis Border Around Page
Document6 pages
Thesis Border Around Page
catherineaguirresaltlakecity
100% (2)
(PDF) Library Space and Place - Nature, Use and Impact On Academic Library
Document11 pages
(PDF) Library Space and Place - Nature, Use and Impact On Academic Library
gt
No ratings yet
Maniac Magee Final Project
Document3 pages
Maniac Magee Final Project
api-235656654
No ratings yet
Katherine M. Stott (2008) - Why Did They Write This Way? Reflections On References To Written Documents in The Hebrew Bible and Ancient Literature. T&T Clark (Ebook, I)
Document177 pages
Katherine M. Stott (2008) - Why Did They Write This Way? Reflections On References To Written Documents in The Hebrew Bible and Ancient Literature. T&T Clark (Ebook, I)
OLEStar
No ratings yet
Sensors 22 02991 v2
Document20 pages
Sensors 22 02991 v2
Daniel
No ratings yet
C++ Reactive Programming 2018 PDF
Document491 pages
C++ Reactive Programming 2018 PDF
batong96
75% (4)
Coptic Stitch Bookbinding
Document3 pages
Coptic Stitch Bookbinding
api-233374589
No ratings yet
Accidental Cure Extraordinary Medicine Patients
Document5 pages
Accidental Cure Extraordinary Medicine Patients
Ahmet
0% (1)
rEADING & wRITING 2ND eXAM
Document3 pages
rEADING & wRITING 2ND eXAM
Menjie Antiporta
No ratings yet
Ted Nicholas
Document6 pages
Ted Nicholas
Júlio César Caesar
No ratings yet
Book Covers Lesson Plan
Document3 pages
Book Covers Lesson Plan
api-545455548
No ratings yet
Goulish, Matthew - 39 Microlectures in Proximity of Performance
Document225 pages
Goulish, Matthew - 39 Microlectures in Proximity of Performance
Ben Zucker
100% (2)
Ethical Use of Media: Dictionary
Document2 pages
Ethical Use of Media: Dictionary
RUS SELL
No ratings yet
Semester I Lesson 1 Family
Document60 pages
Semester I Lesson 1 Family
Puspita Purnama Sari
No ratings yet
MED300 Test 3
Document3 pages
MED300 Test 3
Mark Adrian
No ratings yet
Effects and Implications of E-Books On Students
Document34 pages
Effects and Implications of E-Books On Students
Pannaga Jamadagni
No ratings yet
(Cambridge Studies in Constitutional Law) Roberto Gargarella - The Law As A Conversation Among Equals-Cambridge University Press (2022)
Document378 pages
(Cambridge Studies in Constitutional Law) Roberto Gargarella - The Law As A Conversation Among Equals-Cambridge University Press (2022)
Rodrigo Cordero
No ratings yet
Ursing Atalogue: Anatomy & Physiology
Document12 pages
Ursing Atalogue: Anatomy & Physiology
Muhammad Ali
No ratings yet
Tahun 6 - Semakan RPT Bi
Document24 pages
Tahun 6 - Semakan RPT Bi
Diana Ada
No ratings yet
Knowledge For The People-Understanding
Document38 pages
Knowledge For The People-Understanding
khadarcabdi
No ratings yet
Polymer Composites
Document100 pages
Polymer Composites
Nouna
No ratings yet
TLE ACTIVITY SHEET LAS-for-Q2-Template - with-Guide-Statements
Document6 pages
TLE ACTIVITY SHEET LAS-for-Q2-Template - with-Guide-Statements
Puche Mara
No ratings yet
Remedial Activities in Media and Info Literacy 12 Stem 2
Document14 pages
Remedial Activities in Media and Info Literacy 12 Stem 2
Angelica Maganis
No ratings yet
Masterminds and Wingmen by Rosalind Wiseman Excerpt
Document40 pages
Masterminds and Wingmen by Rosalind Wiseman Excerpt
Crown Publishing Group
33% (3)
A Dissertation Upon Roast Pig
Document8 pages
A Dissertation Upon Roast Pig
CollegePaperServiceCanada
100% (1)