You are on page 1of 12

Word Search with an Index

Powered by @ Brilliant App


• In the previous explorations, you considered a collection of documents, like
this:

• And you transformed that collection of documents into an organized search


index, like this:

• In this exploration, you'll use this search index to answer simple search
was the
queries. This question should be at the back of your mind:
extra work of building a search index worth
it?
• A single-word query can be performed with a
single lookup operation:
– Looking up "springs" identifies the two documents that contain
the word "springs."
– Looking up "against" identifies the one document that contains the
word "against."
– Looking up "water" identifies that no documents contain the word
"water."
• If your goal is to answer single-word queries, it
isn't necessary to build an index or perform lookup operations. A
single-word query can also be performed by reading each of
the words in the collection of documents, one by one, and
reporting back where the word of interest was found.
• Building an index instead of just reading documents
involves tradeoffs. What are some of the tradeoffs?
(Select all that apply.)
Correct answer: Compared to reading word by word, creating an index
requires more work ahead of time., and Compared to reading
word by word, creating an index requires more sticky notes.

Explanation
• Compared to reading word by word, creating an index requires
more sticky notes.
• This is an example of a time-space tradeoff. If you don't have
sticky notes or wall space for organizing the sticky notes, then it's
not possible to create a search index at all, and reading word by
word is the only option.
• Similarly, a search index on a computer takes up space in the
computer's memory.
• Compared to reading word by word, creating an index requires
more work ahead of time.
• If you end up never needing to answer any query about the
documents at all, then the time spent building the search index is
wasted.
• Sometimes it helps to make up some plausible numbers to help think about a tradeoff
problem.

• If you don't have the three documents memorized, it might take about five seconds to
re-read through them word by word in order to see where a single word appears.
• It takes about five minutes (300 seconds) to create the search index out of
sticky notes.

• Once the search index is created, it's a bit faster to find a word in the index, just taking
four and a half seconds.
Fill in the blank
If space and sticky note usage isn't a concern,
it's worth creating the index if you'll be
performing a single-word search more than
about ________.
Correct answer: 600 or 700 times

Explanation
• The small index is just a half-second faster than
just scanning through the documents. In order
for the 300 seconds investment of creating the
index to pay off, you'd need to use the index
600 times.
• After reading the three quotes 600 times, you'd
probably have them memorized! It's unlikely to
be helpful to make an index in this situation.
• With a tiny collection of documents, the index wasn't really that helpful.

• Imagine growing the size of your collection to 150 framed quotes.


Now it takes about four minutes, or 240 seconds, to read through
all the documents word by word.
• Looking up a word in the index, which is sorted alphabetically, isn't
much harder than it was for the smaller index: it takes about 6
seconds.
• Creating the index with sticky notes is now a much more arduous
task than before. It takes 15,000 seconds, or about 4 hours!
Fill in the blank
If space and sticky note usage isn't a concern,
it's worth creating the index if you'll be
performing a single-word search more than
about ________.
Correct answer: 60 or 70 times
Explanation
• Instead of being a fraction of a second more efficient
to use the index, it now is 234 seconds faster to
search for a word with the index. This is, very
approximately, about 100 times less than the
investment needed to create the index.
• Estimating in this way would be close enough to get
the right answer.
• The precise answer is that 15000/(240 – 6)=
64.103, so the investment in creating an index will
pay off if you use the index 65 times or more.
• Even though the numbers in this example were made up,
they match what you should expect from a search index
built with good data structures.
– If your collection of documents grows by 50 times, you should
expect that a computer will take 50 times longer to read all the
documents word by word.
– If your collection of documents grows by 50 times, you should
expect that a computer will take 50 times longer, or maybe just a bit
more than 50 times longer, to put all the documents into
an index.
– If your collection of documents grows by 50 times, you should
expect that it will take only a tiny bit longer to perform a
single lookup operation.
• The key thing to remember is that, as there are more
and more documents to consider, it rapidly becomes
more and more worthwhile to invest in an index if
those documents will be searched multiple times.

You might also like