Professional Documents
Culture Documents
01 Intro
01 Intro
Introduction to
Information Retrieval
1
Introduction to Information Retrieval
Take-away
Administrativa
Boolean Retrieval: Design and data structures of a simple
information retrieval system
What topics will be covered in this class?
2
Introduction to Information Retrieval
Outline
❶ Introduction
❷ Inverted index
❸ Processing Boolean queries
❹ Query optimization
3
Introduction to Information Retrieval
4
Introduction to Information Retrieval
5
Introduction to Information Retrieval
6
Introduction to Information Retrieval
Boolean retrieval
Does
Google use the Boolean model?
7
Introduction to Information Retrieval
Outline
❶ Introduction
❷ Inverted index
❸ Processing Boolean queries
❹ Query optimization
8
Introduction to Information Retrieval
9
Introduction to Information Retrieval
10
Introduction to Information Retrieval
Incidence vectors
12
Introduction to Information Retrieval
result: 1 0 0 1 0 0
13
Introduction to Information Retrieval
Answers to query
14
Introduction to Information Retrieval
Bigger collections
15
Introduction to Information Retrieval
16
Introduction to Information Retrieval
Inverted Index
dictionary
postings 17
Introduction to Information Retrieval
Inverted Index
dictionary
postings 18
Introduction to Information Retrieval
Inverted Index
dictionary
postings 19
Introduction to Information Retrieval
20
Introduction to Information Retrieval
21
Introduction to Information Retrieval
Generate posting
22
Introduction to Information Retrieval
Sort postings
23
Introduction to Information Retrieval
24
Introduction to Information Retrieval
dictionary
postings 25
Introduction to Information Retrieval
26
Introduction to Information Retrieval
Outline
❶ Introduction
❷ Inverted index
❸ Processing Boolean queries
❹ Query optimization
27
Introduction to Information Retrieval
28
Introduction to Information Retrieval
29
Introduction to Information Retrieval
30
Introduction to Information Retrieval
31
Introduction to Information Retrieval
Boolean queries
The Boolean retrieval model can answer any query that is a
Boolean expression.
Boolean queries are queries that use AND, OR and NOT to join
query terms.
Views each document as a set of terms.
Is precise: Document matches condition or not.
Primary commercial retrieval tool for 3 decades
Many professional searchers (e.g., lawyers) still like Boolean
queries.
You know exactly what you are getting.
Many search systems you use are also Boolean: spotlight,
email, intranet etc.
32
Introduction to Information Retrieval
33
Introduction to Information Retrieval
34
Introduction to Information Retrieval
Westlaw: Comments
Proximity operators: /3 = within 3 words, /s = within a
sentence, /p = within a paragraph
Space is disjunction, not conjunction! (This was the default
in search pre-Google.)
Long, precise queries: incrementally developed, not like
web search
Why professional searchers often like Boolean search:
precision, transparency, control
When are Boolean queries the best way of searching?
Depends on: information need, searcher, document
collection, . . .
35
Introduction to Information Retrieval
Outline
❶ Introduction
❷ Inverted index
❸ Processing Boolean queries
❹ Query optimization
36
Introduction to Information Retrieval
Query optimization
37
Introduction to Information Retrieval
Query optimization
38
Introduction to Information Retrieval
39
Introduction to Information Retrieval
40