You are on page 1of 11

INFORMATION

RETRIEVAL TECHNIQUES
Lecture # 18

• Wild Card Queries


• B Tree

1
ACKNOWLEDGEMENTS
The presentation of this lecture has been taken from the underline
sources
1. “Introduction to information retrieval” by Prabhakar Raghavan,
Christopher D. Manning, and Hinrich Schütze
2. “Managing gigabytes” by Ian H. Witten, Alistair Moffat, Timothy C.
Bell
3. “Modern information retrieval” by Baeza-Yates Ricardo,  
4. “Web Information Retrieval” by Stefano Ceri, Alessandro Bozzon,
Marco Brambilla
Outline
• How to Handle Wild-Card Queries
• Wild-card queries: *
• B-Tree
• B+ Tree

3
How to Handle Wild-Card Queries
• B-Trees
• Permuterm Index
• K-Grams
• Soundex Algorithms

4
Wild-card queries: *
• mon*: find all docs containing any word beginning with “mon”.
• Easy with binary tree (or B-tree) lexicon: retrieve all words in range:
mon ≤ w < moo
• *mon: find words ending in “mon”: harder
• Maintain an additional B-tree for terms backwards.
Can retrieve all words in range: nom ≤ w < non.

Exercise: from this, how can we enumerate all terms


meeting the wild-card query pro*cent ?
5
B-Tree

6
B-Tree
Level 0
A

Level 1
B C D

Level 2
E F G H I

Level 3
J K

n = # of pairs.
# of external nodes = n + 1.

7
Wild-card queries: *
• mon*: find all docs containing any word beginning with “mon”.
• Easy with binary tree (or B-tree) lexicon: retrieve all words in range:
mon ≤ w < moo
• *mon: find words ending in “mon”: harder
• Maintain an additional B-tree for terms backwards.
Can retrieve all words in range: nom ≤ w < non.

Exercise: from this, how can we enumerate all terms


meeting the wild-card query pro*cent ?
8
B+ Tree

9
Wild-card queries example
• Query
• Sanat* AND Jayasur*

Sanat AND Jayasur

• Query
mo*y …yad*
*day
• Queries:
• X lookup on X$ X* lookup on $X*
• *X lookup on X$* *X* lookup on X*
• X*Y lookup on Y$X* X*Y*Z ??? Exercise! 10
Resources
• IIR 3, MG 4.2
• Efficient spell retrieval:
• K. Kukich. Techniques for automatically correcting words in text. ACM Computing Surveys
24(4), Dec 1992.
• J. Zobel and P. Dart.  Finding approximate matches in large lexicons.  Software - practice and
experience 25(3), March 1995. http://citeseer.ist.psu.edu/zobel95finding.html
• Mikael Tillenius: Efficient Generation and Ranking of Spelling Error Corrections. Master’s thesis at
Sweden’s Royal Institute of Technology. http://citeseer.ist.psu.edu/179155.html
• Nice, easy reading on spell correction:
• Peter Norvig: How to write a spelling corrector
http://norvig.com/spell-correct.html

11

You might also like