You are on page 1of 8

INFORMATION

RETRIEVAL TECHNIQUES
Lecture # 15

• Merge Sort

1
ACKNOWLEDGEMENTS
The presentation of this lecture has been taken from the following
sources
1. “Introduction to information retrieval” by Prabhakar Raghavan,
Christopher D. Manning, and Hinrich Schütze
2. “Managing gigabytes” by Ian H. Witten, Alistair Moffat, Timothy C.
Bell
3. “Modern information retrieval” by Baeza-Yates Ricardo,  
4. “Web Information Retrieval” by Stefano Ceri, Alessandro Bozzon,
Marco Brambilla
Outline
• Two-Way Merge Sort
• Single-pass in-memory indexing
• SPIMI-Invert

3
Two-Way Merge Sort Example
Explain the improvement in 2
way merge by incorporating n-
way merge

4
SPIMI:
Single-pass in-memory indexing
• Key idea 1: Generate separate dictionaries for each block – no need to
maintain term-termID mapping across blocks.
• Key idea 2: Don’t sort. Accumulate postings in postings lists as they
occur.
• With these two ideas we can generate a complete inverted index for
each block.
• These separate indexes can then be merged into one big index.

5
SPIMI-Invert

• Merging of blocks is analogous to BSBI.


Summary
• So far
• Static Collections (corpus is fixed)
• Linked List based indexing
• Array based indexing
• Data fits into the HD of a single machine
• Next
• Data does not fit in a single machine
• Requires more machines (clusters of machines).
Ch. 4

Resources for today’s lecture


• Chapter 4 of IIR
• MG Chapter 5
• Original publication on MapReduce: Dean and Ghemawat (2004)
• Original publication on SPIMI: Heinz and Zobel (2003)

You might also like