Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
Look up keyword or section
Like this

Table Of Contents

Tree: binary tree
Tree: B-tree
Wild-card queries: *
Query processing
B-trees handle *’s at the end of a query term
Permuterm index
Permuterm query processing
Bigram (k-gram) indexes
Bigram index example
Processing wild-cards
Processing wild-card queries
Spell correction
Document correction
Query mis-spellings
Edit distance
Using edit distances
n-gram overlap
Example with trigrams
One option – Jaccard coefficient
Matching trigrams
Context-sensitive spell correction
Context-sensitive correction
Another approach
General issues in spell correction
Soundex continued
What queries can we process?
Index construction
Hardware assumptions
RCV1: Our collection for this lecture
A Reuters RCV1 document
Reuters RCV1 statistics
Recall IIR 1 index constructionTerm Doc #
Key step
Scaling index construction
Sort-based index construction
Use the same algorithm for disk?
BSBI: Blocked sort-based Indexing (Sorting with fewer disk seeks)
Sorting 10 blocks of 10M records
Remaining problem with sort- based algorithm
SPIMI: Compression
Distributed indexing
Parallel tasks
Data flow
Schema for index construction in MapReduce
Dynamic indexing
Simplest approach
Issues with main and auxiliary indexes
Further issues with multiple indexes
Dynamic indexing at search engines
Compressing Indexes
Why compression (in general)?
Recall Reuters RCV1
Index parameters vs. what we index (details IIR Table 5.1, p.80)
Lossless vs. lossy compression
Heaps’ Law
Zipf’s law
Zipf consequences
Zipf’s law for Reuters RCV1
Why compress the dictionary?
Dictionary storage - first cut
Fixed-width terms are wasteful
Space for dictionary as a string
Dictionary search with blocking
Front coding
Postings compression
Postings: two conflicting forces
Postings file entry
Variable Byte (VB) codes
Other variable unit codes
Unary code
Gamma codes
Gamma code examples
Gamma code properties
Gamma seldom used in practice
RCV1 compression
Ranked retrieval
Problem with Boolean search: feast or famine
Ranked retrieval models
Feast or famine: not a problem in ranked retrieval
Scoring as the basis of ranked retrieval
Query-document matching scores
Take 1: Jaccard coefficient
Jaccard coefficient: Scoring example
Issues with Jaccard for scoring
Recall (Lecture 1): Binary term- document incidence matrix
Term-document count matrices
Term frequency tf
Log-frequency weighting
Document frequency
Document frequency, continued
idf weight
idf example, suppose N = 1 million
Effect of idf on ranking
Collection vs. Document frequency
tf-idf weighting
Final ranking of documents for a query
Binary → count → weight matrix
Documents as vectors
Queries as vectors
Formalizing vector space proximity
Why distance is a bad idea
Use angle instead of distance
Length normalization
Cosine for length-normalized vectors
Cosine similarity illustrated
3 documents example contd
Computing cosine scores
tf-idf weighting has many variants
Weighting may differ in queries vs documents
tf-idf example: lnc.ltc
Summary – vector space ranking
0 of .
Results for:
No results containing your search query
P. 1


Ratings: (0)|Views: 1,307|Likes:
Published by brightday87

More info:

Published by: brightday87 on Aug 07, 2011
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





You're Reading a Free Preview
Pages 4 to 23 are not shown in this preview.
You're Reading a Free Preview
Pages 27 to 87 are not shown in this preview.
You're Reading a Free Preview
Pages 91 to 121 are not shown in this preview.
You're Reading a Free Preview
Pages 125 to 132 are not shown in this preview.
You're Reading a Free Preview
Pages 136 to 175 are not shown in this preview.

Activity (11)

You've already reviewed this. Edit your review.
1 hundred reads
1 thousand reads
Dlven Sher liked this
Sultan Saud liked this
A-n N-a liked this
Haobijam Rocky liked this

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->