Professional Documents
Culture Documents
Note:
1. Please follow all the Instructions to Candidates given on the cover page of the answer book.
2. All parts of a question should be answered consecutively. Each answer should start from a fresh page.
3. Assumptions made if any, should be stated clearly at the beginning of your answer.
Q1 – 2+5+3+5=15 Marks
A) Give an example of uncertainty and vagueness issues in Information retrieval [2 Marks]
B) Explain the merge algorithm for the query “Information Retrieval”? What is the best order
for query processing for the query “BITS AND Information AND Retrieval”? What
Documents will be returned as output from the 15 documents? [5 Marks]
BITS 2 4 6 7 10 11 15
Information 1 3 5 11
Retrieval 3 7 11
C) Generate code for the similar sounding words “Plain" and “Plane” using Soundex algorithm
[3 Marks]
D) Build inverted index using Blocked sort-based Indexing for 50 million records. Explain the
algorithm in detail with respect to indexing 50 million records. [5 Marks]
Q2 – 5+5+5=15 Marks
A) Assume a corpus of 1000 documents. The following table gives the TF and DF values for
the 3 terms in the corpus of documents. Calculate the logarithmic TF-IDF values. [5 Marks]
Term dft
bits 2000
pilani 1500
mtech 500
Page 1 of 2
B) Classify the test document d6 into c1 or c2 using naïve bayes classifier. The documents in
the training set and the appropriate class label is given below. [5 Marks]
d5 negative No Yes
C) The search engine ranked results on 0-5 relevance scale: 2, 2, 3, 0, 5. Calculate the NDCG
metric for the same. [5 Marks]
Page 2 of 2