You are on page 1of 2

Birla Institute of Technology & Science, Pilani

Work-Integrated Learning Programmes Division


Mid-Semester Test
(EC-1 Regular)

Course No. : DSECLZG537 No. of Pages =2


Course Title : INFORMATION RETRIEVAL No. of Questions = 2
Weightage : 30%

Note:
1. Please follow all the Instructions to Candidates given on the cover page of the answer book.
2. All parts of a question should be answered consecutively. Each answer should start from a fresh page.
3. Assumptions made if any, should be stated clearly at the beginning of your answer.

Q1 – 2+5+3+5=15 Marks
A) Give an example of uncertainty and vagueness issues in Information retrieval [2 Marks]

B) Explain the merge algorithm for the query “Information Retrieval”? What is the best order
for query processing for the query “BITS AND Information AND Retrieval”? What
Documents will be returned as output from the 15 documents? [5 Marks]

BITS 2 4 6 7 10 11 15

Information 1 3 5 11

Retrieval 3 7 11

C) Generate code for the similar sounding words “Plain" and “Plane” using Soundex algorithm
[3 Marks]

D) Build inverted index using Blocked sort-based Indexing for 50 million records. Explain the
algorithm in detail with respect to indexing 50 million records. [5 Marks]

Q2 – 5+5+5=15 Marks
A) Assume a corpus of 1000 documents. The following table gives the TF and DF values for
the 3 terms in the corpus of documents. Calculate the logarithmic TF-IDF values. [5 Marks]

Term Doc1 Doc2 Doc3


bits 15 5 20
pilani 2 20 0
mtech 0 20 15

Term dft

bits 2000
pilani 1500
mtech 500

Page 1 of 2
B) Classify the test document d6 into c1 or c2 using naïve bayes classifier. The documents in
the training set and the appropriate class label is given below. [5 Marks]

Docid Words in document c= c1 c= c2

Training Set d1 positive Yes No

d2 Very positive Yes No

d3 Positive very positive Yes No

d4 very negative No Yes

d5 negative No Yes

Test Set d6 Negative positive very ? ?


positive

C) The search engine ranked results on 0-5 relevance scale: 2, 2, 3, 0, 5. Calculate the NDCG
metric for the same. [5 Marks]

Page 2 of 2

You might also like