You are on page 1of 8

 

Information Retrieval 
ASSIGNMENT-2  
 

DEVANSHU MODI 
179301064 

Sub: Information Retrieval (CS1759)  

   

 
 

Q1.) Explain the concept of the Probability Ranking Principle in 


Information Retrieval. 

If the Information Retrieval system’s response to each query is a 


ranking of the documents in order of decreasing probability of 
relevance to the query, where the probabilities are estimated as 
accurately as possible on the basis of whatever data have been made 
available to the system for this purpose, the overall effectiveness of the 
system to its use will be the best that is obtainable on the basis of 
those data. 

● Let d represent a document in the collection.  


● Let R represent the relevance of a document w.r.t. to a query q  
● Let R=1 represent relevant and R=0 not relevant.  
● Our goal is to estimate: 
p(d,q | r=1) * p(r=1)
○ p(r = 1| q, d) = p(d,q)  
p(d,q | r=0) * p(r=0)
○ p(r = 0| q, d) = p(d,q)  
● PRP in action: Rank all documents by 𝑝(𝑟 = 1 | 𝑞, 𝑑) 
○ Theorem: Using the PRP is optimal, in that it minimizes the 
loss (Bayes risk) under 1/0 loss. 
○ Provable if all probabilities correct, etc.  
● Using odds, we reach a more convenient formulation of ranking : 
p(r=1 | q,d)
○ O(R | q, d) = p(r=0 | q,d)  

Q2.) Explain Language Modeling Versus other approaches in 


information retrieval.  


 

The Language Modeling Approach provides a different approach to 


scoring matches between queries and documents, and the hope is that 
the probabilistic language modeling foundation improves the weights 
that are used, and hence the performance of the model. The major 
issue is the estimation of the document model, such as choices of how 
to smooth it effectively. The model has achieved very good retrieval 
results. Compared to other probabilistic approaches, such as BIM, the 
main difference initially appears to be that the LM approach does away 
with explicitly modeling relevance (whereas this is the central variable 
evaluated in the BIM approach). But this may not be the correct way to 
think about things. The LM approach assumes that documents and 
expressions of information needs are objects of the same type, and 
assesses their match by importing the tools and methods of language 
modeling from speech and natural language processing. The resulting 
model is mathematically precise, conceptually simple, computationally 
tractable, and intuitively appealing. This seems similar to the situation 
with XML retrieval: there the approaches that assume queries and 
documents are objects of the same type are also among the most 
successful. 

On the other hand, like all IR models, you can also raise objections to 
the model. The assumption of equivalence between document and 
information need representation is unrealistic. Current LM approaches 
use very simple models of language, usually unigram models. Without 
an explicit notion of relevance, relevance feedback is difficult to 
integrate into the model, as are user preferences. It also seems 
necessary to move beyond a unigram model to accommodate notions 
of phrase or passage matching or Boolean retrieval operators. 


 

Subsequent work in the LM approach has looked at addressing some 


of these concerns, including putting relevance back into the model and 
allowing a language mismatch between the query language and the 
document language. 

The model has significant relations to traditional tf-idf models. Term 


frequency is directly represented in tf-idf models, and much recent 
work has recognized the importance of document length normalization. 
The effect of doing a mixture of document generation probability with 
collection generation probability is a little like idf: terms rare in the 
general collection but common in some documents will have a greater 
influence on the ranking of documents. In most concrete realizations, 
the models share treating terms as if they were independent. On the 
other hand, the intuitions are probabilistic rather than geometric, the 
mathematical models are more principled rather than a heuristic, and 
the details of how statistics like term frequency and document length 
are used differ. 


 

Q3.) Explain the following with an example:  


a. Text Classification and Naive Bayes.  
b. Vector Space Classification and K nearest neighbor.  

a.)​ The Naive Bayes classifier is a simple classifier that classifies based 
on probabilities of events. It is applied commonly to text classification. 
Though it is a simple algorithm, it performs well in many text 
classification problems. 

Example: 

Our example will consist of four questions and statements. 

We need to find out if a new sentence, say, ‘what is the price of the 
book’ is a question or not.  
Bayes’ Theorem: 


 

We need to find out which class has a bigger probability for the new 
sentence. i.e., we need to find which is bigger 
P (Stmt | W hat is the price of the book) or P (Question | W hat is the price of the book)  
P (W hat is the price of the book | Stmt) * P (Stmt)
P (Stmt | W hat is the price of the book) = P (W hat is the price of the book)  
P (W hat is the price of the book | Question) * P (Question)
P (Question | W hat is the price of the book) = P (W hat is the price of the book)
N umber of sentences in Stmt Class
P (Stmt) = T otal number of sentences = 0.5
N umber of sentences in Stmt Class
P (Stmt) = T otal number of sentences = 0.5   
P (W hat is the price of the book | Stmt) = P (W hat | Stmt) x P (is | Stmt) x P (the | Stmt)  
P (price | Stmt) x P (of | Stmt) x P (the | Stmt) x P (book | Stmt)  
P (W hat is the price of the book | Question) = P (W hat | Question) x P (is | Question)  
x P (the | Question) x P (price | Question) x P (of | Question)  
x P (the | Question) x P (book | Stmt)  

Using the frequencies of the words we can calculate their respective 


probabilities as  
In Statement class 


 

In Question class: 

Therefore, 
P (W hat is the price of the book | Stmt) = 1.2583314328
P (W hat is the price of the book | Question) = 1.7624289971  
P (Stmt | W hat is the price of the book) = 1.2583314328 * 0.5 = 6.29165  
P (Stmt | W hat is the price of the book) = 1.7624289971 * 0.5 = 8.81214  

Therefore the new sentence ‘What is the price of the book’ will be 
classified as ‘Question’. 

  


 

Q4.) How Machine Learning Methods are used in ad hoc 


Informational Retrieval  

Rather than coming up with term and document weighting functions by 
hand, we can view different sources of relevance signal (cosine score, 
title match, etc.) as features in a learning problem. A classifier that has 
been fed examples of relevant and nonrelevant documents for each of 
a set of queries can then figure out the relative weights of these 
signals. If we configure the problem so that there are pairs of a 
document and a query which are assigned a relevance judgment of 
relevant or nonrelevant, then we can think of this problem too as a text 
classification problem. Taking such a classification approach is not 
necessarily best, and we present an alternative. Nevertheless, given 
the material we have covered, the simplest place to start is to approach 
this problem as a classification problem, by ordering the documents 
according to the confidence of a two-class classifier in its relevance 
decision. And this move is not purely pedagogical; exactly this 
approach is sometimes used in practice. 

You might also like