You are on page 1of 36

INFORMATION RETRIEVAL

AND SEMANTIC WEB


16B1NCI648
• Probabilistic Information Retrieval
WHY PROBABILITIES IN IR?

User Query
Information Need Understanding
Representation
of user need is
uncertain
How to match?

Uncertain guess of
Document whether document has
Documents Representation relevant content

In vector space model (VSM), matching between each document and query is at-
tempted in a semantically imprecise space of index terms.
Probabilities provide a principled foundation for uncertain reasoning. Can we use
probabilities to quantify our uncertainties?
PROBABILISTIC IR TOPICS

• Classical probabilistic retrieval model


• Probability ranking principle, (PRP) etc.
• Binary independence model (BIM)(≈ Naïve Bayes text cat)
• Bayesian networks for text retrieval

• Probabilistic methods are one of the oldest but also one of the
currently hottest topics in IR.
THE DOCUMENT RANKING PROBLEM

• We have a collection of documents


• User issues a query
• A list of documents needs to be returned
• Ranking method is the core of an IR system:
• In what order do we present documents to the user?
• We want the “best” document to be first, second best second, etc….

• Idea: Rank by probability of relevance of the document w.r.t.


information need
• P(R=1|documenti, query)
PROBABILISTIC RETRIEVAL

• Information Need: Taj Mahal


• Let a query q be “Taj”
• Let the results be:
• d1: Taj
• d2: Taj Mahal
• d3: Taj Tea
• Two judges were asked to provide relevance judgments:
Document Judge 1 Judge 2

Taj R N
Taj Mahal R R
Taj Tea N N
PROBABILITY OF RELEVANCE

• Documents can have probability of being relevant and of


being non-relevant at the same time.
• Example:
• Documents in our collection :
R = 0  Non-Relevant
Document P(R=0|d,q) P(R=1|d,q) R = 1  Relevant
Taj 0.5 0.5
Taj Mahal ? ?
Taj Tea ? ?
PROBABILITY OF RELEVANCE

• Documents can have probability of being relevant and


of being non-relevant at the same time.
• Example:
• Documents in our collection :
Document P(R=0|d,q) P(R=1|d,q) R = 0  Non-Relevant
Taj 0.5 0.5 R = 1  Relevant
Taj Mahal 0 1
Taj Tea 1 0
Probability Ranking Principle (PRP)

Rank documents by the probability of


relevance, P(R=1|q,d) R{0,1}
THE PROBABILITY RANKING PRINCIPLE
(PRP)

 Goal: overall effectiveness to be the best obtainable on the


basis of the available data
 Approach: Rank the documents in the collection in order of
decreasing probability of relevance to the user who submitted
the request
– Assumption: the probabilities are estimated as accurately
as possible on the basis of whatever data have been made
available to the system
Documents are ranked based on the probability of them
being relevant to the query.
–P(R|D) – The probability of relevance given a
document D.
Assumes that the probability depends on the query and
document representations only.
PROBABILITY RANKING PRINCIPLE (PRP)

Rank documents by the probability of relevance,


P(R=1|q,d) R{0,1}
Document P(R=0|d,q) P(R=1|d,q) R = 0  Non-Relevant
Taj 0.5 0.5 R = 1  Relevant
Taj Mahal 0 1
Taj Tea 1 0 Search Result:
1. Taj Mahal
2. Taj
3. Taj Tea
BAYES OPTIMAL DECISION RULE

Bayes Decision rule


Document is relevant if P(R=1|d,q) > P(R=0|d,q)

Document P(R=0|d,q) P(R=1|d,q)


Taj 0.5 0.5
Taj Mahal 0 1
Taj Tea 1 0 Search Result:
1. Taj Mahal
PREDICTING RELEVANCE

Document P(R=0|d,q) P(R=1|d,q)


Taj 0.5 0.5 This is user
Taj Mahal 0 1 given
relevance
Taj Tea 1 0

Can we predict using text


based occurrence
BINARY INDEPENDENCE MODEL
(BIM)
• BIM Models based on fact that (Assumptions)
• Each document is a binary vector of terms. (No term weights,
binary vectors )
• Occurrence of terms is mutually independent. (Term
independence)
• The probability of a query term occurring in the relevant set often
estimated based on the size of the vocabulary and the number of
documents that include the query term.
• P(D|R) can be estimated by the product of the individual term
probabilities.
BAYESIAN CLASSIFICATION
• A statistical classifier: performs probabilistic prediction, i.e., predicts
class membership probabilities

• Foundation: Based on Bayes’ Theorem.

• Performance: A simple Bayesian classifier, naïve bayesian classifier,


has comparable performance with decision tree and selected neural
network classifiers

• Standard: Even when Bayesian methods are computationally


intractable, they can provide a standard of optimal decision making
against which other methods can be measured
16
BAYES’ THEOREM: BASICS
M
• Total probability Theorem: P( B)   P( B | A ) P( A )
i i
i 1

• Bayes’ Theorem: P( H | X)  P(X | H ) P( H )  P(X | H ) P( H ) / P(X)


P(X)

• Let X be a data sample (“evidence”): class label is unknown


• Let H be a hypothesis that X belongs to class C
• Classification is to determine P(H|X), (i.e., posteriori probability): the probability that
the hypothesis holds given the observed data sample X
• P(H) (prior probability): the initial probability
• E.g., X will buy computer, regardless of age, income, …
• P(X): probability that sample data is observed
• P(X|H) (likelihood): the probability of observing the sample X, given that the hypothesis
holds
• E.g., Given that X will buy computer, the prob. that X is 31..40, medium
income 17
PREDICTION BASED ON BAYES’ THEOREM

• Given training data X, posteriori probability of a hypothesis H, P(H|


X), follows the Bayes’ theorem

P( H | X)  P(X | H ) P( H )  P(X | H ) P( H ) / P(X)


P(X)

• Informally, this can be viewed as


posteriori = likelihood x prior/evidence

• Predicts X belongs to Ci iff the probability P(Ci|X) is the highest


among all the P(Ck|X) for all the k classes

• Practical difficulty: It requires initial knowledge of many


probabilities, involving significant computational cost
18
NAÏVE BAYES CLASSIFIER: TRAINING DATASET

age income studentcredit_rating


buys_computer
<=30 high no fair no
Class: <=30 high no excellent no
C1:buys_computer = 31…40 high no fair yes
‘yes’ >40 medium no fair yes
C2:buys_computer = ‘no’ >40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
Data to be classified: <=30 medium no fair no
X = (age <=30, <=30 low yes fair yes
Income = medium, >40 medium yes fair yes
Student = yes <=30 medium yes excellent yes
Credit_rating = Fair) 31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
19
NAÏVE BAYES CLASSIFIER: AN EXAMPLE age income studentcredit_rating
buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes

• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643


<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes

P(buys_computer = “no”) = 5/14= 0.357


31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

• Compute P(X|Ci) for each class


P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

20
• X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(X|Ci) :
P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019

P(X|Ci)*P(Ci) :
P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007

Therefore, X belongs to class (“buys_computer = yes”)


Example 1
10% of patients in a clinic have liver disease. Five percent of the clinic’s
patients are alcoholics. Amongst those patients diagnosed with liver
disease, 7% are alcoholics. You are interested in knowing the probability of
a patient having liver disease, given that he is an alcoholic.
Example 2

A disease occurs in 0.5% of the population


A diagnostic test gives a positive result in:
◦ 99% of people with the disease
◦ 5% of people without the disease (false positive)
A person receives a positive result
What is the probability of them having the disease, given a
positive result?
We know:

= 0.99
= 0.05
= ???
Where:
chance of having the disease
chance of not having the disease
Remember:
chance of positive test given that disease is present
chance of positive test given that the disease isn’t present
Therefore:
Question 3:
It rains on 20% of days.
When it rains, it was forecasted 80% of the time
When it doesn’t rain, it was erroneously forecasted 10% of the time.

The weatherman forecasts rain. What’s the probability of it actually raining?


It rains on 20% of days.
When it rains, it was forecasted 80% of the time
When it doesn’t rain, it was erroneously forecasted 10% of the time.

The weatherman forecasts rain. What’s the probability of it actually raining?

A = forecast rain
B = it rains

What information is given in the story?

P(B) = 0.2 (prior)


P(A|B) = 0.8 (likelihood)
P(A|~B) = 0.1

P(B|A) = P(A|B) * P(B) / P(A)

What is P(A), probability of rain forecast? Calculate over all possible values of B (marginal probability)
P(A|B) * P(B) + P(A|~B) * P(~B) = 0.8 * 0.2 + 0.1 * 0.8 = 0.24

P(B|A) = 0.8 * 0.2 / 0.24


= 0.67

So before you knew anything you thought P(rain) was 0.2. Now that you heard the weather forecast, you adjust your
expectation upwards P(rain|forecast) = 0.67
PREDICTING RELEVANCE
Discrimination

• Odds of Relevance is easier to calculate


Constant term.

• In BIM, we assume that the term occurrence is mutually


independent
RETRIEVAL STATUS VALUE

Not document specific.


Constant for a query.

“We can manipulate this


expression by including the
query terms found in the
document into the right
product, but simultaneously
RSV =
dividing through by them in the
left product, so the value is
unchanged” - CPS.

RSV is used for ranking documents.

Read Section 11.3.1 of CPS.


DISCRIMINATION
• Consider 5 documents and a query Q having terms t1 and t2 Rank
the documents based on discrimination.
REFERENCES

• Christopher D. Manning, Prabhakar Raghavan and Hinrich


Schütze, “An introduction to Information Retrieval”, 2013
Cambridge University Press UP.

You might also like