l18 Irsw Pir

INFORMATION RETRIEVAL
AND SEMANTIC WEB

16B1NCI648
• Probabilistic Information Retrieval
WHY PROBABILITIES IN IR?
User Query
Information Need Understanding
Representation
of user need is
uncertain
How to match?
Uncertain guess of
Document whether document has
Documents Representation relevant content
In vector space model (VSM), matching between each document and query is at-
tempted in a semantically imprecise space of index terms.
Probabilities provide a principled foundation for uncertain reasoning. Can we use
probabilities to quantify our uncertainties?
PROBABILISTIC IR TOPICS
• Classical probabilistic retrieval model

• Probability ranking principle, (PRP) etc.
• Binary independence model (BIM)(≈ Naïve Bayes text cat)
• Bayesian networks for text retrieval
• Probabilistic methods are one of the oldest but also one of the
currently hottest topics in IR.
THE DOCUMENT RANKING PROBLEM
• We have a collection of documents

• User issues a query
• A list of documents needs to be returned
• Ranking method is the core of an IR system:
• In what order do we present documents to the user?
• We want the “best” document to be first, second best second, etc….
• Idea: Rank by probability of relevance of the document w.r.t.

information need
• P(R=1|documenti, query)
PROBABILISTIC RETRIEVAL
• Information Need: Taj Mahal

• Let a query q be “Taj”
• Let the results be:
• d1: Taj
• d2: Taj Mahal
• d3: Taj Tea
• Two judges were asked to provide relevance judgments:
Document Judge 1 Judge 2
Taj R N
Taj Mahal R R
Taj Tea N N
PROBABILITY OF RELEVANCE
• Documents can have probability of being relevant and of

being non-relevant at the same time.
• Example:
• Documents in our collection :
R = 0  Non-Relevant
Document P(R=0|d,q) P(R=1|d,q) R = 1  Relevant
Taj 0.5 0.5
Taj Mahal ? ?
Taj Tea ? ?
PROBABILITY OF RELEVANCE
• Documents can have probability of being relevant and

of being non-relevant at the same time.
• Example:
• Documents in our collection :
Document P(R=0|d,q) P(R=1|d,q) R = 0  Non-Relevant
Taj 0.5 0.5 R = 1  Relevant
Taj Mahal 0 1
Taj Tea 1 0
Probability Ranking Principle (PRP)
Rank documents by the probability of

relevance, P(R=1|q,d) R{0,1}
THE PROBABILITY RANKING PRINCIPLE
(PRP)
 Goal: overall effectiveness to be the best obtainable on the

basis of the available data
 Approach: Rank the documents in the collection in order of
decreasing probability of relevance to the user who submitted
the request
– Assumption: the probabilities are estimated as accurately
as possible on the basis of whatever data have been made
available to the system
Documents are ranked based on the probability of them
being relevant to the query.
–P(R|D) – The probability of relevance given a
document D.
Assumes that the probability depends on the query and
document representations only.
PROBABILITY RANKING PRINCIPLE (PRP)
Rank documents by the probability of relevance,

P(R=1|q,d) R{0,1}
Document P(R=0|d,q) P(R=1|d,q) R = 0  Non-Relevant
Taj 0.5 0.5 R = 1  Relevant
Taj Mahal 0 1
Taj Tea 1 0 Search Result:
1. Taj Mahal
2. Taj
3. Taj Tea
BAYES OPTIMAL DECISION RULE
Bayes Decision rule

Document is relevant if P(R=1|d,q) > P(R=0|d,q)
Document P(R=0|d,q) P(R=1|d,q)

Taj 0.5 0.5
Taj Mahal 0 1
Taj Tea 1 0 Search Result:
1. Taj Mahal
PREDICTING RELEVANCE
Document P(R=0|d,q) P(R=1|d,q)

Taj 0.5 0.5 This is user
Taj Mahal 0 1 given
relevance
Taj Tea 1 0
Can we predict using text

based occurrence
BINARY INDEPENDENCE MODEL
(BIM)
• BIM Models based on fact that (Assumptions)
• Each document is a binary vector of terms. (No term weights,
binary vectors )
• Occurrence of terms is mutually independent. (Term
independence)
• The probability of a query term occurring in the relevant set often
estimated based on the size of the vocabulary and the number of
documents that include the query term.
• P(D|R) can be estimated by the product of the individual term
probabilities.
BAYESIAN CLASSIFICATION
• A statistical classifier: performs probabilistic prediction, i.e., predicts
class membership probabilities
• Foundation: Based on Bayes’ Theorem.
• Performance: A simple Bayesian classifier, naïve bayesian classifier,

has comparable performance with decision tree and selected neural
network classifiers
• Standard: Even when Bayesian methods are computationally

intractable, they can provide a standard of optimal decision making
against which other methods can be measured
16
BAYES’ THEOREM: BASICS
M
• Total probability Theorem: P( B)   P( B | A ) P( A )
i i
i 1
• Bayes’ Theorem: P( H | X)  P(X | H ) P( H )  P(X | H ) P( H ) / P(X)

P(X)
• Let X be a data sample (“evidence”): class label is unknown

• Let H be a hypothesis that X belongs to class C
• Classification is to determine P(H|X), (i.e., posteriori probability): the probability that
the hypothesis holds given the observed data sample X
• P(H) (prior probability): the initial probability
• E.g., X will buy computer, regardless of age, income, …
• P(X): probability that sample data is observed
• P(X|H) (likelihood): the probability of observing the sample X, given that the hypothesis
holds
• E.g., Given that X will buy computer, the prob. that X is 31..40, medium
income 17
PREDICTION BASED ON BAYES’ THEOREM
• Given training data X, posteriori probability of a hypothesis H, P(H|

X), follows the Bayes’ theorem
P( H | X)  P(X | H ) P( H )  P(X | H ) P( H ) / P(X)

P(X)
• Informally, this can be viewed as

posteriori = likelihood x prior/evidence
• Predicts X belongs to Ci iff the probability P(Ci|X) is the highest

among all the P(Ck|X) for all the k classes
• Practical difficulty: It requires initial knowledge of many

probabilities, involving significant computational cost
18
NAÏVE BAYES CLASSIFIER: TRAINING DATASET
age income studentcredit_rating

buys_computer
<=30 high no fair no
Class: <=30 high no excellent no
C1:buys_computer = 31…40 high no fair yes
‘yes’ >40 medium no fair yes
C2:buys_computer = ‘no’ >40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
Data to be classified: <=30 medium no fair no
X = (age <=30, <=30 low yes fair yes
Income = medium, >40 medium yes fair yes
Student = yes <=30 medium yes excellent yes
Credit_rating = Fair) 31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
19
NAÏVE BAYES CLASSIFIER: AN EXAMPLE age income studentcredit_rating
buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643

<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
P(buys_computer = “no”) = 5/14= 0.357

31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
• Compute P(X|Ci) for each class

P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
•
20
• X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|Ci) :
P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) :
P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)

Example 1
10% of patients in a clinic have liver disease. Five percent of the clinic’s
patients are alcoholics. Amongst those patients diagnosed with liver
disease, 7% are alcoholics. You are interested in knowing the probability of
a patient having liver disease, given that he is an alcoholic.
Example 2
A disease occurs in 0.5% of the population

A diagnostic test gives a positive result in:
◦ 99% of people with the disease
◦ 5% of people without the disease (false positive)
A person receives a positive result
What is the probability of them having the disease, given a
positive result?
We know:
= 0.99
= 0.05
= ???
Where:
chance of having the disease
chance of not having the disease
Remember:
chance of positive test given that disease is present
chance of positive test given that the disease isn’t present
Therefore:
Question 3:
It rains on 20% of days.
When it rains, it was forecasted 80% of the time
When it doesn’t rain, it was erroneously forecasted 10% of the time.
The weatherman forecasts rain. What’s the probability of it actually raining?

It rains on 20% of days.
When it rains, it was forecasted 80% of the time
When it doesn’t rain, it was erroneously forecasted 10% of the time.
The weatherman forecasts rain. What’s the probability of it actually raining?
A = forecast rain
B = it rains
What information is given in the story?
P(B) = 0.2 (prior)

P(A|B) = 0.8 (likelihood)
P(A|~B) = 0.1
P(B|A) = P(A|B) * P(B) / P(A)
What is P(A), probability of rain forecast? Calculate over all possible values of B (marginal probability)
P(A|B) * P(B) + P(A|~B) * P(~B) = 0.8 * 0.2 + 0.1 * 0.8 = 0.24
P(B|A) = 0.8 * 0.2 / 0.24

= 0.67
So before you knew anything you thought P(rain) was 0.2. Now that you heard the weather forecast, you adjust your
expectation upwards P(rain|forecast) = 0.67
PREDICTING RELEVANCE
Discrimination
• Odds of Relevance is easier to calculate

Constant term.
• In BIM, we assume that the term occurrence is mutually

independent
RETRIEVAL STATUS VALUE
Not document specific.

Constant for a query.
“We can manipulate this

expression by including the
query terms found in the
document into the right
product, but simultaneously
RSV =
dividing through by them in the
left product, so the value is
unchanged” - CPS.
RSV is used for ranking documents.
Read Section 11.3.1 of CPS.

DISCRIMINATION
• Consider 5 documents and a query Q having terms t1 and t2 Rank
the documents based on discrimination.
REFERENCES
• Christopher D. Manning, Prabhakar Raghavan and Hinrich

Schütze, “An introduction to Information Retrieval”, 2013
Cambridge University Press UP.

l18 Irsw Pir

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

l18 Irsw Pir

Uploaded by

Copyright:

Available Formats

INFORMATION RETRIEVAL

AND SEMANTIC WEB

• Classical probabilistic retrieval model

• We have a collection of documents

• Idea: Rank by probability of relevance of the document w.r.t.

• Information Need: Taj Mahal

• Documents can have probability of being relevant and of

• Documents can have probability of being relevant and

Rank documents by the probability of

 Goal: overall effectiveness to be the best obtainable on the

Rank documents by the probability of relevance,

Bayes Decision rule

Document P(R=0|d,q) P(R=1|d,q)

Document P(R=0|d,q) P(R=1|d,q)

Can we predict using text

• Foundation: Based on Bayes’ Theorem.

• Performance: A simple Bayesian classifier, naïve bayesian classifier,

• Standard: Even when Bayesian methods are computationally

• Bayes’ Theorem: P( H | X)  P(X | H ) P( H )  P(X | H ) P( H ) / P(X)

• Let X be a data sample (“evidence”): class label is unknown

• Given training data X, posteriori probability of a hypothesis H, P(H|

P( H | X)  P(X | H ) P( H )  P(X | H ) P( H ) / P(X)

• Informally, this can be viewed as

• Predicts X belongs to Ci iff the probability P(Ci|X) is the highest

• Practical difficulty: It requires initial knowledge of many

age income studentcredit_rating

• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643

P(buys_computer = “no”) = 5/14= 0.357

• Compute P(X|Ci) for each class

Therefore, X belongs to class (“buys_computer = yes”)

A disease occurs in 0.5% of the population

The weatherman forecasts rain. What’s the probability of it actually raining?

The weatherman forecasts rain. What’s the probability of it actually raining?

What information is given in the story?

P(B) = 0.2 (prior)

P(B|A) = P(A|B) * P(B) / P(A)

P(B|A) = 0.8 * 0.2 / 0.24

• Odds of Relevance is easier to calculate

• In BIM, we assume that the term occurrence is mutually

Not document specific.

“We can manipulate this

RSV is used for ranking documents.

Read Section 11.3.1 of CPS.

• Christopher D. Manning, Prabhakar Raghavan and Hinrich

You might also like