Professional Documents
Culture Documents
User Query
Information Need Understanding
Representation
of user need is
uncertain
How to match?
Uncertain guess of
Document whether document has
Documents Representation relevant content
In vector space model (VSM), matching between each document and query is at-
tempted in a semantically imprecise space of index terms.
Probabilities provide a principled foundation for uncertain reasoning. Can we use
probabilities to quantify our uncertainties?
PROBABILISTIC IR TOPICS
• Probabilistic methods are one of the oldest but also one of the
currently hottest topics in IR.
THE DOCUMENT RANKING PROBLEM
Taj R N
Taj Mahal R R
Taj Tea N N
PROBABILITY OF RELEVANCE
P(X|Ci) :
P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) :
P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
= 0.99
= 0.05
= ???
Where:
chance of having the disease
chance of not having the disease
Remember:
chance of positive test given that disease is present
chance of positive test given that the disease isn’t present
Therefore:
Question 3:
It rains on 20% of days.
When it rains, it was forecasted 80% of the time
When it doesn’t rain, it was erroneously forecasted 10% of the time.
A = forecast rain
B = it rains
What is P(A), probability of rain forecast? Calculate over all possible values of B (marginal probability)
P(A|B) * P(B) + P(A|~B) * P(~B) = 0.8 * 0.2 + 0.1 * 0.8 = 0.24
So before you knew anything you thought P(rain) was 0.2. Now that you heard the weather forecast, you adjust your
expectation upwards P(rain|forecast) = 0.67
PREDICTING RELEVANCE
Discrimination