You are on page 1of 29

Probabilistic

thinking
Nur Aini Masruroh
Events

§ Event is a distinction about some states of the


world
§ Example:
§ Whether the next person entering the room is a heavy
smoker
§ The date of the next general election
§ Whether it will be raining tonight
§ Our next head of department
§ Etc
Clarity test

§ When we identify an event, we have in mind what we meant. But will


other people know precisely what you mean?
§ Even you may not have precise definition of what you have in mind
§ To avoid ambiguity, every event should pass the clarity test
§ Clarity test: to ensure that we are absolutely clear and precise about the
definition of every event we are dealing with in a decision problem
§ The clarity test is conducted by submitting our definition of each
event to a clairvoyant
§ A clairvoyant is a hypothetical being who is:
§ Competent and trustworthy
§ Knows the outcome of any past and future event
§ Knows the value of any physically defined quantity both in the past and future
§ Has infinite computational (mental) power and is able to perform any reasoning
and computation instantly and without any effort
Clarity test (cont’d)

§ Passing the clarity test:


§ If and only if the clairvoyant can tell its outcome
without any further judgment
§ Example:
§ The next person entering this room is a heavy smoker
• What is a heavy smoker?
§ The next person entering this room is a graduate
• What is a graduate?
Possibility tree

§ Single event tree


§ Example: event “the next person entering this room is
a businessman”
§ Suppose B represents a businessman and B’
otherwise,
Possibility tree

§ Two-event trees
§ Simultaneously consider several events
§ Example: event “the next person entering this room is
a businessman” and event “the next person entering
this room is a graduate” can be jointly considered
Reversing the order of events in a tree

§ In the previous example, we have considered the


distinctions in the order of “businessman” then
“graduate”, i.e., B to G.
§ The same information can be expressed with the events
in the reverse order, i.e., G to B.
Multiple event trees

§ We can jointly consider three events businessman,


graduate, and gender.
Using probability to represent uncertainty

Probability:
§ Frequentist view
§ Probabilities are fundamentally dispositional properties of non-
deterministic physical systems
§ Probabilities are viewed as long-run frequencies of events
§ This is the standard interpretation used in classical statistics
§ Subjective (Bayesian) view
§ Probabilities are representations of our subjective degree of
belief
§ Probabilities in general are not necessarily ties to any physical or
process which can be repeated indefinitely
Assigning probabilities to events

§ To assign probabilities, it depends on our state of information about


the event
§ Example: information relevant to assessment of the likelihood that
the next person entering the room is a businessman might include
the followings:
§ There is an alumni meeting outside the room and most of them are
businessman
§ You have made arrangement to meet a friend here and she to your
knowledge is not a businessman. She is going to show up any moment.
§ Etc
§ After considering all relevant background information, we assign the
likelihood that the next person entering the room is a businessman
by assigning a probability value to each of the possibilities or
outcomes
Marginal and conditional probabilities

§ In general, given information about the outcome of some events, we


may revise our probabilities of other events
§ We do this through the use of conditional probabilities
§ The probability of an event X given specific outcomes of another
event Y is called the conditional probability X given Y
§ The conditional probability of event X given event Y and other
background information ξ, is denoted by p(X|Y, ξ) and is given by

p( X ∩ Y | ξ )
p( X | Y , ξ ) = for p(Y | ξ ) ≠ 0
p(Y | ξ )
Factorization rule for joint probability
Changing the order of conditioning

§ Suppose in the previous tree we have

There is no reason why we should always conditioned G on B.


suppose we want to draw the tree in the order G to B

Need to flip the


tree!
Flipping the tree

§ Graphical approach
§ Change the ordering of the underlying possibility tree
§ Transfer the elemental (joint) probabilities from the original tree
to the new tree
§ Compute the marginal probability for the first variable in the new
tree, i.e., G. We add the elemental probabilities that are related
to G1 and G2 respectively.
§ Compute conditional probabilities for B given G

§ Bayes’ theorem
§ Doing the above tree flipping is already applying Bayes’theorem
Bayes’ Theorem

§ Given two uncertain events X and Y. Suppose the


probabilities p(X|ξ) and p(Y|X, ξ) are known, then

p( X | ξ ) p(Y | X , ξ )
p( X | Y , ξ ) =
p(Y | ξ )
where
p(Y | ξ ) = ∑ p( X | ξ ) p(Y | X | ξ )
X
Probabilistic dependency or relevance

§ Let
§ A be an event with n possible outcomes ai, i=1,…,n
§ B be an event with m possible outcomes bj ,j=1,…,m
§ Event A is said to be probabilistically dependent on event B if
p(A|bj, ξ) ≠ p(A|bk, ξ) for some j ≠ k
§ The conditional probability of A given B is different for different
outcomes or realizations of event B. we also say that B is relevant to A
§ Event A is said to be probabilistically independent on event B if
p(A|bj, ξ) = p(A|bk, ξ) for all j = k
§ The conditional probability of A given B is the same for all outcomes or
realizations of event B. we also say that B is irrelevant to A
§ In fact, if A is independent of B, then p(A|B, ξ) = p(A|ξ)
§ Intuitively, independence means knowing the outcome of one event
does not provide any information on the probability of outcomes of
the other event
Joint probability distribution of independent events

§ In general, the joint probability distribution for any two


uncertain events A and B is
p(A, B|ξ)=p(A|B, ξ)p(B| ξ)
§ If A and B are independent, then since p(A|B,ξ)=p(A| ξ),
we have
p(A, B|ξ)=p(A|ξ) p(B|ξ)
§ The joint probability of A and B is simply the product of
their marginal probabilities
§ In general, the joint probability for n mutually
independent events is
p(X1, X2, …, Xn|ξ)=p(X1|ξ) p(X2|ξ)… p(Xn-1|ξ) p(Xn|ξ)
Conditional independence or relevance

§ Suppose given 2 events, A and B, and they are found to be not


independent
§ Introduce event C with 2 outcomes, c1 and c2
§ If C=c1 is true, and we have
p(A|B, c1, ξ)=p(A|c1, ξ)
§ If C=c2 is true, we have
p(A|B, c2, ξ)=p(A|c2, ξ)
§ Then we say that event A is conditionally independent of event B
given event C
§ Definition (Conditional Independence):
given 3 distinct events A, B, and C, if p(A|B, ck, ξ)=p(A|ck, ξ) for all k,
that is the conditional probability table (CPT) for A given B and C
repeats for all possible realizations of C, then we say that A and B
are conditional independent given C, and denote by A ⊥ B | C
Conditional independence (cont’d)

§ If A ⊥ B | C then p(A|B, C, ξ)=p(A|C, ξ)


§ Example:
Given the following conditional probabilities:
p(a1|b1, c1)= 0.9 p(a2|b1, c1)= 0.1
p(a1|b2, c1)= 0.9 p(a2|b2, c1)= 0.1
p(a1|b1, c2)= 0.8 p(a2|b1, c2)= 0.2
p(a1|b2, c2)= 0.8 p(a2|b2, c2)= 0.2

we conclude that A ⊥ B | C

§ Note that A is not (marginally) independent of B unless we can show


that p(a1|b1) = p(a1|b2) with more information
Join probability distribution of conditional
probability distribution

§ Recall, by factorization rule, the joint probability for A, B,


and C is
p(A, B, C|ξ)= p(A| B, C, ξ)p(B|C, ξ)p(C| ξ)

§ If A is independent of B given C, then since


p(A| B, C, ξ) = p(A|C, ξ) we have
p(A, B, C|ξ)= p(A| C, ξ)p(B|C, ξ)p(C| ξ)
Application of conditional probability

Direct conditioning: Relevance of smoking to lung cancer


§ Suppose:
S: A person is a heavy smoker which is defined as having smoked
at least two packs of cigarettes per day for a period of at least 10
years during a lifetime
L: A person has lung cancer according to standard medical
definition

§ A doctor not associated with lung cancer treatment assigned the


following probabilities:
Relevance of smoking to lung cancer (cont’d)

§ A lung cancer specialist remarked: “The probability p(L1|S1, ξ) = 0.1


is too low”
§ When asked to explain why, he said:
“Because in all these years as a lung cancer specialist, whenever I
visited my lung cancer ward, it is always full of smokers.”
§ What’s wrong with the above statement?
§ The answer can be found by flipping the tree:
Relevance of smoking to lung cancer (cont’d)

§ What the specialist referred to as “high” is actually the


probability of a person being a smoker given that he has
lung cancer, i.e., p(S1|L1, ξ) = 0.769 is exactly what he
was referring to.
§ He has confused p(S1|L1, ξ) with p(L1|S1, ξ)
§ Notice that p(L1|S1, ξ) << p(S1|L1, ξ)
§ Hence even highly a trained professional can fall victim
to wrong reasoning
Let’s Make a Deal Game Show

Rules:
§ Consider the TV game show where the contestant is shown o stage
three boxes, one of which contains a valuable prize; the other two
are empty
§ The rules of the game are that the contestant first chooses one of
the boxes. Then, the game show host who knows the location of the
prize opens one of the remaining two boxes, making sure to open
an empty one.
§ the contestant then gets to decide if he wants to stick with his initial
selection or switch to the remaining unopened box.
§ If the prize is in the box that he finally chooses, he wins the prize
Question:
If at the start of the game the contestant chose box A and the host
open box B, should the contestant keep choosing box A or swicth to
box C?
Updating probabilities based on new evidence or information

Recall:
§ p(A|ξ) is probability of event A based on our subjective assessment
of the likelihood of event using any information we have à prior
probability
§ If new information E has arrived, then the probability of A is updated
using Bayes’ Theorem:

p( A) p( E | A)
p( A | E ) =
p( E )

where p(E|A) is called the likelihood function for the evidence E and

p( E ) = ∑ p( A j ) p( E | A j )
j
Example: weather forecast

§ Suppose, the prior probability that it will rain tonight (R1) is 0.6 and it
will not rain (R2) with probability 0.4
§ Suppose we are using information from the weather forecast whose
performance is as follows

§ If the weather station announces that it will rain tonight, what


probability should you assign to the outcome it will indeed rain
tonight?
Weather forecast (cont’d)

Try to use Bayes’ theorem!


Another example

§ In the city, there are only two taxicab companies, the Blue and the
Green. The Blue company operates 90% of all cabs in the city and
the Green company operates the rest. One dark evening, a
pedestrian is killed by a hit-and-run taxicab.
§ There is one witness to the accident. In court, the witness’ ability to
distinguish cab colors in the dark is questioned so he is tested under
conditions similar to those in which the accident occurred. If he is
shown a green cab, he says it is green 80% of the time and blue
20% of the time. If he is shown a blue cab, he says it is blue 80% of
the time and green 20% of the time.
§ The judge believes that the test accurately represents the witness’
performance at the time of the accident.
§ Construct the probability tree representing the judge’s state of
information!
§ If the witness says “ The cab involved in the accident was green.”
What probability should the judge assign to the cab involved in the
accident being green?
Thank you

You might also like