You are on page 1of 35

Basics on Probability

Jingrui He
09/11/2007

Coin Flips


You flip a coin




Head with probability 0.5

You flip 100 coins




How many heads would you expect

Coin Flips cont.




You flip a coin






Head with probability p


Binary random variable
Bernoulli trial with success probability p

You flip k coins






How many heads would you expect


Number of heads X: discrete random variable
Binomial distribution with parameters k and p

Discrete Random Variables




Random variables (RVs) which may take on


only a countable number of distinct values


E.g. the total number of heads X you get if you


flip 100 coins

X is a RV with arity k if it can take on exactly


one value out of { x1 ,K , xk }


E.g. the possible values that X can take on are 0,


1, 2,, 100

Probability of Discrete RV



Probability mass function (pmf): P ( X = xi )


Easy facts about pmf


P(X = x ) = 1
(
(

)
) (

 P X = xi X = x j = 0 if i j
 P X = xi X = x j = P X = xi + P X = x j
 P X = x1 X = x2 K X = xk = 1

) if i j

Common Distributions
 Uniform X U [1,K , N ]


X takes values 1, 2, , N

 P X = i =1 N


E.g. picking balls of different colors from a box

Binomial X


Bin ( n, p )

X takes values 0, 1, , n

n i
n i
 P ( X = i ) = p (1 p )
i


E.g. coin flips

Coin Flips of Two Persons




Your friend and you both flip coins






Head with probability 0.5


You flip 50 times; your friend flip 100 times
How many heads will both of you get

Joint Distribution


Given two discrete RVs X and Y, their joint


distribution is the distribution of X and Y
together


E.g. P(You get 21 heads AND you friend get 70


heads)

P (X = x Y = y) = 1

E.g.


50

100

i =0

j =0

P ( You get i heads AND your friend get j heads ) = 1

Conditional Probability


P ( X = x Y = y ) is the probability of X = x ,
given the occurrence of Y = y


E.g. you get 0 heads, given that your friend gets


61 heads

P (X = x Y = y) =

P (X = x Y = y)
P (Y = y)

Law of Total Probability




Given two discrete RVs X and Y, which take


values in { x1 ,K , xm } and { y1 ,K , yn } , We have

P (X = x Y = y )
= P ( X = x Y = y )P ( Y = y )

P ( X = xi ) =

Marginalization
Marginal Probability

Joint Probability

P (X = x Y = y )
= P ( X = x Y = y )P ( Y = y )

P ( X = xi ) =

Conditional Probability

Marginal Probability

Bayes Rule


X and Y are discrete RVs


P (X = x Y = y) =

P X = xi Y = y j =

P (X = x Y = y)
P (Y = y)

P Y = y j X = xi P ( X = xi )

P (Y = y
k

X = xk P ( X = xk )

Independent RVs


Intuition: X and Y are independent means that


X = x neither makes it more or less probable
that Y = y
Definition: X and Y are independent iff
P (X = x Y = y) = P (X = x) P (Y = y)

More on Independence


P (X = x Y = y ) = P (X = x) P (Y = y)

P (X = x Y = y ) = P (X = x)


P (Y = y X = x) = P (Y = y)

E.g. no matter how many heads you get, your


friend will not be affected, and vice versa

Conditionally Independent RVs




Intuition: X and Y are conditionally


independent given Z means that once Z is
known, the value of X does not add any
additional information about Y
Definition: X and Y are conditionally
independent given Z iff

P (X = x Y = y Z = z) = P (X = x Z = z) P (Y = y Z = z)

More on Conditional Independence


P (X = x Y = y Z = z) = P (X = x Z = z) P (Y = y Z = z)
P ( X = x Y = y, Z = z ) = P ( X = x Z = z )
P ( Y = y X = x, Z = z ) = P ( Y = y Z = z )

Monty Hall Problem







You're given the choice of three doors: Behind one


door is a car; behind the others, goats.
You pick a door, say No. 1
The host, who knows what's behind the doors, opens
another door, say No. 3, which has a goat.
Do you want to pick door No. 2 instead?

Host reveals
Goat A
or
Host reveals
Goat B

Host must
reveal Goat B

Host must
reveal Goat A

Monty Hall Problem: Bayes Rule




Ci : the car is behind door i, i = 1, 2, 3


P ( Ci ) = 1 3

H ij : the host opens door j after you pick door i

P H ij Ck

i= j
0
0
j=k

=
i=k
1 2
1 i k , j k

Monty Hall Problem: Bayes Rule cont.




WLOG, i=1, j=3


P ( C1 H13 ) =

P ( H13

P ( H13 C1 ) P ( C 1 )
P ( H13 )

1 1 1
C1 ) P ( C1 ) = =
2 3 6

Monty Hall Problem: Bayes Rule cont.




P ( H13 ) = P ( H13 , C1 ) + P ( H13 , C2 ) + P ( H13 , C3 )


= P ( H13 C1 ) P ( C1 ) + P ( H13 C2 ) P ( C2 )

1
1
= + 1
6
3
1
=
2
16 1
P ( C1 H13 ) =
=
12 3

Monty Hall Problem: Bayes Rule cont.






16 1
P ( C1 H13 ) =
=
12 3
1 2
P ( C2 H13 ) = 1 = > P ( C1 H13 )
3 3

You should switch!

Continuous Random Variables





What if X is continuous?
Probability density function (pdf) instead of
probability mass function (pmf)
A pdf is any function f ( x ) that describes the
probability density in terms of the input
variable x.

PDF


Properties of pdf



f ( x ) 0, x

f ( x) = 1

( )

 f x 1 ???

Actual probability can be obtained by taking


the integral of pdf


E.g. the probability of X being between 0 and 1 is

P ( 0 X 1) =

f ( x )dx

Cumulative Distribution Function




FX ( v ) = P ( X v )

Discrete RVs

 FX ( v ) =

vi

P ( X = vi )

Continuous RVs

( )

 FX v =


f ( x ) dx

d
FX ( x ) = f ( x )
dx

Common Distributions

N ,

Normal X

( )

 f x =


x
)
1
(
exp
, x
2
2
2

E.g. the height of the entire population


0.4
0.35
0.3
0.25
f(x)

0.2
0.15
0.1
0.05
0
-5

-4

-3

-2

-1

0
x

Common Distributions cont.


Beta X

Beta ( , )

1
1
1
x (1 x ) , x [ 0,1]
 f ( x; , ) =
B ( , )
 = = 1 : uniform distribution between 0 and 1


E.g. the conjugate prior for the parameter p in


Binomial distribution
1.6
1.4
1.2
1
f(x)

0.8
0.6
0.4
0.2
0

0.1

0.2

0.3

0.4

0.5
x

0.6

0.7

0.8

0.9

Joint Distribution


Given two continuous RVs X and Y, the joint


pdf can be written as f X,Y ( x, y )

x y

f X,Y ( x, y )dxdy = 1

Multivariate Normal


Generalization to higher dimensions of the


one-dimensional normal
Covariance Matrix

f Xv

( x1 ,K , xd ) =

( 2 )

d 2

12

T 1 v
1 v

exp ( x ) ( x )
2

Mean

Moments


Mean (Expectation): = E ( X )



Discrete RVs: E ( X ) = vi P ( X = vi )
Continuous RVs: E ( X ) =

vi

Variance: V ( X ) = E ( X )



xf ( x ) dx

Discrete RVs: V ( X ) = ( vi )2 P ( X = vi )
vi

Continuous RVs: V ( X ) =

( x ) f ( x )dx
2

Properties of Moments


Mean

(
) ( )
E ( aX ) = aE ( X )

( )

 E X+Y = E X +E Y



If X and Y are independent, E ( XY ) = E ( X ) E ( Y )

Variance

( )

 V aX + b = a V X


If X and Y are independent, V ( X + Y ) = V (X) + V (Y)

Moments of Common Distributions


 Uniform X U [1,K , N ]
 Mean (1 + N ) 2 ; variance ( N 1) 12
2

Binomial X


Mean np ; variance np 2

Normal X


N , 2

Mean ; variance 2

Beta X


Bin ( n, p )

Beta ( , )

Mean ( + ) ; variance

( + ) ( + + 1)
2

Probability of Events


X denotes an event that could possibly happen




P(X) denotes the likelihood that X happens,


or X=true


E.g. X=you will fail in this course

Whats the probability that you will fail in this


course?

denotes the entire event set

 = X, X

The Axioms of Probabilities




0 <= P(X) <= 1

P () = 1
P ( X1 X 2 K) =

disjoint events
Useful rules

P ( X ) , where X

( )
P (X) = 1 P (X)

 P X1 X 2 = P X1 + P X 2 P X1 X 2

are

Interpreting the Axioms

X1

X2

You might also like