Professional Documents
Culture Documents
2
• pmf must satisfy:
p( x, y) 0 for all ( x, y)
x y
p( x, y) 1
• for any event A,
P( X , Y ) A p( x, y)
( x , y )A
3
Joint Probability Table:
Table presenting joint probability distribution:
y
• Entries: p( x, y ) 1 2 3
• P(X = 2, Y = 3) = .13 x 1 .10 .15 .22
2 .30 .10 .13
• P(Y = 3) = .22 + .13 = .35
• P(Y = 2 or 3) = .15 + .10 + .35 =.60
4
• The marginal pmf X and Y are
p X ( x) y p( x, y) and pY ( y) x p( x, y)
y
1 2 3
x 1 .10 .15 .22 .47
2 .30 .10 .13 .53
.40 .25 .35
x 1 2 y 1 2 3
pX(x) .47 .53 pY(y) .40 .25 .35
5
II. Both continuous (p. 186)
A joint probability density function (pdf) of
X and Y is a function f(x,y) such that
• f(x,y) > 0 everywhere
• .
f ( x, y)dxdy 1
and P[( X , Y ) A] f ( x, y)dxdy
A
6
pdf f is a surface above the (x,y)-plane
• A is a set in the (x,y)-plane.
• P[( X , Y ) A] is the volume of the region over
A under f. (Note: It is not the area of A.)
y
x A
7
Ex) X and Y have joint PDF
f(x,y) = c x y2 if 0 < x < y < 1
= 0 elsewhere.
• Find c. First, draw the region where f > 0.
1 y 1
1 cxy2
dxdy .
y
0 0 0 1
1 1
x
cxy2
dydx 1 y
cxy
2
0 x (not dxdy
x 0
1 y 1 1
dy c / 10
2 2 2 y 4
cxy dxdy c y [.5 x | 0 ]dy c .5 y
0 0 0 0
so, c = 10 1
• Find P(X+Y<1)
y
First, add graph of x + y =1
0 1
.5 y 1 1 y
x
P( X Y 1)
.
10 xy 2
dxdy 10 xy 2
dxdy
0 0 .5 0
.5 1 x 1 x
.5
y 3
x 10 xy dydx 100 x 3 dx
2
0 x
.5
(10 / 3) x((1 x) 3 x 3 )dx .135
0
Marginal pdf (p. 188)
Marginal pdf of Y:
fY ( y )
f ( x, y)dx
Ex) X and Y have joint PDF
f(x,y) = 10 x y2 if 0 < x < y < 1 , and 0 else.
For 0 < y < 1:
y y
0 0
and fY ( y) 0 otherwise.
10
marginal pdf of Y:
fY ( y) 5 y for 0 y 1 and is 0 otherwise.
4
Notes:
and is 0 otherwise.
1. x cannot appear in fY ( y ) (y can‟t be in f X (x) )
2. You must give the ranges; writing fY ( y) 5 y 4
is not enough.
Math convention: writing Y f ( y ) 5 y 4
with no
range means it‟s right for all y, which is very
wrong in this example.
11
Remark: Distribution Functions
• For any pair of jointly distributed RV, the
joint distribution function (cdf) of X and Y
is
F ( x, y) P( X x, Y y)
defined for all (x,y).
• For X,Y are both continuous:
2
f ( x, y ) F ( x, y )
xy
wherever the derivative exists.
12
Extensions for 3 or more RV: by example
X, Y, Z are discrete RV with joint pmf
p( x, y, z) P( X x, Y y, Z z)
marginal pmf of X is
p X ( x) y z p( x, y, z ) (= P(X = x))
13
X, Y, Z are continuous RV with joint pdf f(x,y,z):
marginal pdf of X is
f X ( x) f ( x, y, z )dydz
(joint) marginal pmf of X and Y is
f XY ( x, y ) f ( x, y, z )dz
14
Conditional Distributions & Independence
Marginal pdf of X:
f X ( x) f ( x, y )dy
Marginal pdf of Y:
fY ( y ) f ( x, y)dx
Conditional pdf of X
given Y=y (h(y) > 0) f ( x | y) f ( x, y) / h( y)
17
• Note that idea is the same as in Chap. 2
P( X x, Y y )
P(Y y | X x)
P( X x)
as long as P( X x) 0.
• However, keep in mind that we are defining a
(conditional) prob. dist for Y for a fixed x
18
Example:
y x 1 2
1 2 3 pX(x) .47 .53
x 1 .10 .15 .22 .47
y 1 2 3
2 .30 .10 .13 .53
.40 .25 .35 pY(y) .40 .25 .35
f ( x, y) joint pdf
f Y | X ( y | x)
f X ( x) marginal pdf of condition
as long as f X ( x) 0.
20
Remarks
• ALWAYS: for a cont. RV, prob it‟s in a set A is
the integral of its pdf over A:
no conditional; use the marginal pdf
with a condition; use the right cond‟l pdf
• Interpretation: For cont. X, P(X = x) = 0, so by
Chap 2 rules, P(Y A | X x) is meaningless.
– There is a lot of theory that makes sense of this
– For our purposes, think of it as an
approximation to P(Y A | X x)
that is “given X lies in a tiny interval around x”
21
Ex) X, Y have pdf
f(x,y) = 10 x y2 if 0 < x < y < 1 , and = 0 else.
• Conditional pdf of X given Y= y:
f X |Y ( x | y) f ( x, y) / fY ( y)
We found fY(y) = 5y4 for 0<y<1, = 0 else. So
2
10 xy
f ( x | y) 4
if 0 < x < y < 1
5y
Final Answer: For a fixed y, 0 < y < 1,
fX|Y(x | y) = 2x / y2 if 0 < x < y, and = 0 else.
22
( f(x | y) = 2x / y2 0 < x < y, and = 0 else. )
.2
2 x / .3
2
• P(X < .2 | Y = .3) dx
0
• P(X < .35 | Y = .3) = 1
x
• FX |Y ( x | y) 2t / y dt x / y
2 2 2
if 0 < x < y
0
1
FX|Y(x|y)
y 1
23
Last Time: X, Y have pdf f(x,y)
Marginal pdf of X:
f X ( x) f ( x, y )dy
Marginal pdf of Y:
fY ( y ) f ( x, y)dx
Conditional pdf of X
given Y=y (fY(y) > 0) f X |Y ( x | y) f ( x, y) / fY ( y)
24
Three More Topics
1. Multiplication Rule for pdf:
f ( x, y) f X |Y ( x | y) fY ( y) fY | X ( y | x) f X ( x)
[For events P(A I B) = P(A) P(B|A) = P(B) P(A|B) ]
• Extension, by example:
f ( x, y, z) f X ( x) fY | X ( y | x) f Z | XY ( z | x, y)
[Chap 2: P(A I B I C) = P(A) P(B|A) P(C|A I B) ]
25
2. Independence
Chap. 2: A, B are independent if P(A|B)=P(A)
or equivalently, P(A I B) = P(A)P(B)
• X and Y are independent RV if and only if
f X |Y ( x | y) f X ( x)
for all (x,y) for which f(x,y)>0, or
f ( x, y) f X ( x) fY ( y)
for all (x,y) for which f(x,y)>0.
• i.e., the joint is the product of the marginals.
26
2. Independence
More general: X1, X2, …., Xn are independent if for
every subset of the n variable, their joint pdf is the
product of their marginal pdf‟s.
f ( x1 , x2 ,..., xn ) f1 ( x1 ) f 2 ( x2 )... f n ( xn )
and f ( x1 , x7 , x28 ) f1 ( x1 ) f 7 ( x7 ) f 28 ( x28 ) etc.
27
3. Bayes‟ Thm
P( A | Br ) P( Br )
Chap. 2: P( Br | A)
k
i 1
P ( A | Bi ) P ( Bi )
x
fY | X ( y | x)dy (3 / 4)
x
x 3
dy
(3 / x 4)[ x y y / 3 | 1
3 2 3 x
x
• Don‟t need c f ( y | x ) f ( x, y ) / f ( x, y)dy
Conditional Independence & Learning
(beyond the text)
• Ex) Very large population of people. Let Y be the
unknown proportion having a disease (D). Sample
2 people “at random”, without replacement &
check for D.
• Define X1 = 1 if person 1 has D, X1 =0 if not.
Define X2 for person 2 the same way.
• Note: the X‟s are discrete, Y is continuous
32
Model assumptions
1. Given Y= y, X1 and X2 are conditionally
independent, with PD‟s
1 x1
p1 ( x1 | y) y (1 y) , x1 0,1
x1
1 x
p2 ( x2 | y) y (1 y) , x2 0,1
x 2 2
Hence, p12 ( x1 , x2 | y) p1 ( x1 | y) p2 ( x2 | y)
x1 x2 2 ( x1 x2 )
y (1 y) , x1 & x2 0,1
2. Suppose we know little about Y: Assume fY(y)=1,
0<y<1, and 0 elsewhere.
33
Learn about Y after observing X1 & X2?
Answer
fY | X1X 2 ( y | x1 , x2 ) p12 ( x1 , x2 | y) fY ( y) / p12 ( x1 , x2 )
1
where p12 ( x1 , x2 ) y x1 x2
(1 y ) 2 ( x1 x2 )
dy,
0
x1 & x2 0,1
34
Learn about Y after observing X1 & X2?
Answer
fY | X1X 2 ( y | x1 , x2 ) p12 ( x1 , x2 | y) fY ( y) / p12 ( x1 , x2 )
Ex) Observe X1 = X2 =1. Then
1
p12 (1,1) y 2 dy 1 / 3
0
2
y
and f ( y | 1,1) 1
3 y ,0 y 1
2
y dy
2
0 35
Summary
• Before data, we had no idea about Y, our
“prior pdf” was fY(y)=1, 0<y<1. After seeing
2 out of 2 people sampled have D, we update
to the “posterior pdf” fY(y|1,1) = 3y2, 0<y<1.
pdf 3 fY(y|1,1)
1 fY(y)
0 1
y
36
Sect. 5.2 Expected Values
(p. 197) For discrete RV‟s X, Y with joint pmf p, the
expected value of h(X,Y) is
E[h( X , Y )] x y h( x, y) p( x, y)
if finite.
For continuous RV‟s X, Y with joint pdf f, the
expected value of h(X,Y) is
E[h( X , Y )] h( x, y ) f ( x, y )dxdy
if finite.
37
• Ex) X and Y have pdf f(x,y) = x + y, 0 < x < 1,
0 < y < 1, and 0 else. Find E(XY2).
1 1
E ( XY ) xy ( x y )dxdy 17 / 72
2 2
0 0
(Check my integration)
• Extensions, by example:
E[h( X , Y , Z )]
h( x, y, z ) f ( x, y, z )dxdydz
38
Important Result:
39
E[h1 ( X 1 )h2 ( X 2 )] h ( x )h ( x ) f ( x , x )dx dx
1 1 2 2 1 2 1 2
by indep.
h ( x )h ( x ) f ( x ) f ( x )dx dx
1 1 2 2 1 1 2 2 1 2
h1 ( x1 ) f1 ( x1 )dx1 h2 ( x2 ) f 2 ( x2 )dx2
E[h1 ( X1 )]E[h2 ( X 2 )]
40
Ex: X and Y have pdf
0.50( x y)
f ( x, y) (1 / 8) xe , x 0, y 0
and 0 elsewhere. Find E(Y/X).
• Note that f(x,y) “factors” (ranges on x,y are OK).
Hence, X and Y are independent and
0.50 x 0.50 y
f ( x, y) [(.25) xe ][(.50)e ], x 0, y 0
f X ( x) f Y ( y )
Remark: If f ( x, y) cxe
0.50( x y)
,0 x y
then X and Y are dependent 41
E(Y/X) = E(Y) E(1/X)
y(.50)e 0.50 y
dy x (.25) xe
1 0.50 x
dx (2)(.50) 1
0 0
since,
• Y ~ Exp(l = .5) and mean of exponential is 1/l
• X has a Gamma pdf, but
dx (.25)(.5) /(.5)e
1 0.50 x 0.50 x
x (.25) xe dx
0 0
Fact: Cov( X , Y ) E ( XY ) m X mY
Point: Cov(X,Y) measures dependence between X & Y
• If X and Y are independent, then Cov(X,Y)= 0.
Why: E(XY) = E(X)E(Y) = mX mY
(indep. implies E(product)=product(E‟s) )
• But Cov(X,Y) = 0 does not imply independence.
43
Intuition
We observe and graph many pairs of (X,Y).
Suppose we get y‟s
E(Y)
44
• cov(X,Y) < 0 cov(X,Y)=0 (& indep)
• cov(X,Y)=0
(Not indep)
45
2. Correlation between X and Y = Corr(X,Y) = XY
Cov( X , Y )
XY
XY
Theorem: 1 XY 1
Point: XY measures dependence between X & Y in a
fashion that does not depend on units of measurements
• Sign of XY indicates direction of relationship
• Magnitude of | XY | indicates the strength of the
linear relationship between X and Y
.5 .9
Results for Linear Combinations of RV‟s
1. Recall: V (aX b) a 2 X2 S.D.(aX b) | a | X
2. Extensions:
Cov(aX b, cY d ) acCov( X , Y )
Hence, Corr(aX b, cY d ) Corr( X , Y )
V ( X Y ) V ( X ) V (Y ) 2Cov( X , Y )
So, if X & Y are indep., V ( X Y ) V ( X ) V (Y )
48
[ Y = 0.50 ( X1 – X2 ). m1 = 80.95 cm, 1 = .03 cm;
m2 = 80.85 cm, 2 = .02 cm ]
1. mY = E[.50 ( X1 – X2)]
= .50 [m1 – m2 ] = .05 cm
2. S.D.: Find V(Y), then square root
(there‟s no general shortcut)
V(Y) = V[.50 ( X1 – X2 ) ]
= (.5)2 V( X1 – X2 )
-
= .25 [(.03)2 + (.02)2] = 3.24 x 10 4
so Y = .018 cm (not .5 (.03 + .02) = .025)
49
Ex), Cont‟d
If Y is too small, the piston can‟t move freely; if
Y is too big, the piston isn‟t tight enough to
combust efficiently. Designers say a piston-
cylinder pair will work if .01 < Y < .09.
Assuming Y has a normal dist., find
P(.01 < Y < .09).
50
FYI (not HW or Exams)
If the normality assumption can‟t be claimed, you
can get a bound:
• P(.01 < Y < .09) = P(.01 - .05 < Y - .05 < .09 - .05)
= P( |Y - .05| < .04)
= P( |Y – mY | < .04)
• Chebyshev‟s Inequality: For any constant k > 0,
P( |X – m| < k ) > 1 – k-2
• Set .04 = k , so k-2 = (.018 / .04)2
Y
• Conclusion:
P(.01 < Y < .09) = P( |Y - .05| < .04) > .797
no matter what the dist of Y is !!!!
51
FYI (not for HW or Exam): Conditional Expectation
Recall: X,Y have pdf f(x,y). Then
fX|Y(x|y) = f(x,y)/fY(y) and
P( X A | Y y) f X |Y ( x | y)dx
A
(p. 156) Conditional expectation of h(X) given Y=y
Discrete E[h( X ) | y] h( x) p ( x | y)
x X |Y
Contin. E[h( X ) | y ] h( x) f X |Y ( x | y )dx
if they exist.
52
Special cases (Contin. case; discrete are similar)
1. Conditional meanof X given Y=y is
m X | y xf X |Y ( x | y )dx
2. Conditional variance of X given Y=y is
2
X|y E[( X m X | y ) | y] E ( X | y) m
2 2 2
X |y
53
X and Y have pdf
f ( x, y) l e 2 ly
,0 x y
Find
2
X |y
1. Need f X |Y ( x | y) f ( x, y) / fY ( y)
y
fY ( y ) l2 e ly dx l2 e ly y, y 0
0
so 2 ly
le
f X |Y ( x | y ) 2 ly
1 / y,0 x y
yl e
54
y
E ( X | y ) x (1 / y )dx y / 3
2 2 2
3. So 2
X|y y / 3 ( y / 2) y / 12
2 2 2
55
Sec 5.3 - 5.4
Last material for this course
Lead-in to
• statistical inference: drawing conclusions about
population based on sample data
• we state our inferences and judge their value:
based on probability
56
Key Definitions and Notions
A. Target of Statistical Inference:
1. Population: Collection of units or objects of
interest.
2. Pop Random Variable (RV): Numerical value X
associated with each unit.
3. Pop Dist.: Dist. of X over the pop.
4. Parameters: Numerical summaries of pop.
(mean, variance, proportion, …)
57
B. Inputs to Statistical Inference
1. Sample: Subset of the population
2. Sample Data: X1 , X2 , …, Xn for the n units in
sample
3. Statistic: Function of the data
n
Main ex) “sample mean” X Xi n
i 1
Sample 1 X‟s
give Stat1
Sample 2 X‟s
give Stat2
Population
Sampling Dist
Stat
59
Key Point: Sampling Design
• Form of sampling dist. depends on how we
pick samples
• In most cases, we want samples to be
representative of pop.
(i.e., not biased or special in some way).
If X1 , X2 , …, Xn are independent and identically
distributed (i.i.d.), each having the pop. dist., they
form a random sample from the population
• Finding sampling dist:
(1) Simulation and (2) Prob. theory results
60
Remark
Alternative notion of representative samples:
Simple random sample (SRS):
Sample of n units chosen in a way that all samples
of n units have equal chance of being chosen
• Sampling without replacement: observations are
dependent.
• When sampling from huge populations, SRS are
approximately Random Samples
61
Sampling dist. via grunt work
X = the result of the roll of a fair die
x: 1 2 3 4 5 6
p(x): 1/6 1/6 1/6 1/6 1/6 1/6
1 23 4 5 6
X1, X2 be results of 2 indep. rolls. Dist of X :
x : 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
p(x ) : 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
1 2 3 4 5 6
1 1 1.5 2 2.5 3 3.5
2 1.5 2 2.5 3 3.5 4
3 2 2.5 3 3.5 4 4.5
1 2 3 4 5 6
4 2.5 3 3.5 4 4.5 5
5 3 3.5 4 4.5 5 5.5
6 3.5 4 4.5 5 5.5 6
Simulation: Fundamental in modern applications
Math/Stat has developed numerical methods to
“generate” realizations of RV‟s from specified dist.
• Select a dist. and a statistic
• Simulate many random samples
• Compute statistic for each
• Examine the dist of the simulated statistics:
histograms, estimates of their density, etc.
63
Simple Ex)
• Pop. Dist: X ~ N(10, 4). Statistic X
• Draw k = 1000 random samples (in practice, use
much larger k); compute means and make histograms
for four different sample sizes
n=10, k = 50,000
Various n
n=20 n=30
n=5 n=10
Various n
n=20 n=30
n=5 n=10
Weibull Lognormal
Last Time
1. Population: Collection of objects of interest.
2. Population RV: X (value for each object in pop.)
3. Population Dist.: Dist. of RV X over the pop.
4. Parameters: Numerical summaries of pop. (ex. m,
5. Sample: Subset of the population
6. Data: X1 , X2 , …, Xn for the objects in sample
7. Statistic: Function of the data
Key: Sampling variability: different samples give different
values of a statistic. Statistics are RV’s
8. Sampling distribution: Distribution of a statistic.
Key: Distribution of statistic depends on how sample is
taken
71
Sampling Design
In most cases, we want samples to be representative
of pop. (i.e., not biased or special in some way).
72
Remark
Alternative notion of representative samples:
Simple random sample (SRS):
Sample of n units chosen in a way that all samples
of n units have equal chance of being chosen
• Sampling without replacement: observations are
dependent.
• When sampling from huge populations, SRS are
approximately Random Samples
73
Main Statistics (Sect. 5.4; p. 229-230)
n
1. Sample Mean:
X i
X i 1
or “X-bar” n
n
S2 i 1
or “S-squared” n 1
74
Sect. 5.4: Dist. of X-bar
Proposition (p. 213):
If X1 , X2 , …, Xn is a (iid) random sample
from a population distribution with mean m and
variance 2, then
1. mX def
E( X ) m
2
2. 2
X
def
V (X ) and X
n n
75
Proof:
n
1. E ( X ) E X i / n
i 1
n
1 E( X i ) Constants come
n i 1 outside exp‟ and
nm E(sum) = sum(E‟s)
m
n
76
n
2. var( X ) var X i / n
i 1 Constants come
n
out of Var
2 var( X i )
1 squared and
n i 1 for indep RV,
n 2
2
V(sum) =sum(V‟s)
2
n n
77
Remarks:
1. In statistics, we typically use X to estimate m and
S2 to estimate 2.
When E(Estimator) = target , we say the estimator is
unbiased
(note: S is not unbiased for )
2. Independence of the Xi„s is only needed for the
variance result. n
79
Large sample (large n) properties of X-bar
Assume X1 , X2 , …, Xn are a random sample (iid)
from a population with mean m and variance 2 .
(normality is not assumed)
1. Law of Large Numbers
Recall that m X m and note that
2
n 2
X
0
n
We can prove that with probability = 1,
n X m
81
2. Central Limit Theorem (CLT) (p. 215)
Under the above assumptions (iid random sample)
X m D
Z ~ N (0,1)
n “converges in dist. to”
or “has limiting dist.”
84
Last Time:
n
Dist. of “X-bar”: X X n
i 1 i
Three Main Results: Assumption common to all:
X1 , X2 , …, Xn is a (iid) random sample from
a pop. dist. with mean m and variance 2.
1. mX m and n 2
X
2
3. If n is large
X N (m , n) 2
85
T0 i 1 X i
n
Restate for the sum:
1. mT0 nm and n 2
T0
2
86
Ex) Estimate the average height of men in some
population. Assume pop. is 2.5 in. We will
collect an iid sample of 100 men. Find the prob. that
their sample mean will be within .5 in of the pop. mean.
• Let m be the population mean. Find
P(| X m | .5)
2 2.52 2.52
• Applying CLT, X N m , 2
n 100 10
so
P(| X m | .5) P(| Z | .5 / .25) P(| Z | 2) .95
Ex) Application for “sums”: Binomial Dist.
m X np and 2
X np(1 p)
3. CLT: For n large,
X N (np, np(1 p))
Remark: CLT doesn‟t include “continuity correction”
Ex) Continued. In practice, we may not know p
p(1 p)
ˆ N p,
p
n
Ex) Service times for checkout at a store are indep.,
average 1.5 min., and have variance 1.0 min2. Find
prob. 100 customers are served in less than 2 hours.
• Let Xi = service time of ith customer.
n
• Service time for 100 customers: T X
i 1 i
• Applying the CLT,
T N (nm 100(1.5), n 2 100(1))
• So,
T 150 120 150
P(T 120) P
10 10
P( Z 3) .0013
Ex) Common Class of Applications
• In many cases, we know the distribution of the sum
of independent RV‟s, but if that dist. is complicated,
we may still want to use CLT approximations.
• Example:
1) Theorem. If X1 ~ Poi(l1) , …, Xk ~ Poi(lk) are
indep. Poisson RV‟s, then
T i 1 X i ~ Poi (lT i 1 li )
k k
M
a = .05: 1.96 so we need n (1.96 / M ) 2
/ n
M
a = .01: 2.576 so we need n (2.576 / M ) 2
/ n
Notes
1. Of course, round up to an integer.
2. Procedure requires a guess at
3. Analysis is valid for all n if the population is
normal.
4. Otherwise, if the answer turns out to be small,
CLT does not apply, so analysis failed
h i 1 h( X i ) / n
n
n
Most use sh
2
(h( X i ) h ) /( n 1)
2
i 1
hf N ( E (h( X )), 2
hf n)
Proceed as above
3. Numerical integration. Estimate the integral
1
I h( x) dx
0
Note that I E (h( X )) where X ~ Uniform(0,1)
i. Simulate X1 , X2 , …, Xn iid Uniform(0,1)
ii. Compute h(X1), h(X2), …, h(Xn) and find
I h i 1 h( X i ) / n
ˆ n