# 1MIT (14.

32) Spring 2009
J. Angrist
Preliminaries
Ultimately, we’re interested in measuring causal relationships !las, we ha"e to pay some pro# \$ stats dues
#e%ore we learn how &ut causality is a #ig and deep concept, so we should start thin'ing a#out it now
(e ma'e sense o% causal relationships using potential outcomes )hese capture *what i%s+, a#out the world
-or e.ample,
/1i 0 my health i% 1 go to the hospital
/2i 0 my health i% 1 stay away
3Here, we’re using an e.plicit notation %or potential outcomes 4ometimes we’ll 'eep this in the
#ac'ground5
My %riend Mi'e, who runs emergency medicine at Hart%ord Hospital, descri#es the causal e%%ect o%
hospitali6ation li'e this:
“People come to the ER and they want to be admitted. They figure they’ll just get admitted to the hospital
and we’ll take over and make them better. They don’t realie that the hospital can be a pretty dangerous
place. !nless you’re really sick" you’re really better off going home.#
How does /1i compare with /2i+ (e can ne"er 'now %or sure so we try to loo' at e.pectations or a"erages:
E7/1i 8/2i¸ 901: 0 E7/1i¸ 9i01: 8 E7/2i¸ 9i01:
31n general, *E7/i; <i:, means the *population a"erage o% a random "aria#le, /i, holding the random "aria#le
<i %i.ed,5
)he ta#le on page 1= o% MHE shows some &ritish data on hospitali6ation and health )a'en at %ace "alue,
these data suggest Mi'e is right )he pro#lem with a causal interpretation o% this ta#le is selection #ias >et
yi denote the o#ser"ed outcome 3say an inde. o% health5 )hen, we ha"e:

E7yi¸ 9i 01: 8 E7yi¸ 9i0 2: 0 E7/1i 8 /2i¸ 9i 0 1: ? @E7/2i¸ 9i01: 8 E7/2i¸ 9i02:A
0 the a"erage causal e%%ect on the hospitali6ed ? selection #ias
Hospitali6ation may help you or hurt you 3on a"erage5, #ut as a rule, it’s the sic' who see' treatment
Random assignment o% 9i %i.es selection #ias #ecause 9i and /2i are then independent E.periments are
there%ore said to ha"e *internal "alidity, though they may not ha"e *e.ternal "alidity,, which is predicti"e
"alue %or another time or conte.t than the one e"aluated
/ou can’t randomi6e e"erything, o% course, perhaps not hospitali6ation %or routine medical complaints, #ut
you can try to use the data you ha"e 3or collect more5 in an e%%ort to come close to the desired e.periment
)he details o% how this is done is what most o% econometrics is a#out
1
Lecture !te 1
Pr!"a"ilit# an\$ %istri"uti!n
Reading: (ooldridge !ppendices ! and &
! Bro#a#ility
*! system %or Cuanti%ying chance and ma'ing predictions a#out %uture e"ents,
Concepts
\$ample space: 4 0 @a1, a2, a=, , aDA “the basic elements of the experiment”
example: toss two coins (J=4)
(to make this interesting, we could place bets)
Random variable: X(a) “the data”
A function that assigns numerical values to events
example: number of heads in two coin tosses
Probability: a function defined over events or random variables.
When defined over events, probability satisfies axioms:
0<P(A)<1
P(S)=1
P{U
j
A
j
) = _
j
P(A
j
) for disjoint events A
j
and has properties
P(ø) =0
P(A
c
) = 1 - P(A)
AB P(A)<P(B)
P(AOB)=P(A)+P(B)-P(AB)
When we write P(x) for a discrete r.v. this is shorthand for P(the union of all events a
j
such that X(a
j
)=x).
For a continuous r.v., we write P(X<x) to mean P(the union of all events a
j
such that X(a
j
)<x).
But what is probability really?
The relative frequency of an event in many (×) repeated trials.
A personal and subjective assessment of the likelihood of an event, where the assessment obeys the
axioms of probability
2
Probabil ity (cont.)
Conditional probability: P(A|B) ÷P(AB)/P(B)
Conditional probability has the properties of and obeys the axioms of probability
Bayes’ Rule: Let the set {C
i
; i=1, . . .I} be a partition of the sample space. Then:
P(C
i
| A) = {P(A|C
i
)P(C
i
)}/{_
i
P(A| C
i
)P(C
i
)}
Proof: use P(C
i
| A)=P(A|C
i
)P(C
i
)/P(A) and the fact that {C
i
; i=1,. . .,I} is a partition.
Bayes’ rule is useful for reversing conditional probability statements.
Independence: A is said to be independent of B iff P(AB)=P(A)P(B)
Sometimes we write: A_ B
Note: A_B · P(A|B)=P(A)
Note: r.v.s are independent if their distribution or density functions factor (more below)
B. Distribution and density functions (how we characterize r.v.s)
For the rest of the course, our probability statements will apply directly to r.v.s
1. Discrete random variables
Empirical distribution functions
Example: years of schooling
Probability mass function (pmf)
Parametric examples: Bernoulli, binomial, multinomial, geometric
Cumulative distribution functions (cdf)-- discrete r.v.
Obtain by summation
2. Continuous random variables
Probability density functions (pdf) note: P(X=x)=0

=
Parametric examples: uniform, exponential, normal;
Cumulative distribution functions -- continuous r.v.
Obtain by integration
P(X<c)=F(c) =

|
c
f(t)dt
P(a<X<b)=
a
|
b
f(t)dt= F(b)-F(a)
P(X>c)=1-F(c)
Relationship between cdf and pdf
F´(x)=f(x).
3. Functions of random variables
Mantra: “A function of a random variable is a random variable and therefore has a distribution”
Discrete r.v.
Y=r(X); P(X=x
j
) = f(x
j
); then
g(y) = P[r(X)=y] = _
{x: r(x)=y}
f(x)
Continuous r.v.
Examples: (i) Y=lnX; X F
G(y)=P(Y<y)=P(lnX<y)=P(X<e
y
)=F(e
y
)
g(y) = f(e
y
)e
y
(ii) Y=1/X; X>0
G(y)=P(Y<y)=P(1/X<y)=P(X>1/y)=1-F(1/y)
g(y)=-f(1/y)(-y
-2
)
In general: for Y=r(X) and X = r
-1
(Y) ÷ s(Y), where r(X) is continuous and invertible
if r is increasing, G(y) = F[s(y)] and g(y) = f[s(y)]s´(y)
E
If r is decreasing, G(y) = 1-F[s(y)] and g(y) = -f[s(y)]s´(y)
Important special case: Y = r(X) =a + bX; b>0
X = (Y-a)/b = s(Y)
G(Y) = F[(Y-a)/b]
g(Y)=f[(Y-a)/b](1/b)
Standardize r.v. X by setting a=-E(X)/
X
and b=1/
X
.
C. Bivariate distribution functions: how r.v.s move together
For discrete r.v.s: f(x,y) = P(X=x, Y=y)
For continuous r.v.s: f(x,y) is the joint density
Probability statements for joint continuous r.v.s. use the cdf:
F(x, y) = P(X<x, Y<y) =

|
x

|
y
f(s,t)dsdt
Marginal distributions
Marginal for X: f
1
(x); obtain by integrating the joint density or summing the joint pmf over Y
Marginal for Y: f
2
(y); obtain by integrating the joint density or summing the joint pmf over X
Conditional distributions
Divide the joint density or pmf by the marginal density or pmf
f
2
(y|x)=f(x,y)/f
1
(x); f
1
(x|y)=f(x,y)/f
2
(y)
Example:
Joint normal: marginal and conditional are also normal
f(x,y) = [(1/2)(1-
2
)]
-1/2
exp{-1/2(1-
2
)[(x-
x
)
2
-2(x-
x
)(y-
y
)+(y-
y
)
2
]}
X and Y are normally distributed with means 
x
and 
y
, standard deviation 1,
F
and correlation 
Example:
Roof distribution
f(x,y)=(x+y) for 0<x<1 and 0<y<1.
f
1
(x)=x+(1/2)
f
2
(y)=y+(1/2)
f
2
(y|x)= 2(x+y)/(2x+1)
D. Example: the effect of a wage voucher (Burtless, 1985)
Simple conditional distributions for Bernoulli outcomes in a randomized trial
G