Professional Documents
Culture Documents
Week 1.
1 Some Examples
Statistical inference concerns learning from experience: we observe a random sample
x = (x1 , x2 , ..., xn ) and wish to infer properties of the distribution our data come
from. Probability theory goes in the opposite direction: from the distribution we
deduce the properties of a random sample x, and of statistics calculated from x.
Statistical inference as a mathematical science has been developed almost exclusively
in terms of probability theory.
Question: Is there an association between taking aspirin and having a heart attack
(for men)?
1 Source: Preliminary Report: Findings from the Aspirin Component of the Ongoing Physicians
1
Example 1.3.
Women Men
applied accepted % applied accepted %
Computer Science 26 7 27 228 58 25
Economics 240 63 26 512 112 22
Engineering 164 52 32 972 252 26
Medicine 416 99 24 578 140 24
Veterinary Medicine 338 53 16 180 22 12
Total 1184 274 23 2470 584 24
Figure 1: The Simpson paradox for sex bias.
Example 1.4.
Treatment A Treatment B
treatment success % treatment success %
Small stones 87 81 93 270 234 87
Large stones 263 192 73 80 55 69
Total 350 273 78 350 289 83
Figure 2: The Simpson paradox for kidney surgeries.
2 Data
Denition 2.1. (Data) (x1 , . . . , xn ), xi ∈ S, i = 1, . . . , n. S is usually R, Rd or it
can be any abstract set. But for us S (the sample space) will usually be R.
2
0.4
0.3
Relative frequency
0.2
0.1
0.0
10
5
0
0 10 20 30 40 50 60
student points
3 Statistical Model
Denition 3.1. (Sample) In statistics, our data are often modeled by a vector of
IID (independent, identically distributed) random variables X = (X1 , X2 , . . . , Xn )
(called the sample, of which size is n), where the Xi take values in Z or R.
3
If X ∼ N (µ, σ 2 ) then we write
(x − µ)2
1
f (x | µ, σ) = exp −
σ(2π)1/2 2σ 2
for its probability density function (PDF) indicating that the PDF depends on µ and
σ. In this case we can model the problem as {f (x | µ, σ) | (µ, σ) ∈ R × R+ } (since
the PDF determines the distribution), or, if X = (X1 , X2 , . . . , Xn ), Xi ∼ N (µ, σ 2 )
the Xi being independent, we write
kx − µ1k2
1
f (x | µ, σ) = exp −
(2πσ 2 )n/2 2σ 2
where x = (x1 , . . . , xn ), 1 = (1, 1, . . . , 1).
| {z }
n times
In general, we write f (x; θ) or fθ (x) or f (x | θ) for the probability density
function of the probability mass function. The PDFs or PMFs (or, laws) form the
parametric family {f (x | θ) | θ ∈ Θ}. The statistician observes x from f (x | θ)
and infers the value of θ.
4
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
−1.0 −0.5 0.0 0.5 1.0 1.5 −2 −1 0 1 2
The empirical distribution function of 10 The empirical distribution function of 100
data coming from a standard normal variable data coming from a standard normal variable
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
The empirical distribution function of 500 The empirical distribution function of 3000
data coming from a standard normal variable data coming from a standard normal variable
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0
The empirical distribution function of 10 The empirical distribution function of 100
data coming from a uniform(0,1) variable data coming from a uniform(0,1) variable
5
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
The empirical distribution function of 500 The empirical distribution function of 3000
data coming from a uniform(0,1) variable data coming from a uniform(0,1) variable
5 Statistics
Denition 5.1. (Statistic) A statistic is any function of the sample.
PFor example, n
X := n1 (X1 + · · · + Xn ), max(X1 , . . . , Xn ), 5(X1 , . . . , Xn ), S 2 := 1
n−1 i=1 (X − Xi ) 2
or (X, S 2 ) are statistics.
Denition 5.2. (Order statistic) If X1 , X2 . . . , Xn are IID rvs then the statistic
(X(1) , X(2) , . . . , X(n) ) (1)
is called the order statistic where we put the same numbers in sorted order so X(1)
is the least and X(n) is the greatest.
Example 5.3. Let the sample (X1 , . . . , Xn ) come from a uniform distribution on
(0, 1). Then X(n) ∼ G(x), where G(x) = P(X(n) < x) = P(X1 < x, . . . , Xn < x) =
xn , x ∈ (0, 1) and therefore its probability density function (PDF) is gX(n) (x) =
nxn−1 , x ∈ (0, 1). Later on we will need its expectation and variance:
Z 1
n
EX(n) = x · nxn−1 dx = ,
0 n+1
2
2 n 2 n n
VarX(n) = EX(n) − (EX(n) ) = − = .
n+2 n+1 (n + 1)2 (n + 2)
6
Remark 6.2. It is not hard to see that when p is in the range of the distribution func-
tion F (of X ) then the set of p - quantiles is the closure of the (possibly degenerate)
interval on which F (x) = p; and it is