Professional Documents
Culture Documents
1
1. Likelihood and Sufficiency
Qn
then L(θ, x) = i=1 f (xi , θ)
θ if and only if there exists functions g(t, θ) and h(x) such that for
all x and θ
f (x, θ)
is constant in θ ⇐⇒ t(x) = t(y)
f (y, θ)
2
Example
distribution, where
Z x
F (x) = eθ−z dz = eθ [e−θ − e−x] = 1 − eθ−x.
θ
3
Now
n
Y
P (T > t) = (1 − F (t)) = (1 − F (t))n.
i=1
P
−
eθ−x1 eθ−x2 θ−xn
...e e xi
= , xi ≥ t, i = 1, . . . n,
nen(θ−t) ne−nt
which does not depend on θ for each fixed t = min(xi). Note that
for θ.
4
Alternatively, use Factorization Theorem:
n
Y
fX(x) = eθ−xi I(θ,∞)(xi)
i=1
n
Y
= I(θ,∞)(x(1)) eθ−xi
i=1
Pn
nθ − i=1 xi
= e I(θ,∞)(x(1))e
with
and
Pn
− i=1 xi
h(x) = e .
5
2. Point Estimation
likelihood:
or by method of moments
tions,
θ̂ ≈ N (θ, In−1(θ)),
where
" 2 #
∂`(θ, X)
In(θ) = E
∂θ
∂ 2`(θ, X)
In(θ) = −E
∂θ2
I1(θ) = I(θ)
estimator
7
Example
with density
xc−1 − βx
f (x; c, β) = e , x > 0,
Γ(c)β c
Y 1X
+(c − 1) log xi − xi
β
∂
Put D1(c) = ∂c log Γ(c), then
∂` nc 1 X
= − + 2 xi
∂β β β
∂` Y
= −nD1(c) − n log β + log( xi)
∂c
8
Setting equal to zero yields
x
β̂ = ,
ĉ
where ĉ solves
Y
D1(ĉ) − log(ĉ) = log([ xi]1/n/x)
P Q
(Could calculate that sufficient statistics is indeed ( xi, [ xi), and
is minimal sufficient)
∂2
Calculate Fisher information: Put D2(c) = ∂c2
log Γ(c),
∂ 2` nc 2 X
= 2− 3 xi
∂β 2 β β
∂ 2` n
= −
∂β∂c β
∂ 2`
= −nD2(c)
∂c2
9
Use that EXi = cβ to obtain
c 1
β2 β
I(θ) = n
1
β D2(c)
10
3. Hypothesis Testing
Pearson Lemma says that the most powerful tests are likelihood-ratio
11
Example
f (x, θ1)
= en(θ1−θ0)I(θ1,∞)(x(1)),
f (x, θ0)
is
12
So for t > θ
Z ∞
P (T > t) = nen(θ0−s)ds = en(θ0−t).
t
ln α
t = θ0 − .
n
ln α
The LR test rejects H0 if X(1) > θ0 − n . The test is the same for
13
For general null hypothesis and general alternative:
H0 : θ ∈ Θ0
H1 : θ ∈ Θ1 = Θ \ Θ0
max L(θ; X)
θ∈Θ
T =
max L(θ; X)
θ∈Θ0
tests, which are based on the score function ∂`/∂θ, with asymptotics
14
Example
with parameter p;
Then
P
n (xi )
L(p) = p (1 − p)
and
so that
1 x
∂`/∂p(p) = n −
p 1−p
and
1 x
∂ 2`/∂p2(p) = n − 2 − .
p (1 − p)2
15
We know that EX1 = (1 − p)/p. Calculate the information
n
I(p) = .
p2(1 − p)
√
Z = −62.7/ 1045.8 = 1.9388.
16
Example continued.
1
θ̂ = .
1+x
Calculate
18
Not to forget about:
Profile likelihood
ψ = ψ0, H1+ : ψ > ψ0, H1− : ψ < ψ0, use test statistic
∂`(ψ0, λ̂0; X)
T = ,
∂ψ
T ≈ `0ψ − Iλ,λ
−1
Iψ,λ`0λ ≈ N (0, 1/I ψ,ψ ),
19
−1 −1
where I ψ,ψ = (Iψ,ψ − Iψ,λ
2
Iλ,λ ) is the top left element of I −1
T
Z=p ≈ `0ψ (ψ, λ̂ψ )[I ψ,ψ (ψ, λ̂ψ )]1/2
Var(T )
≈ N (0, 1)
20
Bias and variance approximations: the delta method
and
1
ET ≈ g(β) + trace[g 00(β)V ].
2
21
Exponential family
22
A Very Brief Summary of Bayesian Inference, and Examples
23
1. Priors, Posteriors, Likelihood, and Sufficiency
f (x|θ)π(θ)
π(θ|x) = R
f (x|θ)π(θ)dθ
is
Z
p(x) = f (x|θ)π(θ)dθ
Z
p(x2|x1) = f (x2|θ)π(θ|x1)dθ
x1, x2, . . . , xn are known constants. Suppose prior π(β) = N (µ, α2).
n 2
(y −βx )
− n2 − i 2 i
Y
L(β) = (2π) e
i=1
and
1 (β−µ)2
−
π(β) = √ e 2α2
2πα2
β2
µ
∝ exp − 2 + β 2
2α α
25
and the posterior of β given y1, y2, . . . , yn can be calculated as
1X 1
π(β|y) ∝ exp − (yi − βxi)2 − 2 (β − µ)2
2 2α
β2
X
1 2X 2 βµ
∝ exp β xi yi − β xi − 2 + 2 .
2 2α α
x2i , sxy =
P P
Abbreviate sxx = xiyi, then
β2
1 2 βµ
π(β|y) ∝ exp βsxy − β sxx − 2 + 2
2 2α α
2
β 1 µ
= exp − sxx + 2 + β sxy + 2
2 α α
which we recognize as
µ −1!
sxy + α2 1
N 1 , sxx + .
sxx + α2
α2
26
Summarize information about θ: find a minimal sufficient statis-
tic t(x); T = t(X) is sufficient for θ if and only if for all x and θ
π(θ|x) = π(θ|t(x))
statistic
27
Choice of prior
posterior is in the same family of models as the prior, i.e. when one
has a conjugate prior, then updating the distribution under new data
is particularly convenient.
k
X
f (x|θ) = f (x)g(θ)exp{ ciφi(θ)hi(x)}, x ∈ X,
i=1
28
Non-informative priors favour no particular values of the param-
1 x
For a scale density f (x|σ) = σf σ for σ > 0, the scale-invariant
proper)
1
Jeffreys prior π(θ) ∝ I(θ) 2 if the information I(θ) exists; they is
29
Example continued. Suppose y1, y2, . . . , yn are independent,
∂2
I(β) = −E log L(β, y)
∂β 2
∂ X
= −E (yi − βxi)xi
∂β
= sxx
√
and so the Jeffreys prior is ∝ sxx; constant and improper. With
sxy
π(β|y) is N , (sxx)−1 .
sxx
30
If Θ is discrete, constraints
Eπ gk (θ) = µk , k = 1, . . . , m
choose the distribution with the maximum entropy under these con-
straints:
exp ( m
P
k=1 λk gk (θi ))
π̃(θi) = P P m
i exp ( k=1 λk gk (θi ))
31
2. Point Estimation
estimation.
32
Bayes estimators are constructed to minimize the posterior ex-
pected loss. Recall from Decision Theory: Θ is the set of all possible
L : Θ × D → [0, ∞)
true.
33
Bayesian: For a prior π and data x ∈ X , the posterior expected
loss of a decision is
Z
ρ(π, d|x) = L(θ, d)π(θ|x)dθ
Θ
Z Z
r(π, δ) = L(θ, δ(x))f (x|θ)dxπ(θ)dθ
Θ X
34
Valid for proper priors, and for improper priors if r(π) < ∞. If
associated with prior π is the posterior mean. For absolute error loss
35
Example. Suppose that x1, . . . , xn is a random sample from a
Poisson distribution with unknown mean θ. Two models for the prior
loss, using
P P
−nθ xi −θ −(n+1)θ xi
π1(θ|x) ∝ e θ e =e θ ,
P
which we recognize as Gamma( xi +1, n+1). The Bayes estimator
P
xi + 1
.
n+1
36
For model 2,
P P
−nθ xi −θ −(n+1)θ xi +1
π2(θ|x) ∝ e θ e θ=e θ ,
P
which we recognize as Gamma( xi +2, n+1). The Bayes estimator
P
xi + 2
.
n+1
Note that the first model has greater weight for smaller values of θ,
distributions.
37
3. Hypothesis Testing
H0 : θ ∈ Θ0,
Under this loss function, the Bayes decision rule associated with a
prior distribution π is
a1
1 if P π (θ ∈ Θ0|x) >
a0 +a1
φπ (x) =
0 otherwise
38
The Bayes factor for testing H0 : θ ∈ Θ0 against H1 : θ ∈ Θ1 is
π P π (θ ∈ Θ0|x)/P π (θ ∈ Θ1|x)
B (x) =
P π (θ ∈ Θ0)/P π (θ ∈ Θ1)
p(x|θ ∈ Θ0)
=
p(x|θ ∈ Θ1)
is the ratio of how likely the data is under H0 and how likely the data
is under H1; measures the extent to which the data x will change the
odds of Θ0 relative to Θ1
Note that we have to make sure that our prior distribution puts
choosing a prior that has some point mass on H0 and otherwise lives
on H1.
39
Example revisited. Suppose that x1, . . . , xn is a random sam-
R ∞ −nθ P x −θ
0 e θ i e dθ
B(x) = ∞ −nθ P x −θ
R
0 e θ i e θdθ
P
Γ( xi + 1)/((n + 1) xi+1)
P
= P P
Γ( xi + 2)/((n + 1) xi+2)
n+1
= P .
xi + 1
40
Note that in this setting
P (model 1|x)
B(x) =
P (model 2|x)
P (model 1|x)
= ,
1 − P (model 1|x)
so that
Hence
P −1
xi + 1
P (model 1|x) = 1 +
n+1
−1
x + n1
= 1+ 1 ,
1+ n
41
For robustness: Least favourable Bayesian answers; suppose H0 :
f (x|θ0)
B(x, G) = inf R .
g∈G
Θ f (x|θ)g(θ)dθ
−1
1
at least P (x, G) = 1 + B(x,G) on H0 (for ρ0 = 1/2). If θ̂ is the
f (x|θ0)
B(x, GA) =
f (x|θ̂(x))
and
!−1
f (x|θ̂(x))
P (x, GA) = 1+ .
f (x|θ0)
42
4. Credible intervals
lies
43
Example continued. Suppose y1, y2, . . . , yn are independent
ter and x1, x2, . . . , xn are known constants; Jeffreys prior, posterior
sxy −1
N sxx , (s xx ) , then a 95%-credible interval for β is
r
sxy 1
± 1.96 .
sxx sxx
44
5. Not to forget about: Nuisance parameters
marginal posterior of ψ:
Z
π(ψ|x) = π(ψ, λ|x)dλ
45