Professional Documents
Culture Documents
Outline
Parameter estimation
Beta-binomial example
Point estimation
Interval estimation
Simulation from the posterior
Priors
Subjective
Conjugate
Default
Improper
Parameter estimation
p(y|θ)p(θ) p(y|θ)p(θ)
p(θ|y) = =R ∝ p(y|θ)p(θ)
p(y) p(y|θ)p(θ)dθ
where
p(θ) is the prior distribution for the parameter,
p(θ|y) is the posterior distribution for the parameter,
p(y|θ) is the statistical model (or likelihood), and
p(y) is the prior predictive distribution (or marginal likelihood).
Definition
The kernel of a probability density (mass) function is the form of the pdf
(pmf) with any terms not involving the random variable omitted.
p(θ|y) ∝ p(y|θ)p(θ)
∝ θy (1 − θ)n−y θa−1 (1 − θ)b−1
= θa+y−1 (1 − θ)b+n−y−1
When constructing the Be(a, b) prior with the binomial likelihood which
results in the posterior
Example data
Assume Y ∼ Bin(n, θ) and θ ∼ Be(1, 1) (which is equivalent to
Unif(0,1)). If we observe three successes (y = 3) out of ten attempts
d
(n = 10). Then our posterior is θ|y ∼ Be(1 + 3, 1 + 10 − 3) = Be(4, 8).
The posterior mean is
2 1 10 3 4
E[θ|y] = × + × = .
12 2 12 10 12
Remark Note that a Be(1, 1) is equivalent to p(θ) = I(0 < θ < 1), i.e.
Posterior distribution
3
2
density
Point estimation
Define a loss (or utility) function L θ, θ̂ = −U θ, θ̂ where
θ is the parameter of interest
θ̂ = θ̂(y) is the estimator of θ.
Find the estimator that minimizes the expected loss:
h i
θ̂Bayes = argminθ̂ E L θ, θ̂ y
Mode:
θ̂Bayes = argmaxθ p(θ|y)
is obtained by minimizing
L θ, θ̂ = −I |θ − θ̂| < as → 0, also called maximum a
posterior (MAP) estimator.
Jarad Niemi (STAT544@ISU) Parameter estimation February 18, 2019 12 / 40
Parameter estimation Point estimation
Theorem
The mean minimizes expected squared-error loss.
Proof.
2
Suppose L θ, θ̂ = θ − θ̂ = θ2 − 2θθ̂ + θ̂2 , then
h i
= E θ2 |y − 2θ̂E[θ|y] + θ̂2
E L θ, θ̂ y
h i
d set
E L θ, θ̂ y = −2E[θ|y] + 2θ̂ = 0 =⇒ θ̂ = E[θ|y]
dθ̂
h i
d2
E L θ, θ̂ y = 2
dθ̂2
Point estimation
3
2
posterior
Interval estimation
Definition
A 100(1 − a)% credible interval is any interval (L,U) such that
Z U
1−a= p(θ|y)dθ.
L
Interval estimation
3
2
posterior
type equal−tail higher tail highest posterior density (HPD) lower tail
Jarad Niemi (STAT544@ISU) Parameter estimation February 18, 2019 16 / 40
Parameter estimation Simulation from the posterior
2
density
We can also obtain point and interval estimates using these simulations
round(c(mean = mean(sim$x), median = median(sim$x)),2)
mean median
0.34 0.33
2.5% 97.5%
0.11 0.61
5%
0.13 1.00
95%
0.00 0.57
Definition
A prior probability distribution, often called simply the prior, of an
uncertain quantity θ is the probability distribution that would express one’s
uncertainty about θ before the “data” is taken into account.
http://en.wikipedia.org/wiki/Prior_distribution
Priors
Definition
A prior p(θ) is conjugate if for p(θ) ∈ P and p(y|θ) ∈ F, p(θ|y) ∈ P
where F and P are families of distributions.
Definition
A natural conjugate prior is a conjugate prior that has the same functional
form as the likelihood.
Proof.
Suppose p(θ) is discrete, i.e.
I
X
P (θ = θi ) = pi pi = 1
i=1
and p(y|θ) is the model. Then, P (θ = θi |y) = p0i is the posterior with
pi p(y|θi )
p0i = PI ∝ pi p(y|θi ).
j=1 pj p(y|θj )
Discrete prior
0.03
0.02
density
0.01
0.00
Proof.
Let pi = P (Hi ) and pi (θ) = p(θ|Hi ),
I
X I
X
θ∼ pi pi (θ) pi = 1,
i=1 i=1
R
and pi (y) = p(y|θ)pi (θ)dθ, then
1 1 PI
p(θ|y) = p(y) p(y|θ)p(θ) = p(y) p(y|θ) i=1 pi pi (θ)
1 PI 1 PI
= p(y) i=1 pi p(y|θ)pi (θ) = p(y) i=1 pi pi (y)pi (θ|y)
P I pi pi (y) PI pi pi (y)
= i=1 p(y) pi (θ|y) = i=1 PI pj pj (y) pi (θ|y)
j=1
Bottom line: if
I
X I
X
θ∼ pi pi (θ) pi = 1
i=1 i=1
R
and pi (y) = p(y|θ)pi (θ)dθ, then
I
X
θ|y ∼ p0i pi (θ|y) p0i ∝ pi pi (y)
i=1
Mixture priors
Prior
2.5
Posterior
2.0
1.5
Density
1.0
0.5
0.0
θ
Jarad Niemi (STAT544@ISU) Parameter estimation February 18, 2019 27 / 40
Priors Default priors
Default priors
Definition
A default prior is used when a data analyst is unable or unwilling to specify
an informative prior distribution.
Default priors
Suppose we use φ = log(θ/[1 − θ]), the log odds as our parameter, and set
p(φ) ∝ 1, then the implied prior on θ is
d
pθ (θ) ∝ 1 dθ log(θ/[1
h − θ])
i
1−θ 1 θ
= θ 1−θ + [1−θ]2
h i
[1−θ]+θ
= 1−θ
θ [1−θ]2
= θ−1 [1 − θ]−1
a Be(0,0), if that were a proper distribution, and is different from setting
p(θ) ∝ 1 which results in the Be(1,1) prior. Thus, the constant prior is not
invariant to the parameterization used.
Jeffreys prior
Definition
Jeffreys prior is a prior that is invariant to parameterization and is obtained
via p
p(θ) ∝ det I(θ)
where I(θ) is the Fisher information.
n
For example, for a binomial distribution I(θ) = θ[1−θ] , so
a Be(1/2,1/2) distribution.
Fisher information
Theorem
n
The Fisher information for Y ∼ Bin(n, θ) is I(θ) = θ(1−θ) .
Proof.
Since the binomial is an exponential family,
h 2 i
∂
I(θ) = −Ey|θ ∂θ 2 log p(y|θ)
h 2 i
∂ n
= −Ey|θ ∂θ 2 log y + y log θ + (n − y) log(1 − θ)
h i
∂ y n−y
= −Ey|θ ∂θ −
h θ 1−θ i
y n−y
= −Ey|θ − θ2 − (1−θ) 2
h i
= − − nθ θ 2 − n−nθ
(1−θ) 2 = nθ + (1−θ)
n
n
= θ(1−θ)
2
density
Non-conjugate priors
Options
Plot f (θ) (possibly multiplying by a constant).
R
Find i = f (θ)dθ, so that p(θ|y) = f (θ)/i.
Evaluate f (θ) on a grid and normalize by the grid spacing.
Plot of f (θ)
Posterior
Density (proportional)
0.0020
0.0010
0.0000
θ
Jarad Niemi (STAT544@ISU) Parameter estimation February 18, 2019 35 / 40
Priors Non-conjugate priors
Numerical integration
R
Find i = f (θ)dθ, so that p(θ|y) = f (θ)/i.
(i = integrate(f, 0, 1))
Prior
Posterior
2.5
2.0
Density
1.5
1.0
0.5
0.0
θ
Jarad Niemi (STAT544@ISU) Parameter estimation February 18, 2019 37 / 40
Priors Non-conjugate priors
Prior
Posterior
2.5
2.0
Density
1.5
1.0
0.5
0.0
Improper priors
Definition
R
An unnormalized density, f (θ), is proper if f (θ)dθ = c < ∞, and
otherwise it is improper.
f (θ)
p(θ|y) =
c
R
to see that p(θ|y) is a proper normalized density note that c = f (θ)dθ is
not a function of θ, then
Z Z Z Z
f (θ) f (θ) 1 c
p(θ|y)dθ = R dθ = dθ = f (θ)dθ = = 1
f (θ)dθ c c c
Be(0,0) prior