Data Analysis and Statistical Arbitrage
Lecture 2: Point estimation Oli Atlason
Outline
• Parametric Statistical Models
• Method of Moments and Maximum Likelihood
• Bias and MSE
• Asymptotic Properties
Parametric Statistical Models
Example We observe X _{1} ,
, X _{n} .
X _{i} : number of alpha particles emitted by sample during the ith time interval of an experiment
Natural model: X _{i} ∼ Poisson(λ) and X _{1} ,
, X _{n} independent.
Poisson distribution
P _{λ} (X _{i} = k) = p ^{P}^{o}^{i}^{s}^{s}^{o}^{n}
λ
(k) = ^{λ} ^{k} _{k}_{!}
e
−λ
, k ∈ N, λ > 0
Formal model consists, for a given n, of the family
^{} ^{} _{p} Poisson
λ
^{(}^{X} ^{i} ^{)} λ∈R _{+}
here parameter space is R _{+}
Parametric Statistical Models
General deﬁnition A parametric statistical model for observations X = (X _{1} , {f _{θ} (x)} _{θ}_{∈}_{Θ} of probability distributions.
We want to know which f _{θ} is responsible.
˜
, X _{n} ) is a family
Point estimator: a function θ(x) of the observations. This is a guess at
what θ was used to generate data.
Sampling distribution θ(X) takes random values. It has a distribution derived from X.
Method of Moments
is a simple way of ﬁnding an estimator. kth sample moment
kth population moment
1
µˆ _{k} = _{n} ^{} (X _{i} ) ^{k}
µ _{k} (θ) = E _{θ} [X ^{k} ]
Now solve for θ in the system (µ _{k} (θ) = µˆ _{k} ) _{k}_{=}_{1}_{,}
,p
.
Usually need as many equations (p) as the number of parameters.
Method of Moments: Examples
Poisson For Poisson, θ = λ. Also
E _{λ} [X] =
∞
k=0
kP (X = k) =
∞
k ^{λ} ^{k} _{e} −λ k!
k=0
^{=} ^{λ}
˜
thus the method of moments estimator is λ(X) = µˆ _{1} =
Normal The parameters are two, θ = (µ, σ ^{2} ). We need 2 equations. We know that E[X] = µ and V ar[X] = σ ^{2} . Thus
µ _{1} = µ, µ _{2} = µ ^{2} + σ ^{2}
1 n ^{X} i
The method of moments estimator is µ˜ = µˆ _{1} ,
1
σ˜ ^{2} = µˆ _{2} − µˆ _{1} ^{2} =
=
=
1
_{n}
1
_{n}
^{} (x
^{} (x
2
i
2
i
n ^{x}
2
2 i − x¯ ^{2}
− x¯) ^{2} + _{n} ^{} x _{i} x¯ − x¯ ^{2} − x¯ ^{2}
−
x¯) ^{2}
Maximum Likelihood
Likelihood is the density of the data as function of θ.
When data i.i.d.
(independent and identically distributed) with density f _{θ} , then
L(θ) = f _{θ} (X _{1} ,
,
X _{n} ) =
n
i=0
ˆ
f _{θ} (X _{i} )
Maximum likelihood estimate (MLE), θ, is the θ that maximizes the
likelihood.
ˆ
The idea is that θ is the value of θ for which the sample is most likely.
Finding MLE
• Helpful to take log
• Usually use calculus (L ^{} (θ) = 0)
• Remember to check values at boundaries and second derivatives
MLE: Example
Poisson
L(θ) =
n
i=0
_{λ} X _{i} _{e} −λ
X _{i} !
take logs (argmax log f (x) = argmax f (x))
x
x
l(θ) = log L(θ) =
n
i=0
(X _{i} log λ − λ − log X _{i} !)
ˆ
i.e. the MLE is θ = _{n} ^{} X _{i} .
1
l ^{} (θ) =
1
_{λ} ^{} X _{i} − n
In this example, method of moments and MLE give same answer.
Note, we don’t really need X = (X _{1} ,
T(X) = ^{} X _{i} is called a suﬃcient statistic.
, X _{n} ) here, only the sum.
Bias and MSE
˜
θ(X) is an estimator of θ. Then
Note
˜
Bias
MSE
˜
θ ˜ E _{θ} [ θ(X)] − θ
˜
θ ˜ E _{θ} [( θ(X) − θ) ^{2} ]
=
=
 θ(X) is random (function of sample)
 Bias, MSE are nonrandom
 Bias, MSE are functions of θ
˜
When E θ = θ, the estimator is called unbiased.
Bias Example: Sample variance
For an i.i.d. sample X _{1} ,
, X _{n} from a N (µ, σ ^{2} ), the method of moments
and maximum likelihood estimators coincide and are
this estimator is biased:
E[σˆ ^{2} ]
=
=
=
_{=}
which implies
σˆ ^{2} =
1
n
i
(X _{i} − X) ^{2}
¯
1
_{n}
1
_{n}
E ^{} (X i −
2
2X _{i} X + X ^{2} )
¯
¯
E ^{} X
2
i
− n X ^{2} = E[X
¯
2
1
(σ ^{2} + µ ^{2} ) − ^{σ} ^{2} + µ ^{2}
_{σ} _{2} n − 1
n
n
Bias = E[σˆ ^{2} ] − σ ^{2} = − ^{σ} ^{2}
n
] − E[ X ^{2} ]
¯
Bias Example: Sample variance
From last lecture, sample variance is
S
2
n =
1
n − 1
i
(X _{i} − X) ^{2}
¯
by preceding derivation, S _{n} is an unbiased estimate of σ ^{2} .
2
Frequently method of moments estimators and MLEs are biased and can be made slightly better by a small change.
Biasvariance tradeoﬀ
We would like no bias and low variance.
Often there is a choice:
 low bias, high variance
 some bias, some variance
 high bias, low variance Illustration
Mean Squared Error
The MSE combines the bias and the variance of the estimator.
MSE _{˜}
θ
=
E
( θ − θ) ^{2}
˜
˜
= E θ − E[ θ] + E[ θ] − θ ^{2}
˜
˜
= E θ − E[ θ] ^{2} + E (E[ θ] − θ) ^{2} + 2E θ − E[ θ] E E[ θ] − θ
˜
= V ar[ θ] + Bias ^{2}
˜
˜
˜
˜
˜
˜
Bias and variance sometimes called precision and accuracy.
MSE Example: Sample Variance
2
To compute the MSE of S _{n} , recall that under independence and normality
¯
^{} _{i} (X _{i} − X) ^{2}
2
n−1
∼ χ
σ ^{2} which has mean n − 1 and variance 2(n − 1). Thus
MSE _{S} _{n} ^{=}
2
Bias ^{2} + V ar[S _{n} ] = V ar
2
1
n − 1
i
=
σ ^{2}
_{1} 2 × 2(n − 1) =
n −
^{2}^{σ} ^{4} n − 1
for the MLE, however
(X _{i} − X) ^{2}
¯
MSE _{σ}_{ˆ} _{2}
=
=
Bias ^{2} + V ar[ˆσ ^{2} ] = − ^{σ} ^{2} 2 + V ar
n
1
n
i
^{σ} ^{4} _{+} _{σ} _{4} 2(n − 1)
n
^{2}
n ^{2}
^{=}
_{σ} _{4} 2n − 1
n
^{2}
< MSE _{S}
2
n
(X _{i} − X) ^{2}
¯
MSE Example: Sample Variance
MSE perhaps not natural for scale parameters. estimator is
1
n + 1
i
(X _{i} − X) ^{2}
¯
n+1
Asymptotically identical (lim _{n}_{−}_{1} = 1).
In fact, minimum MSE
Method of moments and MLE estimators are rarely unbiased.
However, MLE has nice asymptotic properties.
Example: Gamma distribution, one observation
Recall that Γ(x) = ^{}
0
∞ _{t} x−1 _{e} −tx _{d}_{t}_{.}
Important property: Γ(α) = (α − 1)Γ(α − 1) for α > 0. Gamma distribution with parameters α, β > 0:
β
α
f _{α}_{,}_{β} (x) = _{Γ}_{(}_{α}_{)} x ^{α}^{−}^{1} e ^{−}^{β}^{x} , x > 0
We have one observation, X ∈ R _{+} , we know α. Find MLE for β; compute bias and MSE. Solution:
l(β) 
= 
l ^{} (β) 
= 
α log(β) − log(Γ(α)) + (α − 1) log(x) − βx
α
_{β} − x = 0
ˆ
our candidate is β = ^{α}
x ^{.}
Example: Gamma distribution, one observation (2)
Check:
and ^{α}
x
> 0, i.e.
ˆ
α
l ^{}^{} (β) = − _{β} _{2} < 0
β is in parameter space. Moments:
ˆ
E _{β} [ β]
=
_{=}
=
∞
0
α
x
β ^{α}
Γ(α) _{x} α−1 _{e} −βx _{d}_{x}
_{α}_{β} Γ(α − 1) Γ(α)
β
α
α − 1
^{} ∞
0
_{β} α−1
Γ(α − 1) _{x} (α−1)−1 _{e} −βx _{d}_{x}
and in same way
_{β} _{2} _{]} _{=} _{α} _{2} _{β} _{2} Γ(α − 2)
Γ(α)
E _{β} [ ^{ˆ}
=
β ^{2}
^{α} ^{2}
(α − 1)(α − 2)
Example: Gamma distribution, one observation (3)
ˆ
E _{β} [ β] = β _{α}_{−}_{1}
α
ˆ
and E _{β} [ β ^{2} ] = β ^{2}
α
2
(α−1)(α−2) ^{.} ^{s}^{o}
ˆ
Bias = E β − β =
α
_{−} _{1} − 1 β =
α
^{β}
α − 1
MSE 
= 
= 

= 

_{=} 
ˆ
Bias ^{2} + V ar[ β]
β ^{2}
α
1 − 1 2
+ E[ β ^{2} ] − E[ β] ^{2}
ˆ
ˆ
β ^{2}
_{β}
_{2}
^{1}
(α − 1) ^{2} ^{+} α + 2
α
^{2}
(α − 1)(α − 2) ^{−}
(α − 1)(α − 2)
2
(α − 1) ^{2}
α
Properties of MLE
Invariance
MLEs are invariant under transformations.
Ex: In Poisson model, θ =
ˆ
invariance, θ =
1
λ ^{=}
ˆ
n
^{} X i ^{.}
_{λ} 1 measures waiting time between observations. By
Consistency
ˆ
θ _{n} is the MLE obtained from X _{1} ,
conditions,
ˆ
θ
n
, X _{n} .
P
→ θ
Compare with statements of:
ˆ
 unbiasedness ( E[ θ] = θ),
 strong consistency,
ˆ
θ _{n} → θ a.s
Then, under minimal technical
Method of moments estimators often also consistent.
Properties of MLE (2)
Theorem (CramerRao) Under conditions, notably
V
_{θ}_{(}_{X}_{)}_{]} _{≥} (1 + Bias ^{} (θ)) ^{2}
nI(θ)
ar _{θ} [ ^{˜}
^{.}
∂θ ^{E} ^{˜}
∂
˜
θ(X) = ^{} θ _{∂}_{θ} f (xθ)dx,
∂
where
I(θ) = E _{θ} _{∂}_{θ} log f (Xθ) 2 = −E _{θ}
∂
∂
^{2}
∂θ
_{2}
is the Fisher information of f _{θ} .
log f (xθ)
Theorem (Fisher) MLE achieves bound asymptotically and
^{√} n θ − θ
ˆ
→ N 0,
D
I(θ)
1
MLE is asymptotically eﬃcient, i.e. attains lowest possible variance.
Example: Gamma, n observations
Let X _{1} ,
, X _{n} be i.i.d. Gamma(α, β), α and β unknown.
L(α, β) =
l(α, β)
l _{α} (α, β)
l _{β} (α, β)
=
=
=
n
i=1
f _{α}_{,}_{β} (x _{i} ) =
n
i=1
β ^{α}
Γ(α) ^{x}
α−1 _{e} −βx _{i}
i
^{} (α log β − log Γ(α) + (α − 1) log x _{i} − βx _{i} )
n log β + ^{} log x _{i} − n ^{Γ} Γ(α) ^{} ^{(}^{α}^{)}
n ^{α}
_{β} − ^{} x _{i}
Last equation:
Hard to compute small sample bias and MSE.
Use asymptotic methods.
ˆ
_{β} _{=}
nαˆ
_{x} _{i} . Must solve l _{α} (α, β) = 0 numerically.
Example: Gamma, n observations (2)
Fisher information: nI(θ) = −E[l ^{}^{} (θ)].
l _{α}_{,}_{α} (α, β) 
= 
l _{α}_{,}_{β} (α, β) 
= 
l _{β}_{,}_{β} (α, β) 
= 
_{−}_{n} ^{} ^{Γ} ^{} ^{(}^{α}^{)} Γ(α)
^{n}
β
−n
α
β ^{2}
^{} ^{}
_{=}
_{−}_{n} Γ ^{}^{} (α)Γ(α) − Γ ^{} (α)Γ ^{} (α) (Γ(α)) ^{2}
we can use e.g. the approximation
^{√} n( β − β) ∼ N 0, ^{β} ^{2}
ˆ
α
i.e.
β ∼ N β,
ˆ
2
β
nα
Much more than documents.
Discover everything Scribd has to offer, including books and audiobooks from major publishers.
Cancel anytime.