You are on page 1of 21

Data Analysis and Statistical Arbitrage

Lecture 2: Point estimation Oli Atlason

Outline

Parametric Statistical Models

Method of Moments and Maximum Likelihood

Bias and MSE

Asymptotic Properties

Parametric Statistical Models

Example We observe X 1 ,

, X n .

X i : number of alpha particles emitted by sample during the i-th time interval of an experiment

Natural model: X i Poisson(λ) and X 1 ,

, X n independent.

Poisson distribution

P λ (X i = k) = p Poisson

λ

(k) = λ k k!

e

λ

, k N, λ > 0

Formal model consists, for a given n, of the family

p Poisson

λ

(X i ) λR +

here parameter space is R +

Parametric Statistical Models

General definition A parametric statistical model for observations X = (X 1 , {f θ (x)} θΘ of probability distributions.

We want to know which f θ is responsible.

˜

, X n ) is a family

Point estimator: a function θ(x) of the observations. This is a guess at

what θ was used to generate data.

Sampling distribution θ(X) takes random values. It has a distribution derived from X.

Method of Moments

is a simple way of finding an estimator. k-th sample moment

k-th population moment

1

µˆ k = n (X i ) k

µ k (θ) = E θ [X k ]

Now solve for θ in the system (µ k (θ) = µˆ k ) k=1,

,p

.

Usually need as many equations (p) as the number of parameters.

Method of Moments: Examples

Poisson For Poisson, θ = λ. Also

E λ [X] =

k=0

kP (X = k) =

k λ k e λ k!

k=0

= λ

˜

thus the method of moments estimator is λ(X) = µˆ 1 =

Normal The parameters are two, θ = (µ, σ 2 ). We need 2 equations. We know that E[X] = µ and V ar[X] = σ 2 . Thus

µ 1 = µ, µ 2 = µ 2 + σ 2

1 n X i

The method of moments estimator is µ˜ = µˆ 1 ,

1

σ˜ 2 = µˆ 2 µˆ 1 2 =

=

=

1

n

1

n

(x

(x

2

i

2

i

n x

2

2 i x¯ 2

x¯) 2 + n x i x¯ x¯ 2 x¯ 2

x¯) 2

Maximum Likelihood

Likelihood is the density of the data as function of θ.

When data i.i.d.

(independent and identically distributed) with density f θ , then

L(θ) = f θ (X 1 ,

,

X n ) =

n

i=0

ˆ

f θ (X i )

Maximum likelihood estimate (MLE), θ, is the θ that maximizes the

likelihood.

ˆ

The idea is that θ is the value of θ for which the sample is most likely.

Finding MLE

Helpful to take log

Usually use calculus (L (θ) = 0)

Remember to check values at boundaries and second derivatives

MLE: Example

Poisson

L(θ) =

n

i=0

λ X i e λ

X i !

take logs (argmax log f (x) = argmax f (x))

x

x

l(θ) = log L(θ) =

n

i=0

(X i log λ λ log X i !)

ˆ

i.e. the MLE is θ = n X i .

1

l (θ) =

1

λ X i n

In this example, method of moments and MLE give same answer.

Note, we don’t really need X = (X 1 ,

T(X) = X i is called a sufficient statistic.

, X n ) here, only the sum.

Bias and MSE

˜

θ(X) is an estimator of θ. Then

Note

˜

Bias

MSE

˜

θ ˜ E θ [ θ(X)] θ

˜

θ ˜ E θ [( θ(X) θ) 2 ]

=

=

- θ(X) is random (function of sample)

- Bias, MSE are non-random

- Bias, MSE are functions of θ

˜

When E θ = θ, the estimator is called unbiased.

Bias Example: Sample variance

For an i.i.d. sample X 1 ,

, X n from a N (µ, σ 2 ), the method of moments

and maximum likelihood estimators coincide and are

this estimator is biased:

E[σˆ 2 ]

=

=

=

=

which implies

σˆ 2 =

1

n

i

(X i X) 2

¯

1

n

1

n

E (X i

2

2X i X + X 2 )

¯

¯

E X

2

i

n X 2 = E[X

¯

2

1

(σ 2 + µ 2 ) σ 2 + µ 2

σ 2 n 1

n

n

Bias = E[σˆ 2 ] σ 2 = σ 2

n

] E[ X 2 ]

¯

Bias Example: Sample variance

From last lecture, sample variance is

S

2

n =

1

n 1

i

(X i X) 2

¯

by preceding derivation, S n is an unbiased estimate of σ 2 .

2

Frequently method of moments estimators and MLEs are biased and can be made slightly better by a small change.

Bias-variance trade-off

We would like no bias and low variance.

Often there is a choice:

- low bias, high variance

- some bias, some variance

- high bias, low variance Illustration

Often there is a choice: - low bias, high variance - some bias, some variance -
Often there is a choice: - low bias, high variance - some bias, some variance -
Often there is a choice: - low bias, high variance - some bias, some variance -

Mean Squared Error

The MSE combines the bias and the variance of the estimator.

MSE ˜

θ

=

E

( θ θ) 2

˜

˜

= E θ E[ θ] + E[ θ] θ 2

˜

˜

= E θ E[ θ] 2 + E (E[ θ] θ) 2 + 2E θ E[ θ] E E[ θ] θ

˜

= V ar[ θ] + Bias 2

˜

˜

˜

˜

˜

˜

Bias and variance sometimes called precision and accuracy.

MSE Example: Sample Variance

2

To compute the MSE of S n , recall that under independence and normality

¯

i (X i X) 2

2

n1

χ

σ 2 which has mean n 1 and variance 2(n 1). Thus

MSE S n =

2

Bias 2 + V ar[S n ] = V ar

2

1

n 1

i

=

σ 2

1 2 × 2(n 1) =

n

2σ 4 n 1

for the MLE, however

(X i X) 2

¯

MSE σˆ 2

=

=

Bias 2 + V arσ 2 ] = σ 2 2 + V ar

n

1

n

i

σ 4 + σ 4 2(n 1)

n

2

n 2

=

σ 4 2n 1

n

2

< MSE S

2

n

(X i X) 2

¯

MSE Example: Sample Variance

MSE perhaps not natural for scale parameters. estimator is

1

n + 1

i

(X i X) 2

¯

n+1

Asymptotically identical (lim n1 = 1).

In fact, minimum MSE

Method of moments and MLE estimators are rarely unbiased.

However, MLE has nice asymptotic properties.

Example: Gamma distribution, one observation

Recall that Γ(x) =

0

t x1 e tx dt.

Important property: Γ(α) = (α 1)Γ(α 1) for α > 0. Gamma distribution with parameters α, β > 0:

β

α

f α,β (x) = Γ(α) x α1 e βx , x > 0

We have one observation, X R + , we know α. Find MLE for β; compute bias and MSE. Solution:

l(β)

=

l (β)

=

α log(β) log(Γ(α)) + (α 1) log(x) βx

α

β x = 0

ˆ

our candidate is β = α

x .

Example: Gamma distribution, one observation (2)

Check:

and α

x

> 0, i.e.

ˆ

α

l (β) = β 2 < 0

β is in parameter space. Moments:

ˆ

E β [ β]

=

=

=

0

α

x

β α

Γ(α) x α1 e βx dx

αβ Γ(α 1) Γ(α)

β

α

α 1

0

β α1

Γ(α 1) x (α1)1 e βx dx

and in same way

β 2 ] = α 2 β 2 Γ(α 2)

Γ(α)

E β [ ˆ

=

β 2

α 2

(α 1)(α 2)

Example: Gamma distribution, one observation (3)

ˆ

E β [ β] = β α1

α

ˆ

and E β [ β 2 ] = β 2

α

2

(α1)(α2) . so

ˆ

Bias = E β β =

α

1 1 β =

α

β

α 1

MSE

=

=

=

=

ˆ

Bias 2 + V ar[ β]

β 2

α

1 1 2

+ E[ β 2 ] E[ β] 2

ˆ

ˆ

β 2

β

2

1

(α 1) 2 + α + 2

α

2

(α 1)(α 2)

(α 1)(α 2)

2

(α 1) 2

α

Properties of MLE

Invariance

MLEs are invariant under transformations.

Ex: In Poisson model, θ =

ˆ

invariance, θ =

1

λ =

ˆ

n

X i .

λ 1 measures waiting time between observations. By

Consistency

ˆ

θ n is the MLE obtained from X 1 ,

conditions,

ˆ

θ

n

, X n .

P

θ

Compare with statements of:

ˆ

- unbiasedness ( E[ θ] = θ),

- strong consistency,

ˆ

θ n θ a.s

Then, under minimal technical

Method of moments estimators often also consistent.

Properties of MLE (2)

Theorem (Cramer-Rao) Under conditions, notably

V

θ(X)] (1 + Bias (θ)) 2

nI(θ)

ar θ [ ˜

.

∂θ E ˜

˜

θ(X) = θ θ f (x|θ)dx,

where

I(θ) = E θ θ log f (X|θ) 2 = E θ

2

∂θ

2

is the Fisher information of f θ .

log f (x|θ)

Theorem (Fisher) MLE achieves bound asymptotically and

n θ θ

ˆ

→ N 0,

D

I(θ)

1

MLE is asymptotically efficient, i.e. attains lowest possible variance.

Example: Gamma, n observations

Let X 1 ,

, X n be i.i.d. Gamma(α, β), α and β unknown.

L(α, β) =

l(α, β)

l α (α, β)

l β (α, β)

=

=

=

n

i=1

f α,β (x i ) =

n

i=1

β α

Γ(α) x

α1 e βx i

i

(α log β log Γ(α) + (α 1) log x i βx i )

n log β + log x i n Γ Γ(α) (α)

n α

β x i

Last equation:

Hard to compute small sample bias and MSE.

Use asymptotic methods.

ˆ

β =

ˆ

x i . Must solve l α (α, β) = 0 numerically.

Example: Gamma, n observations (2)

Fisher information: nI(θ) = E[l (θ)].

l α,α (α, β)

=

l α,β (α, β)

=

l β,β (α, β)

=

n Γ (α) Γ(α)

n

β

n

α

β 2

=

n Γ (α)Γ(α) Γ (α (α) (Γ(α)) 2

we can use e.g. the approximation

n( β β) ∼ N 0, β 2

ˆ

α

i.e.

β ∼ N β,

ˆ

2

β