84 views

Uploaded by Søren Larson

Newton Raphson Maximum Likelihoo

save

- Mws Civ Nle Txt Newton Examples
- Numerical Analysis
- Nonlinear Panel Data
- Test1 Fskm Set1 Mac2016
- Aaron Btown - Wald’s Series
- Lecture 7
- Fixit
- ch6-soltion
- Chapter 3 Solutions
- Metode Numerik
- Mancin_Statistical Methods for the Calculation of Vaccine Protective Dose 50
- A New Model for I-V Characteristic
- tmpD6F3.tmp
- What is the Expectation Maximization Algorithm?
- 11. Mathematics - Ijmcar -Study of Distance Associated With Marriage Migration _1
- IJSCMC
- IJSCMC
- howto_en
- IJSCMC
- IJSCMC
- Thesis Prospectus Elan Markov
- IJSCMC
- IJSCMC.docx
- Step 2
- System Theory for Numerical Analysis
- IJSCMC
- UT Dallas Syllabus for math6318.501 06f taught by Mohammad Hooshyar (ali)
- IJSCMC
- Lecture1(1)
- Test
- Gelman- Data Analysis With Regressions and MultiLevel Hierarchical Models
- Measuring Multi Channel 12 (1)
- Experiments with Non Parametric Topic Models
- Haiti Fail
- Fraleigh Summary
- Gaussian Hidden Markov Fields Notes

You are on page 1of 3

**STA705 Spring 2006
**

Let f(x) be a function (possibly multivariate) and suppose we are interested in determining

the maximum of f and, often more importantly, the value of x which maximizes f. The most

common statistical application of this problem is ﬁnding a Maximum Likelihood Estimate (MLE).

This document discusses the Newton Raphson method

1 Motivation

Newton Raphson maximization is based on a Taylor series expansion of the function f(x). Specif-

ically, if we expand f(x) about a point a,

f(x) ≈ f(a) + (x −a)

T

f

(a) +

1

2

(x −a)

T

f

(a)(x −a)

where f

(·) is the gradient vector and f

**(·) is the hessian matrix of second derivatives. This creates
**

a quadratic approximation for f. We know how to maximize a quadratic function (take derivatives,

set equal to zero, and solve)

d

dx

f(a) + (x −a)

T

f

(a) +

1

2

(x −a)

T

f

(a)(x −a) = f

(a) + (x −a)

T

f

(a) = 0

x = a −[f

(a)]

T

[f

(a)]

−1

The Newton Raphson process iterates this equation. Speciﬁcally, let x

0

be a starting point for

the algorithm and deﬁne successive estimates x

1

, x

2

, . . . recursively through the equation

x

i+1

= x

i

−[f

(x

i

)]

T

[f

(x

i

)]

−1

If the function f(x) is quadratic, then of course the quadratic “approximation” is exact and the

Newton Raphson method converges to the maximum in one iteration. If the function is concave,

then the Newton Raphson method is gauranteed to converge to the correct answer. If the function

is convex for some values of x, then the algorithm may or may not converge. The NR algorithm may

converge to a local maximum and not the global maximum, it might converge to a local minimum,

or it might cycle between two points. Starting the algorithm near the global maximum is the best

practical method for helping convergence to the global maximum.

Fortunately, loglikelihoods are typically approximately quadratic (the reason asymptotically

normality occurs for many random variables). Thus, the NR algorithm is an obvious choice for

ﬁnding MLEs. The starting value for the algorithm is often a “simpler” estimate (in terms of ease

of computation) of the parameter, such as a method of moments estimator.

1

Example

Let X

1

, . . . , X

n

∼ Gamma(α, β) and suppose we want the joint maximum likelihood estimate

of (α, β). The loglikelihood is

lnf(x|α, β) = ln

_

β

nα

Γ

n

(α)

_

x

i

_

α−1

exp

_

−β

x

i

_

_

= nαlnβ −nlnΓ(α) + (α −1)

lnx

i

−β

x

i

Solving for the MLE analytically requires solving the equations

∂

∂α

lnf(x|α, β) = nlnβ −n

∂ lnΓ(α)

∂α

+

lnx

i

= 0

∂

∂β

lnf(x|α, β) =

nα

β

−

x

i

= 0

These two equations cannot be solved analytically (the Gamma function is diﬃcult to work with).

The two equations do provide us with the gradient

f

(α, β) =

_

nlnβ −n

∂ ln Γ(α)

∂α

+

lnx

i

(nα/β) −

x

i

_

The hessian matrix is

f

(α, β) =

_

−n

∂

2

ln Γ(α)

∂α

2

n/β

n/β −nα/β

2

_

The starting values of the algorithm may be found using the method of moments. Since E[X

i

] =

α/β and V [X

i

] = α/β

2

, the method of moment estimators are α

M

= ¯ x

2

/s

2

and β

M

= ¯ x/s

2

.

Suppose we have data (in truth actually generated with α = 2 and β = 3) such that n = 1000,

ln

X

i

= −646.0951,

¯

X = 0.6809364, and s

2

= 0.2235679. The algorithm begins at α

M

=

2.073976 and β

M

= 3.04577. After 3 iterations, the NR algorithm stabilizes at ˆ α = 2.060933 and

ˆ

β = 3.026616.

Note that there is no need to treat this situation as a multivariate problem. The partial

derivative for β may be solved in terms of α.

∂

∂β

lnf(x|α, β) =

nα

β

−

x

i

= 0

β = nα/

x

i

= α/¯ x

Thus, for each value of α, the maximum of β is attained at β = nα/

x

i

. Thus, we can reduce the

problem to the one-dimensional problem of maximizing

2

lnf(x|α, β = α/¯ x) = nαln(α/¯ x) −nlnΓ(α) + (α −1)

lnx

i

−(α/¯ x)

x

i

= nαlnα −nαln ¯ x −nlnΓ(α) + (α −1)

lnx

i

−nα

This is called the proﬁle loglikelihood. The ﬁrst and second derivatives are

∂

∂α

= n[1 + lnα] −nln ¯ x −n

∂ lnΓ(α)

∂α

+

lnx

i

−n = nlnα −nln ¯ x −n

∂ lnΓ(α)

∂α

+

lnx

i

∂

2

∂α

2

=

n

α

−n

∂

2

lnΓ(α)

∂α

2

Again starting the algorithm at α

M

= 2.073976, the algorithm converges to ˆ α = 2.060933 after

???? iterations.

3

- Mws Civ Nle Txt Newton ExamplesUploaded bydurgasharma123
- Numerical AnalysisUploaded byAsif javed
- Nonlinear Panel DataUploaded byNuur Ahmed
- Test1 Fskm Set1 Mac2016Uploaded byatiqah umairah
- Aaron Btown - Wald’s SeriesUploaded byjockxyz
- Lecture 7Uploaded byJung Yoon Song
- FixitUploaded bySonya Bajwa
- ch6-soltionUploaded byYaserYons
- Chapter 3 SolutionsUploaded byKleilson Chagas
- Metode NumerikUploaded byDzakaTriPutra
- Mancin_Statistical Methods for the Calculation of Vaccine Protective Dose 50Uploaded byJoshua Welch
- A New Model for I-V CharacteristicUploaded byDomingo Ben
- tmpD6F3.tmpUploaded byFrontiers
- What is the Expectation Maximization Algorithm?Uploaded byslash020
- 11. Mathematics - Ijmcar -Study of Distance Associated With Marriage Migration _1Uploaded byTJPRC Publications
- IJSCMCUploaded byijscmc
- IJSCMCUploaded byijscmc
- howto_enUploaded bysaint1386
- IJSCMCUploaded byijscmc
- IJSCMCUploaded byijscmc
- Thesis Prospectus Elan MarkovUploaded byYogeshUnde
- IJSCMCUploaded byijscmc
- IJSCMC.docxUploaded byijscmc
- Step 2Uploaded byManoj Kumar
- System Theory for Numerical AnalysisUploaded byTHuy Dv
- IJSCMCUploaded byijscmc
- UT Dallas Syllabus for math6318.501 06f taught by Mohammad Hooshyar (ali)Uploaded byUT Dallas Provost's Technology Group
- IJSCMCUploaded byijscmc
- Lecture1(1)Uploaded byZaher Sharaf
- TestUploaded byapi-3832391

- Gelman- Data Analysis With Regressions and MultiLevel Hierarchical ModelsUploaded bySøren Larson
- Measuring Multi Channel 12 (1)Uploaded bySøren Larson
- Experiments with Non Parametric Topic ModelsUploaded bySøren Larson
- Haiti FailUploaded bySøren Larson
- Fraleigh SummaryUploaded bySøren Larson
- Gaussian Hidden Markov Fields NotesUploaded bySøren Larson