5 views

Uploaded by Mohammed Abdul Jabbar

asymptotic properties asymptotic properties asymptotic properties asymptotic properties asymptotic properties asymptotic properties

asymptotic properties asymptotic properties asymptotic properties asymptotic properties asymptotic properties asymptotic properties

Attribution Non-Commercial (BY-NC)

- tmpD6F3.tmp
- ACTUARIES Statistical Methods
- 7 pillars of stat wisdom.pdf
- 2013 W Stute Empirical Distributions
- Maximum Likelihood
- Theoretical Bioinformatics and Machine Learning_Hochreiter_ 2013
- HiddenMarkovModel
- Berger - Integrated Likelihood Methods for Eliminating Nuisance Parametres
- MLE of Utility Function by Using Stata
- 1553_117079 (1)
- 06a6.pdf
- Harvest-Based Bayesian Estimation of Sika Deer Populations Using State-space Models
- chap3
- Ecology-0012-9658-086-10-2673
- Robotics (36)
- International Refereed Journal of Engineering and Science (IRJES)
- ps1
- tamhane_1981_916
- Aistats2012
- hpca04b

You are on page 1of 29

In this part of the course, we will consider the asymptotic properties of the maximum likelihood estimator. In particular, we will study issues of consistency, asymptotic normality, and eciency. Many of the proofs will be rigorous, to display more generally useful techniques also for later chapters. We suppose that X n = (X1 , . . . , Xn ), where the Xi s are i.i.d. with common density p(x; 0 ) P = {p(x; ) : }. We assume that 0 is identied in the sense that if = 0 and , then p(x; ) = p(x; 0 ) with respect to the dominating measure .

For xed , the joint density of X n is equal to the product of the individual densities, i.e.,

n

p(xn ; ) =

i=1

p(xi ; )

As usual, when we think of p(xn ; ) as a function of with xn held xed, we refer to the resulting function as the likelihood function, L( ; xn ). The maximum likelihood estimate for observed xn is the (xn ). Prior to observation, value which maximizes L( ; xn ), xn is unknown, so we consider the maximum likelihood estimator, (X n ). MLE, to be the value which maximizes L( ; X n ), Equivalently, the MLE can be taken to be the maximum of the standardized log-likelihood, l ( ; X n ) log L( ; X n ) 1 = = n n n 1 log p(Xi ; ) = n i=1

n n

l ( ; X i )

i=1

P (X n ) 0 1. consistent,

2. asymptotically normal,

( 0 ) (X n ) 0 ) D n( Normal R.V.

3. asymptotically ecient, i.e., if we want to estimate 0 by any other estimator within a reasonable class, the MLE is the most precise. To show 1-3, we will have to provide some regularity conditions on the probability model and (for 3) on the class of estimators that will be considered.

Section 8.1 Consistency We rst want to show that if we have a sample of i.i.d. data from a common distribution which belongs to a probability model, then under some regularity conditions on the form of the density, the (X n )}, will converge in probability to 0 . sequence of estimators, { So far, we have not discussed the issue of whether a maximum likelihood estimator exists or, if one does, whether it is unique. We will get to this, but rst we start with a heuristic proof of consistency.

Heuristic Proof The MLE is the value that maximizes n 1 Q( ; X n ) := n i=1 l( ; Xi ). By the WLLN, we know that 1 Q( ; X n ) = n

n i=1

We expect that, on average, the log-likelihood will be close to the expected log-likelihood. Therefore, we expect that the maximum likelihood estimator will be close to the maximum of the expected log-likelihood. We will show that the expected log-likelihood, Q0 ( ) is maximized at 0 (i.e., the truth).

Lemma 8.1: If 0 is identied and E0 [| log p(X ; )|] < for all , Q0 ( ) is uniquely maximized at = 0 . Proof: By Jensens inequality, we know that for any strictly convex function g (), E [g (Y )] > g (E [Y ]). Take g (y ) = log(y ). So, for = 0 , E0 [ log( Note that E0 [ p (X ; ) ]= p (X ; 0 ) p(x; ) p(x; 0 )d(x) = p(x; 0 ) p(x; ) = 1 p (X ; ) p (X ; ) )] > log(E0 [ ]) p (X ; 0 ) p (X ; 0 )

Q0 (0 ) = E0 [log p(X ; 0 )] > E0 [log p(X ; )] = Q0 ( ) This inequality holds for all = 0 .

Under technical conditions for the limit of the maximum to be the (X n ) should converge in probability to 0 . maximum of the limit, Sucient conditions for the maximum of the limit to be the limit of the maximum are that the convergence is uniform and the parameter space is compact.

The discussion so far only allows for a compact parameter space. In theory compactness requires that one know bounds on the true parameter value, although this constraint is often ignored in practice. It is possible to drop this assumption if the function Q( ; X n ) cannot rise too much as becomes unbounded. We will discuss this later.

10

P ( 0 )

P0 [sup |Q( ; X n ) Q0 ( )| > ] 0 Why isnt pointwise convergence enough? Uniform convergence guarantees that for almost all realizations, the paths in are in the -sleeve. This ensures that the maximum is close to 0 . For pointwise convergence, we know that at each , most of the realizations are in the -sleeve, but there is no guarantee that for another value of the same set of realizations are in the sleeve. Thus, the maximum need not be near 0 .

11

Theorem 8.2: Suppose that Q( ; X n ) is continuous in and there exists a function Q0 ( ) such that 1. Q0 ( ) is uniquely maximized at 0 2. is compact 3. Q0 ( ) is continuous in 4. Q( ; X n ) converges uniformly in probability to Q0 ( ). (X n ) dened as the value of which for each X n = xn then P (X n ) maximizes the objective function Q( ; X n ) satises 0 .

12

Proof: For a positive , dene the -neighborhood about 0 to be ( ) = { : 0 < } We want to show that (X n ) ( )] 1 P 0 [ as n . Since ( ) is an open set, we know that ( )C is a compact set (Assumption 2). Since Q0 ( ) is a continuous function (Assumption 3), then sup( )C {Q0 ( )} is a achieved for a in the compact set. Denote this value by . Since 0 is the unique max, let Q0 (0 ) Q0 ( ) = > 0.

13

Now for any , we distinguish between two cases. Case 1: ( )C . Let An be the event that sup( Then,

)C

An Q( ; X n ) < Q0 ( ) + /2 Q0 ( ) + /2 = Q0 (0 ) + /2 = Q0 (0 ) /2

14

Case 2: ( ). Let Bn be the event that sup( ) |Q( ; X n ) Q0 ( )| < /2. Then, Bn Q( ; X n ) > Q0 ( ) /2 for all Q(0 ; X n ) > Q0 (0 ) /2 By comparing the last expressions for each of cases 1,2, we ( ). But by conclude that if both An and Bn hold then ( )) 1. uniform convergence, pr(An Bn ) 1, so pr(

15

A key element of the above proof is that Q( ; X n ) converges uniformly in probability to Q0 ( ). This is often dicult to prove. A useful condition is given by the following lemma: Lemma 8.3: If X1 , . . . , Xn are i.i.d. p(x; 0 ) {p(x; ) : }, is compact, log p(x; ) is continuous in for all and all x X , and if there exists a function d(x) such that | log p(x; )| d(x) for all and x X , and E0 [d(X )] < , then i. Q0 ( ) = E0 [log p(X ; )] is continuous in ii. sup |Q( ; X n ) Q0 ( )| 0 Example: Suicide seasonality and von Mises distribution (in class)

P

16

Proof: We rst prove the continuity of Q0 ( ). For any , choose a sequence k which converges to . By the continuity of log p(x; ), we know that log p(x; k ) log p(x; ). Since | log p(x; k )| d(x), the dominated convergence theorem tells us that Q0 (k ) = E0 [log p(X ; k )] E0 [log p(X ; )] = Q0 ( ). This implies that Q0 ( ) is continuous. Next, we work to establish the uniform convergence in probability. We need to show that for any , > 0 there exists N ( , ) such that for all n > N ( , ), P [sup |Q( ; X n ) Q0 ( )| > ] <

17

Since log p(x; ) is continuous in and since is compact, we know that log p(x; ) is uniformly continuous (see Rudin, page 90). Uniform continuity says that for all > 0 there exists a ( ) > 0 such that | log p(x; 1 ) log p(x; 2 )| < for all 1 , 2 for which 1 2 < ( ). This is a property of a function dened on a set of points. In contrast, continuity is dened relative to a particular point. Continuity of a function at a point says that for all > 0, there exists a ( , ) such that | log p(x; ) log p(x; )| < for all for which < ( , ). For continuity, depends on and . For uniform continuity, depends only on . In general, uniform continuity is stronger than continuity; however, they are equivalent on compact sets.

18

Aside: Uniform Continuity vs. Continuity Consider f (x) = 1/x for x (0, 1). This function is continuous for each x (0, 1). However, it is not uniformly continuous. Suppose this function was uniformly continuous. Then, we know for any > 0, we can nd a ( ) > 0 such that |1/x1 1/x2 | < for all x1 , x2 such that |x1 x2 | < ( ). Given an > 0, consider the points x1 and x2 = x1 + ( )/2. Then, we know that ( )/2 1 1 1 1 |= | |=| x1 x2 x1 x1 + ( )/2 x1 (x1 + ( )/2)

) 1 Take x1 suciently small so that x1 < min( ( 2 , 2 ). This implies that ( )/2 1 ( )/2 > = > x1 (x1 + ( )/2) x1 ( ) 2x1

This is a contradiction.

19

{(1 ,2 ): 1 2 < }

(1)

as 0. By the assumption of Lemma 8.3, we know that (x, ) 2d(x) for all . By the dominated convergence theorem, we know that E0 [(X, )] 0 as 0. Now, consider open balls of length about each , i.e., : < }. The union of these open balls contains B (, ) = { . This union is an open cover of . Since is a compact set, we know that there exists a nite subcover, which we denote by {B (j , ), j = 1, . . . , J }.

20

Taking a , by the triangle inequality, we know that |Q( ; X n ) Q0 ( )| |Q( ; X n ) Q(j ; X n )| + |Q(j ; X n ) Q0 (j )| + |Q0 (j ) Q0 ( )| (2) (3) (4)

n

i=1

and this is less than or equal 1 n 1 | log p(Xi ; ) log p(Xi ; j )| n i=1

n n

(Xi , )

i=1

21

{(1 ,2 ): 1 2 < }

|Q0 (1 ) Q0 (2 )|

Since Q0 ( ) is uniformly continuous, we know that this bound can be made arbitrarily small by choosing to be small. That is, this bound can be made less than /3, for a < 3 . We also know that (3) is less than

j =1,...,J

max |Q(j ; X n ) Q0 (j )|

22

(Xi , )

i=1 j =1,...,J

+ max |Q(j ; X n ) Q0 (j )| + /3

23

n

i=1 n i=1 j =1,...,J

(5) (6)

j =1,...,J

Now, we can show that we can take n suciently large so that (5) and (6) can be made small.

24

n

i=1

We already have demonstrated that E0 [(X, )] 0 as 0. Choose small enough (< 1 )that E0 [(X, )] < /6. Call this number 1 . Take < min(1 , 3 ). Then (5) is less than 1 P0 [ n

n

i=1

By the WLLN, we know that there exists N1 ( , ) so that for all n > N1 ( , ), the above term is less than /2. So, (5) is less than /2.

25

For the considered so far, nd the nite subcover {B (j , ), j = 1, . . . , J }. Now, (6) is equal to P 0 [J j =1 {|Q(j ; X n )Q0 (j )| > /3}]

J

j =1

By the WLLN, we know that for each j and for any > 0, we know there exists N2j ( , ) so that for all n > N2j ( , ) P0 [|Q(j ; X n ) Q0 (j )| > /3] /(2J ) Let N2 ( , ) = maxj =1,...,J {N2j }. Then, for all n > N2 ( , ), we know that

J

j =1

26

Combining the results for (5) and (6), we have demonstrated that there exists an N ( , ) = max(N1 ( , ), N2 ( , )) so that for all n > N ( , ), P0 [sup |Q( ; X n ) Q0 ( )| > ] <

Q.E.D.

27

Other conditions that might be useful to establish uniform convergence in probability are given in the lemmas below. Lemma 8.4 may be useful when the data are not independent. Lemma 8.4: If is compact, Q0 ( ) is continuous in , Q( ; X n ) Q0 ( ) for all , and there is an > 0 and C (X n ) , which is bounded in probability such that for all , X n ) Q(, X n )| C (X n ) |Q(, then,

P P

Aside: C (X n ) is bounded in probability if for every exists N ( ) and ( ) > 0 such that P0 [|C (X n )| > ( )] <

28

29

is Not Compact Suppose that we are not willing to assume that is compact. One way around this is bound the objective function from above uniformly in parameters that are far away from the truth. For example, suppose that there is a compact set D such that E0 [ sup {log p(X ; )}] < Q0 (0 ) = E0 [log p(X ; 0 )]

D C

By the law of large numbers, we know that with probability approaching one that sup {Q( ; X n )} < 1 n 1 n

n

sup

C i=1 D n

D C

log p(Xi ; )

i=1

30

(X n ) Therefore, with probability approaching one, we know that is in the compact set D . Then, we can apply the previous theory to show consistency. The following lemma can be used in cases where the objective function is concave. Lemma 8.5: If there is a function Q0 ( ) such that i. Q0 ( ) is uniquely maximized at 0 ii. 0 is an element of the interior of a convex set (does not have to be bounded) iii. Q(, xn ) is concave in for each xn , and iv. Q( ; X n ) Q0 ( ) for all (X n ) exists with probability approaching one and then P (X n ) 0 .

P

- tmpD6F3.tmpUploaded byFrontiers
- ACTUARIES Statistical MethodsUploaded byvg284
- 7 pillars of stat wisdom.pdfUploaded byreub33
- 2013 W Stute Empirical DistributionsUploaded byAmber Miller
- Maximum LikelihoodUploaded byccffff
- Theoretical Bioinformatics and Machine Learning_Hochreiter_ 2013Uploaded byEAP
- HiddenMarkovModelUploaded byMıghty Itauma
- Berger - Integrated Likelihood Methods for Eliminating Nuisance ParametresUploaded byJose Espinoza
- MLE of Utility Function by Using StataUploaded bymeriem
- 1553_117079 (1)Uploaded bydivya
- 06a6.pdfUploaded byPETER
- Harvest-Based Bayesian Estimation of Sika Deer Populations Using State-space ModelsUploaded byMarcelo Kittlein
- chap3Uploaded byjosels
- Ecology-0012-9658-086-10-2673Uploaded bytoltsyupier
- Robotics (36)Uploaded byapi-19801502
- International Refereed Journal of Engineering and Science (IRJES)Uploaded bywww.irjes.com
- ps1Uploaded bypatrick
- tamhane_1981_916Uploaded bydavparan
- Aistats2012Uploaded byGautam Dematti
- hpca04bUploaded byxavo_27
- Determinants of Economic Growth a Bayesian Panel Data ApproachUploaded byallokot
- Article 2055Uploaded byStapinul Retelelor
- g 03405049058Uploaded bytheijes
- Lecture on Volatility.pptUploaded byMayur Hampiholi
- pr_l6(1)Uploaded bycrennydane
- ARL-TR-5508Uploaded bygoliatrgs
- 38. IJASR - Concept of Conditional Reference Prior in the Study of GenotypicUploaded byTJPRC Publications
- lecture12(2) (1)Uploaded byMichael Mccoy
- Arch06v40n1 IxUploaded byankitag612
- Ayesha GroupUploaded bysohaill_100

- Mean Square Error Method 123Uploaded byMohammed Abdul Jabbar
- Automation System Based on SIMATIC S7 300Uploaded byMohammed Abdul Jabbar
- Advanced Control Systems (2)Uploaded byMohammed Abdul Jabbar
- Digital Control Systems (4)Uploaded byMohammed Abdul Jabbar
- Power Plant InstrumentationUploaded byiamvarkey
- Paper on Magnetic LevitationUploaded byAshutosh Agrawal
- Basic of Kalman FilterUploaded byWaseem Asif
- Sensors Kalman FiltersUploaded byabdullahnisar92
- Transco AE NotificationUploaded byFreshers Plane India
- part1(1)Uploaded byMohammed Abdul Jabbar
- Lecture2 SmallUploaded byMohammed Abdul Jabbar
- 1Uploaded byhvrk
- Wireless Distributed Sensor NetworksverhaerUploaded byMohammed Abdul Jabbar
- r09-Advanced Control Systems Nov 2012Uploaded byMohammed Abdul Jabbar
- 09-Advanced Control Systems April 2010Uploaded byMohammed Abdul Jabbar
- Minimum Mean Square Error Equalization With PriorsUploaded byAchref Methenni
- Management scienceUploaded byMohammed Abdul Jabbar
- seminar reportUploaded byMohammed Abdul Jabbar
- Control Sys Two ExpUploaded byMohammed Abdul Jabbar
- Tutorial on Maximum Likelihood EstimationUploaded byapi-3861635
- Jntuworld.com [JW]Uploaded byMohammed Abdul Jabbar
- Intelligent InstrumentationUploaded byMohammed Abdul Jabbar
- Sirpur Paper ManufacturingUploaded byMohammed Abdul Jabbar

- jsdnlaUploaded byYrra Espino
- For a sociology of India - AhmadUploaded byAlberto Bórquez Paredes
- Hr Management TitlesUploaded bykavitachordiya86
- Palyul Times (2015)Uploaded byShenphen Ozer
- AbelsProof Peter PesicUploaded bym9n8o
- 1390569749Uploaded byr4aden
- Making Sense Of Politics Course OutlineUploaded byRXillusionist
- m5a1-edc 257 mini-lesson and reflectionUploaded byapi-318466277
- Introduction to the Early-Buddhist-TextsUploaded byaurora7755
- FULLTEXT01.pdfUploaded byRoom MusicProd
- Syllabus - B Phil Oxford, Intention LawUploaded byNikolai Hambone
- The Bilingual BrainUploaded byJessica Durham
- Alvarez-Galvez - 2015 - Network Models of Minority OpinionUploaded byMagaliDoNascimentoCunha
- Laurent Cult Nat Cult OrgUploaded byArpi Komaromi
- fizikUploaded byorangramai
- The Paradox of Punk - Ken SpringUploaded bythoushaltnot
- Artifical intelligence (AI)Uploaded byanand_54
- 1st PUC Maths Model QP 2.pdfUploaded byPrasad C M
- The Perception of Critical Thinking and Problem Solving Skill.pdfUploaded bybunga.bta
- TQMUploaded byArvind Mallik
- 9781464813788.pdfUploaded byAdolfo Gonzalez
- Behold the Word of God.pdfUploaded byRaj
- Power, Politics and the Reinvention of Tradition - Tibet in the Seventeen and Eighteenth CenturiesUploaded byLászló Uherkovich
- Developing an Empirical Basis for Selecting a Strategic-PlanningUploaded byalcides laura
- 8_ Archipelago ConceptUploaded byAzham Reeza
- Alhambra Cigar vs. MojicaUploaded byOdette H. Reyes
- Take Home Exam 1 WEEK 4Uploaded byCynthia Herring Hopwood
- mm1Uploaded byshikhaseth
- Crash FinalUploaded byJimena Trombetta
- La Bella Dame Sans Merci.pdfUploaded bychandanmiet