6 views

Uploaded by Juan Luis

lecture 06

save

- 28301x Week 1 Slides
- 3360n
- Parameter Estimation for the Lognormal Distribution.pdf
- 1-intro+nt
- Tomasso 2012
- Paper
- Art- Keim_1997- A Technique to Measure Trends in the Frequency of Discrete Random Events
- 640153
- Worksheet Normal Distributions
- Reliability and Stability Assessment of Concrete Gravity Structures (RCSLIDE): Theoretical Manual
- Student Slides Supplement 15
- Statistical Gaussian Model of Image Regions in Stochastic Watershed Segmentation
- Estimation of Parameters2
- Ape Krig
- 13 Normal (1)
- chap07
- 5.0 Stats Chapter 9.8-9.12 Continuous Distributions
- Jean-Pierre Florens, Velayoudom Marimoutou, Anne Peguin-Feissolle Econometric Modeling and Inference Themes in Modern Econometrics 2007
- 7
- Developing Statistical Job-Evaluation Models.pdf
- task time
- 93338-Statistics-Study-Sheet
- 360755_AS_2016081616474635
- Chapter 16
- [BankSoalan] Johor Add Maths P1 2017
- Ch3_examples From Mecha
- Calculation and Modelling of Radar Performance 8 Clutter Modelling and Analysis
- Report
- sdarticle
- 67-1426444406_FinalPublishedVersion.pdf
- The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
- Dispatches from Pluto: Lost and Found in the Mississippi Delta
- Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
- The Unwinding: An Inner History of the New America
- Yes Please
- Sapiens: A Brief History of Humankind
- The Prize: The Epic Quest for Oil, Money & Power
- The Emperor of All Maladies: A Biography of Cancer
- A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
- This Changes Everything: Capitalism vs. The Climate
- Grand Pursuit: The Story of Economic Genius
- Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
- John Adams
- Smart People Should Build Things: How to Restore Our Culture of Achievement, Build a Path for Entrepreneurs, and Create New Jobs in America
- The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
- Rise of ISIS: A Threat We Can't Ignore
- The World Is Flat 3.0: A Brief History of the Twenty-first Century
- The New Confessions of an Economic Hit Man
- Team of Rivals: The Political Genius of Abraham Lincoln
- Bad Feminist: Essays
- Steve Jobs
- Angela's Ashes: A Memoir
- How To Win Friends and Influence People
- The Incarnations: A Novel
- You Too Can Have a Body Like Mine: A Novel
- The Sympathizer: A Novel (Pulitzer Prize for Fiction)
- Extremely Loud and Incredibly Close: A Novel
- Leaving Berlin: A Novel
- The Silver Linings Playbook: A Novel
- The Light Between Oceans: A Novel
- The Flamethrowers: A Novel
- Brooklyn: A Novel
- The First Bad Man: A Novel
- We Are Not Ourselves: A Novel
- The Blazing World: A Novel
- The Rosie Project: A Novel
- Bel Canto
- The Master
- A Man Called Ove: A Novel
- The Love Affairs of Nathaniel P.: A Novel
- Life of Pi
- The Cider House Rules
- A Prayer for Owen Meany: A Novel
- The Perks of Being a Wallflower
- Lovers at the Chameleon Club, Paris 1932: A Novel
- The Bonfire of the Vanities: A Novel
- Beautiful Ruins: A Novel
- The Kitchen House: A Novel
- Interpreter of Maladies
- The Wallcreeper
- The Art of Racing in the Rain: A Novel
- Wolf Hall: A Novel
- A Tree Grows in Brooklyn

You are on page 1of 7

See updates and corrections at http://www.stat.cmu.edu/~cshalizi/mreg/

**Lecture 6: The Method of Maximum Likelihood
**

for Simple Linear Regression

36-401, Fall 2015, Section B

17 September 2015

1 Recapitulation

We introduced the method of maximum likelihood for simple linear regression

in the notes for two lectures ago. Let’s review.

We start with the statistical model, which is the Gaussian-noise simple linear

regression model, defined as follows:

1. The distribution of X is arbitrary (and perhaps X is even non-random).

2. If X = x, then Y = β0 + β1 x + , for some constants (“coefficients”,

“parameters”) β0 and β1 , and some random noise variable .

3. ∼ N (0, σ 2 ), and is independent of X.

4. is independent across observations.

A consequence of these assumptions is that the response variable Y is indepen-

dent across observations, conditional on the predictor X, i.e., Y1 and Y2 are

independent given X1 and X2 (Exercise 1).

As you’ll recall, this is a special case of the simple linear regression model: the

first two assumptions are the same, but we are now assuming much more about

the noise variable : it’s not just mean zero with constant variance, but it has

a particular distribution (Gaussian), and everything we said was uncorrelated

before we now strengthen to independence1 .

Because of these stronger assumptions, the model tells us the conditional pdf

of Y for each x, p(y|X = x; β0 , β1 , σ 2 ). (This notation separates the random

variables from the parameters.) Given any data set (x1 , y1 ), (x2 , y2 ), . . . (xn , yn ),

we can now write down the probability density, under the model, of seeing that

data:

n n

Y Y 1 (yi −(β0 +β1 xi ))2

p(yi |xi ; β0 , β1 , σ 2 ) = √ e− 2σ 2

i=1 i=1 2πσ 2

1 See the notes for lecture 1 for a reminder, with an explicit example, of how uncorrelated

random variables can nonetheless be strongly statistically dependent.

1

say (b0 . n Y L(b0 . s2 ). 2015 . but any guess at them. b0 . For instance. b1 . we do not known the true parameters. b1 . b1 . to work with the log-likelihood. and this in turn gives us a distribution — the sampling distribution — for the estimators. this gives us the following estima- tors: Pn (x − x)(yi − y) cXY ˆ β1 = Pn i i=1 2 = 2 (4) i=1 (xi − x) sX βˆ0 = y − beta ˆ 1x (5) n 1X ˆ2 σ = (yi − (βˆ0 + βˆ1 xi ))2 (6) n i=1 As you will recall. This is a special property of assuming independent Gaussian noise. b1 . Similarly. s2 ) = log p(yi |xi . that we can write βˆ1 and βˆ0 in the form “constant plus sum of noise variables”. What makes the Gaussian noise assumption important is that it gives us an exact conditional distribution for each Yi . It’s just as informa- tive. because our point estimates are just the same as they were from least squares. the estimators for the slope and the intercept exactly match the least squares estimators. σˆ2 is exactly the in-sample mean squared error. After some calculus (see notes for lecture 5). s2 ) (2) i=1 n n 1 X = − log 2π − n log s − 2 (yi − (b0 + b1 xi ))2 (3) 2 2s i=1 In the method of maximum likelihood. a function of the parameter values. gives us a probability density: n n Y Y 1 (yi −(b0 +b1 xi ))2 2 p(yi |xi . s2 ) (1) i=1 n X = log p(yi |xi . we p[ick the parameter values which maximize the likelihood. b0 . 2 Sampling Distributions We may seem not to have gained much from the Gaussian-noise assumption. or. b0 . n X xi − x βˆ1 = β1 + i i=1 ns2X 08:48 Saturday 19th September. we are using the independence of the Yi . and much more convenient.2 In multiplying together the probabilities like this. from the notes from last time. When we see the data. s ) = √ e− 2s2 i=1 i=1 2πs2 This is the likelihood. equivalently. b1 . maximize the log-likelihood. Remember.

When we come to making predictions of new Y ’s. in the Gaussian-noise model. β1 = −2. There- fore. Y will be between l and u with 95% probability”) or full distributional forecasts. in repeated simulations. because the i are independent Gaussians.1 Illustration Now. we have no observations where x = −1. and we can just say σ2 (x − x)2 m(x) ˆ ∼ N β0 + β1 x. Figure 1 provides code which simulates a particular Gaussian-noise linear model: β0 = 5. We will derive these inferential formulas in later lectures. 2. we can just say βˆ1 ∼ N (β1 . we saw that the fitted value at an arbitrary point x. or testing hypotheses — we need to know these sampling distributions. I present a small simulation. The theory above lets us calculate just what the distribution of βˆ1 should be.3 2. 2015 . 1+ n s2X Slightly more complicated manipulation of the i makes it possible to show that σ2 nˆ ∼ χ2n−2 σ2 These are all important. σ 2 = 3. a weighted sum of them is also Gaussian.e. this is an example of using the model to extrapolate beyond the data. these sampling distributions will let us give confidence intervals for the expected values. ˆ (By construction. σ 2 /ns2X ) Again. Since we worked out its mean and variance last time. m(x).) Figure 2 compares the theoretical sampling distributions to what we actually get by repeated simulation. ˆ is a constant plus a weighted sum of the : n 1X xi − x m(x) ˆ = β0 + β1 x + 1 + (x − x) 2 i n i=1 sX Once again. but thereafter held fixed through all the simulations.1 Illustration To make the idea of these sampling distributions more concrete. ˆ as well as give prediction intervals (of the form “when X = 5.. m(x). i. because when we come to doing statistical infer- ence on the parameters — forming confidence intervals. with twenty X’s initially randomly drawn from an exponential distribution. by repeating the experiment. βˆ1 is also Gaussian. the i are all independent Gaussians. and the distribution of m(−1). 08:48 Saturday 19th September.

model=FALSE) { # Make up y by adding Gaussian noise to the linear function y <. it makes sense to build that into the simulator. y=y)) } } Figure 1: Function to simulate a Gaussian-noise simple linear regression model.frame(x=x.function(intercept=beta. we’ll always be estimating a linear model on the simulated values. Since.sq <. noise.3 fixed.sq. slope=beta.5 beta. x = fixed.x.gauss <.\sigma^2) # Inputs: intercept.lin. in this lecture. draw from an exponential n <.variance=sigma.-2 sigma.0. sd=sqrt(noise. vector of x.rnorm(length(x).4 2. intercept + slope*x. 2015 .rexp(n=n) # Simulate from the model Y=\beta_0+\beta_1*x+N(0. but I included a switch to 08:48 Saturday 19th September. variance.1. return sample or estimated # linear model? # Outputs: data frame with columns x and y OR linear model fit to simulated y # regressed on x sim. together with some default parameter values. slope.x <.0 <.1 Illustration # Fix x values for all runs of the simulation.1 <.20 # So we don't have magic #s floating around beta.variance)) # Do we want to fit a model to this simulation and return that # model? Or do we want to just return the simulated values? if (model) { return(lm(y~x)) } else { return(data.

0 −2.freq=FALSE. predict(sim.sample <.gauss(model=TRUE))["x"]) hist(slope.5 −1.8 Density 0.main="") curve(dnorm(x.sample <.0 −0. newdata=data.5 −3.0 4 6 8 10 ^ (− 1) m par(mfrow=c(2. add=TRUE. 08:48 Saturday 19th September.replicate(1e4.frame(x=-1))) hist(pred.x)))).breaks=50.5 −2.x)))). freq=FALSE.1 Illustration 0.-2.lin. 2015 .replicate(1e4.main="") curve(dnorm(x.0 −1.col="blue") Figure 2: Theoretical sampling distributions for βˆ1 and m(−1) ˆ (blue curves) versus the distribution in 104 simulations (black histograms).0+beta. xlab=expression(hat(m)(-1)).1*(-1).4 0.1)) slope.gauss(model=TRUE).lin. coefficients(sim.sq/n)*(1+(-1-mean(fixed. col="blue") pred. breaks=50. add=TRUE.sd=sqrt(3/(n*var(fixed.x))^2/var(fixed.2 0.sample.5 2. mean=beta.4 Density 0.5 ^ β1 0.sample.xlab=expression(hat(beta)[1]).0 −3. sd=sqrt((sigma.

not to hand in. That is. ˆ for the MLE. Let Y1 . therefore. all these wonderful abilities come at a cost. and θ˜ for any other consis- 2 Very roughly: writing θ for the true parameter. Of course. however. as for many other “regular” statistical models. the noise follows a t distribution with ν degrees of freedom (after scaling). then f (U ) and g(V ) are also independent. finance and geology) it is common to en- counter noise whose distribution has much heavier tails than any Gaussian could give us. but it typically does in parametric problems. Before we begin to do those inferences on any particular data set. Exercises To think through or to practice on. brings us into the topics for next week. .g. Xj ). then Yi and Yj are conditionally in- dependent given (Xi . That. in more advanced statistics classes. for instance.6 3 Virtues and Limitations of Maximum Likeli- hood The method of maximum likelihood does not always work. (a) Write down the log-likelihood function. 2. 2015 . Consider. 08:48 Saturday 19th September. Note: Most students find most parts after (a) quite challenging. Hint: If U and V are independent.. . Cram´er (1945). we should really check all those modeling assumptions. . and especially before we begin to make grand claims about the world on the basis of those inferences. . Yn be generated from the Gaussian-noise simple linear re- gression model. Use an explicit formula for the density of the t distribution. If that is wrong. Xn .) For more precise statements. maximum likelihood is asymptotically efficient. . θ h i h i tent estimator. One way to model this is with t distributions. one proves that for such models. For Gaussian-noise linear models. for any functions f and g. meaning that its parameter estimates converge on the truth as quickly as possible2 . with the corresponding values of the predictor variable being X1 . 1. rather than having a Gaussian distribution. Y2 . This is on top of having exact sampling distributions for the estimators. Show that if i 6= j. there are models where it gives poor or even pathological estimates. it actually works very well. In many practical fields (e. and so are the inferential calculations which rely on those sampling distributions. asymptotic efficiency means limn→∞ E nkθˆ − θk2 ≤ limn→∞ E nkθ˜ − θk . however. which is the Gaussian noise assumption. (This way of formulating it takes it for granted that the MSE of estimation goes to zero like 1/n. then so are the sampling distributions I gave above. . the statistical model where Y = β0 +β1 X +. see. with independent of X and independent across observations. and /σ ∼ tν . Indeed. Pitman (1979) or van der Vaart (1998).

This should call your function from part (a). (b) In R. G. Refer to the previous problem. write a function to find the MLE of this model on a given data set. 08:48 Saturday 19th September. and (ii) fitting a t distribution to the residuals. Uppsala: Almqvist and Wiksells. W. (c) In R.) (c) Can you solve for the maximum likelihood estimators of β0 and β1 without knowing σ and ν? If not. write a function which gives an unprincipled but straight- forward initial estimate of the parameters by (i) calculating the slope and intercept using least squares. why not? If you can. How well does the MLE recover the parameters? Does it get better as n grows? As the variance of your x’s increases? How does it compare to your unprincipled estimator? References Cram´er. van der Vaart. 2015 . from an arbitrary starting vector of guesses at the parameters. write a function to calculate the log-likelihood. J. σ (or σ 2 .7 REFERENCES (b) Find the derivatives of this log-likelihood with respect to the four parameters β0 . It should return a data frame with appropriate column names. Mathematical Methods of Statistics.) 3. you may have to express your answer in terms of the gamma function and its derivatives. and fitdistr from the package MASS. (f) Run the output of your simulation function through your MLE func- tion. A. (a) In R. Simplify as much as possible. tak- ing as arguments the four parameters and a vector of x’s. if more convenient) and ν. Pitman. taking as argu- ments a data frame with columns names y and x. and the vector of the four model parameters. β1 . using optim. (1998). and do part (a). since that’s another special function. (e) Write another function which will simulte data from the model. Hint: use the dt function. (1979). do they match the least-squares estimators again? If they don’t match. how do they differ? (d) Can you solve for the MLE of all four parameters at once? (Again. Hint: call lm. (It is legitimate to use derivatives of the gamma function here. England: Cambridge University Press. London: Chapman and Hall. Asymptotic Statistics. Harald (1945). Cambridge. (d) Combine your functions to write a function which takes as its only argument a data frame containing columns called x and y. E. Some Basic Theory for Statistical Inference. and returns the MLE for the model parameters.

- 28301x Week 1 SlidesUploaded byYasine Yaquot
- 3360nUploaded byhaseebzkhan
- Parameter Estimation for the Lognormal Distribution.pdfUploaded byAhmad Rizalul Harbie
- 1-intro+ntUploaded byChen Jianchao
- Tomasso 2012Uploaded byMuflihah Musa S
- PaperUploaded byrmartin101
- Art- Keim_1997- A Technique to Measure Trends in the Frequency of Discrete Random EventsUploaded byseforo
- 640153Uploaded byhubik38
- Worksheet Normal DistributionsUploaded byKrista Parnell
- Reliability and Stability Assessment of Concrete Gravity Structures (RCSLIDE): Theoretical ManualUploaded byArmando Belato Pereira
- Student Slides Supplement 15Uploaded byGanessa Roland
- Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationUploaded byBudi Purnomo
- Estimation of Parameters2Uploaded byNamrata Gulati
- Ape KrigUploaded byMateus Antunes
- 13 Normal (1)Uploaded byWong Yi Kai
- chap07Uploaded byKaziRafi
- 5.0 Stats Chapter 9.8-9.12 Continuous DistributionsUploaded byAaron Dias
- Jean-Pierre Florens, Velayoudom Marimoutou, Anne Peguin-Feissolle Econometric Modeling and Inference Themes in Modern Econometrics 2007Uploaded byRoberto Sarkisian
- 7Uploaded byBrother George
- Developing Statistical Job-Evaluation Models.pdfUploaded byrezasattari
- task timeUploaded bypkj009
- 93338-Statistics-Study-SheetUploaded byShannon McKinley
- 360755_AS_2016081616474635Uploaded byAnggini Pangestu
- Chapter 16Uploaded byeuhsaha
- [BankSoalan] Johor Add Maths P1 2017Uploaded byKhor Han Yi
- Ch3_examples From MechaUploaded byJames Thee
- Calculation and Modelling of Radar Performance 8 Clutter Modelling and AnalysisUploaded bymmhorii
- ReportUploaded byAritra De
- sdarticleUploaded byahmhamed
- 67-1426444406_FinalPublishedVersion.pdfUploaded bySandi Bong