— than others. If we give the points
near that region big weights
,and points elsewhere smaller weights, the regression will be pulled towardsmatching the data in that region.2.
Ordinary least squares is the maximum likeli-hood estimate when the
is IID Gaussian white noise.This means that the variance of
has to be constant, and we measurethe regression curve with the same precision elsewhere. This situation, of constant noise variance, is called
. Often however themagnitude of the noise is not constant, and the data are
.When we have heteroskedasticity, even if each noise term is still Gaussian,ordinary least squares is no longer the maximum likelihood estimate, andso no longer eﬃcient. If however we know the noise variance
, and set
, we get the heteroskedastic MLE, andrecover eﬃciency.To say the same thing slightly diﬀerently, there’s just no way that we canestimate the regression function as accurately where the noise is large aswe can where the noise is small. Trying to give equal attention to all partsof the input space is a waste of time; we should be more concerned aboutﬁtting well where the noise is small, and expect to ﬁt poorly where thenoise is big.3.
Doing something else.
There are a number of other optimization prob-lems which can be transformed into, or approximated by, weighted leastsquares. The most important of these arises from generalized linear mod-els, where the mean response is some nonlinear function of a linear pre-dictor. (Logistic regression is an example.)In the ﬁrst case, we decide on the weights to reﬂect our priorities. In thethird case, the weights come from the optimization problem we’d really ratherbe solving. What about the second case, of heteroskedasticity?2