Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
1Activity
0 of .
Results for:
No results containing your search query
P. 1
Lecture 18

Lecture 18

Ratings: (0)|Views: 37|Likes:
Published by machinelearner

More info:

Published by: machinelearner on Nov 02, 2010
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

11/02/2010

pdf

text

original

 
Extending Linear Regression: Weighted LeastSquares, Heteroskedasticity, Local PolynomialRegression
36-350, Data Mining23 October 2009
Contents
1 Weighted Least Squares
Instead of minimizing the residual sum of squares,
RSS 
(
β 
) =
n
i
=1
(
y
i
x
i
·
β 
)
2
(1)we could minimize the
weighted 
sum of squares,
WSS 
(
β,  w
) =
n
i
=1
w
i
(
y
i
x
i
·
β 
)
2
(2)This includes ordinary least squares as the special case where all the weights
w
i
= 1. We can solve it by the same kind of algebra we used to solve theordinary linear least squares problem. But why would we want to solve it? Forthree reasons.1.
Focusing accuracy.
We may care very strongly about predicting the re-sponse for certain values of the input — ones we expect to see often again,ones where mistakes are especially costly or embarrassing or painful, etc.1
 
— than others. If we give the points
x
i
near that region big weights
w
i
,and points elsewhere smaller weights, the regression will be pulled towardsmatching the data in that region.2.
Discounting imprecision.
Ordinary least squares is the maximum likeli-hood estimate when the
in
=
 
·
β 
+
is IID Gaussian white noise.This means that the variance of 
has to be constant, and we measurethe regression curve with the same precision elsewhere. This situation, of constant noise variance, is called
homoskedasticity
. Often however themagnitude of the noise is not constant, and the data are
heteroskedastic
.When we have heteroskedasticity, even if each noise term is still Gaussian,ordinary least squares is no longer the maximum likelihood estimate, andso no longer efficient. If however we know the noise variance
σ
2
i
at eachmeasurement
i
, and set
w
i
= 1
2
i
, we get the heteroskedastic MLE, andrecover efficiency.To say the same thing slightly differently, there’s just no way that we canestimate the regression function as accurately where the noise is large aswe can where the noise is small. Trying to give equal attention to all partsof the input space is a waste of time; we should be more concerned aboutfitting well where the noise is small, and expect to fit poorly where thenoise is big.3.
Doing something else.
There are a number of other optimization prob-lems which can be transformed into, or approximated by, weighted leastsquares. The most important of these arises from generalized linear mod-els, where the mean response is some nonlinear function of a linear pre-dictor. (Logistic regression is an example.)In the first case, we decide on the weights to reflect our priorities. In thethird case, the weights come from the optimization problem we’d really ratherbe solving. What about the second case, of heteroskedasticity?2
 
-4
-2
0
2
4
    -        1        5
    -        1        0
    -        5
        0
        5
        1        0
        1        5
x
      y
Figure 1: Black line: Linear response function (
y
= 3
2
x
). Grey curve:standard deviation as a function of 
x
(
σ
(
x
) = 1 +
x
2
/
2).
2 Heteroskedasticity
Suppose the noise variance is itself variable. For example, the figure shows asimple linear relationship between the input
and the response
, but also anonlinear relationship between
and Var[
].In this particular case, the ordinary least squares estimate of the regressionline is 2
.
72
1
.
30
x
, with R reporting standard errors in the coefficients of 
±
0
.
52and 0
.
20, respectively. Those are however calculated under the assumption thatthe noise is homoskedastic, which it isn’t. And in fact we can see, pretty much,that there is heteroskedasticity if looking at the scatter-plot didn’t convinceus, we could always plot the residuals against
x
, which we should do anyway.To see whether that makes a difference, let’s re-do this many times with3

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->