Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more ➡
Standard view
Full view
of .
×
0 of .
Results for:
P. 1
Chapter 11

# Chapter 11

Ratings: (0)|Views: 4,851|Likes:

### Availability:

See More
See less

11/30/2012

pdf

text

original

Chap. 11, page 1Math 445
Chapter 11 Model Checking and Refinement
Rainfall dataIn the rainfall data, we ended up leaving out case 28 (Death Valley) because it had a large residual andits altitude was the lowest in the data set. The resulting model is therefore not applicable to such lowaltitude locations. If case 28 had not been unusual, then we would not have been justified in omitting it.
Without #28 (Death Valley)
Coefficients
a
-2.074.525-3.951.001.000725.0002414.6473.012.006.093924.014285.7736.575.000-.431176.059929-.662-7.195.000-.000019.000006-4.620-2.959.007(Constant)Altitude (ft)Latitude (degrees)RainshadowAltitude*LatitudeModel1BStd. ErrorUnstandardizedCoefficientsBetaStandardizedCoefficientstSig.Dependent Variable: Log10(Precipitation)a.
R
2
= .80Since there is an interaction between Altitude and Latitude, interpretation of the coefficients for thesevariables becomes a little complicated. However, we can interpret the effect of the Rainshadow variablein this model.

Chap. 11, page 2Case 28 is an example of an outlier, a case for which the model does not fit well. Outliers have largeresiduals. We are also interested in influential cases, cases whose omission changes the fitted modelsubstantially. Influential cases may not be outliers. Least squares is sensitive to unusual cases and aninfluential case may “pull” the regression plane toward it so much that it does not have a large residual.In simple linear regression, we could often identify influential cases simply from a scatterplot. Inmultiple regression, it may not be possible to see influential cases in pairwise scatterplots and we needadditional tools.
Case-Influence statistics
Leverage:The leverage of a case is based only on the values of the explanatory variables. It measures the distanceof the case from the mean for the explanatory variables (in multidimensional space). For oneexplanatory variable, the leverage is
( )
n X  X  X  X ns X  X nh
ii X ii
1)(1)1(1
222
+=+=
With more than one explanatory variable, the leverage is a measure of distance in higher-dimensionalspace. The distance takes into account the joint variability of the variables – see Display 11.10 on p.316.High-leverage cases are easy to identify visually with only one explanatory variable, but becomeincreasingly difficult to identify visually with more explanatory variables.Leverages are always between 1/
n
and 1. The average of all the leverages in a data set is always
p/n
where
p
is the number of explanatory variables. SPSS computes centered leverages (under the LinearRegression…Save button), even though it calls them simply “leverages.” The centered leverage is
nh
i
/ 1
. Therefore, the centered leverage is between 0 and 1-1/
n
.Leverage measures the
potential
influence of a case. High leverage cases have the potential to changethe least squares fit substantially.

Chap. 11, page 3Studentized residualsWhile the true residuals (what we called the
i
ε
) all have the same standard deviation
σ
in the regressionmodel, the observed residuals
i
e
don’t. Why not?Consider simple linear regression:

True residual )(
10
iii
β  β ε
+=

Observed residual: )ˆˆ(
10
iii
e
β  β
+=
First, we already know that the size of the observed residuals tend to be smaller than the sizes of the trueresiduals. That’s why we divide by
n
-2 when we compute the standard deviation of the observedresiduals to get an estimate of the standard deviation of the true residuals. The reason that the observedresiduals tend to be smaller is that the least squares line is the line which best fits the data so thedeviations from this line will tend to be smaller than the deviations from the true line.What do we mean when we say that the residuals do not all have the same standard deviation? How cana single value have a standard deviation?What we mean is: what is the standard deviation of the residuals at each
i
X
from many simulated setsof data from the linear regression model with a fixed set of
i
X
’s?To carry out this simulation we would follow the following steps. The
i
X
’s remain the same for everysimulation.1.

Generate a set of a set of
i
’s where each
i
is from a normal distribution with mean
i
X
10
β  β
+
and standard deviation
σ
. That gives a set of
n
pairs of values ),(,),,(),,(
2211
nn
X  X  X
.2.

Fit the least squares line3.

Compute the residuals.4.

Repeat steps 1-3 many times with a new set of
i
’s each time.Now look at the distribution of observed residuals for each
i
X
and, in particular, compute the standarddeviation of the observed residuals at each
i
X
. You will find that the standard deviations are differentand that the standard deviation of the residuals for
i
X
’s far from
X
(high leverage values) is smallerthan for
i
X
’s near
X
(low leverage values). In fact, it can be shown that the standard deviation of theresidual at
i
X
is:SD(Residual
i
) = )1(
i
h
σ
where
i
h
is the leverage. This formula applies to any multiple regression model, not just the simplelinear regression model.

## Activity (8)

### Showing

AllMost RecentReviewsAll NotesLikes