You are on page 1of 13

FIGURE 9.

1 Strategy for

Building
Model.

Regression

Preliminary checks on data qualitY

Data collection

and preparation

Determine several potentiallY useful subsets of explanatory variables; include known


essential variables

Reduction of number of explanatory variables (for

exploratory
observational studies)

Investigate curvature

and interaction
effects more fullY

Study residuals and other diagnostics

Model refinement
and selection

350

Part

Two

Multiple Linear Regression

ModelValidation
pfuo'iUifitfund
moO"t-iuitaing
Section 9'6'

Modelvalidityreferstothestabilityandreasonablenessoftheregresstoncoe, usaUitity of the regression function'

r -- -^L,^-

ffrcients,rhe

analysis' Validation i ences drawn from the regression p'o"""' S"u"ral methods of assessing

qr,

Su

ical Unit Exam


of the with the completion of this overview T"u:1111111"^t'fl:"'"":: 31*'l fili"-y:'l#r1""1t"':H:^""'n"ffi;'iii""'"J',"''"':'l':":li:'::::::,:i::n""'ff,::":Ti
ho sp

itar surgic ar unit was intere sted A :'#"#; ;il;;;";; ;: ltr: JIT:I # :::Tflnarients rrndersoins a particular ty ofliver operation' t 11nd9t

^titlJ;

il"r:ffi:,tTr'l"StTit"";ffi':1u3'F"t"91'.n1it::l:'1":"-'":t'J:i:3"1:;11il;t"iil,ii# ::fl::li:'$iili;13il'#'ffi ;;;F#;'J':"1::hpatientrecord'theroilowing


the preoperation information was extracted from blood clotting score X1 prognostic index Xz enzyme function test score X3 liver function test score X+ evaluation:

re

Xs v A6
X1

variable
Xg variable
None Moderate
Severe

ars

:
se:

female)

and

Alcohol Use'

Xt

Xe

00 t0 01

TABLE 9.1

chapter

Building the Regression Model I: Model selection and validation

3sl

To illustrate the model-building procedures discussed in this and the next section, we

will

explanatory
h masses of

of relationis possible at this stage of data preparation.

the predictor variables (not shown). These

satorwastherebyalerredtoexaminer",".*"ffi ::""::""?,it#TgJT::*ilTJff,i;
and the correlation matrix were also obtained (not shown). A first-order regression model based on all predictor variables was fitted to serve as a point. A plot of residuals against predicted values for this fitted model is shown

in Figure 9'2a'The plot suggests that both curvature and nonconstant error variance are apparent' In addition, some departure from normality is suggested by the normal probability plot of residuals in Figure 9.2b. To make the distribution of the enor terms more nearly normal and to see if the same transformation would also reduce the apparent curyature, the investigator examined the

:tarting

JRE9.2
e

(a) Residual Plot for y


1

(b) Normal Plot for y


1

minary

000

000

lual
._Surgical
Example.

E
'G
O

500

E f
,a

500

o
0

-500

s00 1000
Predicted value

1500

-500' -3 -2 -1
'

012

Expected value

o6 04
A)

(c) Residual Plot for Iny


0.6 0.4

(d) Normal Plot for InY

'
-o.2 -0.4

Eo
-0.2

/ -"'
t

-0.4

5.5

6.5

3-2

Predicted value

-t012
Expected value

352

Part

Two

Muttiple Linear Regresston

logarithmic transformatio n Y'

:ln

Y' Data for the

tr
probabilitY terms is error the the distribution of "1ehfl";11:

t"fr::TJlrl,:T"1t

and the conelation matrix with the aho obtained a scatter plor matrix

transrbrmedlvariable;tiresearep,","n."o.inFigureg.3.Inaddition,variousscatterand

FIGURE 9.3 JMP Scatter Plot Matrix


and

Multivariate Correlations

LnSurvival Bloodclot Progindex


LnSurvival

Bloodclot
Progindex
Enzyme Liver

Correlation

Matrix when
Response

1.o0oo 0'2462 0'4699 0.2462 1.oo0o 0'0901 0.4699 0.0901 1'0000 0.6539 -0.1496 -0'0236 0.6493 0.5024 0'3690

Enzyme 0.6539 -o.1496 -0.0236 1.0000 o.4164

Liver

0.6493 0.5024 0.3690 0.4164, 1.0000

Variable Is

Y'-Surgical
Unit ExamPle.
8 7.5

'
.flt -

ScatterPlot Matrix

6.5
6

LnSurvival

,:.$

:.i:ri.'

..tii''

5.5
11

Bloodclot
5 3

i:,+':.
l

-{3::r
I

.-.$ t'..
,
.l

,{:
,rt! "

;..

r. . t

90 70 50 30
0 10

'r'.
itrC.

i.
Progindex

t:h

.t

.rir'

it':'J

"
il.

0t 110
|oJ 90

--

f0 70

:1.3$

rl'

,r.i?:
'"^tt
Enryme

"J."".
..3

50 ;0

l0
5
I

,'$'
I

-., f"'I1'
l_

$i:
rr
ro ro 50 70 90

:,

. ..r:;.1

Liver

'.': '.. .

5.566.577.5I 3456789

30 50 70 90

1Z 54 )

'!rlll

Chapter

Builcling the Regression Model

Model Selectiott and

Vulidntiott

:i,53

residual plots were obtained (not shown here). All of these plots indicate that each of rhe predictor variables is linearly assocriated with y', with X3 and Xa showing the highest degrees of association and X 1 the lorvest. 'fhe scatter plot matrix and the correlation matrix furlher show intercorrelations amonf4 the potential predictor variables. In particular, Xa has moderately high pairwise correlations with X1 , X2, and X3. On the basis of these analyses, the investigator concluded to use, at this stage of rhe model-building process, Y' : In I ar; the response variable, to represent the predictor variables in linear terms, and not to inch-Lde any interaction terms. The next stage in the modelbuilding process is to examine whether all of the potential predictor variables are neecled or whether a subset of them is adequate. A rLumber of useful mcasures have been developed to assess the adequacy of the .rarious subsets. We now turn to a discussion of those
measures.

9,3

Criteria for Model Selection


From any set of p - 1 predictors,2p-t alternative rnodels can be constructed. This calL:ulation is based on the fact that each predictor can be either included or excluded from ;he model. For example, the 2a :16 different posrsible subset models that can be formed tiLrm the pool of four X variables in the surgical unit example are listed in Table 9.2. First, thi:re is the regression model with no X vnriables, i,e., the model Y; : flo I e1 . Then there are the regression models with one X variable (.Xt, X2, Xs, Xq), with two X variables (X1 zLnd Xz, Xt and X3, X1 and Xq, Xz and .K3, X2 and Xt, Xz and Xa), and so on.

TAELE 9.2 SSIlp, n;, Rl,p, C p, AIC p, Models--Surgical Unit Example' (1) \fariables irr Model
l',lone

SBC p, and PR.E)SSp Valu,es

for All Possible Regression (6)


Alc
P

(2)
55EP

(3)

(4)

(s)

(7)
sBc

(8)

p
1

R1
P

R:^
9\F

c"
151 ..498

)(t
/\2

z
2 2 2
3 3 3 3

12.808 12.031

/\f
)(1, X2

9.979 7.332 7.409 9.443


5.781

)(t, Xt )(t, Xq
)Kz,

7.299

Xt
X+

4.312
6.622

)\2,

3
3

)\3, Xa

5.130
3.109

'\t,

Xz, Xz
X4
Xz,

4
4 4
4
5

,\1, X2, Xz,

6.570

'\1,
,Y,z',

X3, Xa

4.968
3.614
3.084

,Y.1,

X2, X3, Xa

0.000 0.000 0.061 0.04:i 0.221 0.20(; 0.428 0.41)', 0:422 0.410 0.263 0.23t1 0.549 0.531 0.430 0.408 0.653 0.650 0.483 0.46:' 0.599 0.s84 0.757 0.74:l 0.487 0.45(i 0.612 0,589 0.718 0.701 0.759 0.740

141.164 1 08.556 66.489 67 .715


'102.031

-75.703 -77.079 -87.178


-103^827

43.852
67.972

20.520 57.215
33.504

-1 14.658 -102.067
-130.483

-103.262 -88.162

-107.324

3.39i 58.392 32.932


11.424

-tzt.t t5 -146.161
-105.748

5.000

-120.844 -1 38.023 -144.590

PRESIi;P P -73.714 13.296 -73.101 13.512 -83.200 10.744 -99.849 8.32!"7 -99.284 8.0rlt5 -82.195 11.0(t2 -108.691 6.9tii8 -96.100 8.47',2 -124.516 s.065 -1 0'f .357 7.41',6 6.1 1]l -1 5 .1 46 3.914 38.205 -1 -97.792 7.9("t3 -1 12.888 6.207 -130.067 4.5't7 -134.645 4.0t;9
1

Chapter

Building the Regression Model

1: Model Selection and

Validatiort 355

considered sufficiently helpful to enter the regression model. Since the degrees offreedom associated with MSE vary depending on the number of X variables in the model, and since repeated tests on the same data are undertaken, fixed /* limits for adding or deleting a variable have no precise probabilistic meaning. For this reason, software programs often favor the use of predetermined alimits.

2. Assume X7 is the variable

entered at step 1. The stepwise regression routine now

fits all regression models with two X variables, where X7 is one of the pair. For each such regression model, the r* test statistic corresponding to the newly added predictor X1 is obtained. This is the statistic for testing whether or not p1 :0 when X7 and X1, Ne the variables in the model. The X variable with the largest /* value----or equivalently, the smallest P-value-is the candidate for addition at the second stage. If this t* value exceeds a predetermined level (i.e., the P-value falls below a predetermined level), the second X
variable is added. Otherwise, the program temlnates.

3.

Suppose X3 is added at the second stage. Now the stepwise regression routine examines

whether any of the other X variables already in the model should be dropped. For our illustration, there is at this stage only one other X variable in the model, X7, so that only one /+ test statistic is obtained:

'1

t4

- slhl
-

b,

(e.1e)

these t+ statistics, one for each of the variables The variable for which this /* value is smallest (or last added. the besides the one in model is largest) is the candidate for deletion. If which the P-value equivalently the variable for this t* value falls below--or the P-value sxsesds-4 predetermined limit, the variable is

At later stages, there would be a number of

dropped from the model; otherwise, it is retained.

4. Suppose X7 is retained so that both X3 and X7 are now in the model. The stepwise regression routine now examines which X variable is the next candidate for addition, then examines whether any of the variables already in the model should now be dropped, and n either be added or deleted, at which point the search

algorithm allows an X variable, brought into the model sequently if it is no longer helpful in conjunction with

ter printout for the forward stepwise regression procehe maximum acceptable o limit for adding a variable is 0. 10 and the minimum acceptable c limit for removing a variable is 0.15, as shown at the top of Figure 9.7. I ) i\ a iutoloN{ We now follow through the steps.

i,!

At the start of the stepwise search, no X variable is in the model so that the model to be fitted is IZ;: fo * ei. In step 1, the t* statistics (9.18) and corresponding P-values
are calculated for each potential

l.

X variable,

and the predictor having the smallest P-value

(largest /* value) is chosen to enter the equation. We see that Enzyme (X3) had the largest

366

Part

Two

Multiple Linear Regression

FIGURE9.7
MINITAB

A1Pha-to-Euter:
Response

O'1

Alpha-to-Renove: 0'15
N = 54

Forward
Stepwise

is lnsurviv on 8 predictors' with


5

R'";;i"'StePr234 constanl o;p";


Surgical Unit

'264

4'

3s1

'29r

3 '852

"

n-"i.pf".

Euzyne T-VaLue P-VaIue Proglnde T-Value P-Value nrbulleav T-Val-ue P-Value Bloodclo T-VaIue P-Va1ue s

O'0151 O'O1s4 0'0145 0'0155'''

6'?3 8 ' 19 9 ' 33 II 'O7 / 0'OO0 0'O0O 0'000 0'000"/


0'0141 0'0149 O'O142 '' 5'98 7 '68 8'20 / 0'OO0 0'000 0'000"

0'429 0'353 '/ 5'08 4'57 t 0'000 0'000'


0'073 "
0'000
"'

3'86 ;

R-sq

0'375 o'2gL 0'238 0'2tt

R-Sq(adj) C-D
test statistic:

77 B? 2?" ' 81'60 76'47 ?? 65'01 41'66 5'8 5O'5 18'9 tL7 '4

42'76 66'33

"',

h.
t) : 'r

.015124 ,
.002427

^a

s{b:}
-

TheP.valueforthisteststatisticis0.000'whichfallsbelowthemaximum.acceptable' (X:) is added to the model' : d-to-enter value of 0' 10; hence Enzyme Th" completed' been has I 2. At rhis stage, step ".Y*-t::tj:,tl::,:::"*t":'T:i "Step 1"' rher'j disprays, near the top of rhe corumn labeled -?rulrintout : r.^i'-'f r#;;;;;idriLi rr' *"i. ":nf':&i, resres

:t"il:

criteri
il*,,
utr

r"grrrrio,l*i"*i'';1il;i;;& -;'"notherX
MSR(Xrlx3)

"io'oool' ni,,(41'66), and Cl (rr1'4) ar: 1ite11,' 11",Pi:l:t:l-; and the r* are fitted' o..r variab;

At the bottom

oJ

column l ' a number of variables-selection

fy- ?'^'::'-"::::T*ll;3il."il3-Hi

statistics calculated. They are now:

MSE(h, xk)
Progindex(Xz)hasthehighestf*value,anditsP-value(0.000)fallsbelow0.l0,sothat Xr oow enters the model'

Chapter

Building the Regression Model I: Model Selection and Validation 367

Step 2 in Figure 9.7 summarizes the situation at this point. Enzyme are now in the model, and information about this model is and Progindex (X3 and provided. At this point, a test whether Enzyme (Xl) should be dropped is undertaken, but because the P-value (0.000) corresponding to X3 is not above 0.15, this variable is retained.

3. The column labeled

X)

4. Next, all regression models containing Xz, Xt, and one of the remaining potential X variables are fitted. The appropriate t* statistics now are:
'k -

MSR(Xklxz, h) MSE(X2, Xz, Xr)

The predictor labeled Histheavy (Xa) had the largest rf value (P-value : 0.000) and was next added to the model. 5. The column labeled Step 3 in Figure 9.7 summarizes the situation at this point. X2, X3, and Xs are now in the model. Next, a test is undertaken to determine whether X2 or X3 should be dropped. Since both of the conesponding P-values are less than 0.15, neither predictor is dropped from the model. 6. At step 4 Bloodclot (Xr) is added, and no terms previously included were dropped. The right-most column of Figure 9.7 summarizes the addition of variable X 1 into the model containing variables Xz, Xt, and Xs. Next, a test is undertaken to determine whether either Xz, Xt, or Xs should be dropped. Since all P-values are less than 0.15 (all are 0.000), all
variables are retained.

7. Finally, the stepwise regression routine considers adding one of X+, Xs, Xa, or X7 to the model containing Xr, Xz, X3, and Xs. In each case, the P-values are greater than 0. l0
(not shown); therefore, no additional variables can be added to the model and the search
process is terminated. Thus, the stepwise search algorithrnidentifies (Xt, Xz, Xz, Xi as the "best" subset of X variables. This model also happens to be the model identified by both the SBCp and pRESSp criteria in our previous analyses based on an assessment of "best" subset selection.

Comments
predictor variables that tendencies. Simulation studies have shown that for large pools of uncorrelated large or moderately large of use have been generated to be unconelated with the response variable, that is, it allows liberal; is too that procedure in a cu-to-enter values as the entry criterion results by an automatic produced models hand, other the On model. the too many predictor variables into in o2 being badly resulting underspecified, often are values cv-to-enter small with selection piocedu.e References 9'2 and 9'3)' overestimated and the procedure being too conservative (see, for example, T'n".*imum acceptable d-to-enter value should never be larger than the minimum acceptable value; otherwise, cycling is possible where a variable is continually entered and removed.

1. Thechoiceofc-to-enterandc-to-removevaluesessentiallyrepresentsabalancingofopposing

tSZ.
A

o-to-remove At 3. 1.he order in which variables enter the regression model does not reflect their importance' predicted be it can because stage a latel at Fr.r"r, u variable may enter the model, only to be dropped I well from the other predictors that have been subsequently added'

Other Stepwise Procedures

predictor variables. We Other stepwise procedures are available to find a "best" subset of these' mention two of

410

Part

Two

Multiple Linear Regression

7.3b) tolether (R] large. Exami


Figure

It is interesting to note that

(V1F):

instance Pairwise rrelations

= 105 despite the fact that both

onglY imPle
se

rl'

and

t;' (Y
Xz not

this

Comments
the variance inflation factor to detect Some computer regression programs use the reciprocal of regression model because ofexcesfitted the into ailowed be not should instances where an X variable X variables in the model' Tolerance sively high interdependence between this variable and the other below which the variable is not or .0001, I Ri frequently used are .01, .001, limits for |

1.

/(vIF)k: -

entered into the model.

2. A limitation of

is that they cannot variance inflation factors for detectirig multicollinearities

distinguish between several simultaneous multicollinearities' have.been proposed' These 3. A number of other formal methods for detecting multicollinearity texts such as Refin specialized discussed are and are more complex than variance inflation factors
erences 10.5 and

10'6.

10.6

Sureical Unit ExamPle-Cot!*g9d


In Chapter 9 we developed a regression
Table 9.1). Recall that validation studies in the model containing variables X I , X2, X3,art to demonstrate a more in-depth study of curv other

r the six two-factor interaction e examined' These Plots (not interactions are present and need to be shown) did not suggest that any strong two-variable odel. The absence of anY containing X1, X2, X3' a The P-value of the form and the interaction the model containing both the first-order effects from interaction terms effects is .35, indicating that interaction effects at were generated to check Figure 10.9 containJsome of the additional d on the adequacy of the first-order model:

rther, adde XzX

el containing first-order terms

Yi
where Y,l

fro

flrXr

flzXiz

* ftXit I

lsXis

ei

(10.4s)

In Yi. The following points are worth noting:

Figure 10.9a shows no evidence of serious 1. The residual plot against the fitted values in
departures from the model' studies in Section 9'6 2. Oneof the three candid a$e(b) was negative contained X5 (patient age) as the sign of b5 became in model (9.23), but when th" an added-variable plot to study graphically oositive. We will now use a residual ptot and

Chapter

10

Building the Regression Model II: Diagnostics 411

FIGURE 10.9
Residual and

(a) Residual Plot against Predicted 0.6 0.4


6
q)

(b) Residual Plot against


0.6 0.4

X5

AddedVariable Plots for Surgical

0.2
0

Unit

ExampleRegression Model (10.45).

Po
o

-0.2 -0.4

-u

-z

-0.6

-0.4 -0.6

5.5 6

6.5

75

50 xs

70

Predicted Value

(c) Added-Variable Plot for 0.6

X5

(d) Normal Probability Plot


0.6 0.4 0.2
0

x 0.4 x 0.2 0 x x -0.2


N

-0.2

-0.4

-0.6

-o.4 -0.6

-20 -10
e(X5lX1,

0
X2,

10

-202
Expected Value

\,

X8)

the strength of the marginal relationship between X5 and the response, when Xy, Xz, Xy and Xs are already in the model. Figure 10.9b shows the plot of the residuals for the model containing Xt, Xz, X3, and X3 against X5, the predictor variable not in the model. This plot shows no need to include patient age (Xs) in the model to predict logarithm of survival time. A better view of this marginal relationship is provided by the added-variable plot in Figure 10.9c. The slope coefficient b5 can be seen again to be slightly negative as depicted by the solid line in the added-variable plot. Overall, however, the marginal relationship between X5 and Y' is weak. The P-value of the formal r test (9.18) for dropping X5 from the model containing Xt, Xz, Xz, Xs and Xs is 0.194. In addition, the plot shows that the negative slope is driven largely by one or two outliers----one in the upper left region of the plot, and one in the lower right region. In this way the added-variable plot provides additional support for dropping X5. 3. The normal probability plot of the residuals in Figure 10.9d shows little departure from linearity. The coefficient of correlation between the ordered residuals and their expected values under normality is .982, which is larger than the critical value for significance level .05

in Table 8.6.

412

Part

Two

MuLtiple Linear Regression

Multicollineantywasstudiedbycalculatingthevarianceinflationfactors:
Variable
Xr Xz Y.
8

(vtO*
1

.10 1.02 1.05 1.09

among the four predictor variables As may be seen from these results, multicollinearity not a Problem. regression Figure 10.10 contarns index plots of four kev

is

'.

t""l-'. rigui' 10'10a' the leverage value I h;; in liryle l0'10b' :: plots suggest values in Figure 10.10d. These distances D; in Figure ro.io"., andDFFITSi c for values 10.6 lrsts numerical diagnostic further study of cases 17, 28 and 38. Table 1-5 are the residuals^lit columns in these cases. The measures presented -f:':l* t values h;; i1(t!,tS)':n" leverage the (10.24), in studentized deleted."riJuuri ri (DFFITS)i values in (10'30)' The following distance measures n, in if O':il, and the points about the diagnostics in Table 10'6:

,,J"i;;;;;;';';"

1*,9t::tt:::.1:"l1iT :"]:tff

9""lii *-j

noteworthy

r! regard 17 was identified as outlying with CaselTwasidenttttedasoutlylngwrltrrtrBduLUrNr.vsruvs::;'.-: 1. Case We test formally whether''E deviations' standard three ihun rno." by deleted residual, outlying test procedure' For a familY:""#|;iiKI case 1? is outlying uy t"un, of the Bonfenoni

to its Yvalue accordinS to its

sli:::'',*1'

ti-al2n;n-p-r):t('eee54;4e)'|: ;ffi;i'i:b'iJ'i.""*ntesi;en:54,werequire outlier test indicates that case 22\s116:' < formal the *-3'528' :3i3696 --)rnoe lt17l 3'528.Since l't?l - J'Jw/v -- J')26' -not critical value' and alth an outlier. Still, r17 is very close to the of we may wish to inve to be outlying to any substantial extent'
to remove anY doubts.
cases

dot plots identirv onlv iu'"' 28 and 38 liillrii;.1il;d:;.;"**are outlier identification'

2.with2pln:2(5)154:.l85asaguideforidentifyingoutlyinsXobservation*l as outlvinc 23, 28,32,38, 4;,ind s2 were ideitified """"":ldllg ll j1111'l?"t?5'


as

outlving Here'

we see the value of multivariable their 28' 32' 38' 42' 32' and 52'we consider 3. To determine the infl uence of cases l't' 23' 1? ic fhe' of these measures, i"t"-11it ry:: .oloi,:"ff;:'J'i""1'rio;rt values. According ro each (DFFITSIt=i"o^ttt^i:j:*X.t and 'i1oa mosr influential, with cooti ai,ott" we note that the cook's value' rreedom' oi degrees il"irlJT'.1lXXlilli. ')un 5 rnd 4e thu-s appear: corresponds to the 1lth fercentite' It cases arso do

on:

ffi|",:o}}.iii;J#;iil;;;;, to be overly influential'


not appear

"*T-t*::1""."::i:t:"'"t^::[ir:'i] rhe other outrvins il'clsequentry


17

-Tralr"J,:"rlffiiimu"

,..of^.ur"

itterest v_ pJ--y u[wrvlvve Lrrv inferences I Here, f_LElgr the P: 9f "f making predictions in the for used be to ,oJ"r i, intended vsrvg on all 54 observations was f;I based v4uw ! rrLlsu value each fitted ft"n"", nence, eaclr
percent differences:

., ,-!^-^-^^- ^{;nraracr urrs also conductod interestwasarsoconducretli ontheinferences of fit the regression model because the' are in

y*,','t'*case17isdeIetedinfittingtheregressionmoder.llIG4v9I46v

lP','r,.-

roo Y, 'l
I

414

Part

Two

Multiple Linear Regresstott

is for case 17) is only absolute percent difference (which on the fitted not have such a disproportionate influence be required.

4.Insummary,thediagnosticanalysesidentifiedanumberofpotentialproblems,but enough to require further remedial action' none of these was consrdeld to be serious

Cited
References

l0.l.Atkinson,A.C.Plots,Transformations'andRegression'Oxford:ClarendonPress'1987' ..Diagnostic Value of Residual and Partial Residual Plots,,' 10.2' Mansfield, E. R', unavt. o. Con".ly.
The

American Statistician 41 (1987)' pp' I


P.

l0.3.Cook,R.D...ExploringPartialResidua]PIhnometrics35(1993)'pp.35|_62. Outlier Detection' New York:


10.4. Rousseeuw,

J'' ana I' Vt' Leroy'

Robust

on and

John

10.5.
John WileY

and R' E' Welsch' nriry. New York: John

Reg

entifying Influential Data inRegression'NewYork:'

10.6. Belsley,D -A.ConditiontingDiagnosticst

ColI

& Sons, 1991'

Problems

to.l.Astudentasked:..WhyisitnecessarytoperformdiagnosticchecksofthefitwhenR2is
large?" Comment'

l0.2.Aresearcherstated:..onegoodthingaboutadded-variableplotsisthattheyareextremely usefulforidentifyingmodeladequacyevenwhenthepredictorvariablesarenotproperly
specified in the regression model'" Comment'

l0.3.Astudentsuggested:..Ifextremelyinfluentialoutlyingcasesaredetectedinadataset,simply Comment' discard these cases from the data set"'

l0.4.Describeseveralinformalmethodsthatcanbehelpfulinidentifyingmulticollinearityamong model' the X variables in a multiple regression


I

6'5b' 0.5. Refer to Brand preference Problem

each ofthe predictor variables' Prepare an added-variable plot for regression b. Do your plots in part (a) suggest that the for any ofthe predictor areinappropriate 6.5b ProUlem in function problem 6.5b by separately regressing both c. obtain rhe firted regression function in appropriate fashion' an in X2 ort X 1,unO tt'"n-'"g'"'sing the residuals

a.

relationships

ron

r and

6'9' 10.6. Refer to Grocery retailer Problem

using X1 and X2 only' a. Fit regression modei (6'1) to the data

b.

X2' each of the predictor variables X1 and Prepare an added-variable plot for

c.DoyourplotsinPart(a)Suggestthattheregressionrelationshipsinthefittedfeglessron ofthe predictor variables? Explain' in part (a) are inappropriate for any
function

d,obtainthefittedregressionfunctioninpart(a)byseparatelyregressingbothlandX2on
in an appropriate fashion' X r . and then regres"sing the residuals
6'15c' 10.7. Refer to Patient satisfaction Problem of the predictor variables' each for plot a. Prepare an added-variable

b.Doyourplotsinpart(a)Suggestthattheregressionrelationshipsinthefittedregression predictor variables? Explain'


funcrion in

p.oij"* o.r!.

uiJlnupp.opriateior any of the