This action might not be possible to undo. Are you sure you want to continue?
gression
USINGSTATISTICS Sunflowers Apparel @ 13.1 TYPES REGRESSION MODELS OF 13.2 DETERMINING SIMPLE THE LINEAR REGRESSION EOUATION TheLeastSquares Method Visual Explorations: Exploring Linear Simple Regression Coefficients Predictions Regression in Analysis: Interpolation Versus Extrapolation the Computing I Intercept, andtheSlope, bo, b, I3.3 MEASURES VARIATION OF the Computing Sumof Squares TheCoefficient Determination of Errorof theEstimate Standard I3.4 ASSUMPTIONS 13.5 RESIDUAL ANALYSIS Evaluating Assumptions the 13.6 MEASURING AUTOCORRELATION: THE DURBIN.WATSON STATISTIC Residual to Detect Plots Autocorrelation TheDurbinWatson Statistic 13.7 INFERENCES ABOUTTHESLOPE AND CORRELATION COEFFICIENT t Test theSlope for F Test theSlope for (8,) Confidence IntervalEstimate Slope of the r Test theCorrelation for Coefficient 13.8 ESTIMATION MEAN VALUES AND OF PREDICTION INDIVIDUAL OF VALUES
The Confidence Interval Estimate The Prediction Interval
13.9 PITFALLSIN REGRESSION AND ETHICAL ISSUES EXCEL COMPANION TO CHAPTER 13 El3.l Performing SimpleLinearRegression Analyses E13.2 Creating Scatter PlotsandAddine a Prediction Line El 3.3 Performing Residual Analyses E13.4 Computing DurbinWatson the Statistic E13.5 Estimatins Meanof yand Predictins the )'Values E13.6 Example: Sunflowers Apparel Data
In this chapter,you learn: r To use regressionanalysis to predict the value ofa dependentvariable based on an independentvariable r The meaningof the regression coefficients6n and b, I r I To evaluatethe assumptionsof regressionanalysis and know what to do if the assumptions violated are To make inferencesabout the slope and correlation coefficient To estimate mean values and predict individual values
512
Regression THIRTEENSimple Linear CHAPTER
@ Using Statistics SunflowersApparel
The sales for Sunflowers Apparel, a chain of upscale clothing stores women, have increased during the past 12 years as the chain expanded the number of stores open. Until now, Sunflowers
factors, suchas the availability of sitesbasedon subjective selected
good lease or the perception that a location seemed ideal for an store. As the new director of planning, you need to develop a s approach that will lead to making better decisions during the site
point,you believe thatthe sizeof thestore tion process. a starting As
nificantly contributes to store sales,and you want to use this relati
process. How can you usestatistics that so in the decisionmaking on the ofa proposed storebased thesizeof canforecast annualsales store?
you you learnhow regression analysisenables n t h i s c h a p t e ra n d t h e n e x t two chapters, f model to predict the values of a numerical variable, based on the valueof Idevelop a variables. In regressionanalysis,the variableyou wish to predict is called the dependent The variablesused to make the prediction are called independent variables. In predicting values of the dependentvariable, regressionanalysis also allows you to identiff tvoe of mathematical relationshio that exists between a deoendent and an indeoendent able, to quantify the effect that changes in the independent variable have on the variable, and to identify unusual observations. For example, as the director of planning, Other may wish to predictsalesfor a Sunflowers store,basedon the sizeof the store.
ples include predicting the monthly rent of an apartment, based on its size, and predictr monthly salesof a product in a supermarket,based on the amount of shelf spacedevoted product.
in i This chapter discusses simple linear regression, which a singlenumerical
variable,X, is used to predict the numerical dependentvariable )', such as using the size
multiple storeto predictthe annualsalesof the store.Chapters14 and l5 discuss
vari models, which use several independentvariables to predict a numerical dependent
price,andthe you could usethe amountof advertising For example, expenditures, to shelfspace devoted a productto predictits monthlysales.
13.1
MODELS TYPES OF REGRESSION
ln Section 2.5,youuseda scatterplot (alsoknownasa scatterdiagram)to examinethe
tionship between an X variable on the horizontal axis and a I variable on the verticalaxis. nature of the relationship between two variables can take many forms, ranging from si
functions. simplest extremely complicated mathematical The relationship consists a of in 13.1. line,or linear relationship. example thisrelationship shown Figure An of is
I 3. I : Types Rcgression of Models
5I3
13.1 FIGURE
stra A positive ig htline relationship
LY = "change in Y" AX= "changenX" i
( ( E q u a t i o n 1 3 . l ) r e p r e s e n tts e s t r a i g h t  l i n el i n e a r )m o d e l . h
R SIMPLEINEAR EGRESSION DEL L MO
+ )i: Fo B,{ + e,
wnere Fu: Yintercept for the population Fr t,: { : slope for the population random error in Ifor observation i = dependentvariable (sometimes referred to as the response variable) for observation i
(13.1)
X,: independentvariable (sometimes referred to as the explanatory variable) for observation i
The portion y, 0n + F{,of the simple linear regressionmodel expressed Equation in ( 1 3 . 1 )i s a s t r a i g h tl i n e . T h e s l o p e o f t h e l i n e , 8 , , r e p r e s e n tts e e x p e c t e d h a n g ei n ) ' p e r u n i t h c changein X. It represents mean amount that I changes(eitherpositivelyor negatively)for the a oneunit changein X. The Yintercept, B,,,represents mean value of )'when Xequals 0. the The last componentof the model, €,, represents random error in X for eachobservation, In the l. other words, e, is the vertical distance of the actual value of X, above or below the predicted value of { on the line. The selectionof the proper mathematical model depends the distributionof the X and Y on valueson the scatterplot.[n PanelA of Figure 13.2on page 514, the valuesof /are generally increasing linearly asX increases. This panel is similar to Figure I 3.3 on page 5 15,which illustratesthe positive relationshipbetweenthe squarefootageof the store and the annual salesat branchesof the SunflowersApparel women's clothing store chain. Panel B is an exampleof a negativelinear relationship. X increases, valuesof f are As the generallydecreasing. exampleof this type of relationshipmight be the price of a particular An product and the amount of sales. The data in PanelC show a positive curvilinear relationshipbetweenX and Y. The values of )'increaseas X increases, this increase but tapersoff beyond certainvaluesof X. An example of a positivecurvilinear relationshipmight be the age and maintenance cost of a machine. As a machine gets o1der,the maintenancecost rnay rise rapidly at first, but then level off beyonda certainnumber ofyears. Panel D showsa UshapedrelationshipbetweenX and Y. As X increases, first Igenerat ally decreases; as Xcontinues to increase, but )/not only stopsdecreasing actuallyincreases but above its minimum value. An example of this type of relationship might be the number of errors per hour at a task and the number of hours worked. The number of errors per hour
5I4
CHAPTERTHIRTEEN Simple Linear Regression
FIGURE 13.2 Examples types of of relationships found in scatter olots
PanelA
PanelB
a
PanelD Ushapedcurvilinearrelationship
PanelF No relationship between X and Y
decreases the individual becomes as more proficient at the task,but then it increases certainpoint because offactors suchas fatigue and boredom. PanelE indicatesan exponentialrelationshipbetweenX and IZ.In this case,f very rapidly asX first increases, then it decreases but much lessrapidly asX increases An exampleof an exponentialrelationshipcould be the resalevalue of an automobile age.In the first year,the resalevalue dropsdrasticallyfrom its original price; resalevaluethen decreases much lessrapidly in subsequent years. Finally, PanelF showsa set of datain which thereis very little or no relationship X and Y. High and low valuesof Iappear at eachvalue ofX. In this section,a variety of different models that represent relationship the variableswere briefly examined.Although scatterplots are useful in visually mathematical form of a relationship,more sophisticated statisticalprocedures are determinethe most appropriate model for a set of variables. The rest of this chapter the model usedwhen thereis a linear relationshipbetween variables.
13.2
DETERMINING THE SIMPLELINEARREGRESSION
goal is to forecast In the Using Statistics scenario page 512,the stated on annual new stores, basedon storesize.To examine relationship the the between storesizein and its annualsales, sampleof 14 stores a was selected. Tablel3.l summarizes the these14 stores,which are storedin the file @[!.
I 3.2: Determiningthe Simple Linear Regression Equation
5 15
L E1 3 . 1
Footage
ThousandsSquare of Annual and Sales Millions Dollars) of of a Sample Branches of Apparel
Store I
z
a J
Square Feet (Thousands)
Annual Sales (in Millions of Dollars)
Store
Square Feet (Thousands) l.l 1.5 5.2 4.6
Annual Sales (in Millions of Dollars) 2.7 5.5 2.9 10.'7 7.6 I 1.8 4.1
4 5 6 ,7
t.7 1.6 2.8 5.6 1.3 2.2 1.3
3.7 3.9 6.7 9.5
J.+
5.6 3.7
8 9 10 11 12 13 14
s.8 3.0
plot for the datain TableI 3.I . Observe increasing FigureI 3.3 displays scatter the the rela(Y).As the sizeof the storeincreases, tionshipbetween square feet (,{) andannualsales annual salesincrease approximately a straightline. Thus,you can assume as that a straightline providesa usefulmathematical modelof this relationshio. Now vou needto determine soecific the straightline that is the bestfit to thesedata.
13.3
Excel scatter forthe Sunflowers
data
Scatter Plot for Site Selection
E2.12to create
34567 Squde Fest (000)
The LeastSquaresMethod
model is hypothesizedto represent relationship the In the preceding section,a statistical two square footage andsales, the entirepopulation Sunflowers in of Apparel between variables, Howeveq shownin Table13.1,the dataare from only a randomsampleof stores. as If stores. you canusethe sampleXintercept, andthe certainassumptions valid (seeSection13.4), are bo, population parameters, and B,. Equation sample slope,b,, as estimates the respective of Bo ( 13.2) This straight is these estimates form the simplelinear regression to equation. line uses to oftenreferred asthe prediction line.
EOUATION: THE PREDICTIONLINE SIMPLELINEAR REGRESSION The predictedvalue of I equalsthe Y interceptplus the slopetimes the value of X. Yi=bo+4Xi
(13.2)
5 16
CHAPTER THIRTEENSimple LinearRegressron
where
I; : predictedvalue of I for observation i X,: valueofXfor observation i bo: samplelintercept b, : sampleslope
(13.2)requires determination two regression Equation the of coefficientsbo (the (the sampleslope). )zintercept) b, and The most commonapproach finding bo andb, is to methodof leastsquares. This methodminimizesthe sum of the squared differences the actualvalues({) andthe predicted values(Ii) usingthe simplelinearregression ( is differences equal to [thatis, the predictionline; seeEquation I 3.2)]. This sumof squared
\{r, j=l
Because = bo + \Xi, Yi
 f)'
i=l
+ 2cr, f,)' =t rt, (bo brx,)12
i=l
Because equation two unknowns, this has boandb,, the sumof squared differences depends the sample)zintercept, andthe sample bo, slope,b,. The leastsquares methoddetermines valuesof bo and brthat minimize the sum of squared differences. Any valuesforboand other than thosedetermined the leastsquares by methodresult in a greater sum of squared ferences between actualvalues({) and the predictedvalues )2,.In this book, Mi the Excel is usedto perform the computations involvedin the leastsquares method.For thedata Table13.1,Figure13.4presents resultsfrom MicrosoftExcel.
FIGURE13.4 MicrosoftExcelresults for the Sunflowers Appareldata
See Section E13.1 to create this.
t2 ssE1145;r
0J339
P'*a,lllp Lover
Coeffclenes S'a,nde,lldEnor tsrrr
o.1820 1J280
13.2:Determining Simple the Linear Regression Equation 517 To understand how the resultsare computed" many of the computations involvedare illustratedin Examples 13.3and 13.4on pages 520521 and,526527 Figure13.4,observe .In that b0: 0.9645andbr: 1.6699. (13.2)on page515] for Thus,the prediction line [seeEquation thesedatais
t, = 0.9645+ 1.6699Xi
The slope,b,, is +1.6699. This means that for eachincrease I unit in X, the meanvalueof I of is estimated increase  .6699units. In otherwords,for eachincrease I .0 thousand to by of square feet in the size of the store,the meanannualsalesare estimated increase  .6699millions to by of dollars.Thus, the sloperepresents portion of the annualsalesthat are estimated vary the to accordingto the sizeof the store. The )zintercept,bo, is +0.9645.The f interceptrepresents mean value of Y whenX the equals0. Because squarefootageofthe storecannotbe 0, this Iintercept hasno practical the interpretation. Also, the Iintercept for this exampleis outsidethe rangeof the observed values of the X variable,and thereforeinterpretations the value of bo should be made cautiously. of Figure 13.5displaysthe actualobservations the prediction line. To illustratea situationin and which thereis a direct interpretationfor the I/ intercept,bo,seeExample I 3.I .
13.5
Excelscatter line tand prediction Sunflowers Apparel
Scatter Diagram for Site Selection
y = r.0599t o.96,fs
Seaion El3.2 to create
LE 13.1
TNTERPRETTNG y NTERCEpT, AND THE SLOPE,b1 THE bo, A statisticsprofessorwants to use the number of hours a studentstudiesfor a statisticsfinal exam (X) to predict the final exam score(y).A regressionmodel was fit basedon data collectedfor a classduring the previoussemester, with the following results: 'ii=35.0+3Xi What is the interpretationof the Iintercept, bo, andthe slope,b,? SOLUTION The I intercept bo : 35.0 indicatesthat when the studentdoesnot study for the final exam,the meanfinal examscoreis 35.0.The slopeb, : 3 indicatesthat for eachincrease of one hour in studyingtime, the meanchangein the final exam scoreis predictedto be +3.0. In other words, the final exam score is predicted to increaseby 3 points for each onehour increase studyingtime. in
5IU
L c H A P ' r l r RT I I I R T I T E NS i n r p l c i n e a rl l c s r e s s i o n
Regression VISUAL EXPLORATIONSExploring SimpleLinear Coefficients
l S U s e t h e V i s r  r aE , x p l o r a t i o n s i m p l e L i n e a r R e g r e s s i o n procedure producc a predictionline that is as close as to possible the predictionline defined by the leastsqLrares to solution. Open the fiffiffi addin workb o o k a n d s e l e c tV i s u a l E x p l o r a t i o n s 9 S i m p l e L i n e a r Regression (E,xcel 912003) or Addins ) Visual (Exccl 2001). Erplorations ) Simple Linear Regression ( S c cS e c t i o rE l . 6 t o l e a r na b o u tu s i n ga d d  i n s . ) r When a scatterplot of the SunflowersApparcl data of T a b l e 1 3 . 1 o n p a g e 5 1 5 w i t h a n i n i t i a l p r e d i c t i o nl i n e t a p p e a r s( s h o w n b e l o l v ) , c l i c k t h e s p i n n e r b r  r t t o n so t c l r a n g e h e v a l u e sf o r b , , t h e s l o p eo f t h e p r e d i c t i o nl i n e . and b,,.the f interccptof thc predictionlirre. linc that is ascloseas possible Try to produce prcdiction a to the prcdictionline dcfinedby the leastsquares estimates. using the chart display and thc Differencc fi'om Targct SSE (sce valueas feedback page525 fbr an cxplanation SSE). of you aredonewith this exploration. C'lickFinish when At any time. click Reset to resetthc b, and ir,,values. Help for rrore inforn.ration, Solution to revealthe preor rnethod. diction linc defined by the lcastsquarcs w o r k b o o k a n d s e l e c tV i s u a l E r p l o r a t i o n s ) S i m p l e Linear Regression with your worksheet data (912003) or Addins ) Visual Explorations ) Simple Linear Regressionvith 1'ouruorksheet data l ( 2 0 0 1 ) . I n t h e p l o c e d u r c ' sd i a l o g b o x ( s h o w n b e l o l v ) , e n t e r y o u r I v a r i a b l ec e l l r a n g ea s t h e Y V a r i a b l e C e l l R a n g e a n d y o u r X v a r i a b l cc c l l r a n g c a s t h c X \ h r i a b l e C e l l R a n g e .C l l i c k F i r s t c e l l s i n b o t h r a n g e s c o n t a i n a l a b e l . c n t e r a t i t l c a s t h c T i t l e . a r r dc l i c k O K . W h e n t h e . s c a t t e rp l o t u ' i t h a n i n i t i a l p r e d i c t i o n l i n e a p p e a r s u s e i t h e i n s t r u c t i o n s n t h e f i r s t p a r t o f t h i s s e c t i o nt o t r y t 0 p r o d u c ct h e p r c d i c t i o nl i n c d c f i n c c lb y t h e l c a s t  s q u a r c s nrethoci.
Data CellRarq", !Variable cell X Variable Range: i""'i.
*; *l]
v flrst cells bothranges in contain label a Ortput options
Data UsingYour Own Regression
T o u s e V i s L r a E x p l o r a t i o n s o f i n d a p r e c l i c t i o ni n e f o r l t l
your own data,openth. iilffiffiffi}ffj
addin
Equation 13.2: Determining SimpleLinearRegression the
5 l9
Apparel stores. Return to the Using Statisticsscenarioconcerningthe Sunflowers to the sales. how equation predict meanannual Example 13.2illustrates you usetheprediction
E X A M P L1 3 . 2 E
MEAN,ANNUAL SALEs,BASEDON SOUAREFOOTAGE PREDICTING feet. for line to predictthe meanannualsales a storewith 4,000square Usethe prediction X: valueby substituting 4 (thousands square of the SOLUTION Youcandetermine predicted equation: feet)into the simplelinearregression Yi=0.9645+1.6699Xi + ti = 0.9645 l.6699(4)= 7.644or $7,644,000 feet is $7,644,000. meanannualsales a storewith 4,000square of Thus,the predicted
Predictionsin RegressionAnalysis: Interpolation Versus Extrapolation
When using a regressionmodel for prediction purposes,you need to consider only the relevant range of the independentvariable in making predictions.This relevantrange includes all values from the smallestto the largestXused in developingthe regressionmodel. Hence,when predicting )'for a given value ofX, you can interpolatewithin this relevantrangeof the Xvalues, but you should not extrapolatebeyond the range of X values.When you use the squarefootageto predict annual sales,the square footage (in thousandsofsquare feet) varies from 1.1 to 5.8 (see Table I 3. I on page 5 I 5). Therefore, you should predict annual sales only for stores whose size is between l.l and 5.8 thousandsof squarefeet. Any prediction of annual salesfor storesoutside that the observedrelationshipbetweensalesand store size for store sizesfrom this rangeassumes 1.1 to 5.8 thousandsquarefeet is the sameas for storesoutsidethis range.For example,you cannot extrapolate the linear relationshipbeyond 5,800 squarefeet in Example 13.2.It would be improper to use the prediction line to forecastthe salesfor a new store containing 8,000 square feet. It is quite possiblethat store size has a point of diminishing returns. If that is true, as square beyond 5,800 squarefeet, the effect on salesmight becomesmaller and smaller. footageincreases
Computing the Y Intercept, bo, and the Slope, b,
regression For small data sets, you can use a hand calculator to compute the leastsquares coefficients.Equations(13.3) and (13.4) give the valuesof b,, and b', which minimize
Ittl
i=l
 t,)'= Itt'i=l
(bo+b,x,)12
b1 FORMULA FORTHESLOPE, COMPUTATIONAL
A=
,'
,ssxr
ssx
(13.3)
where
ssx:I(x,  x)',
J_ I
n
520.
LinearRegression Simple CHAPTERTHIRTEEN
bO FORTHE Y INTERCEPT, FORMULA COMPUTATIONAL bo=Y btX where
n
(13.4)
Sv. LJ't
t
v  i = l 
n
n
Sr.
v^Lr" i=l
I
E X A M P L E1 3 . 3
bo, CoMPUTING THE y INTERCEPT, AND THE SLOPE,b1 Appareldata. bo, the Compute I/ intercept, andthe slope,b1,for the Sunflowers mustbecal' (13.3)and (13.4),you seethat five quantities SOLUTION ExaminingEquations n, b, culatedto determine and bo.These are thesamplesir"; !
n
X , , thesum of theX values; X,4. thesum
)
;t
and X 4.,fr. sumof rheX valuesl f ?, thesumof thesquared values; ) Z
l=l tl
to feet is used Apparel data,the numberof square For of the productof X and )2. the Sunflowers sums of the various the predictthe annualsalesin a store.Table 13.2presents computations thatwill be I/ of pf problem, u. ) Y,2 thesum the squared values for , needed the siteselection 13.3. i=Section SS?"in usedto compute
T A B L E1 3 . 2
Computations for the SunflowersApparel Data
Square Feet(X) I 2 3 4 5 6 7 8 9 l0 ll t2 l3 t4 Totals
Annual (Y) Sales
y2 2.89
1.7 1.6 2.8 5.6 1.3 2.2 1.3
l.l
3.2 1.5 5.2 4.6 5.8 3.0 40.9
3.7 3.9 6.7 9.5 3.4 5.6 3.7 2.7 5.5 2.9 10.7 '7.6 I 1.8 4.1 81.8
2.s6
7.84 31.36 r.69 4.84 1.69 l.2r 10.24 2.25 27.04 21.16 33.64 9.00 r57.41
13.69 15.21 44.89 90.25 I 1.56 31.36 13.69 7.29 30.25 8.41 114.49 57.76 139.24 16.81 s94.90
6.29 6.24 18.76 s3.20 4.42 12.32 4.81 2.97 r7.60 4.35
ss.64
34.96 68.44 12.30 302.30
13.2: Determining SimpleLinearRegression the Equation
521
usingEquations (r3.3)and(13.4), cancompute values you the of boand,br:
D1=
,
'
.Ssrry
,ssr
=f,f*,  X)V, l) = *,r, ^ss,Kr L
i=l j=l
2r, 2 x
i=l i=l
n
It
,YSyr= 302.3 (40'9X81'8) t4  23997285 = 302.3 = 63.32715
= .ssf,2r*,  x),=f *?; t . t i=l
@o'D2 = 157.41t4 = 157.41 119.48642 = 37.92358
so that
, r' = 
,
63.3271s 37.923s8 = 1.6699
and
bo=FbrX
!r'
t =d=
n
n14
ttf =5.842857
n14 = 5.842857 (r.6699)(2.92143) bo = 0.9645
)x, ' N =E =09?=2.e2t43
522
CHAPTERTHIRTEENSimpleLinearRegression
Learningthe Basics
13.1 Fitting a straightline to a set of datayields the following predictionline: Yi=2+5Xi a. Interpretthe meaningof the Iintercept, bo. b. Interpretthe meaningof the slope,br. c. Predictthe meanvalue of Y for X : 3, ofXin Problem13.1rangefrom2to25, 13.2 If thevalues shouldyou use this model to predict the mean value of Y whenXequals a.3? b. 3? c. 0? d,.24? 1 3.3 Fitting a straightline to a set of datayields the following predictionline:
Yi = 16 O.5Xi
a. Constructa scatterplot. data,bo:145andbr:7.4. Forthese b. Interpretthe meaningof the slope,6r, in this problem. of c. Predictthe meanweekly sales(in hundreds dollars) for pet food for stores with 8 feet of shelf space pet 13.5 Circulationis the lifeblood of the publishing ness.The larger the salesof a magazine,the more it Recently, circulationgaphas a chargeadvertisers. the between publishers'reportsof magazines' sales and subsequentaudits by the Audit Bureau Circulations.The datain the file@@represent reported and audited newsstandsales(in thousands) 2001for the following l0 magazines: Magazine YM CosmoGirl Rosie Playboy Esquire TbenPeople More Spin Vogue Elle Audited (
a. Interpretthe meaningof the Iintercept, bo. b. Interpretthe meaningof the slope,bt. c. Predictthe meanvalue of Y for X: 6.
62r.0 359.7 530.0 492.1 70.5 567.0 125.5 50.6 353.3 263.6
299.6 207.7 325.0 336.3 48.6 400.3 91.2 39.1 268.6 2t4.3
Applying the Concepts
ffi
rc
Store
13.4 The marketing managerof a large superto marketchain would like to useshelf space predict the salesof pet food. A randomsampleof 12 with the following storesis selected, equalsized results(storedin the file E!E!!E@: Shelf Space(X) (Feet) Weekly Sales(Y) ($)
August6, 2003, Journal, The TheirSales," Wall Street Overstate pp.A1,AI0.
Source: Extracted from M. Rose, "In Fight for Ads, Publishers
a. Constructa scatterplot. For thesedatabo: 26.724andb t : 0.5719. b. Interpretthe meaningof the slope,b1,in this problem. c. Predict the mean audited newsstandsalesfor a salesof 400,000. zine that reportsnewsstand 13.6 The owner of a moving companytypically has managerpredict the total number most experienced labor hours that will be requiredto completean move.This approachhas proved useful in the past,but method would like to be ableto developa more accurate predicting labor hours by using the number of cubic moved.In a preliminary effort to provide a more method"he has collected data for 36 moves in which origin and destination were within the borough Manhattan in New York Citv and in which the travelti was an insignificant portion of the hoursworked.The are storedin the file @!@f[.
I 2
J
4 5 6 8 9 10 ll t2
5 5 5 10 l0 l0 15 l5 15 20 20 20
160 220 r40 190 240 260 230 270 280 260 290 310
13.2: Determiningthe Simple Linear Regression Equation
523
plot. a. Construct scatter a Assuming linear relationship, the leastsquares b. a use method find the regression to coefficients andb,. bo c. Interpret meaning the slope,b,, in this problem. the of d. Predict meanlaborhoursfor movins 500 cubicfeet. the 13.7 A large mailorderhouse believesthat thereis a linearrelationship between weight the of themail it receives the numberof orders and to befilled. It would like to investigate relationshipin the to order predict the numberof orders,basedon the weight ofthemail. Froman operational perspective, knowledge of the number orderswill help in the planningof the orderof fulfillmentprocess. sampleof 25 mail shipments A is that range from 200 to 700 pounds.The results selected (stored the file @[@) are as follows: in Weight ofMail Orders (Pounds) (Thousands) Weight of Mail (Pounds) Orders (Thousands) 13.6 t2.8 16.5 t7.l 15.0 16.2 l5.8
13.9 An agentfor a residential real estate companyin a largecity would like to be ableto predictthe monthly rental cost for apartments, basedon the sizeof the apartment, as defined by squarefootage.A sampleof 25 apartments (storedin the file [[l$) in a particularresidential neighborhood was selected.and the information sathered revealed followins: the
Monthly Size (Square Rent ($) Feet) Apartment Monthly Rent ($)
Apartment
Size (Square Feet)
1 2
J A
216 283 237 203 2s9 374 342 301 365 384 404 426 482
6.1 9.1 7.2 7.5 6.9 I 1.5 10.3 9.5 9.2 10.6 12.5 12.9 14.5
432 409 553 572 506 528 501 628 677 602 630 652
6 7 8 o l0 11 t2
IJ
950 1,600 1,200 1,500 950 l,700 I,650 93s 875 1,150 1,400 1,650 2,300
850
I 45n
1,085
I t1')
t4 l5 l6
II
718 I,485 1,136 726 700 956 1,100 t,285 1,985
18 19 20 21 22
ZJ
z+
25
1,800 1,400 1,450 1,100 l,700 t,200 1,150 1,600 1,650 t,200 800 l,750
t,369 t , t 15 t,225 1,245 1,259 I,150 896 1,361 1,040 755 1,000 1.200
r 9 .0
t9.4 19.1 18.0 20.2
a. Construct scatterplot. a b.Assuming linear relationship, the leastsquares a use method find the regression to coefficients andb,. bo Interpret meaning the slope,b,, in this problem. c. the of the d. Predict mean number of orderswhen the weisht of themail is 500pounds. 13.8 The valueof a sportsfranchise directlyrelatedto is theamount revenue of that a franchisecan generate. The data the file EEEE@represent in the value in 2005 (in millions dollars)and the annualrevenue millions of (in of for dollars) 30 baseballfranchises. you want to Suppose a develop simple linear regression model to predict franvaluebasedon annualrevenue generated. chise plot. a.Construct scatter a b.Usethe leastsquares method to find the regression coefficients boand br. the c. Interpret meaningof bo and b, in this problem. d.Predict meanvalue of a baseballfranchisethat senthe erates $150million of annualrevenue.
plot. a. Construct scatter a b. Use the leastsquares methodto find the regression coefficients boandbr. c. Interpretthe meaning 6o andb, in this problem. of d. Predictthe meanmonthly rent for an apartment that has 1,000square feet. e. Why would it not be appropriate usethe modelto preto dict the monthly rent for apartmentsthat have 500 squarefeet? f. Your friendsJim and Jenniferareconsidering signinga leasefor an apartment this residential in neighborhood. They are trying to decidebetweentwo apartments, one with 1,000square feet for a monthlyrent of $1,2'/5and the other with 1,200squarefeet for a monthly rent of to $1,425.What would you recommend them basedon (a) through(d)? 13.10 The data in the file ftII$EEprovide measurementson the hardness tensilestrength 35 specimens and for (measured of diecast aluminum.It is believedthat hardness in Rockwell E units) can be usedto predicttensilestrength (measured thousands poundsper square in of inch). plot. a. Construct scatter a b. Assuminga linear relationship, the leastsquares use methodto find the regression coefficients andb,. bo c. Interpret meaning the slope,b,, in this problem. the of d. Predictthe meantensilestrength diecast for aluminum that hasa hardness 30 RockwellE units of
524
CHAPTER THIRTEENSimple LinearRegression
13.3
MEASURES VARIATION OF
Whenusingthe leastsquares methodto determine regression the coefficients a setof da for you needto compute threeimportantmeasures variation. of The first measure, total sumr the squares(,S,SZ), a measure variationof the { values is of around their mean,l.In a regress analysis, total variation or total sumofsquares subdivided explained the is into variation a unexplainedvariation. The explained variationor regression sum of squares(SSR)is due the relationship between and Y, andthe unexplainedvariation, or error sum of squan X (^SSf)is due to factorsotherthan the relationship between and Y. Figure 13.6shows X the different measures variation. of
F I G U R E3 . 6 1
M e a s u r e so f v a r i a t i o n Error um s of squares
?t'= ,t',"^ ssr
Yi= bo+ btXi
D2= ,2,(r, ssr
Regression sum of squares
v',)',SSR ,Zr(V,
n^
i j
Computing the Sum of Squares
The regression of squares (SSR) based the difference sum is on between)2,(thepredicted val of )'from the predictionline ) and F (the meanvalueof If . The error sum of squares (SS represents part ofthe variationin Ithat is not explained the regression. is based the by It onth difference between Y,and, Equations i,. (13.6), (13.5), (13.7),and (13.8) define these measu of variation.
MEASURES VARIATIONIN REGRESSION OF The total sumofsquaresis equalto the regression plus the errorsumof sumofsquares squares.
,s,sz: +.lsE ssR
(13.s)
TOTAL SUM OF SOUARES(557) The total sum of squares (SSf is equalto the observed )'valueand  , the meanvalueof /. SSI = Total sum of squares (13.6)
=\{r,f),
n
13.3:Measures ofvariation
525
(55R) REGRESSION OF SOUARES 5UM (S,SR) equal thesumof thesquared Theregression of squares sum is to differences between thepredicted valueof Y andY , themean valueof )'.
variationor regression squares SSR= Explained of (13.7)
=\{v, r)2
i=l
n
(55O ERROR SUM OF SOUARES (SSU) equal thesumof thesquared Theerrorsumof squares is to differences between the observed value Iand thepredicted of value ). of
= ^S,SE Unexplained variationor errorsumof squares (13.8)
= \{r, _ y,),
i=l
n
Figure13.7shows sumof squares the areaof theworksheet the containing MicrosoftExcel results forthe SunflowersApparel data.The total variation, SSZ,is equalto 116.9543. This (,S.SR), amountis subdivided into the sum of squares explained the regression by equalto 105.7476, the sumof squares (SSg),equalto I 1.2067. and unexplained the regression by From (13.5)on page524: Equation S,SZ: SSR+ SSE : 116.9543 105.7 6 + 11.2067 47
',3.7
Excel sum for the rsAppareldata
11 i jRegresion 12_ l3lResldual il'ltotal 16
E13.1to create Section worksheet that contains area.
rf  12 t3
SS frlS 105.7{76 105.7176 111067 0.934 116.95{3
F Sign'ricanceF 113.2335 0.fin0
Coe/ficJsntsSandard Erol 1.66$
18 iSquare Feet
t Stal Pvalue Lower 95o/o I 0.0917 o.1820 0.1569 10.6411 0.fino 1.t200
95o/o
2.1110
2.0118
In a datasetthat hasa largenumberof significantdigits,the results a regression of analysisaresometimes displayed formatknownasscientificnotation. usinga numerical This typeof format is usedto displayvery small or very largevalues. The numberafterthe letterE reprethe numberof digits that the decimalpoint needs be movedto the left (for a negative sents to number)or to the right (for a positivenumber).For example, number3.7431E+02 the means that the decimalpoint shouldbe movedtwo places the right, producing number374.31. to the The number3.'7431E02 means that the decimalpoint shouldbe movedtwo places the left, to producing number0.037431. the When scientificnotationis used, fewersignificantdigits are usuallydisplayedandthe numbers may appear be rounded. to
526
CHAPTER THIRTEENSimple LinearRegressron
The Coefficient of Determination
By themselves, S,SR, SSE,andS,ST"provide information. little However, ratioof the regresthe (SSR) the total sum of squares (SSf) measures proportion variasion sum of squares to the of tion in I/ that is explained the independent by variable in the regression X model.This ratio is ( the 12, called coefficient determination, andis defined Equation 13.9). of in NATION COEFFICIENT DETERMI OF (thatis, The coefficientof determination equalto the regression is sum of squares explained variation) dividedby the total sumofsquares(thatis, total variation).
,2=
Regression of squares sum Totalsumofsquares
,ssR
,s,sz
(13.e)
The coefficient of determination measuresthe proportion of variation in Ithat is explained by the independentvariable X in the regressionmodel. For the Sunflowers Apparel data,with , S , S:R 1 0 5 . 7 4 7 6 S S E : 1 1 . 2 0 6 7a n d , S S I : 1 1 6 . 9 5 4 3 . . .
t05.7476 ^.^.) = 0.9042 t' = 116.9543
Therefore, 90.42% thevariation annual of in is sales explained thevariability thesizeof the by in positive by footage. This larger'2indicates strong linearrelationstore, measured the square as a shipbetween variables fwo because useof a regression the modelhasreduced variability the in predicting annual sales 90.42%. by Only 9.58%of the sample variability annual in is sales due to factors otherthanwhatis accounted by the linearregression for modelthatuses square footage. presents coefficient determination portionof theMicrosoft Figure13.8 the of Excelresults for the Sunflowers Appareldata. FIGURE 3.8 1 Partial Microsoft Excel regression results the for Appareldata Sunflowers
R 4. iklultiple 5 tRSquare 6 ";Adjuered Square R 0.852 svx0.96$4 7 :Standard Error
See SectionE13.1to create the worksheet that contains this area.
E X A M P L E1 3 . 4
COMPUTING THECOEFFICIENT DETERMINAT]ON OF
12, Compute coefficient determination, for the Sunflowers the of Apparel data.
(13.6), (13.7), SOLUTION Youcancompute S,Sl.SSR, SSE, and thataredefined Equations in (s (1 l ) E a n d( 1 3 . 8o n p a g e s 2 4  5 2 5 ,y u s i n g q u a t i o n 1 3 . 1 0 ) , 3 . 1) , a n d( 1 3 . 1 2 ) . 5 b COMPUTATIONALFORMULA FOR S5T
ln
/ \L
= ss?" )tr,  y), =
n
lIv, ' l l.Lt I
\i=l )
I
n
(13.10)
13.3:Measures ofVariation 527
FORMULA SsR FOR COMPUTATIONAL
n Yv
/d'i i=l
= Etl ,ss.rt  Y,' = 4I"r + 1Zx,Y' j*l i=l i=l
(13,r1)
FORMUT.A FOR 558 COMPUTATIONAL
= .e,srItt,  'fj' =4'  h}r,  h}x,n
,=l i=l i=l i=l
(13.12)
Using the summaryresultsfrom Table 13.2 onpage 520,
(n )2
= sszfd,v)'=fr,'+ n 7r' i=r
(81'S)2 = 594.9t4 = 594,9 477.94571 = 116.95429 3 ^ . ^ S S RL V i  Y \ " =
i=l
llnI
= uoZY, +b,\XiYi*+
i=l i=l
n
n
t+l lL)t v , l l I
I
'r2
(sl'8)2 = (0.s64478X81.8) + (1.66e86)( 302.3)t4 = 105.74726
SSE=/(YiYi)"
3 ^a
=fr? b,ir,u,fx,Y,
i=t i=l i=l
 (1.66986X302.3) = 594.9 (0.e64478)(81.8) = 11.2067
Therefore.
=0.9042 ,z 105.74726 116.95429
528
CHAPTERTHIRTEEN Simple Linear Regression
StandardError of the Estimate
method resultsin the line that fits the data with the minimum Although the leastsquares amountof error,unless the observed all datapointsfall on a straight line, the predictionlineis not a perfect predictor.Just as all data valuescannotbe expected be exactly equal to their to to line.An importantstatistic mean,neithercanthey be expected fall exactlyon the prediction the from calledthe standard error of the estimate,measures variabilityof the actual)zvalues the predicted )zvaluesin the sameway that the standarddeviationin Chapter3 measures the variability of eachvaluearoundthe samplemean.In otherwords,the standard error of the estimate is the standard deviationaroundthe predictionline, whereas standard the in deviation Chapter3 is the standard deviationsround the samplemean. Figure 13.5 on page 517 illustrates variability aroundthe predictionline for the the SunflowersApparel data. Observethat althoughmany of the actualvalues of )'fall nearthe prediction line, noneof the valuesareexactlyon the line. The standarderror of the estimate,represented the symbol Sr", is defined in Equation by ( 13 . 1 ) . 3 STANDARD ERROROF THE ESTIMATE
SYX=
,sst
n2
2rt,+f
l=l
n2
(13.13)
where Y,: actualvalue of Y for a givenX, i : predictedvalue of I for a givenX,
^SSZ':error sum of squares : (I3.8) andFigureI3.4 on page5l6,,S,SE I1.2067. FromEquation Thus,
cOYX
_
= 0.9664
This standard error of the estimate, equalto 0.9664millions of dollars(that is, $966,400), is labeledStandard Error in theMicrosoft Excelresultsshownin Figure 13.8on page526.Thestanrepresents measure the variationaroundthe predictionline. It is dard error of the estimate a of measured the same in unitsasthe dependent variable)2. The interpretation the standard of errorof the estimate similar to that of the standard is deviation.Justas the standard deviationmeasure variability aroundthe mean,the standard measures error of the estimate variability aroundthe predictionline. For Sunflowers Apparel,the typical differencebetweenactualannualsales a at storeandthe predicted annualsales usingthe regression equationis approximately $966,400.
Learning the Basics
13,11 How do you interpreta coefficientof 12, determination, equal 0.80? to
: : 13.13 If ^SSR 66 and,S,Sf 88, compute the coefficientof determination, and interpret 12, its meaning.
: : : : 13.14 If .SSE l0 andS.SR 30.compure the 13.12 lf .tSR 36 andSSE 4. determine SSr @q flft@ 12. the of andthencompute coefficient determina lAsitiil coefficient determination.andinterpreti of lAsslsil
tion, rz, and interpretits meaning.
meanlng.
13.4:Assumptions 529 : If ,S,SR 120, why is it impossible for SSZ to received(storedin the file@. Using the resultsofthat problem, a. determinethe coefficientof determination, andinter12, pret its meaning. b. find the standarderror of the estimate. c. How useful do you think this regression model is for predictingthe numberof orders? 13.20 In Problem 13.8 on page 523, you used annual revenuesto predict the value of a baseball franchise (stored in the file !![s@lQ. Using the results of that problem, a. determine coefficientof determination. and interthe r2. pret its meaning. b. determinethe standard error of the estimate. c. How useful do you think this regression model is for predictingthe value of a baseballfranchise? 13.21 In Problem 13.9 on page 523, an agent for a real estate company wanted to predict the monthly rent for apartments, basedon the size of the apartment(stored in the file ft@@. Using the resultsof that problem, a. determinethe coefficientof determination, andinterr2, pret its meaning. b. determinethe standard error of the estimate. c. How useful do you think this regression model is for predictingthe monthly rent? 13.22 In Problem13.10on page523,you usedhardness to predict the tensile strength of diecast aluminum (stored in the file ft@!@). Using the results of that problem, a. determine the coefficient of determination.12. and interpretits meaning. b. find the standard error of the estimate. c. How useful do you think this regression model is for predictingthe tensilesfiengthof diecastaluminum?
I l0?
the Concepts
13.16 In Problem 13.4 on page522, the marketing managerused shelf spacefor pet food to predict weekly sales (stored in the file : For that data, ,S^SR 20,535 and @!s[)
30,025.
the coefficient of determination.12. and its meaning. ine the standard error of the estimate. usefuldo you think this regression model is for sales? ln Problem13.5on page522, you usedreported ine newsstand sales to predict audited sales
in the file @s@).
For that data,
130.301.41 S,SZ: 144.538.64. and ine the coefficient of determination,r2, and lts mearung. ine the standarderror of the estimate. usefuldo you think this regression model is for ins auditedsales? In Problem13.6 on page522, an owner of a movny wantedto predict labor hours, basedon the feetmoved(storedin the file @@@. Using the of that problem, rminethe coefficientof determination. and inter12"
lts meanmg.
ine the standard error of the estimate. useful do you think this regressionmodel is for :tine labor hours? 13.19 In Problem13.7on page 523,you used theweightof mail to predictthe numberof orders
13.4 ASSUMPTIONS
The discussion hypothesis of testingandthe analysisof varianceemphasized importance the of the assumptions the validity of any conclusionsreached.The assumptions to necessary for regression similar to thoseof the analysisof variancebecause are both topics fall in the general categoryof linear models(reference 4). The four assumptionsof regression(known by the acronymLINE) are as follows: . r r . Linearity Independenceoferrors Normality of error Equalvariance
The first assumption, linearity, statesthat the relationshipbetweenvariablesis linear. Relationships betweenvariablesthat are not linear are discussed Chapter15. in The secondassumption, independenceof errors, requiresthat the errors(er)are independent of one another.This assumptionis particularly important when data are collectedover a period of time. In suchsituations, errorsfor a specific time period are sometimes the correlated with thoseof the previoustime period.
530
CHAPTERTHIRTEENSimpleLinearRegression The third assumption, normality, requires that the errors (e,) are normally
each value of X. Like the I test and the ANOVA F' test, regressionanalysisis fairly As from the normality assumption. long as the distribution of the enon againstdepartures aboutpo eachlevel ofXis not extremelydifferent from a normal distribution,inferences are not seriouslvaffected. equal variance or homoscedasticity,requiresthat the variance The fourth assumption, (e,) are constantfor all valuesof X. In other words,the variability of )'valuesis the errors samewhen X is a low value as when X is a high value.The equal varianceassumptic from important when making inferencesabout po and B,. If there are seriousdepartures methods you can use either data transformationsor weighted leastsquares assumption, reference4).
13.5
ANALYSIS RESIDUAL
In analysis was introduced. Sections13.2and 13.3,a In Section13.1,regression Is for Apparel data. approach the Sunflowers using the leastsquares model was developed introducedin Section13.4valid?In the correctmodel for thesedata?Are the assumptions the section,a graphicalapproachcalled residual analysis is usedto evaluate assumptions is model. model selected an appropriate determinewhetherthe regression (I) The residual or estimatederror value,e,, is the differencebetweenthe observed predicted (I,) valuesof the dependent variablefor a given value ofX,. Graphically,a resi on appears a scatterplot as the vertical distancebetweenan observedvalue of )zandthe dictionline. Equation(13.14)definesthe residual.
RESIDUAL
The residual is equal to the difference betweenthe observedvalue of /and the predicted1: valueot'I.
ei=YiYi
(13.14)
Evaluatingthe Assumptions
(known by the of Recall from Section 13.4that the four assumptions regression normality,and equalvariance. LINE) are linearity,independence,
linearity,you plot the residualson the vertical axis againstthe coneLinearity To evaluate variable on the horizontal axis. If the linear modelis spondingX, values of the independent patternin this plot. However,if the linearmodel is for appropriate the data,thereis no apparent there is a relationshipbetweenthe X, valuesand the residuals,e,.You cansee not appropriate, sucha patternin Figure 13.9.PanelA showsa situationin which, althoughthereis an increas the the ing trend in I as X increases, relationshipseemscurvilinearbecause upwardtrend X. This quadratic effect is highlighted in Panel B, where for decreases increasingvalues of X,and e,. By plotting the residuals,the linear trendof.f, there is a clear relationshipbetween a therebyexposingthe lack of fit in the simple linear model.Thus, with I has beenremoved, quadraticmodel is a better fit and should be used in place of the simple linear model.(See models.) of Sectionl5.l for furtherdiscussion fitting quadratic returnto the eval model is appropriate, To determinewhetherthe simple linear regression Apparel data.Figure 13.10providesthe predictedand residualvalue uation ofthe Sunflowers variable(annualsales)computedby Microsoft Excel. of the response
1 3 . 5 : e s i d u A ln a l y s i s 5 3 1 R a
FIGURE 13.9 Studying the appropnateness of the simple linear regression model
o al aa l a oa a a '
a a
oo l o
1o'
FIGURE 3.10 1 Microsoft Excel residual for statistics the Sunflowers Appareldata
Obseruation
Predicted Anmral Sates
Fesidaals
SeeSectionE13.3to create the worksheetthat contains thisarea.
1 2 3 1 5 6 7 I I 10 11 12 13 11
3.803239598{.103239598 3.636253367 0.263746633 5.640088147 1.05991 1853 10.31570263.0.815702635 3.135294672 0.2647053?8 d.638170757 0.961829243 3.1352916720.564705328 2.801322208 s.101322208 6.3{n033074 .o.8r,8033071 .0.569267135 3.469267135 9.64n57708 1.052242n2 8.645840318 1.045840318 10.6{96751 1.150324S2 5.97106061'l1.874060611
(storesize,in To assess Iinearity, residuals plottedagainst independent the are the variable thousands square feet)in Figure13.11. Although of thereis widespread scatter the residual in plot, thereis no apparent patternor relationship between residualsandXi. The residuals the appear be evenlyspread to aboveand below 0 for the differingvaluesofX. You can conclude thatthe linearmodelis appropriate the Sunflowers for Appareldata. FIGUR13.11 E Micosoft Excelplot of residuals against the square footageof a for store the Sunflowers Apparel data
Square Feet Residual Plot
See SectionE2.12 to create this.
Square F6et
532
CHAPTERTHIRTEEN SimpleLinearRegression
Independence You can evaluate assumption independence the errorsby the of of the residuals the order or sequence which the datawere collected.Data collected in in periodsof time sometimes exhibit an autocorrelation effect amongsuccessive observations, theseinstances, thereis a relationshipbetweenconsecutive residuals. Ifthis relationship exi (which violatesthe assumption independence), is apparent the plot of the residuals of it in susthe time in which the datawere collected. You can alsotest for autocorrelation using by DurbinWatson statistic. which is the subiect Section13.6.Because Sunflowers of the datawere collectedduring the sametime period,you do not needto evaluate i the assumption. Normality You can evaluate assumption normality in the errorsby tallying the the of uals into a frequencydistribution and displayingthe resultsin a histogram(see Section For the Sunflowers Apparel data,the residuals havebeentallied into a frequencydistribution Table 13.3. (There are an insufficient number of values.however.to constructa hi You can also evaluate normality assumption comparingthe actualversustheoretical the by ues of the residualsor by constructinga normal probability plot of the residuals(seeSecti 6.3).Figure13.12is a normalprobabilityplot of the residuals the Sunflower for Apparel Residuals 2.25 but lessthan1.75 l.75 but lessthan1.25 1.25 but lessthan0.75 0.75 but lessthan0.25 0.25 but lessthan+0.25 +0.25but lessthan+0.75 +0.75but lessthan+1.25 Frequency
TABLE 13.3 Frequency Distribution '14 of Residual Values for the Sunflowers ApparelData
I 0 3 I 2 3 4 t4
FIGURE13.12 Microsoft Excelnormar probability plot of for the residuals the Appareldata Sunflowers
0 ll
Normal Probability Plot of the Residuals
! o.s
See Section E6.2 to create this.
o E
1 .1.5 2 2.5
0 ZValw
It is difficult to evaluate normality assumption a sampleof only 14 values,regardthe for lessof whetheryou use a histogram,stemandleaf plot, or display,boxandwhisker probability plot. You can seefrom Figure 13.12that the data do not appear departsubstanto tially from a normal distribution.The robustness regression of analysiswith modestdepartures you to concludethat you shouldnot be overly concerned from normality enables aboutdeparturesfrom this normality assumption the Sunflowers in Apparel data.
13.5: ResidualAnalysis 533 Equal Variance You can evaluatethe assumptionof equal variance from a plot of the residualswith X,. For the Sunflowers Apparel data of Figure I 3. I I on page 53I , there do not appearto be major differencesin the variability of the residualsfor differentX, values.Thus, you can concludethat thereis no apparent violation in the assumption ofequal varianceat each level ofX. To examine a casein which the equal variance assumptionis violated, observeFigure 13.13,which is a plot ofthe residuals withX, for a hypotheticalsetof data.In this plot, the variability of the residualsincreases dramaticallyasXincreases,demonstrating lack of homothe geneityin the variances Y,at eachlevel ofX. For thesedata,the equalvarianceassumption of is invalid.
3.13
equal
a a aa
a a a a a
. tl
l1 aa
a a a
a a
.l'. !;33:
ooo !orr
..;j:'iii .. . ! !l].
t t ta:::
3.f ' I
a a
,:'; i:l:
the Basics
resultsbelow provide the Xvalues, residuals, plot from a regression analysis: 13.24 The resultsbelow showtheXvalues, residuals, and a residualplot from a regression analysis:
2.u
1.5 t.0
! o.t t0: *ii; "{.0a ! o.o I l:r iti**:ird g2
,iit..,3.2r.1.0
!rt "!,1._,*:
1.5
0.5
evidence a patternin the residuals? of Explain.
Is thereany evidenceof a patternin the residuals? Explain.
534
CHAPTERTHIRTEENSimpleLinearRegression
Applying the Concepts
13.25 In Problem 13.5 on page522, you usedreported magazine newsstand salesto predict auditedsales. The data arestoredin the file@l$fi!. Performa residualanalysis for thesedata. a. Determinethe adequacy the fit of the model. of b. Evaluatewhetherthe assumptions regression of have beenseriously violated. 13.26 In Problem13.4on page522,themarketing manager usedshelf spacefor pet food to predict weekly sales.The dataarc storedin the file Performa residualanalysisfor thesedata. [!$!!frE a. Determinethe adequacy the fit of the model. of b. Evaluatewhetherthe assumptions regression of have beenseriously violated. 13.27 In Problem13.7on page523,you usedthe weight of mail to predictthe numberof ordersreceived.Performa residualanalysisfor thesedata.The data are storedin the file ftfiEE. Basedon theseresults, a. determinethe adequacy the fit of the model. of b. evaluatewhetherthe assumptions regression of have beenseriously violated. 13.28 In Problem13.6on page522,the ownerof a moving companywantedto predict labor hours basedon the cubic feet moved. Perform a residualanalysisfor these data.The data are storedin the file E@E. Basedon theseresults,
a. determinethe adequacy the fit of the model. of b. evaluatewhether the assumptions regression of have beenseriously violated. 13.29 In Problem 13.9on page 523,an agentfor a real estatecompany wanted to predict the monthly rent for apartments, basedon the sizeof the apartments. Perform a residualanalysisfor thesedata.The data are storedin the file [@. Basedon these results, a. determinethe adequacy the fit of the model. of b. evaluatewhetherthe assumptions regression of have beenseriously violated. 13.30 In Problem13.8on page523,you usedannual revenuesto predict the value ofa baseballfranchise. Thedata are stored in the file EE@. Perform a residual analysisfor thesedata.Basedon theseresults, a. determinethe adequacy the fit of the model. of b. evaluatewhether the assumptions regression of have beenseriously violated. 13.31 In Problem13.10on page523,you usedhardness to predict the tensile strengthof diecastaluminum. The data are stored in the file ftftl!$Q Perform a residual analysis thesedata.Basedon these for results, a. determinethe adequacy the fit of the model. of b. evaluatewhether the assumptions regression of have beenseriously violated.
13.5
MEASURING AUTOCORRELATION: TH E DU RBIN.WATSONSTATISTIC
One of the basic assumptions the regression of model is the independence the errors. of This assumption sometimes is violatedwhen dataarecollectedover sequential time periodsbecausc a residualat any one time period may tend to be similar to residualsat adjacenttime peri This patternin the residualsis called autocorrelation. When a setof datahas substantial a correlation,the validity of a regression model can be in seriousdoubt.
ResidualPlots to Detect Autocorrelation
As mentioned Section13.5,oneway to detectautocorrelation to plot the residuals in is in order.If a positive autocorrelation effect is present,therewill be clustersof residuals with samesign, and you will readily detectan apparentpattern.If negativeautocorrelation exi residualswill tend to jump back and forth from positiveto negativeto positive,and so on. type of pattern is very rarely seenin regression analysis.Thus, the focus of this sectionis positiveautocorrelation. illustratepositiveautocorrelation, To considerthe following The managerof a packagedelivery store wants to predict weekly sales,basedon numberof customers making purchases a period of 15 weeks.In this situation, for data are collected over a period of l5 consecutiveweeks at the same store,you need determine whetherautocorrelation present. is Table I 3.4 presents data(storedin thefi the Figure 13.14illustratesMicrosoft Excel resultsfor thesedata. @EED.
13.6: Measuring Autocorrelation: DurbinWatson The Statistic 535
T A B L E1 3 . 4 Customers and for Sales a Periodof 15Consecutive Weeks
Customers 794 199 831 855 845 844 863 875
Sales (Thousands of Dollars) 9.33 8.26 7.48 9.08 9.83 10.09 11.01 11.49
o
Customers
Sales (Thousands of Dollars) t2.07 t2.55 11.92 10.27 I 1.80 t2.15 9.64
10 lt t2
la IJ
t4 l5
880 905 886 843 904 950 841
FIGURE'13.14 Microsoft Excelresults for package the delivery store dataof Table13.4 t\v
lla
t

See Section E13.1to create this.
!3
11"39010.8762
From Figure 13.14,observethat 12 is 0.6514, indicating that 65.l4oh of the variation in salesis explainedby variation in the number of customers.In addition, the )'intercept, bo, is 16.0322, and the slope, b,, is 0.0308. However,before using this model for predictron!you must undertakeproper analysesofthe residuals.Becausethe data have been collectedover a consecutiveperiod of l5 weeks, in addition to checking the linearity, normality, and equalvarianceassumptions, you must investigatethe independenceoferrors assumption. You can plot the residualsversus time to help you see whether a pattern exists. In Figure 13.15,you can see that the residualstend to fluctuate up and down in a cycfical pattern.This cyclical pattern provides strong cause for concern about the autocorrelation of the residuals and, hence,a violation of the independenceoferrors assumption.
F I G U R1 3 . 1 5 E Microsoft Excelresiduar for plot the package
rielivorv cfnra.]:ia
PackageDelivery Store Sales Analysis Residual Plol
1 ofTable 3.4
See Sectron E13.3to create this.
536
CHAPTERTHIRTEEN SimnleLinearResressron
The DurbinWatson Statistic
The DurbinWatson statistic is used to measure autocorrelation.This statistic measures the correlation between each residual and the residual for the time period immediately preceding the one of interest.Equation(13.15) definesthe DurbinWatsonstatistic.
DURBI NWATSON STATISTIC
f L '  I{ e '  e ,  , ) 2 . ,
(r3.ls)
>"?
i
where e,: residual the time periodI at
To better understand the DurbinWatsonstatistic,D, you can examine Equation (13.15).
n
sr) I n e n u m e r a t o r . ) , l e i  e i _ t )  , representsthe squared difference between two successive H
residuals.summed from the secondvalue to the nth value I h e d e n o m l n a t o r . Lel:.
l=1
n sa)
represent
the sum of the squared residuals.When successiveresiduals are positively autocorrelatedthe value of D approaches0. If the residualsare not correlated,the value of D will be close to 2. (lf there is negative autocorrelation, D will be greater than 2 and could even approach its maxrmum value of 4.) For the package delivery store data, as shown in the Microsoft Excel results of Figure 13.16,the DurbinWatsonstatistic, is 0.8830. D,
FIGURE 13.16
M icrosoft Excel results of the DurbinWatson statisticfor the package delivery store data *83/84
See SectionE13.4to create thts.
You need to determine when the autocorrelation is large enough to make the DurbinWatson statistic,D, fall sufficiently below 2 to conclude that there is significant positive autocorrelation. After computing D, you compare it to the critical values of the DurbinWatsonstatistic found in Table E.10, a portion of which is presentedin Table 13.5.The critical values dependon o(,the significancelevel chosen,n,the sample size, and k, the number of independent variablesin the model (in simple linear resression. : 1). /r
T A B L E1 3 . 5
F i n d i n gC r i t i c a V a l u e s l of the DurbinWatson Statistic
dL
cr: .05
l6
1'7
18
.95 .98 1.02 1.05
t.54 1.54 1.54 1.53
.82 .86 .90 .93
.69 ./4 .78 .82
t.97 1.93 1.90 1.87
.62 2.15 .67 2.10 .7 2.06
13.6: MeasurinsAutocorrelation: The DurbinWatson Statistic
537
In Table 13.5,two valuesare shownfor eachcombination cr (level of significance), of r (sample size),andfr (numberof independent variables the model).The first value,d., reprein sents lowercriticalvalue.If D is belowdr, you conclude the that thereis evidence positive of autocorrelation amongthe residuals. this occurs, leastsquares If the methodusedin this chapter is inappropriate, and you shoulduse alternativemethods(seereference The second 4). value,ds, represents upper critical value of D, abovewhich you would concludethat there the is no evidence positiveautocorrelation amongthe residuals. D is between , andds, lov of If d areunableto arriveat a definiteconclusion. Forthe package deliverystoredata,with one independent variable(f : 1) and l5 values : 0.8830 1.08, < (n: 15),dL: 1.08anddu: 1.36. Because D you conclude there posthat is itive autocorrelation amongthe residuals. The leastsquares regression analysis the datais of inappropriate because the presence significantpositiveautocorrelation of of amongthe residuals.In otherwords,the independenceoferrors is assumption invalid.You needto usealternativeapproaches discussed reference in 4.
Learning Basics the
13.32 The residuals for l0 consecutive time periods as follows: are Time Period Residual Time Period 6 7 8 9 l0 Residual
r1 TI
b. Computethe DurbinWatson statistic. the 0.05 level At of significance, thereevidence positiveautocorreis of lationamongthe residuals? c. Basedon (a) and (b), what conclusion can you reach aboutthe autocorrelation ofthe residuals?
I 2 3 4 5
Applying the Concepts
13.34 In Problem13.4on page522 concerning pet food sales, marketing the manager usedshelf for space pet food to predictweeklysales. a. Is it necessary compute DurbinWatson to the statistic in this case? Explain. b. Underwhatcircumstances it necessary compute is to the DurbinWatson statisticbefore proceeding with the leastsquares methodof regression analysis? 13.35 The owner of a singlefamily home in a suburban county in the northeastern United Stateswould like to develop modelto predictelectricityconsumption his alla in (lights,fans,heat,appliances, soon),based electrichouse and (in on average atmospheric temperature degrees Fahrenheit). Monthly kilowattusage temperature areavailable and data for a periodof 24 consecutive monthsin the file![@f@. a. Assuminga linear relationship, the leastsquares use methodto find the regression coefficients andb,. bo b. Predict the mean kilowatt usage when the average atmospheric temperature 50oFahrenheit. is c. Plot the residuals versus time period. the d. Computethe DurbinWatson statistic. the 0.05 level At of significance, thereevidence positiveautocorreis of lationamongthe residuals? e. Basedon the resultsof (c) and (d), is therereason to question validity of the model? the
+2 +3 +4 +5
r. Plotthe residuals over time. What conclusioncan you reach about patternof the residuals the overtime? b.Based (a), what conclusion you reachaboutthe on can autocorrelation the residuals? of 13.33 The residuals for l5 consecutive time periods as follows: are fimePeriod I 2 3 4 5 6 7 8 Residual Time Period Residual
+4 6 l 5 +2 +5 2 +7
9 l0
ll
t2 l3 t4 l5
+6 3 +l +3 0 4 7
the I Plot residuals over time. What conclusioncan you reach the about patternof the residuals overtime?
538
CHAPTERTHIRTEEN SimpleLinearRegression To use the espresso shot in making alatte, cappuccino, other drinks, the shot must be poured into the beverage ing the separation the heart,body,andcrema.If the shoti of used after the separation occurs,the drink becomes sively bitter and acidic, ruining the final drink. Thus, longer separationtime allows the drinkmaker more time pour the shotandensure thebeverage meet that will tions. An employeeat a coffee shop hypothesized that harder the espresso grounds were tamped down into portafilter before brewing, the longer the separation ti would be. An experimentusing 24 observations was ductedto test this relationship. The independent vari Tampmeasures distance, inches, the in between the groundsand the top ofthe portafilter (that is, the harder tamp, the largerthe distance). The dependent variable is the numberof seconds heart,body,and cremaare the arated(that is. the amountof time after the shot is beforeit mustbe usedfor the customer's beverage). The are storedin the filel$!$$: Shot Tamp Time Shot Tamp
13.35 A mailordercatalogbusiness that sells personal computersupplies,software,and hardwaremaintainsa centralizedwarehousefor the distribution of products ordered.Managementis currently examining the process of distribution from the warehouse and is interestedin studying the factors that affect warehousedistribution costs.Currently,a small handlingfee is addedto the order, regardless the amountof the order.Data havebeen colof lected over the past 24 months, indicating the warehouse distributioncostsand the numberof ordersreceived.They are storedin the file@@. The resultsare as follows: Distribution Cost (Thousandsof Dollars) Number of Orders
Months
I 2
J
4 5 6 7 8 9 l0 ll 12 13 t4 15 l6 t7 l8 l9 20 2l 22 23 24
52.95 7r.66 85.58 63.69 72.8r 68.44 52.46 70,77 82.03 74.39 70.84 s4.08 62.98 72.30 58.99 79.38 94.44 59.74 90.50 93.24 69.33 53.7r 8 9 .8 1 66.80
4,015 3,806 5,309 4,262 4,296 4,097 3,213 4,809 5,237 4,732 4,413 2,921 3,977 4,428 3,964 4,592 5,582 3,450 5,079 5,735 4,269 3,708 5,387 4,161
 2 3 4 s 6 7 8 9 10 11 t2
0.20 0.50 0.50 0.20 0.20 0.50 0.20 0.35 0.50 0.35 0.50 0.50
t4 t4 18 t6 16 13 12 15 9 15 ll t6
13 14 15 16 r7 18 19 20 2t 22 23 24
0.50 0.50 0.3s 0.35 0.20 0.20 0.20 0.20 0.35 0.35 0.35 0.35
t3 19 l9 l7 l8 t5 l6 l8 16 t4 l6
Assuming a linear relationship,use the leastsquares methodto find the regression coefficientsbo and b,. Predict the monthly warehouse distribution costswhen the numberof ordersis 4.500. c. Plot the residuals versusthe time period. d. Computethe DurbinWatsonstatistic.At the 0.05 level ofsignificance,is thereevidence ofpositive autocorrelation amongthe residuals? e. Basedon the resultsof (c) and (d), is there reasonto questionthe validity of the model? 13.37 A freshlybrewedshot of espresso threedistinct has components: heart,body, and crema.The separation the of these threecomponents typically lastsonly l0 to 20 seconds.
Determinethe prediction line, using Time as the dent variableandTampas the independent variable. b. Predictthe meanseparation time for a Tampdistance 0.50inch. c. Plot the residualsversusthe time order of exoeri patterns? tion. Are thereany noticeable d. Computethe DurbinWatsonstatistic.At the 0.05 of significance,is there evidenceof positive lation amongthe residuals? e. Basedon the resultsof (c) and (d), is there reason questionthe validity of the model? 13.38 The owner of a chain of ice cream stores like to study the effect of atmospherictemperature salesduringthe summerseason. sampleof 2l A tive daysis selected, with the resultsstoredin the data
@.
(Hint: Determinewhich are the independent and dentvariables.)
13.7:lnferences AbouttheSlope Correlation and Coefficient 539
Assuminga linear relationship,use the leastsquares method find the regression to coefficients andb,. bo Predict sales storefor a day in which thetemperper the atureis 83"F. Plotthe residuals versus time oeriod. the
d. Compute the DurbinWatson statistic. At the 0.05 level of significance, is there evidence of positive autocorrelation among the residuals? e. Based on the results of (c) and (d), is there reason to question the validity of the model?
13.7 INFERENCES ABOUTTHESLOPE AND CORRELATION COEFFICIENT
In Sections l3.l through13.3, regression usedsolelyfor descriptive was purposes. learned You how the leastsquares methoddetermines regression the coefficients how to predictY for a and given valueof X. In addition,you learned how to computeand interpretthe standard error of the estimate the coefficientof determination. and When residualanalysis, discussed Section13.5,indicates as in that the assumptions a of leastsquares regression model are not seriouslyviolated and that the straightline model is appropriate, canmakeinferences you aboutthe linearrelationship between variables the the in population.
t Testfor the Slope
To determine existence a significantlinearrelationship the of between X and )zvariables, the you testwhetherFr (tltepopulation slope)is equalto 0. The null andalternative hypotheses are as follows: Hot Fr: 0 (There no linearrelationship.) is Hl Fr + 0 (Thereis a linearrelationship.) If you rejectthe null hypothesis, conclude you that thereis evidence a linearrelationship. of Equation (13.16) defines teststatistic. the TESTTNG HypOTHEStSFOR A pOpULATtON SLOPE,01, USTNG A THE t TEST The r statisticequalsthe differencebetweenthe sampleslopeand hypothesized value of the populationslopedivided by the standard error ofthe slope.
r  4Fr
sr,
(13.16)
where Srr _ Svx
ffi
j=l
ssx:> 6i x)2
The test statisticI follows a I distributionwith n  2 desrees freedom. of
3
Returnto the Using Statistics scenario concerning Sunflowers Apparel.To testwhetherthere is a significantlinearrelationship between sizeof the storeandthe annualsales the0.05level the at of significance, referto the MicrosoftExcelworksheet the / testpresented Fizure I 3.17. for in
540
CHAPTERTHIRTEEN SimpleLinearRegression
FIGURE 13.17 Microsoft Excel ttest forthe slope the for Apparel data Sunflowers
D: 16 i tZj lntercept 18 Square Feet
CoefficientsSandard Errcr t Sat Prralae Lawer95% Upper9S/o 0.5262 1.8329 0.9645 0.0917 {.1820 2.1110 1.6699 0.1569 10.6411 0.qpo 1.3280 2.01 18
FromFigure 13.17,
See SectionE13.1to create the worksheet that contains this area.
4=+1.6699
and
n=14
Sa =0.1569
,_
hrF
sn,
_ r.66990:10.6411 0.I 569
Microsoft Excellabels this r statistic Stat(seeFigure13.17). l Usingthe 0.05levelof signif the value / withn  2:12 degrees freedom 2.1788. cance, critical of of is Because 10.6411 Iyou rejectHo (seeFigure13.18). you 2.1188, Usingthepvalue, rejectHo because thepvalu you is approximatelywhichis lessthancr: 0.05.Hence, canconclude thereis a signifi 0 that meanannualsales cantlinearrelationship between andthe sizeof the store.
F I G U R E3 . 1 8 1
Testing a hypothesis about the population slope at the 0.05 level o f s i g n i f i c a n c ew i t h , 12 deoreesof freedom
2.1t788 I
R e g i o no f Rejection Critical Value
0
+2.1788!, tp R e g i o no f Rejection Critical Value
R e g i o no f Nonrejection
F Test for the Slope
As an alternative the I test,you can usean F testto determine to whetherthe slopein simpl linearregression statistically is In 10.4, you used tr distribution tes significant. Section the to ( I 3.  7) definesthe ,Etestfor the slopeas the ratio of the Equation the ratio of two variances. (MSR)dividedby the errorvariance (MSE Sii. variance that is dueto the regression TESTNGA HYPOTHESIS FOR A POPULAT]ON SLOPE,91' USTNG THE FTEST (MSR)dividedby the errormean meansquare The F statistic equalto the regression is square(MSD.

t
MSR MSE
(13.17)
13.7: Inferences About the Slopeand CorrelationCoefficient
541
where
MsR:!q4
L L
MSE:
s,sE
nk1
t: numberof independent variables the regression in model The teststatistic followsan F distribution F with k andn  k l degrees freedom. of
Usinga levelof significance thedecision a, rule is RejectHoif F> Fu. otherwise, not rejectl{n. do TableI 3.6 organizes complete of results the set into an ANOVA table.
13.6 Table
inqthe
ofa Coefficient
Source Regression k
df
Sum of Squares ,SSR
Mean Square (Variance) M,SR= MSE = SSR .
F
MSR MSE
Error Total
nkl n 
S.siE ,S,SZ
'S^St nkl
The completed ANOVA table is also part of the MicrosoftExcel resultsshownin F i g u r e 3 . l 9 . F i g u r el 3 . l g s h o w s t h a t t h e c o m p u t e d F s t a t i s t ilc3s 2 3 3 5 a n d t h e  v a l u e l p li. is approximately 0.
13.19
Excel Ftest Sunflowers
data
ANOVA
ss
Regreeslon Residual
14lTotal
EI3.1to create that contains
1 12 13
MS F 105.7476 105.7476 1132335 11.2M7 0333!' I16.9543
0.{xno
F
Using a level of significance 0.05,from TableE.5, the critical valueof the F distribuof tion,with 1 and12degrees freedom, 4.75(see of is Figure13.20). > Because 113.2335 4.j5 F: or because pvalue: 0.0000< 0.05,you rejectHn andconclude the thatthe sizeof the storeis significantly related annual to sales. Because F teit in Equation the 13.17 page540is equivon alentto the I teston page539,you reachthe same conclusion.
542
CHAPTERTHIRTEEN Simple Linear Regression
13.20 FTGURE Regions rejection of when and nonreiection testingfoisignificance of slooeat the 0.05 level with of significance, 1 and 12 degrees of freedom
 4.75 Regionof Critical Regionof Nonrejection Value Relection
it
ConfidenceInterval Estimateof the Slope (0r)
the As an alternative testingfor the existence a linearrelationship to of between variables, of can constructa confidenceinterval estimate B, and determinewhetherthe (13.18)definesthe confidence i value(8, :0) is includedin the interval.Equation estimate B,. of ESTMATEOF THE SLOPE,B1 CoNFTDENCETNTERVAL by The confidenceinterval estimatefor the slopecan be constructed taking the sample the error slope,b1,and addingand subtracting critical / value multiplied by the standard of the slope. br!tn_256, (13.18)
Fromthe MicrosoftExcelresults Figure13.17on page540, of 4 =1.6699 n =14 Sh = 0.1569
intervalestimate, al2:0.025, andfrom TableE.3,/,, To construct a95ohconfidence Thus, 9 b 1 + t n  2 5 6= 1 . 6 6 9 t ( 2 . 1 7 8 8 X 0 . 1 5 6 9 ) , = 1.6699 0.3419 + 1.3280<Fr<2.0118 you that the population slopeis between 1.3280 Therefore, estimate with 95o/o confidence that thereis a sisnificantlinear 2.0118.Because thesevaluesare above0. vou conclude tionship betweenannualsalesand the size of the store.Had the interval included0, you The con that no significantrelationship existsbetween variables. the haveconcluded of feet,meanannualsalesareestimated intervalindicatesthat for eachincrease 1,000square increase at least$1,328,000 no morethan$2,011,800. by but
t Testfor the CorrelationCoefficient
between two numerical In Section3.5 on page 130,the strength the relationship of coefficient usingthe correlation coefficient,r. You can usethe correlation was measured, Xand L To determinewhetherthereis a statisticallysignificant linear relationshipbetween
About the Slopeand CorrelationCoefficient 13.7: Inferences
543
correlation coefficient,p, is 0. Thus,the null andalterthat so,you hypothesize the population nativehypotheses are Ho: p :0 (no correlation) Hr:p+0(correlation) the of ( for Equation 13.19)definestheteststatistic determining existence a significantcorrelation. OF CORRELATION TESTING FOR THE EXISTENCE
l=
(r3.1e)
where
,: +F
i ,: _,[7f b l < 0 of The test statisticI follows a / distributionwith n  2 degrees freedom. Apparel problem,12: 0 .9042 andb , : +1.6699 (seeFigure I 3.4 on In the Sunflowers for page516).Because btr 0, the correlatiopeqe.ficient annualsalesand storesizeis the : +40.9042 : +0.9509.Testingthe null hypothesis that positivesquare root of P, that is, P / resultsin the following observed statistic: thesetwo variables between thereis no correlation
ifbl>0
r0
1 (o.9so9)2 t42
= 10.641I
you because : l0.64ll > 2.1'788, rejectthe null hypotht Usingthe 0.05 levelof significance, and storesize. between annualsales that thereis evidence ofan association You conclude esis. whetherthe population slope, found when testing is to This / statistic equivalent the / statistic on to F1,is equal zero(seeFigure13.17 page540). confidenceintervalsand the concerning populationslopewere discussed" When inferences intervalfor the a However, developing confidence wereusedinterchangeably. testsof hypothesis because shape the samplingdistributionof the the of correlationcoefficientis morecomplicated statisticr variesfor differentvaluesof the populationcorrelationcoefficient.Methodsfor develin 4. for oping a confidenceintervalestimate the correlationcoefficientarepresented reference
the Basics
You testingthe null hypothesisthat there is no are betweentwo variables,X and )'. From ionship of n = 10.vou determinethatr:0.80.
a. What is the valueof the I teststatistic? what arethe critib. At the o : 0.05 level of significance, cal values? to c. Basedon your answers (a) and (b), what statistical decision shouldyou make?
544
LinearRegression THIRTEENSimple CHAPTER
'13.40 You are testingthe null hypothesis that X there is no relationshipbetweentwo variables, Y. From your sampleof n : 18, you deterand minethatb1:+4.5 and 56, : 1.5. a. What is the value of the r test statistic? what arethe critib. At the cr : 0.05 level of significance, cal values? to c. Basedon your answers (a) and (b), what statistical shouldyou make? decision d. Constructa 95ohconfidenceinterval estimateof the population slope, B,. that 13.41 You are testingthe null hypothesis X there is no relationshipbetweentwo variables, of andL Fromyour sample n:20, you determine : thatSSR 60 and,SSt: 40. a. What is the valueof the F teststatistic? what is the critical b. At the cr: 0.05levelof significance, value? to c. Basedon your answers (a) and (b), what statistical shouldyou make? decision d. Computethe correlationcoefficient by first computing that b, is negative. P andassuming is level of significance, therea significant e. At the 0.05 Xand l? correlation between
13.7on page523.you 13.45 In Problem theweightof mail to predictthenumberof
received. The data are stored in the file
Using the resultsof thatproblem,
of a. at the 0.05 level of significance, is there evidence the weight of mail and linear relationship between number of orders received? b. construct a95oh confidence interval estimateof the ulation slope,B,.
13.45 In Problem13.8on page523,you usedannual The franchise. to enues oredictthe valueofa baseball Using the results of are storedin the file[[[!!@fs. problem, is of a. at the 0.05 level of sienificance. thereevidence linear relationshipbetweenannualrevenueand chisevalue?
b. construct a95o/oconfidence interval estimateof the ulation slope,B,.
13.47 In Problem 13.9on page 523,an agentfor a wantedto predictthe monthlyrentfor company estate The ments, based the sizeof the apartment. dataare on the resultsof thatproblem, Using in the file[S[!.
of a. at the 0.05 level of significance, is there evidence
Applying the Concepts
13.42 In Problem13.4on page522,the marketusedshelf spacefor pet food to preing manager dict weekly sales.The data are storedin the file From the resultsof that problem,bt: 7.4 and fE@ 56, : 1.59. of is a. At the 0.05 level of significance, thereevidence a and sales? between shelfspace linearrelationship confidenceinterval estimateof the b. Constructa 95o/o population slope,8,. 13.43 In Problem13.5on page522,you usedreported The data newsstand salesto predict auditedsales. magazine arestoredinthefile@.Usingtheresultsofthat problem, br:0.5719 and 56, :0.0668. of is a. At the 0.05 level of significance, thereevidence a and audited linear relationship betweenreportedsales sales? confidenceinterval estimateof the b. Constructa 95o/o population slope, B,. 522523,theownerof a 13.44 In Problem13.6on pages moving companywantedto predict labor hours, basedon the numberof cubic feet moved.The dataare storedin the of file@@$. Usingthe results thatproblem, of is a. at the 0.05 level of significance, thereevidence a linear relationshipbetweenthe number of cubic feet movedand labor hours? of intervalestimate thepopa95"/o confidence b. construct ulationslope, 8,.
the linearrelationship between sizeof the apartment the monthly rent?
b. construct a95Yo confidence interval estimateof the ulation slope, B,.
13.48 In Problem13.10on page523,you usedha
to predict the tensile strength of diecast aluminum.
Using the results dataare storedin the file [[[ft$! that problem, is of a. at the 0.05 level of significance, thereevidence
linear relationship between hardness and strensth? b. construct a95"/oconfidence interval estimateof the ulation slope,8,.
by 13.49 The volatility of a stock is often measured the beta value of a stock beta value.You can estimate model,usingthe developing simplelinearregression a centageweekly changein the stock as the dependent
able and the percentage weekly change in a market index
The S&P 500 Indexts a variable. the independent
index to use. For example, if you wanted to estimate beta for IBM, you could use the following model, which sometimes referred to as a market model: (% weekly changein IBM) : 9o * 9, (% weekly change S&P500index)+e
of regression estimate the slopebr is The leastsquares
estimate of the beta value for IBM. A stock with a value of 1.0 tends to move the same as the overall A stock with a beta value of 1.5 tends to move 50% than the overall market. and a stock with a beta value
I 3.7: InferencesAbout the Slooe and Correlation Coefficient
545
to moveonly 60% as much as the overall market. betavaluestend to move in a direcwithnegative that opposite of the overallmarket.The following table
betavalues for some widely held stocks: some
Ticker Symbol T IBM DIS AA LSI
Beta
Company
Logrc
0.80 1.20 1.40 2.26 3.61
: Extracted from finance.yahoo.com, May 3 I, 2006.
mately 12.5%.On the downside,if the sameindex loses 20%, POSCX losesapproximately25o/o. mutual fund ProFundUltraOTC a. Considerthe leveraged "Inv" (UOPIX), whose descriptionis 200% of the performanceof the S&P 500 Index. What is its approximate marketmodel? b. If the NASDAQ gains30% in a yeaq what return do you expectUOPX to have? c. If the NASDAQ loses35% in a year,what return do you expectUOPX to have? d. What type of investorsshouldbe attractedto leveraged funds?What type of investorsshould stay away from thesefunds? 13.51 The data in the file EEE@ representthe iced coffeedrinks caloriesand fat (in grams)of 16ounce at Dunkin'Donutsand Starbucks: Product Calories Fat
interpretthe betavalue. each the five companies, of use the beta value as a euide for Howcaninvestors investins? lndexfundsare mutual funds that try to mimic the of leading indexes, suchas the S&P 500 Index, 100 NASDAQ Index, or the Russell2000 Index.The for in values thesefunds(asdescribed Problem 13.49) 1.0. The estimatedmarket therefore approximately fundsare approximately for these
(%weekly changein index tu"d) : 0.0 + 1.0 (% weekly
in change the index) index funds are designedto magnify the An of maior indexes. article in Mutual Funds "Reachfor Higher Returns," Mutual 0'Shaughnessy, pp. someof the risks July1999, 4449) described associated with thesefunds and savedetails rewards popularleveraged of funds,including some themost in thefollowins table: (TickerSymbol) Small Cap (POSCX) "Inv"Nova Fund Description 125%ofRussell2000Index
Dunkin'DonutsIced MochaSwirl latte (wholemilk) Starbucks CoffeeFrappuccino blended coffee (cream) Dunkin' DonutsCoffeeCoolatta Starbucks IcedCoffeeMochaEspresso (wholemilk andwhippedcream) Starbucks Mocha Frappuccinoblended coffee (whippedcream) Starbucks ChocolateBrownie Frappuccino blended coffee(whippedcream) Frappuccino Blended Starbucks Chocolate Crdme(whippedcream)
240 260 350 350 420
510 530
8.0 3.5 22.0 20.0 16.0 22.0 r9.0
Extractedfrom"Coffee Candyat Dunkin'Donuts Source: as and p. Reports, Starbucks," Consumer June2004, 9. a. Compute and interpret the coefficient of correlation, r. b. At the 0.05 level of significance, is there a significant linear relationship between the calories and fat? 13.52 There are several methods for calculating fuel economy. The following table (contained in the file indicates the mileage as calculated by owners @l!!ls) and by current government standards:
150%ofthe S&P 500Index
(uoPx)
UltraOTC
Double(200%)the NASDAQ 100 Index
market models for these funds are estimated
Vehicle (%weekly in change POSCX) : 0.0 + 1.25(% weekly changein the Russell 2000 Index) (%weekly changein RYNVX) : 0.0 +  .50 (% weekly changein the S&P 500 Index)
Owner 14.3 15.0 27.8 27.9 48.8 16.8 23.7 32.8
JI.J
Government Standards 16.8 17.8 26.2 34.2 47.6 18.3 28.5 33I .
change UOPIX tund): 0.0 + 2.0 (% weekly in weekly change theNASDAQ100Index) in 2000Indexgains10%overa periodof if theRussell mutual fund POSCX gains approxitheleveraged
2005FordF150 2005 ChevroletSilverado 2002HondaAccordLX 2002 HondaCivic 2004 HondaCivic Hybrid 2002 Ford Explorer 2005 ToyotaCamry 2003 ToyotaCorolla 2005 ToyotaPrius
s6.0
546
CHAPTER THIRTEENSimple LinearRegressron
a. Compute and interpret the coefficient of correlation, r. b. At the 0.05 level of significance, is there a significant linear relationship between the mileage as calculated by owners and by current government standards? 13.53 College basketball is big business,with coaches' salaries,revenues,and expensesin millions of dollars. The
13.54 Collegefootballplayers trying out for the NFL the file[@!@Srepresent
given the Wonderlic standardizedintelligence test.The data football players trying out for the NFL and the rates for football players at selected schools (extracted
theaverage Wonderlic score
S. Walkeq "The NFUs Smartest Team," The Wall pp. Journal,September 2005, Wl, Wl0). 30, b. At the 0.05 levelof sienificance. therea sisnifi is
linear relationship betweenthe averageWonderlic of football players trying out for the NFL and the ation rates for football players at selectedschools? c. What conclusions can you reach about the relat between the averageWonderlic score of football trying out for the NFL and the graduation rates for ball players at selectedschools?
datain the file !![!l!$ls$l[f@
represent coaches' the
a. Compute and interpret the coefficient of correlation,r.
salaries and revenues collegebasketball selected for at schoolsin a recentyear(extracted from R. Adams,"Pay for Playoffs," TheWallStreet Journal,March ll12,2006, pp. Pl, P8). a. Compute and interpret coefficientof correlation, the r. b. At the 0.05 level of significance, therea signifiis cant linear relationship between coach'ssalaryand a revenue?
13.8 ESTIMATION MEANVALUES OF AND PREDICTION OF INDIVIDUAL VALUES
This section presentsmethods of making inferences about the mean of )'and predicting ind vidual values of )2.
The Confidence Interval Estimate
In Example13.2on page519,you usedthe prediction line to predictthe valueof )'for a giv X. The meanannualsalesfor stores with 4,000square feet waspredicted tobe1.644 millio of dollars($7,644,000). This estimate, howeveq a point estimate the population is of meanI Chapter8, you studied concept the confidence the of intervalas an estimate the populat of mean.In a similarfashion,Equation I 3.20)definesthe confidenceinterval estimatefor th ( mean response a givenX. for CONFIDENCEINTERVAL ESTIMATE FOR THE MEAN OF Y
Y,t trrsrr^fi
Y, tnrsrr rE,
hi=
V4x=x,3 t, + t,rsvx fi,
(13.20)
,ssx
Yi : predicted valueof {
where = bs + b1X, ,Sr": standard error of the estimate n : samplesize X,: givenvalueofX Vvlx=x, : meanvalueof I whenX  X,
ssx:I (x,x)',
j!
n
13.8: Estimation MeanValues of and Prediction Individual of Values 547
The width of the confidence intervalin Equation (13.20)depends several on factors. a For given level of confidence, increased variationaroundthe predictionline, as measured the by standard error of the estimate, resultsin a wider interval.However, you would expect, as increased samplesizereduces width of the interval.In addition,the width of the interval the alsovariesat differentvalues X. Whenyou predict)'for values X closeto X, the interval of of is narrower thanfor predictions X values for moredistantfrom X. In the Sunflowers Apparelexample, you want to construct 95o/o suppose a confidence intervalestimate the meanannualsales the entirepopulation stores of for of that contain4,000 square (X:4). Usingthe simplelinearregression feet equation, ti =0.9645+1.6699X, = 0.9645 1.6699(4) 7.6439 = + (millionsof dollars) Also,giventhe following: X = 2.9214 S),x= 0.9664
il
SSX= Zr*,i= I
Xl' = 37.9236
F r o mT a b l e . 3 ,t r r : 2 . 1 7 8 8 .T h u s , E
Y,X tnrSrr rfr
where
tti
,t 

l
, (X, x)'
n so that
ssx
* t, zsvx
T
, 6, v)2
,ssx
= 7.6439 (2.1788X0 t .9664) = 7.6439 0.6728 +
SO
(4 2.g2rq2 37.9236
6 . 9 7 1 1 ! F y r  q <8 . 3 1 6 7 Therefore,the 95o/o confidenceinterval estimateis that the meanannualsalesare between and for $6,971,100 $8,316,700 thepopulation stores of with 4,000square feet.
The Prediction Interval
In addition to the need for a confidence interval estimate for the mean value, you often want to predict the responsefor an individual value. Although the form of the prediction intervalis similar to that of the confidenceinterval estimateof Equation (13.20),the prediction interval is predicting an individual value, not estimating a parameter.Equation (13.21) defines the prediction interval for an individual response, Y, at aparticular value,X,, denotedby Yx=x, .
548
CHAPTERTHIRTEEN SimpleLinearRegression
PREDICTION INTERVAL FORAN INDIVIDUAL RESPONSE, Y
J
Yi+t,zSrxll+I+
^
t
s 1  t,rsu^t;a 3 Yy=y, I + to*2sn.[il
(13.21)
Yy*y,is lvhere&r,yr,SWn,a+dX,aredefinedasinEquation,(13.20)onpege546and future value YwhenX=4. of
prediction interval of the annualsalesfor an individual storethat To constructa95%io tains4,000 squarefeet(X:4), you first compute t1. Urittg the predictionline:
fi =0.9645+1.6699X, = 0.9645 1.6699(4) + = 7.6439 (millions dollars) of
Also, given the following:
X _ 2.9214
n
SYX= 0.9664
SSX =
\rx,x)'=37.e236
From Table tn: 2.1788. E.3, Thus,
f, :'t,rsr*[1
where
n
2<', x)'
;l
n
so that
'ti * tnzsvx
,r,(xix)'
n SSX
7.6439 (2.1788X0 I .9664) t + ! + 7.6439 !2.2104
so
(4  2.s q2
37.9236
t4
5.4335 Yr_+<9.8543 3 Therefore,with 95o/o confidence,you predict that the annualsalesfor an individual store 4,000square feetis between and $5,433,500 $9,854,300.
13.8:Estimation Mean Values Prediction Individual of and of Values 549 Figure 13.21is a Microsoft Excel worksheetthat illustratesthe confidenceinterval estimateandthe predictioninterval for the Sunflowers Apparelproblem.If you comparethe results of the confidenceinterval estimateand the prediction interval, you seethat the width of the prediction interval for an individual storeis much wider than the confidenceinterval estimate for the mean.Rememberthat there is much more variation in predicting an individual value than in estimatinga meanvalue.
13,21
Excel interval and prediction for the
Apparel
DarrCopylF2 Bi 2 nwF 85, Bl D6c.ItlF3 DmcopylF{ trrn rrgrcdon rerh.t t/80 + {Bf Btll^2nn 'DrirCofylF 810'813'sARTFtal 815  818 815 r 8il Bl0"813'SQRTfi + Bl{ Bl5  8?3 815 r fiB
c.ll Bl
E13.5 create to
the Basics
13.55 Based a sample n:20, the leaston of methodwas used to developthe followsquares * 3X,.In addition, ing prediction line: ,t lt
Applying the Concepts
13.57 In Problem 13.5 on page 522,you usedreported salesto predict auditedsalesof magazines. The data are in the file@!s@. For thesedataSr*:42.186 stored and. 0.108whenX: 400. h,: a. Constructa 95ohconfidenceinterval estimateof the meanauditedsalesfor magazines that report newsstand sales 400.000. of b. Constructa95Yoprediction interval of the auditedsales for an individualmagazinethat reportsnewsstand sales of400.000. c. Explain the differencein the resultsin (a) and (b).
= Syx 1.0 X = 2
Z<*,
i=l
 X)2 =20
a 95o/o confidenceinterval estimateof the response X:2. mean for prediction interval of an individual a 95o/o
forX:2.
13.56 Basedon a sample n:20, the leastof methodwas usedto developthe followsquares ing predictionline: Yi : 5 + 3X,.ln addition,
=l.o X=2 fr",x)2=zo
a 95o/o confidenceinterval estimateof the meanresponse forX:4. prediction interval of an individual a 95o/o forX: 4. theresults (a) and(b) with thoseof Problem of (a) and(b). Which interval is wider? Why?
ffi ffi
13.58 In Problem 13.4 on page522, the marketing managerused shelf spacefor pet food to predict weekly sales.The data are stored in the file [@!![!. For these dataSr*: 30.81 and h i : 0 . 1 3 7 3w h e n x : 8 . confidenceinterval estimateof the a. Constructa 95o/o meanweekly salesfor all storesthat have8 feet of shelf spacefor pet food. prediction interval of the weekly sales b. Constructa 95o/o of an individual store that has 8 feet of shelf spacefor pet food. c. Explain the differencein the resultsin (a) and (b).
550
CHAPTERTHIRTEEN Linear Simple Regression prediction interval of the b. Construct a 95o/o rental of an individual apartment that is 1,000 feet in size. c. Explain the differencein the resultsin (a) and (b), 13.62 In Problem 13.8 on page 523, you predicted value of a baseballfranchise. basedon current The dataare storedin the file!![$@@. a. Constructa 95o/o confidenceinterval estimateof meanvalue of all baseballfranchises that generate $ million of annualrevenue. prediction interval of the value b. Construct a 95o/o individual baseballfranchisethat senerates $150 lion ofannualrevenue. c. Explain the differencein the resultsin (a) and (b). 13.63 In Problem13.10on page523,you used to predict the tensile strengthof diecastaluminum. dataare storedin the file@[@. a. Constructa 95o/o confidenceinterval estimate of meantensile strengthfor all specimens with a of 30 RockwellE units. b. Construct a 95Yo prediction interval of the strengthfor an individual specimenthat has a of 30 Rockwell units. E c. Explain the differencein the resultsin (a) and (b).
13.59 In Problem13.7on page523,you usedthe weight of mail to predict the number of ordersreceived.The data are storedin the file@[!. a. Constructa 95o/o confidenceinterval estimateof the meannumberof ordersreceivedfor all packages with a weightof500 pounds. predictioninterval of the number of b. Constructa 95o/o ordersreceivedfor an individual packagewith a weight of500 pounds. c. Explain the differencein the resultsin (a) and (b). 13.50 In Problem13.6on page522,the ownerof a moving companywantedto predict labor hours basedon the numberof cubic feet moved.The dataare storedin the file
@.
a. Constructa 95ohconfidenceinterval estimateof the meanlabor hoursfor all movesof 500 cubic feet. predictioninterval of the labor hoursof b. Constructa95%o an individual movethat has 500 cubic feet. c. Explain the differencein the resultsin (a) and (b). ',3.6', In Problem13.9on page 523,an agentfor a real estatecompanywanted to predict the monthly rent for basedon the size of the apartment.The data apartments, are storedin the file [!@ a. Constructa 95o/o confidenceinterval estimateof the mean monthly rental for all apartmentsthat are 1,000 squarefeet in size.
13.9 PITFALLS REGRESSION ETHICAL IN AND IsSUEs
Someof the pitfalls involved in using regression analysisare as follows: . I r , I r Lacking an awareness the assumptions leastsquares of of regression Not knowing how to evaluate assumptions leastsquares the of regression Not knowing what the alternatives leastsquares to regression if a particularassu are is violated Using a regression model without knowledgeof the subjectmatter Extrapolatingoutsidethe relevantrange Concludingthat a significant relationshipidentified in an observational study is due causeandeffect relationship
The widespreadavailability of spreadsheet statistical softwarehas made and analysismuch more feasible.However,for many users,this enhanced availability of has not been accompanied an understanding by ofhow to use regression analysis Someone who is not familiar with either the assumptions regression how to evaluate or of assumptions cannotbe expected know what the alternatives leastsquares to to regression particularassumption violated. a is ThedatainTablel3.7(storedinthefile@illustratetheimportanceof scatterplots and residualanalysisto go beyondthe basic numbercrunchingof computing Iintercept,the slope.and12.
13.9: Pitfallsin Regression EthicalIssues )) I and
13.7
I Sets Artificia of l0 t4 5 8 9 t2 4 7 ll l3 6
Data SetA
Data Set B
Data Set C X
Data Set D
8.04 9.96 5.68 6.95 8.81 10.84 4.26 4.82 8.33 7.58 7.24
l0 t4 5 8 9 12 4 7 ll l3 6
9.r4
8 .1 0 4.74 8.14 8.',77 9.13 3.10 7.26 9.26 8.74 6.13
10 14 5 8 9 12 4 7 ll 13 6
7.46 8.84 5.73 6.77 7.11
8.r5
5.39 6.42
r 7.8
12.74 6.08
8 8 8 8 8 8 8 l9 8 8 8
6.58 5.16 I .11 8.84 8.47 7.04 5.25 t2.50 5.56 7.91 6.89
Source:Extracted.fiom J. Anscombe,"Graphsin StatisticalAnalysrs," E American Statistician, Vol.27 (1973), pp. l721. Anscombe (reference 1) showed that all four data sets given in Table 13.7 have the follow
ing identicalresults:
Yi
= 3.0+ 0.5X;
Svx = 1'23'7 S a , = 0 . 11 8 12 = 0.667
= = SSR Explainedvariation It1
j=l n
 f )2 = 27.5t
f)2 = 13.76
SSE = Unxplained variation= \{v,
._I
SSZ = Total variation=
t
l=l
(,  y 12= 41.27
Thus, with respect to these statistics associatedwith a simple linear regression analysis, the four data setsare identical. Were you to stop the analysisat this point, you would fail to observe the important differences among the four data sets.By examining the scatterplots for the four data sets in Figure 13.22 on page 552, and their residual plots in Figure I 3.23 on page 552, you can clearly seethat each ofthe four data sets has a different relationship betweenX and Y. From the scatterplots of Figure 13.22 and the residual plots of Figure 13.23,you see how different the data setsare. The only data set that seemsto follow an approximate straight line is data set A. The residual plot for data set A does not show any obvious patterns or outlying residuals. This is certainly not true for data sets B, C, and D. The scatter plot for data set B shows that a quadratic regressionmodel (see Section l5.l) is more appropriate.This conclusion is reinforced by the residual plot for data set B. The scatter plot and the residual plot for data set C clearly show an outlying observation. If this is the case,you may want to remove the outlier and reestimatethe regressionmodel (see reference4). Similarly, the scatterplot for data set D representsthe situation in which the model is heavily dependenton the outcome of a single response(XB: 19 and )', : 12.50).You would have to cautiously evaluate any regression model becauseits regressioncoefficients are heavily dependenton a single observation.
FIGURE13.22 Scatter plotsfor four data sets
$ 0  $a
FIGURE 13.23 plots four Residual for
data sets
Residual +4
a a a a
a a a
a a a
10 P a n e lD
13.9: Pitfallsin Regression EthicalIssues 553 and
In summary, scatter plots and residual plots are of vital importance to a complete regression analysis.The information they provide is so basic to a credible analysis that you should always include these graphical methods as part of a regressionanalysis.Thus, a strategy that you can use to help avoid the pitfalls of regressionis as follows: 1. Start with a scatterplot to observe the possible relationship betweenX and Y. 2. Check the assumptionsof regressionbefore moving on to using the results of the model. 3. Plot the residualsversus the independentvariable to determine whether the linear model is appropriate and to check the equalvarianceassumption. 4. Use a histogram, stemandleaf display, boxandwhisker plot, or normal probability plot to of the residuals check the normality assumption. 5. If you collected the data over time, plot the residuals versus time and use the DurbinWatsontest to check the independence assumption. 6. If there are violations of the assumptions,use alternative methods to leastsquares regression or alternative leastsquares models. 7. If there are no violations of the assumptions,carry out tests for the significance of the regressioncoefficients and develop confidence and prediction intervals. 8. Avoid making predictions and forecastsoutside the relevant range of the independent variable. 9. Keep in mind that the relationships identified in observational studies may or may not be relationships. Remember that while causation implies correlation, due to causeandeffect correlation does not imolv causation.
$
r.V
fa\
a
^h l*
$
you with erhaps arefamiliar the I TV competition organized i by model Tyra Banks to find r "America's model." may top You
be less familiar with another of toDmodset from the business els that are emerging world. Week article from its ln a Eusiness "Why January 2006,edition(S.Baker, 23, MathWill RockYourWorld: MathGeeks More Are Calling the Shotsin Business. Your ls lndustry Next?" Business Week,pp.5462), Stephen Baker talks about how "quants" on turned finance upside downandis moving fields. The namequants to otherbusiness derives from the fact that "math geeks" developmodelsand forecasts using by "ouantitative These methods are methods." of analysis builton the principles regression discussed this chapter, in although actual the models muchmorecomplicated the are than in simple linear models discussedthischapter. Regressionbased havebecome models the top models manytypesof business for include analyses. Some examples n Advertising and marketingManagers (in useeconometric models otherwords,
"S
\ q)
s s
\
,x
regression models)to determine the effect an advertisement sales, of on based on a set of factors. Also,managers use patterns behavdatamining predict to of ior of what customers buy in the will future, basedon historicinformation aboutthe consumer. Finance timeyoureadabouta finanAny you cial"model," should understand that sometypeof regression modelis being used.For example, New York Times a 'An Old article June18,2006,titled on Formula ThatPoints NewWorry"by to (p. MarkHulbert BU8) discussesmarket a timingmodel that predicts return the of stocksin the next three to five years, yieldof the stock based the dividend on market and the interest rate of 90dav Treasury bills. Food and beverageBelieve or not, it Enologix, California a consulting company,has developed "formula" (a a regression model) that predicts wine's a qualityindex, based a set of chemical on compounds found in the wine (seeD, "The Chemistry a 90+ Darlington, of Magazine, Wine,"Ihe New York Times pp. August 2005, 3639). 7,
Publishing studyof the effectof price A changes Amazon.com BN.com at and on (again, sales regression analysis) found pushed thata 1% price change BN.com at sales down4%, but it pushed sales down (You can only 0.5% at Amazon.com. download paperat http://gsbadg. the uchicago.edu/vitae.htm.) s TransportationFarecast.com data uses mining predictive and technologies to objec(see Darlin, pricing D. tivelypredict airfare (Or ?irfaresMadeEasy Easier)," New The pp. C6). YorkTimes, 1,2006, C1, July # Real estate Zillow.com usesinformation contained a home in aboutthe features anditslocation develoo to estimates about the market value thehome, of using "fora mula"builtwith a proprietary algorithm. In the article, Baker stated that statistics and probability become will coreskillsfor Those who businesspeople consumers. and will aresuccessful knowhowto usestatistics, whether theyarebuilding financial models or plans.He also strongly makingmarketing the for in to endorsed need everyone business haveknowledge Microsoft of Excel beable to to produce statistical analysis reports. and
t
554
CHAPTERTHIRTEEN SimoleLinearResression
As you can see frorn the chapter roadmap in Figure 13.24, this chapter developsthe simple linear regressionmodel and discussesthe assumotions and how to evaluate them.
Once you are assuredthat the model is appropriate,you can predict valuesby using the prediction line and test for the significance of the slope.
S i m p l e L i n e a rR e g r e s s i o n and Correlation
Regression
Primary
FOCUS
Correlation
Coefficient LeastSquares , Regression nalysis A
:tto""l*:.5
r e s t  n gH o : P=0
Scatter Plot
P r e d i c t i o nL i n e
PlotResiduals Over Ilme I 1.,. Compute DurbinWatson Statistic
I
Data Collected in Sequential Order ? No R e s i d u a lA n a l y s i s
Yes No UseAlternative to Autocorrelation Regression : Leastsquares Present
L.".,.
ls
Yes
7
Model Appropriate ?
No
Testing Hs:
I
0r=0
l{See Assumptionsl
No
Model Yes Significant ' ?
Use Model for P r e d i c t i o na n d E s t i m a t i o n
Estimate
Estimate
Predict
9r
vy,*:1a..* ;
"=\^*
FGURE'13.24Roadmap simplelinear for regression
KeyEquations555 learned thedirectorofplanningfor a chain how to can analysis investigate stores useregression
io betweenthe size of a store and its annual
have usedthis analysisto make better decisions
new sitesfor stores well asto forecast whenselecting as sales for existing stores.In Chapter 14, regression analysisis to extended situationsin which more than one independent variableis usedto predictthe valueofa dependent variable.
RegressionModel Y,: Fo+ Plxi+ Ei
Computational Formula for SSR
(13.1)
n
ssR=\fr,_yl,
l=1
Regression Equation: The Prediction Line Yi=bo+hXi Formula for the Slope, D,
(r3.2)
n n
( n
=b^Ir,+bt "v.L
;, 7i
X,Y,\r=l /
tt
llr, ' l l lH
\2
(13.11)
, ,ssxr A=' .ssx
Formula for the Y Intercept, 6o pf'
(13.3)
Computational Formula for ^S.SE
bo=Y 4X
ssE= ),ti  v )' = 2t,'  aol vi bt>x iYi
(13.4)
i=t i=l i=l j=l
nnnn
(13.12)
Standard Error of the Estimate
ofVariation in Regression
,s,sz: + ssr s,sR
(SST) of Squares
n
(13.5)
Srx = (13.6)
Residual
.? €i=IiIi
t
M
\{v,
;l
v,)'
(13.13)
i= Total of squares = sum
(^SSR) $umof Squares
> ff, 112
jl
\n2
"1
(13.14)
ofsquares inedvariationor regression
DurbinWatsonStatistic
(f,  Y)'
(S^lE) of Squares
(13'7)
D 
sr /(e;i=2 n !o2
n
.) ei_1)(13.15)
inedvariationor enor sum of squares
3'',
(13.16)
(Y, f,)'
of Determination
(13.8)
p' Testing Hypothesis a Population a for Slope, Using the t Test
49r
sut
r_ Regressionsum of squares _ ^S^SR (13.e)
Testing Hypothesis a Population for a Slope, B'
Usingthe ,FTest MSR ' M S E Interval Estimateof the Slope, Confidence B,
bl ! tn_2Sbl
i:
Total sum of souares
,S,SZ
(13.17)
Formula for SSZ
,
n
/\ri

t=r /  )' = ). r,t  \
n
. lrnl
(
,
12
(13.18)
(l3.lo)
b 1  t n _ 2 s b< F r < b r + t r _ r S ^ t
Simple LinearRegression 556' cUapTERTHIRTEEN Testingfor the Existence Correlation of rp
Prediction Interval for an Individual Responseo Y
Ir" \,z
ConfidenceInterval Estimatefor the Mean of Y
(13.1e)
'i, + tnrsr"^F. r,
'i,  tnrsrr..fl
(13.21)
3 Yx=xis f, + t,2syr.,F+k
'i,t t,rsr*rfi
't, tnrsr*rEa pr,"=x,3 f, + tnrsrrrfi
(13.20)
assumptions regression 529 of autocorrelation 534 coefficient of determination 526 confidenceinterval estimatefor the meanresponse 546 correlationcoefficient 542 dependent variable 512 DurbinWatson statistic 536 (.LlE") 524 error sum of squares equalvariance 530 explainedvariation 524 explanatoryvariable 513 homoscedasticity530
independenceoferrors 529 independentvariable 512 leastsquares method 516 linearrelationship 512 normality 530 predictioninterval for an individual response, 547 I prediction line 515 regressionanalysis 512 regression coefficient 516 regression sum ofsquares(SSR) 524 relevant range 519 residual 530
residual analysis 530 response variable 513 scatterdiagram 512 plot 512 scatter simple linear regression 512 simplelinear regression equation 5l slope 513 standard error of the estimate 528 total sum ofsquares(SSQ 524 total variation 524 unexplainedvariation 524 )zintercept 513
Your Understanding Checking
13.54 What is the interpretation the )zintercept of and the slopein the simple linear regression equation? 13.65 What is the interpretationof the coefficient of determination? 13.66 When is the unexplained variation (that is, error sum of squares) equalto 0? 13.67 Whenis the explained variation(thatis, regression sum of squares) equalto 0? 13.68 Why shouldyou alwayscarry out a residualanalysisaspart of a regression model? 13.69 What are the assumptions regression of analysis? 13.70 How do you evaluate assumptions regression the of analysis? 13.71 When and how do you use the DurbinWatson statistic?
13.72 What is the differencebetweena confidence i val estimateof the meanresponse,Vy x=x , anda tion intervalof Yr=y ?
Applying the Concepts
13.73 Researchers from the Lubin Schoolof Business PaceUniversity in New York Citv conducted study a lnternetsupported courses.In one part of the study, numericalvariables were collectedon 108 students in introductory management coursethat met oncea week an entire semester. One variable collected was hit tency.Tomeasure consistency, researchers hit the did followine: If a student did not visit the Internet betweenclasses, studentwas given a 0 for thatti the period. If a studentvisited the Internet site one or times betweenclasses, studentwas given a I for the time period. Becausethere were 13 time periods, a dent'sscoreon hit consistency could rangefrom 0 to 13. The other three variablesincluded the student's average,the student'scumulative grade point
Chapter Review Problems 557 thetotal numberof hits the studenthad on the 'eitesupporting the course. The following table conelationcoefficient for all pairs of variables. marked with an * are statisticallv correlations e. Determine the coefficient of determination,12, and explain its meaningin this problem. f. Perform a residual analysis.Is there any evidenceof a patternin the residuals? Explain. g. At the 0.05 level of significance,is there evidenceof a linear relationshipbetweendelivery time and the number ofcasesdelivered? h. Constructa 95ohconfidenceinterval estimateof the meandelivery time for 150 cases soft drink. of prediction interval of the delivery time i. Constructa95o/o for a singledeliveryof 150cases ofsoft drink. j. Constructa 95o/o confidenceinterval estimateof the populationslope. k. Explain how the resultsin (a) through (j) can help allocatedelivery coststo customers. 13.75 A brokerage housewantsto predict the numberof trade executions per day, using the number of incoming phonecalls as a predictorvariable.Datawerecollectedover a period of 35 daysand are storedin the file@@. a. Use the leastsquares methodto computethe regression coefficients boandbr. b. Interpretthe meaningof bo and b, in this problem. c. Predictthe numberof tradesexecuted a day in which for the numberof incoming calls is 2,000. d. Should you use the model to predict the number of tradesexecuted a day in which the numberof incomfor ing calls is 5,000?Why or why not? e. Determine the coefficient of determination,r2, and explain its meaningin this problem. f. Plot the residualsagainstthe number of incoming calls andalsoagainst days.Is thereanyevidence pattern the ofa in the residuals with eitherof thesevariables? Explain. g. Determinethe DurbinWatsonstatisticfor thesedata. h. Basedon the resultsof (f) and (g), is there reasonto questionthe validity of the model?Explain. i. At the 0.05 level of significance,is there evidenceof a linear relationshipbetweenthe volume of trade executions and the numberof incoming calls? j. Constructa 95o/o confidenceinterval estimateof the mean number of tradesexecutedfor days in which the numberof incomingcallsis 2,000. prediction interval of the number of k Construct a 95o/o tradesexecuted a particularday in which the number for of incomingcallsis 2,000. l. Constructa 95ohconfidenceinterval estimateof the populationslope. m.Basedon the resultsof (a) through (l), do you think the brokeragehouse should focus on a strategyof increasing the total number of incoming calls or on a strategy that relies on trading by a small number of heavy traders? Explain. 13.76 You want to developa model to predict the selling price of homesbasedon assessed value.A sampleof 30
using : 0.001: o
Correlation Cumulative GPA Total Hits Hit Consistency GPA.TotalHits GPA,Hit Consistency Hit Consistency
0.72* 0.08 0.37* 0.12 0.32* 0.64*
F.xtmctedfromD. Baugheti A. Varanelli, and E. Weisbord, Hits in an InternetSupported Course: How Can UseThemand What Do They Mean? " Decision Sciences
Innovative Educatioq Fall 2003,I(2), pp. 159179.
conclusions you reach from this correlation can surprised the results,or are they consistent by own observations experiences? and Management a softdrink bottling company of develop methodfor allocating delivery coststo a Althoughone cost clearly relatesto travel time particular route, anothervariablecost reflectsthe iredto unloadthe cases soft drink at the delivof A sample 20 deliverieswithin a territory was of The delivery times and the numbersof cases wererecorded the@@$@file: in
Delivery Delivery Number Time Number Time ofCases (Minutes) Customer ofCases (Minutes)
52 g 73 85 95 103 n6 l2l t43 t57
32.1 34.8 36.2 37.8 37.8 39.7 38.5 4r.9 44.2 47.r
ll
t2 l3 t4 15 l6 t7 l8 l9 20
161 184 202 2r8 243 254 267 27s 287 298
43.0 49.4 57.2 56.8 60.6 61.2 58.2 63.1 65.6 67.3
modelto predictdeliverytime,based a regression ofcases delivered. leastsquares methodto computethe regression rtsDoandb,. themeaningof bo and 6, in this problem. thedeliverytime for 150 cases soft drink. of you usethe model to predict the delivery time who is receiving500 cases soft drink? of why not?
558
CHAPTER THIRTEEN Simple Linear Regression
recently houses a smallcity is selected soldsinglefamily in to studythe relationship between sellingprice (in thousands value(in thousands ofdollars)andassessed ofdollars).The houses the city had beenreassessed full valueone year in at prior to the study.The resultsare in the file@@. (Hint: First, determinewhich are the independent and variables.) dependent plot and"assuming linearrelationa. Construct scatter a a method to compute the ship, use the leastsquares regression coefficients andbr. bo b. Interpret the meaning of the I intercept,bo, and the slope, in thisproblem. b,, c. Usethepredictionline developed (a) to predictthe sellin ing price for a housewhoseassessed valueis SI 70,000. Determine the coefficient of determination,r2, and d. interpret meaning this problem. its in e. Performa residualanalysison your resultsand determine the adequacy the fit of the model. of f. At the 0.05levelof significance, thereevidence a linis of earrelationship between sellingprice andassessed value? g. Construct a95o/o confidence intervalestimate the mean of sellingpricefor houses with anassessed valueof $170,000. predictionintervalof the sellingprice of h. Construct a95o/o house an individual with an assessed valueof $ 170,000. i. Constructa 95o/o confidenceinterval estimateof the population slope. 13,77 You want to develop a model to predict the assessed valueofhouses,based heatingarea. sample on A of 15 singlefamily houses selected a city.The assessed is in value(in thousands ofdollars) and the heatingareaofthe houses(in thousands squarefeet) are recorded, of with the following results,storedin the file@@!fS: Assessed Value ($000) 184.4 177.4 Heating Area of Dwelling (Thousandsof SquareFeet) 2.00  1.7 1.45 t.76 1.93 1.20 1.55 1.93 1.59 1.50 1.90 1.39 1.54 1.89 1.59
a. Constructa scatterplot and"assuming linear relationa ship,usethe leastsquares methodto computethe regressioncoefficients andb,. bo b. Interpret the meaningof the I intercept,bo, and the slope,b1,in this problem. c. Use the predictionline developed (a) to predictthe in assessed value for a housewhoseheatingareais 1,750 squarefeet. d. Determine the coefficient of determination.r2. and, interpretits meaningin this problem. e. Performa residualanalysis your resultsand deteron mine the adequacy the fit of the model. of f. At the0.05levelof significance, thereevidence a linis of earrelationship between assessed valueandheating area? g. Constructa 95o/o confidenceinterval estimateof the meanassessed value for houses with a heatingarea of 1,750square feet. h. Construcla 95ohpredictioninterval of the assessed valueof an individualhouse with a heating areaof 1,750 squarefeet. i. Constructa 95o/o confidenceinterval estimateof the population slope. 13.78 The directorof graduate studies alargecollege at of (GPA) would like to predictthe gradepoint average business of studentsin an MBA program basedon the Graduate Management AdmissionTest(GMAI) score. sample A of 20 studentswho had completed2 yearsin the programis selected. resultsare storedin the filefiS@@: The
GMAT Observation Score GPA 1 2
J i
GMAT Observation Score GPA ll 12
IJ
House
I 2
a
r7 .7 5
185.9 179.1 170.4 175.8 185.9
4 5 6 8 9 10 ll t2 l3 t4 l5
5 6 7 8 9 l0
688 647 652 608 680 617 557 599 616 594
3.72 3.44 3.21 3.29 3.91 3.28 3.02 3.13 3.45
J.JJ
T4 l5 l6 l7 18 19 20
567 542 551 573 536 639 619 694 718 759
3.07 2.86 2.91 2.79 3.00 3.55
J.+ I
3.60 3.88 3.76
r78.5 179.2
186.7 t'19.3 174.5
r8 3 . 8
176.8
(Hint: First, determine which are the independent and dependent variables.)
(Hint: First, determinewhich are the independent and dependent variables.) a. Constructa scatterplot and,assuminga linear relationship, use the leastsquares method to compute the regression coefficients andb,. bo b. Interpret the meaning of the I intercept,bo, and the slope,b1,in this problem. c. Use the predictionline developed (a) to predictthe in GPA for a studentwith a GMAT scoreof 600. d. Determine the coefficient of determination,12, and interpretits meaning this problem. in e. Performa residualanalysison your resultsand determine the adequacy the fit of the model. of
Chapter Review Problems 559 the0.05level ofsignificance,is thereevidence ofa relationship between GMAT scoreand GPA? a 95Yo confidence intervalestimate the of GPAof students with a GMAI scoreof 600. prediction interval of the GPA for a a 95%o student with a GMAT scoreof 600. a 95o/o confidenceinterval estimateof the slope. Themanager the purchasingdepartmentof a of bankingorganization would like to developa model the amountof time it takesto processinvoices. from a sampleof 30 days,and the numarecollected processed completiontime, in hours,is invoices and inthe file@@. First,determinewhich are the independent and variables.) ing a linear relationship,use the leastsquares to compute regression the coefficients andb,. bo the meaningof the )z intercept,bo, and the b1,in thisproblem. thepredictionline developed (a) to predict the in of time it would taketo process 150invoices. ine the coefficient of determination,r2, and
ItSmeamng.
Flight Number I 2 3 5 6 8 9 4tB
Temperature (oF)
ORing DamageIndex
66 70 69 68 67 72 73 70
JI
4tc
4tD 4tG 5lA 5lB
63 70 '18 67 75 53 67
slc
5lD
sl F sl G
5l  I 5lJ
8r
70 67 79 75 76 58
6r  A
61B 6lc
the residuals asainst the number of invoices andalsoaeainst time. plots in (e), does the model seem on the the DurbinWatson statisticand.at the 0.05 of significance, determinewhetherthere is any ion in the residuals. ontheresults (e) through(g), whatconclusions of youreachconcerningthe validity of the model? the0.05levelofsignificance,is thereevidence ofa relationship betweenthe amount of time and the processed? of invoices
a95o/oconfidence interval estimate of the mean
0 4 0 0 0 0 0 0 4 2 4 0 0 0 ll 0 0 0 0 0 4 0 4
Note: Data from flight 4 is omitted due to unknown Oring condition. Source: Extractedfrom Report of the PresidentialCommission on the SpaceShuttle Challenger Accident Washington, DC, 1986,Vol. II (HlH3) and Vol.IV (664), andPost Challenger Evaluation of Space Shuttle Risk Assessmentand Management, Washington, DC, 1988, pp. 135136.
150 oftime it wouldtaketo process invoices. intervalofthe amountof time a95o/oprediction taketo process150invoiceson a particularday. On January28, 1986,the spaceshuffleChallenger and sevenastronauts were killed. Prior to the the predictedatmospherictemperature was for weather the launch site. Engineersfor Morton at (themanufacturerof the rocket motor) prepared tomake casethat the launchshouldnot takeplace the Thesearguments wererejected, thecoldweather. and tragicallytook place.Upon investigation after , experts agreed that the disasteroccurred of leakyrubber Orings that did not sealproperly Data indicating the atmothe cold temperature. temperature the time of 23 previouslaunches at and damage index are storedin the file!@@:
a. Constructa scatterplot for the sevenflights in which there was Oring damage(Oring damageindex * 0). What conclusions,if any, can you draw about the relationship betweenatmospheric temperature and Oring damase? b. Constructa scatterplot for all 23 flights. c. Explain any differencesin the interpretationof the re tionship betweenatmospheric temperature and Ori in damage (a) and (b). d. Basedon the scatterplot in (b), provide reasons why a prediction shouldnot be made for an atmospheric temperatureof 3 I'F, the temperature the morning of the on launchof the Challenger. e . Although the assumption a linear relationship may of not be valid"fit a simplelinearregression modelto predict Oring damage, basedon atmospheric temperature. Include the prediction line found in (e) on the scatter plot developed (b). in g. Basedon the resultsof (f), do you think a linearmodel is appropriate thesedata?Explain. for h. Performa residualanalvsis. What conclusions vou do reach?
5q0
Simple Regression CHAPTERTHIRTEEN Linear 12, d. Computethe coefficient of determination, andirfierpret its meaning. e. Perform a residual analysison your results and determine the adequacy the fit of the model. of f. At the 0.05 level of significance,is there evidence a of linear relationshipbetweenthe Wonderlic scorefor a football playertrying out for the NFL from a schooland the school'sgraduationrate? g. Constructa 95%o confidenceinterval estimateof the meanWonderlicscorefor football playerstrying out for the NFL from a schoolthat hasa graduationrateof 50%.' predictioninterval of the Wonderlic h. Constructa 95o/o scorefor a football playertrying out for the NFL froma schoolthat hasa sraduationrate of50o/o. i. Constructa 95%o confidenceinterval estimateof the slope. 13.83 Collegebasketball big business, is with coaches' in salaries, revenues, and expenses millions of dollars. The data in the fil" EEEEEE!$EI@ contains the coaches'salariesand revenuesfor college basketball at selectedschools in a recent year (extractedfrom R. Adams, "Pay for Playoffs," The Wall StreetJourncl, March ll12,2006, pp. Pl, P8).You plan to develop a regression model to predict a coach'ssalarybased revenue. a. Assuming a linear relationship,use the I methodto computethe regression coefficientsboandb, b. Interpret the meaningof the )zintercept,bo, andthe slope,b1,in this problem. ! c. Use the prediction line developedin (a) to predici the coach'ssalary for a school that has revenue $7 million. r2, d. Computethe coefficient of determination, ant pret its meaning. e. Performa residualanalysison your resultsand mine the adequacy the fit of the model. of f. At the 0.05 level of significance, thereevidence is of linear relationship betweenthe coach'ssalary for schooland revenue? g. Constructa 95o/o of confidenceinterval estimate meansalaryofcoachesat schoolsthat haverevenue $7 million. h. Constructa9lYoprediction interval of the coach's for a schoolthat hasrevenue $7 million. of i. Constructa 95o/o confidenceinterval estimate of slope. 13.84 Durins the fall harvest season the UnitedS in pumpkinsare sold in large quantitiesat farm stands. insteadof weighing the pumpkins prior to sale,the standoperatorwill just place the pumpkin in the a ate circular cutout on the counter.When askedwhv was done, one farmer replied,"l cantell the weightof pumpkin from its circumference." determine To this was really true, a sampleof 23 pumpkinswere
would 13.81 CrazyDave,a wellknownbaseball analyst, like to study various team statisticsfor the 2005 baseball might be useful in preseason determinewhich variables to dicting the number of wins achievedby teamsduring the season. has decidedto begin by using a team'searned He (ERA), a measure pitching performance, to run average of predict the number of wins. The data for the 30 Major LeagueBaseballteamsare in the file [!!!!lf[ (Hint: First, determinewhich are the independent and variables.) dependent a. Assuming a linear relationship,use the leastsquares methodto computethe regression coefficientsboandb,. b. Interpret the meaningof the I intercept,bo, and the slope, in this problem. b1, in c. Use the predictionline developed (a) to predict the numberof wins for a teamwith an ERA of 4.50. 12, d. Computethe coefficient of determination, andinterpret its meaning. e. Performa residualanalysison your resultsand determine the adequacy the fit of the model. of f. At the 0.05 level of significance, thereevidence a is of linear relationshipbetweenthe number of wins and the ERA? g. Constructa 95o/o confidenceinterval estimateof the numberof wins expectedfor teamswith an ERA mean of 4.50. h. Constructa 95Yopredictioninterval of the numberof wins for an individual teamthat hasan ERA of 4.50. i. Constructa 95%o confidenceinterval estimateof the slope. j. The 30 teamsconstitute population.In orderto usestaa tistical inference,as in (f) through (i), the datamust be to a assumed represent random sample.What "population" would this samplebe drawingconclusions about? k. What other independent variablesmight you consider for inclusionin the model? 13.82 Collegefootball playerstrying out for the NFL are giventheWonderlicstandardized intelligencetest.The data in the file E![!@! contains the averageWonderlic scoresof football playerstrying out for the NFL and the graduationrates for football playersat selectedschools (extracted Teaml' The from S.Walker,"The NFIlsSmartest WallStreetJournal, September 2005,pp. Wl, Wl0). 30, You plan to develop a regressionmodel to predict the Wonderlic scoresfor football playerstrying out for the NFL, based on the graduationrate of the school they attended. a. Assuming a linear relationship,use the leastsquares boandbr. method compute regression to the coefficients b. Interpret the meaningof the I intercept,bo, and the slope,b1,in this problem. in c. Use the predictionline developed (a) to predict the playerstrying out for the Wonderlic score for football NFL from a schoolthat hasa eraduationrateof 50o/o.
Chapter Review Problems 561 for circumference weighed"with the following and SalesLatest onemonthsalestotal (dollars) AgeMedian ageof customerbase(years) HSPercentage of customerbasewith a high school diploma CollegePercentageof customerbasewith a college diploma GrowthAnnual population growth rate of customer baseover the past 10 years IncomeMedian family income of customerbase (dollars) a. Constructa scatterplot, using salesas the dependent variable and median family income as the independent variable.Discussthe scatterdiagram. b. Assuming a linear relationship,use the leastsquares methodto computethe regression coefficientsboandb,. c. Interpret the meaning of the I intercept,bo, and the slope,b1,in this problem. d. Computethe coefficient of determination, andinler12, pret its meaning. e. Perform a residualanalysison your resultsand determine the adequacy the fit of the model. of f. At the 0.05 level of significance,is there evidenceof a linear relationshipbetween independent the variableand the dependentvaiable? g. Constructa 95o/o confidenceinterval estimateof the slopeand interpretits meaning. 13.86 For the dataof Problem13.85,repeat(a) through (g), using medianageas the independent variable. '13.87 For the dataof Problem13.85, (a) repeat through(g), using high schoolgraduationrateasthe independent variable. (a) 13.88 Forthe dataofProblem I 3.85,repeat through(g), usingcollegegraduation rateasthe independent variable. 13.89 For the dataof Problem13.85,repeat(a) through (g), using populationgrowth asthe independent variable. 13.90 Zagat'spublishes restaurant ratingsfor variouslocationsin theUnitedStates. datafile @contains The the Zagatratingfor food, decor,service,andthe price per person for a sampleof 50 restaurants locatedin an urbanarea (New York City) and 50 restaurants locatedin a suburbof New York City. Develop a regression model to predict the price per person, basedon a variablethat represents sum the ofthe ratingsfor food,decor,andservice. Extractedfrom Survey NewYork Source: Zagat 2002 City
Restaurantsand Zagat Survey 200 l2002, Long Island Restaurants.
in stored thefile E@@fr:
Weight (Grams) Circumference Weight (cm) (Grams)
(cm)
50
54 52 37 :52 53
47 51 63
i33
43
1,200 2,000 1,500 1,700 500 1,000 1,500 1,400 1,500 2,500 s00 1,000
57 66 82 83 70 34 5t 50 49 60 59
2,000 2,500 4,600 4,600 3,100 600 1,500 1,500 1,600 2,300 2,r00
ing a linear relationship,use the leastsquares to compute regression the coefficientsboand b,. themeaning the slope,b,, in this problem. of the meanweight for a pumpkin that is 60 cenin circumference. you think it is a good idea for the farmer to sell pkins by circumference insteadof weight?Explain. ine the coefficient of determination.12, and
Its meamns.
a residualanalysisfor thesedataand determine of adequacy the fit of the model. the0.05level of sisnificance.is there evidenceof a relationship betweenthe circumference and the ightof a pumpkin? t a 95ohconfidence intervalestimate the of tionslope, Br. a 95% confidenceinterval estimateof the mean weight for pumpkins that have a cirof 60 centimeters. prediction interval of the weight for a 95o/o pumpkin that has a circumference 60 individual of Candemographic information be helpful in pregoods stores? of The data storedin sales sporting EE@[Eure the monthly salestotals from a ranof 38 storesin a large chain of nationwide goodsstores. storesin the franchise,and thus All the sample,are approximatelythe samesize and merchandise. county or, in somecases, same The ,the in which the store draws the majority of its cusis referred hereasthe customerbase.For eachof to demographic information aboutthe customer stores, is provided.The data are real, but the name of the iseis not used at the requestof the company.The in the dataset are
a. Assuming a linear relationship,use the leastsquares methodto computethe regression coefficientsboandb,. b. Interpret the meaning of the I intercept,bo, and the slope,b1,in this problem. c. Usethepredictionline developed (a)to predicttheprice in per pe$on for a restaurant with a summated rating of 50. d. Computethe coefficient of determination, and inter12, pret its meaning.
562
CHAPTER THIRTEEN Linear Regression Simple
e. Performa residualanalysison your resultsand determinethe adequacy the fit of the model. of f. At the 0.05 level of significance, thereevidence a is of linearrelationship between price per personand the the summated rating? g. Constructa 95ohconfidence intervalestimate the of meanprice per personfor all restaurants with a summatedratingof 50. prediction h. Construct a95o/o intervalof the priceper personfor a restaurant with a summated ratingof 50. i. Construct 95% confidence a intervalestimate the slope. of j. How useful do you think the summated rating is as a predictor price?Explain. of '13.91 Referto the discussion betavaluesand market of modelsin Problem13.49onpages544545.One hundred weeksof data,endingtheweekof May 22,2006,for the S&P 500 and threeindividual stocksare includedin the datafile Note that the weeklypercentqge changefor both @ the S&P 500 and the individualstocksis measured the as percentage change from the previousweek'sclosingvalueto the currentweek'sclosingvalue.The variables includedare WeekCurrent week SP500Weekly percentage in change the S&P 500 Index WALMARTWeekly percentage in change stockprice of WalMartStores, Inc. TARGETWeekly percentage change stockprice of in the TargetCorporation SARALEEWeekly percentage changein stockprice of the SaraLeeCorporation
SourceExtracted : May from finance.yahoo.com, 3I, 2006. a. Estimate the market model forWalMart StoresInc. (Hint: Use the percentagechange in the S&P 500 Index as the independent variable and the percentage change in WalMart Stores,Inc.'s stock price as the dependentvariable.) b. Interpret the beta value for WalMart Stores,Inc. c. Repeat(a) and (b) forTarget Corporation. d. Repeat(a) and (b) for Sara Lee Corporation. e. Write a brief summary of your findings.
13.92 The datafile [@!contains the stockpricesof four companies, collectedweekly for 53 consecutive weeks, endingMay 22,2006.Thevariables are WeekClosing datefor stockprices MSFTStock price of Microsoft,Inc. FordStock price of FordMotor Company GMStock price of General Motors,Inc. IALStock price of International Aluminum,Inc.
Extracted Source; May Jromfinance.yahoo.com, 31, 2006. a. Calculate the correlation coefficient, r, for each pair of stocks. (There are six of them.) b. Interpret the meaning of r for each pair. c. Is it a good idea to have all the stocks in an individual's portfolio be strongly positively correlated among each other? Explain. 13.93 Is the daily performanceof stocks and bonds correlated? The data file E!@s![[tE contains information concerning the closing value of the Dow Jones Industrial Average and the Vanguard LongTerm Bond Index Fund for 60 consecutivebusinessdays, ending May 30, 2006. The variables included are Date Current day Bonds Closing price of Vanguard LongTerm Bond Index Fund StocksClosing price of the Dow Jones Industrial Average Scturce : Extracted.from finance.yahoo.com, 31, 2006. May a. Compute and interpret the correlation coefficient, r, for the variables Stocks and Bonds. b. At the 0.05 level of significance, is there a relationship between these two variables?Explain.
Report Writing Exercises
13.94 In Problems 13.8513.89 page561,you develon opedregression modelsto predictmonthlysales a sportat ing goodsstore.Noq write a reportbasedon the models you developed. Append to your report all appropriate chartsand statistical information.
Managingthe SpringvilleHerald
To ensure that as many trial subscriptions as possible are converted to regular subscriptions, the Herald marketing departmentworks closely with the distribution department to accomplish a smooth initial delivery processfor the trial subscription customers.To assist in this effort, the marketing department needs to accurately forecast the number of new regular subscriptionsfor the coming months. A team consisting of managersfrom the marketing and distribution departmentswas convenedto develop a better method of forecasting new subscriptions.Previously, after examining new subscription data for the prior three months, a group of three managerswould develop a subjective forecast of the number of new subscriptions. Lauren Hall, who was recently hired by the company to provide special skills in quantitative forecasting methods, suggested that the department look for factors that might help in predicting new subscriptions. Members of the team found that the forecasts in the past year had been particularly inaccuratebecausein some months, much more time was spent on telemarketing than
References 563 in other months. particular, thepastmonth,only 1,055 In in hours werecompleted because callerswerebusy during the frst weekof the month attendingtraining sessions the on personal formal greetingstyle and a new standard but preguide(see"Managing the SpringvilleHerald" in sentation Chapter ll). Lauren collected data (stored in the file and hours @@) for the number of new subscriptions spent telemarketing each month for the past two on for years. EXERCISES SH13.1 What criticism can you make concerningthe methodof forecasting that involvedtaking the new subscriptions for the prior threemonthsasthe data basisfor futureprojections? SHl3.2 What factorsotherthan numberof telemarketing hoursspent might be usefulin predicting numthe ber of new subscriptions? Explain. SHl3.3 a. Analyze dataanddevelopa regression the model to predictthe meannumberof new subscriptions for a month,based the numberof hoursspent on on telemarketing new subscriptions. for b. If you expect spend1,200hourson telemarketto ing per month,estimate meannumberof new the subscriptions the month.Indicatethe assumpfor tions on which this prediction based. you is Do think theseassumptions valid?Explain. are c. What would be the danger of predicting the number of new subscriptions a month in for which 2,000hourswerespenton telemarketine?
your knowledgeof simple linear regressionin this Apply Web Case,which extendsthe SunJlowers Apparel Using Statistics scenario this chapter from Leasing agentsfrom the Triangle Mall Management Corporation havesuggested that Sunflowersconsiderseverallocationsin some of Triangle's newly renovated lifestyle mallsthat caterto shoppers with higherthanmean disposable income.Although the locationsare smallerthan thetypicalSunflowers location,the leasingagentsargue that higherthanmean disposable income in the surroundingcommunityis a better predictorof higher salesthan store size.The leasingagentsmaintain that sampledata from14Sunflowers storesprovethat this is true. Review leasingagents'proposal supporting the and documents describe dataat the company's that the Web site,
www.prenhall.com/Springville/Triangle_Sunfl ower.htm, (or open this Web casefile from the StudentCDROM's Web Casefolder), and then answerthe following: 1. Shouldmeandisposable incomebe usedto predictsales based the sample 14 Sunflowers on of stores? 2. Shouldthe management Sunflowers of accept claims the of Triangle'sleasingagents? Why or why not? 3. Is it possible the meandisposable that incomeof the surrounding areais not an important factor in leasingnew locations? Explain. 4. Are thereany other factorsnot mentioned the leasby ing agents that might be relevant the storeleasing to decision?
l. Anscombe, J., "Graphsin Statistical F. Analysisl' The American Statistician (1973):1721. 27 2.Hoaglin,D. C., and R. Welsch,"The Hat Matrix in Regression ANOVAI' TheAmericanStatistician32 and (1978): 1722. 3.Hocking,R. R., "Developments Linear Regression in Methodology: 19591982,"kchnometrics25 (l 983): 219250.
4. Kutner,M. H., C. J. Nachtsheim, Neter,and W. Li, J. AppliedLinear StatisticalModels,5th ed. (NewYork: McGrawHill/Irwin,2005). 5. MicrosoftExcel2007(Redmond, WA: MicrosoftCorp., 2007\.
564
EXCELcoMPANIoN to chaoter l3
E13.1 PERFORMINGIMPLE INEAR L S REGRESSION ANALYSES
You perform a simple linear regression analysis either by procedure or usingthe PHStat2SimpleLinear Regression procedure. Regression by usingtheToolPak
performsthe regression PHStat2 analysis, using t
ToolPak Regression procedure. Therefore, the workshee produced does rol dynamically change ifyou change data. (Rerun the procedure to create revised results.)
threeOutput Optionsavailable the PHStat2dialogbox in procedureand are explained enhancethe ToolPak in E13.2.813.4. E13.5. Sections and
Using PHStat2 Simple Linear Regression
that the Opento the worksheet contains datafor the regres) sion analysis. SelectPHStat ) Regression Simple Linear Regression. the procedure's In dialogbox (shown below), enterthe cell rangeof the )zvariableas the Y VariableCell Rangeandthe cell rangeof theXvariableas the X Variable Cell Range. Click First cells in both rangescontain label and entera valuefor the Confidence level for regression coefficients.Click the Regression StatisticsTable and the ANOVA and CoefficientsTable Regression Tool OutputOptions,entera title as the Title, andclick OK.
Using ToolPak Regression
that containsthe datafor the regres Opento the worksheet sion analysis.Select Tools t Data Analysis. selec Regression from theDataAnalysis andclick OK.ln list, procedure's dialogbox (shown below), enter cellrange the the X variabledataas the Input Y Range and enterthecell rangeof the X variabledataas the Input X Range.Click
Labels, click Confidence Level and enter a value in its and then click OK. Resultsappearon a new worksheet.
I'p'rt
rprtYRdrgc: rreir1Rd4c: EIL$.* M codidarc rcvd: Output opiisrs OQrrptncrgc: WutatcCEV: O tlcr,r flcsstsrtca"no 9s %
,=,
E E
fruT
= fctdl t Hphl
0*i Y tCarda Cd Rffqcl X Valrilc Cd RrUc: 17 ** cek nrbo$rrurEescmtdr l$d lG* corfidarc brclfa rogre*sm ccfficlonB, RegrseionTod A*gt Optftns l? neresaon**ucsrge 17 $.totArrdco6fMT.bh T Rcddi*TaHa T Rcc&dPlot O.tpr.t Opdons
,,.8
nR.dCsdPtots ilWrcmnot3
Ot$+rltdt6oof
Residudk
trne*ar* n*md*d2rdRrcdu.b
tlormd Probabiity
DUonnlPr**yPtes
Tf{c: I f sc*ter uagran f qrbbFvcat$nstltirtt I Cor$dorra Radction md htorvalfarX
E13.2 CREATINGSCATTER PLOTS ADDING A PREDICTION LINE
Ti
r
lr.b
I
 .*rI
I
I
li
oK
ll
cilcd
I
the 'Adding a Prediction Line" section that applies to Excel version you use.
plot You useExcel chartingfeatures create scatter to a add a predictionline to that plot. Ifyou selectthe Diagram output option of the PHStat2Simple Li procedure Section (see Regression you canskip 813.1),
E 13.2: C'reating Scatter PlotsandAddinga PredictionIine 565
Creating a Sea***r Flot
Useeither Section the E2.12instructions create scatter to a plot(see page or usethe Section 93) E13.1 instructions in "UsingPHStat2 SimpleLinearRegression", clicking but you click OK. Diagrambefore Scatter
Then*r@fi3
ilne Color Lfie Sti ie Shadoi
Tre,rdlire Options
TrendF.egressrm Tfpc E\Jsnrnnal " .t .(:i Lnear  Loo8rrmmri polrno*at t,_
___l
Adding a Prediction Line (972003)
Open to the chart sheet that contains your scatter plot and selectChart t Add Trendline. In the Add Trendline d i a l o gb o x ( s e eF i g u r eE l 3 . l ) , c l i c k t h e T y p e t a b a n d t h e n click Linear. Click the Options tab and select the A u t o m a t i c o p t i o n .C l i c k D i s p l a y e q u a t i o n o n c h a r t a n d Display Rsquared value on chart and then click OK. If you haveincluded a label as part of your data range,you will see thatlabeldisplayed placeof Seriesl in this dialogbox. in
_t
.t,,. I i, II I
 ,'r Po,r. I ' t  :" l,to.nc Areraoe
Trendftna l lame i:l (j Aulomatc : Eustom; Linrar (Annu6l Siies)
ForeGst foftlardl Bicklrtrd: ff E 0,0 0.0 gelods pcrods
5et Intacept Drsplay Equdbonon ch.rt
t{ rqt:deigiq@.qi 0" .njii "1""
Iype OPtbns LirEsr(scri€rt)
f*c"*
l
Ifendlnerrime r+) A*om*A: {) Eurtmr F0re{nst f,orword: [*kwvdr 0 0
FIGURE E13.2 Format Trendline dialog box (2007) relocatethe X axis to the bottom of the chart. open to the chart, rightclick the I axis and select Format Axis fiom the shortcutrnenu. lf you use Excel 972003, selectthe Scale tab in the FormatAxis dialogbox (seeFigureE 13.3 and enterthe value ), fbund in the Minimum box (6 in FigureE13.3)asthe Value (X) axis Crossesat value and click OK. (As you enterthis value,the check box fbr this entry is clearedautomatically. )
I ]
Lhits tkr*s
o r*aapt * Dl* g Bsplay gquatbn m chsrt E u+tev Bsquareav.k€ on ch.ft
Patterns
5cde
Fort
Nwnber
FIGURE E13.1Add Trendline dialogbox (972003)
valua (Y) axis scde Arjto 6 4
Adding a Prediction Line QA07)
Opento the chart sheet that contains your scatter plot and select Layout ) Trendline and in the Trendline gallery, More Trendline Options. In the TrendlineOpitions select panel the FormatTrendline dialog box (seeFigure F.13.2), of select Linear option, click Display equation on chart the andDisplay Rsquaredvalue on chart, and click Close.
El Ptqnilrfl: B t*tagnrum:
I E] mg;orur*: o.2 Fl.rnr mit: P flva&ie (X)axis qr6rc5 6t: 6 DsplayUr*s: ttone v
flEogartfrrr scde rda I vduash geverse flvaltr 1x;axiscrosiei at &6ximunvah.€
Relocating X Axis an
Ifthere are )'values on a residualplot or scatterplot that arelessthan zero, MicrosofltExcel placesthe X axis at the pointf : 0, possiblyobscuringsome of the data points.To
6oKl T("*a
I
FIGUREE13.3 Format Axisdialoqbox (972003)
566
EXCEL coMPANIoN chaoter to 13 WatsonStatistic PHStat2 create residuals causes to a table, evenif you did not checkthe Residuals Table Regression Tool outputoption. The DurbinWatson Statisticoutput option creates a new DurbinWatson worksheet similar to the one shown in Figure 13.16on page536.This worksheet references cells in the regression resultsworksheet is alsocreated the that by procedure. you deletethe regression If resultsworksheet, the DurbinWatson worksheet disolavs errormessase. an
If you useExcel2007,in theAxis Optionspanelof the Format Axis dialogbox (seeFigureE I 3.4), select Axis the value option,changeits defaultvalue of 0.0 (shownin FigureE13.4)to a valuelessthan the minimum )'value, andclick Close.
{diiiiqisiit
rlr
i
{;ffistdt I t{.rrber i n
ur€colar
Axis options ' t'?unun: O Apo O Exct r Maxirum: Oegb ORxed
Mslo(unit:O autr O r,feA t/klort'nit: O ruto O Fx# Lncsb/. , ; I Yabcrrnrcvcrscordcr shadow s(6h Fo{rnat LI Logarifndc 313
i Atgryneftt
UsingDurbinWatson.xls
Open to the DurbinWatson worksheet of the (seeFigure13.16 This worksheet E@@workbook. on page536)uses SUMXMY2 (cell rangeI, cell range the 2) in cell 83 to compute sum of squared the function difference and of the residuals, the SUMSQ (residuals range)funccell tion in cell E}4to compute sumof squared the residuals the for Section13.6package deliverystoreexample. By settingcell range 1 to the cell rangeof the first residual throughthe secondtolast residualandcell range 2 to the cell rangeof the second residualthroughthe last residual, you can get SUMXMY2 to computethe squared differencebetween two successive residuals, which is the (13.15). numerator term of Equation Because residuals resultsworksheet, appearin a regression cell references usedin the SUMXMY2 functionmust refer to the regresworksheet name. sion results by In the DurbinWatson workbook,the SLR worksheet containsthe simple linear regressionanalysisfor the Section13.6package deliveryexample. The residuals in the cell rangeC25:C39. Therefore, range appear cell I is set to SLR!C25:C38, and cell range 2 is set to SLR!C26:C39. This makes the cell B3 formula :SUMXMY2(SLR!C26:C39, SLR!C25:C38). The cell 84 formula, which also must refer to the SLR worksheet, is :SUMSQ(SLR!C25:C39). To adaptthe DurbinWatsonworkbook to other problems, first createa simple linear regression resultsworksheet that contains residual outputandcopythatworksheet to the DurbinWatsonworkbook. Then open to the DurbinWatson worksheet and edit the formulasin cells 83 and 84 so that they refer to the correctcell ranges on your regression worksheet. Finally, deletethe nolongerneeded SLR worksheet.
I I DisplayUrib: it{sc DisplayUrib: i!{sc
v_, v '
Faaprbcknrak tvpe:
i1
O.rtstdc t{onr
v v
I i ttor tck marktypc: : Ar! Irb.h: Lxilabcb: i :fbri:ontalaxbgosc.s:
i i
tlrxt t6 axk v tlcxttoAxts \
i , O uu,ar.o,
(, Axb vabg: 0.0
FIGUREE13.4 Format Axisdialogbox (2007)
E13.3 PERFORMING RESIDUAL ANALYSES
You modify the procedures SectionE I 3.I to perform a of residualanalysis. you usethe PHStat2SimpleLinear If procedure, Regression click all the Regression Tool output options (Regression Statistics Table, ANOVA and CoefficientsTable,ResidualsTable,and ResidualPlot). procedure,click If you use the ToolPak Regression Residualsand ResidualPlots beforeclicking OK. If you plot, needto relocate anXaxis to the bottomofa residual reviewthe "Relocating XAxis" part of Section E13.2. an
E13.4 COMPUTING THE DURBIN. WATSON STATISTIC
Youcompute DurbinWatson the Statistic eitherusingthe by procedure by using a PHStat2Simple Linear Regression or process uses EE@EEEEworkbook. that th. severalstep
E13.5 ESTIMATING THE MEAN OF Y AND PREDICTING YVALUES
intervalestimate the mean You computea confidence for responseand the prediction interval for an individual response eitherby selecting PHStat2SimpleLinear the procedure or by making entries in the Regression g@@workbook.
UsingPHStat2SimpleLinearRegression
Use the SectionE13.1instructions "Using PHStat2 in SimpleLinear Regression," clicking DurbinWatson but Statistic before you click OK. Choosingthe Durbin
E 1 3 . 6 : x a m p l e :u n f l o w eA s p a r e l a t a 5 6 1 E S rp D
FIGURE E13.5 DataCopy worksheet (first rows) six
A 1
5qla]e Fsel
B
Anouat
Saleg
c
U
E
amole Sizs
F
2 3
4 5
1.7
tt
2.8
5E
5
a
{XXBad^2 1.4919  7t& 6.7 8Dl47 9.t 3.4 2.6H
3,7 3.S
1t
COUt{T(B:B) *TREI{D(82:815, Ai2zA15, ClEandPllBfl
rmpleMean 2.921t AVTRAGE{A:A} 3umof SquarEd Difference 37.SZA *SUT(c:Q
rredicted Y ffHal)
ac
Using PHStat2 SimpleLinearRegression
the Use Section E13.1 instructions "Using PHStat2 in Linear Simple Regression", beforeyou click OK, click but Confidence Prediction Interval for X: and enteran and Xvalue its box (seebelow).Then entera value for the in Confidence for interval estimates level andclick OK.
D*. vvriatrhcdRangc, xv$ntrhcctrRscc' f*:
[*f tr fi* ce*sin bo*r r*lges cont*r hbd lGx
Cmkrco bval fa roryessbn co#fiinr**, Reg/es*rnTod Attrlt Optimg 17 Rcges*onStatistis foile [ AITOVA Cocffi*nts rabb and T Raidr*Tabb I ncn*f.reR* OIg.tOdtdrt Tf{cr [* sc*arnagrun I* U.rbnWctron*&ktk
tr CorSdcrmardFrodctionlr*arvdfor = l*** X Crfi*rre b/d for htervd cstinatce , [**
Cells B8, B I I, B 12,and B l5 containformulasthat reference individual cells on a DataCopy worksheet. This worksheet, the first six rows of which are shown in Figure E13.5, containsa copy of the regression data in columnsA and B and a formula in column C that squaresthe difference between eachX and X .tne worksheetalso computes the sample size, the sample mean, the sum of the squared differences [SSXin Equation (13.20) on page 546], and the predictedIvalue in cells F2, F3, F4, and F5. The cell F5 formula uses the function TREND (Y variable cell ronge,X vsriable cell range, X value) to calculatethe predicted I value. Becausethe formula uses the X value that has been entered on the CIEandPt worksheet, the X value in the cell F5 formula is set to CIEandPM4. Becausethe DataCopy and CIEandPI worksheetsreferenceeach other, you should consider these worksheetsa matched pair that should not be broken up. To adapt these worksheetsto other problems, first create a simple linear regression results worksheet. Then, transfer the standard error value, always found in the r e g r e s s i o nr e s u l t s w o r k s h e e tc e l l 8 7 , t o c e l l B l 3 o f t h e CIEandPI worksheet.Change,as is necessary, XValue the and the confidence level in cells 84 and 85 of the CIEandPI worksheet.Next, open to the DataCopy worksheet,and if your samplesize is not 14, follow the instructions found in the worksheet. Enter the problem's X values in column A and l'values in column B. Finally, return to the CIEandPI worksheet to examine its updated results.
C.,"d I
places confidence PHStat2 the intervalestimate and prediction intervalon a new worksheet similarto the one in shown Figure13.21on page549.(PHStat2also creates DataCopy a worksheet that is discussed the next in part thissection.) of
E13.6 EXAMPLE: SUNFLOWERS APPARELDATA
This section showsyou how to usePHStat2 BasicExcel or to performa regression analysis Sunflowers for Apparel usingthe square footage andannualsales datastored the in l[!f[!workbook.
Using EandPlforSLR.xls Cl
CIEandPI worksheet of the workbook.This worksheet(shownin @[@[![!EE Figure 13.21 page549) usesthe functionTINV(Ion conJidence level, degreesof freedom) to determinethe lvalue computethe confidence and intervalestimate andprediction intervalfor the Section13.8Sunflower's Apparelexample. 0pen to the
UsingPHStat2
Opento the Data worksheet the[fff[!workbook. Select of ) PHStat) Regression SimpleLinear Regression. the In procedure's dialogbox (seeFigureE13.6),enterC1:Cl5 as the Y Variable Cell Range and Bl:Bl5 as the X VariableCell Range.Click First cellsin both ranges
568
E X C E L o M P A N I o No C h a p t e r t l3 C
Data YvariableCdl Rarpr
* **; lcl'cls
^ x veriaHe Range: ru;s15 tdl
17 Fir* cellsin bsth ranges cn*e*n labd Csrfidenc*bvel for rogression codficbr*s: Regressbn ortut Optitrts Tool F Regessian 5tdislics Tabb V *xwn ard ceffkients TBbk 17 Resid:dsT&le tr7 Residrd Plot Ortput O$ions
;
k*'1{
Varieble Rvrger lcz+:c:a Cdl 17 frst qefi csntalnr ldbel Outg.rtO$ions
J
T*le: W
Heb I oK lt Lgr:i::::1! il
^ 
cmcd
I
FIGURE E13.7 Completed Normal P r o b a b i l i t yP l o t d i a l o g b o x
rlrle: i5ird
f7 scetterDiegram
A";[
l* fr.rrbift.watson 5t*irtic fil {orfidgme 6ndtuedctbn intervattor x = Csrfidence tevd for brtervdestimates: [_ qt igS
You conclude that all assumptionsare valid and that you can use this simple linear regression model for the SunflowersApparel data.You can now open to the SLR worksheetto view the detailsof the analysisor open to the Estimate worksheet to make inferencesabout the mean of )'and the predictionof individual valuesof )'.
F I G U R E E 1 3 . 6 C o m p l e t e d S i m o l eL i n e a r Regression ialog box d
Using Basis Excel
Open to the Data worksheet of the ffiE workbook. Select Tools ) Data Analysis (972003) or Data ) Data Analysis (2007). Select Regression from the Data Analysis list, and click OK. In the procedure's dialog box ( s e eF i g u r e E 1 3 . 8 ) ,e n t e r C l : C l 5 a s t h e I n p u t Y R a n g e and enterBl:Bl5 as the Input X Range. Click Labels, click Confidence Level and enter 95 in its box, and click Residuals.Click OK to executethe orocedure.
contain label and enter a value for the Confidence level for regression coefficients. Click the Regression Statistics Table,ANOVA and Coefficients Table, Residuals Table, and Residual Plot Regression Tool Output Options. Enter S i t e S e l e c t i o nA n a l y s i s a s t h e T i t l e a n d c l i c k S c a t t e r Diagram. Click Confidence and Prediction Interval for X: and enter 4 in its box. Enter 95 in the Confidence level for interval estimates box. Click OK to executethe procedure. To evaluatethe assumptionof linearity, you review the R e s i d u a l P l o t f o r X l c h a r t s h e e t .N o t e t h a t t h e r e i s n o apparentpattern or relationshipbetweenthe residualsand X variable. To evaluatethe normality assumption,create a normal probability plot. With your workbook open to the SLR worksheet, select PHStat ) Probability & Prob. Distributions ) Normal Probability Plot. In the proced u r e ' sd i a l o g b o x ( s e e F i g u r e E 1 3 . 7 ) , e n t e r C 2 4 : C 3 8 a s the Variable Cell Range and click First cell contains label. Enter Normal Probability Plot as the Title and c l i c k O K . I n t h e N o r m a l P l o t c h a r ts h e e t , b s e r v e h a t t h e o t data do not appear to depart substantiallyfrom a normal distribution. To evaluatethe assurnptionof equal variances,review the Residual PIot for Xl chart sheet.Note that theredo not appear to be major differences in the variability of the residuals.
inFUl Inprl Y Ran{e: InFtrtXRang6:
rl,rtq
ru
ffi
k Z*ero
t3*:l
f c*.l I Tnb 1
81:615 f
S
B laoets tavelr E ccrtiderse
Oul:frut 0Fli'rn5
corstat
olo
f) Qr*nrt nmCe: gly: Q ttewWwtstreet C fiF/{ Wtrkbook
ReEidr.lal:
a.
E aeEou* Ra:#uals [ &andardieed
Frob,ebiiity Nermal I Sormd Prqbabiky ff*s
rkts I nesi6uU I rfrenittuts
FIGUREE'3.8 Completed Regression dialogbox
E I 3.6: E,xample: SunflowcrsApparel Data
569
To evaluate assumptionof linearity, you plot the the the squarefeet (independent)variable. To residuals against creatingthis plot, open to the Data worksheet and simplifu t f c o p yh e s q u a r e e e t c e l l r a n g e B l : B l 5 t o c e l l E 1 . T h e n copy cell rangeof the residuals,C24:C38 on the SLR the worksheet, cell Fl of the Data worksheet. With your to workbook open to the Data worksheet, use the Section 813.2 instructions pages 564566 to create a scatter on plot. (UseEl:Fl5 as the Data range (Excel 972003) or asthecell range of the X and I variables (Excel 2007) the scatter plot.) Review the scatter plot. when creating that 0bserve there is no apparentpattern or relationship the between residualsand X variable.You conclude that the holds. linearity assumption the Younow evaluate normality assumptionby creating probabilityplot. Createa Plot worksheet,using the anormal in workbook as your guide. In a modelworksheetthe $fifr new worksheet, enter Rank in cell A I and then enter the 1 seriesthrough14 in cells A2:A15. Enter Proportion in allBl andenterthe formula :A2l15 in cell 82. Next. enter ZValue cell Cl and the formula :NORMSINV(82) in in
cell C2. Copy the residuals(including their column heading) to the cell range Dl:Dl5. Selectthe formulas in cell range B2:C2 and copy them down through row 15. Open to the probability plot and observe that the data do not appear to depart substantiallyfrom a normal distribution. To evaluate assumption equal variance,returnto the of the scatter plot of the residuals and the X variable that you already developed. Observe that there do not appear to be major differencesin the variability of the residuals. You conclude that all assumptionsare valid and that y o u c a n u s e t h i s s i m p l e l i n e a r r e g r e s s i o nm o d e l f o r t h e Sunflowers Apparel data. You can now evaluatethe details in resultsworksheet.If you are interested of the regression making inferences about the mean of )'and the prediction workof individual values of )', open the (l!!@@$f[! (Usually, you would have to first make adjustments book. in to the DataCopy worksheet, discussed SectionE13.5, as but this workbook already contains the entries for the S u n f l o w e r sA p p a r e l a n a l y s i s . )O p e n t o t h e C I E a n d P I worksheetto make inferences about the mean of )'and the predictionof individual valuesof )'.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.