Inference About Regression Coefficients

BENDRIX.XLS

This is a continuation of the Bendrix manufacturing example from the previous chapter.
As before, the response variable is Overhead and the explanatory variables are MachHrs and ProRuns. The data are contained in this file. What inferences can we make about the regression coefficients?

 

14.2 | 14.3 | 14.3a | 14.1a | 14.4 | 14.4a | 14.1b | 14.5 | 14.4b

Multiple Regression Output

We obtain the output from using StatPro’s Multiple Regression procedure.

14.2 | 14.3 | 14.3a | 14.1a | 14.4 | 14.4a | 14.1b | 14.5 | 14.4b

3a | 14. 14.1a | 14.4b    .715. population coefficients. Similar statements can be made for the coefficient of ProdRuns and the intercept term. The standard error of bi indicates the accuracy of these point estimates.5 | 14.1b | 14.357 to 50.3 | 14. but unobservable.4a | 14. We are 95% confident that the coefficient is between 36.Multiple Regression Output -continued  Regression coefficients estimate the true.2 | 14.4 | 14.536. For example. the effect on Overhead of a one-unit increase in MachHrs is 43.

Multicollinearity .

4b .4 | 14. and the explanatory variables are Right and Left. respectively.2 | 14. The response variable is Height.1b | 14.4a | 14. the length of the right foot and the left foot.The Problem  We want to explain a person’s height by means of foot length.5 | 14.3a | 14.1a | 14. What can occur when we regress Height on both Right and Left?   14.3 | 14.

2 | 14. By definition multicollinearity is the presence of a fairly strong linear relationship between two or more explanatory variables.4 | 14. 14.1b | 14.3 | 14.Multicollinearity  The relationship between the explanatory variable X and the response variable Y is not always accurately reflected in the coefficient of X.5 | 14. it depends on which other X’s are included or not included in the equation.1a | 14. This is especially true when there is a linear relationship between to or more explanatory variables.4b   . in which case we have multicollinearity.3a | 14. and it can make estimation difficult.4a | 14.

2 | 14.4 | 14. But what about the coefficients of Right and Left? Here is a problem.5 | 14. so we would expect this regression equation to do a good job.either one would do but we include both to make a point. there is no need to include both Right and Left in an equation for Height .4a | 14.Solution  Admittedly.   14.3a | 14. The R2 value will probably be large.4b . It is likely that there is a large correlation between height and foot size.1a | 14.3 | 14.1b | 14.

5 | 14.3a | 14. The extra effort of Left.3 | 14.4 | 14. in addition to that provided by Right. 14. the extra information provided by Right is probably minimal. This additional effect is probably minimal.4b .4a | 14. But it goes the other way also.continued  The coefficient of Right indicates that the right foot’s effect on Height in addition to the effect of the left foot. That is. is probably minimal.1a | 14.Solution -. after the effect of Left on Height has already been taken into account.1b | 14.2 | 14.

2 times foot length (all expressed in inches). 14.XLS  To show what can happen numerically. We did this so that. the correlations between Height and either Right or Left in our data set are quite large. height is approximately 32 plus 3.3 | 14.4b   .4a | 14. except for random error. As shown in the table to the right.1a | 14.HEIGHT.5 | 14.4 | 14.2 | 14.1b | 14. and the correlation between Right and Left is very close to 1.3a | 14. we generated a hypothetical data set of heights and left and right foot lengths in this file.

3a | 14.1a | 14.4a | 14.3 | 14.Solution -.1b | 14.5 | 14.2 | 14.continued  The regression output when both Right and Left are entered in the equation for Height appears in this table.4b .4 | 14. 14.

given the correlations between Height and either Right or Left.4a | 14. 14.Solution -.continued   This output tells a somewhat confusing story.4 | 14. the multiple R is close to the correlation between Height and either Right or Left.1a | 14. In particular.5 | 14.3 | 14. Also. The multiple R and the corresponding R2 are about what we would expect. It implies that predictions of height from this regression equation will typically be off by only about 2 inches.4b  . the se value is quite good.1b | 14.3a | 14.2 | 14.

4b   .4 | 14. 14.2 times foot length.it is negative! Besides this wrong sign. the tip-off that there is a problem is that the t-value of Left is quite small and the corresponding p-value is quite large. the coefficients of Right and Left are not all what we might expect.4a | 14.2 | 14. the coefficient of Left has the wrong sign .1b | 14.3 | 14.continued  However.5 | 14. given that we generated heights as approximately 32 plus 3.Solution -. In fact.3a | 14.1a | 14.

1a | 14.5 | 14. the coefficient of Right has the “correct” sign.3 | 14. But we know from the table of correlations that both of these are false. at least at the 5% level. 14.4 | 14.2 | 14. However.continued  Judging by this.3a | 14.4a | 14.4b   .1b | 14. In contrast. slight changes in the data could change the results completely.Solution -. and its t-value and associated p-value do imply statistical significance. this happened mostly by chance. we might conclude that Height and Left are either not related or are related negatively.

4 | 14. the estimated equation will work well for predicting heights.4b   .1a | 14. the sum of the coefficients is 3.3a | 14.5 | 14. 14. Therefore.Solution -. Note that the regression equation does estimate the combined effect fairly well. It just does not have reliable estimates of the individual coefficients of Right and Left.2 | 14.3 | 14. it is impossible for the least squares method to distinguish their separate effects.4a | 14.1b | 14.2 we used to generate the data.continued  The problem is although both Right and Left are clearly related to Height.178 which is close to the coefficient of 3.

000 .34 and 0.546 + 3.005.1b | 14.5 | 14.continued  To see what happens when either Right or Left are excluded from the regression equation.195Right   The R2 and se values are 81.3 | 14.4a | 14.Solution -. we show the results of simple regression.1a | 14. it becomes Predicted Height = 31. When Right is only variable in the equation.4 | 14.very significant.4b .2 | 14.3a | 14. 14.6% and 2. and the tvalue and p-value for the coefficient of Right are now 21.

and the tvalue and p-value for the coefficient of Left are 20.Solution -.99 and 0.2 | 14.3 | 14.526 + 3.4b  .4a | 14.3a | 14.1% and 2.0000 .continued  Similarly. when the Left is the only variable in the equation.1a | 14.again very significant.033. and they are much easier to interpret than the equation with both Right and Left included.1b | 14. both of these equations tell almost identical stories.4 | 14.197Left  The R2 and se values are 81. it becomes Predicted Height = 31. Clearly.5 | 14. 14.

Include/Exclude Decisions .

We want to estimate and interpret a regression equation for Spent98 based on all of these variables.3 | 14.1b | 14.11 that HyTex is a direct marketer of stereo equipment.4 | 14.2 | 14.CATALOGS1.XLS  This file contains data on 100 customers who purchased mail-order products from the HyTex Company in 1998.4a | 14.3a | 14.5 | 14.4b    . and other electronic products. personal computers. HyTex advertises entirely by mailing catalogs to its customers. Recall from Example 3. and all of its orders are taken over the telephone. 14.1a | 14.

4 | 14.3 | 14.5 | 14.4b  .1a | 14.3a | 14. and it wants to be sure that this is paying off in sales. – Gender: coded as 1 for males. 0 otherwise – Married: coded as 1 if customer is currently married. 0 for females – OwnHome: coded as 1 if customer owns a home. 0 otherwise 14.2 | 14.4a | 14.The Data  The company spends a great deal of money on its catalog mailings.1b | 14. For each customer there are data on the following variables: – Age in years.

3 | 14.3a | 14. 2 otherwise – Salary: combined annual salary of customer and spouse (if any) – Children: number of children living with customer – Customer97: coded as a 1 if customer purchased from HyTex during 1997. 0 otherwise – Spent97: total amount of purchase made from HyTex during 1997 – Catalogs: Number of catalogs sent to the customer in 1998 – Spent98: total amount of purchase made from HyTex during 1998 14.4 | 14.4a | 14.1b | 14.5 | 14.1a | 14.4b .The Data -.2 | 14.continued – Close: coded as 1 if customers lives reasonably close to a shopping area that sells similar merchandise.

4b .1a | 14.5 | 14.  14.2 | 14. 1000 observations.4a | 14.The Data -.4 | 14. we can certainly afford to set aside part of the data set for validation.3 | 14. Although any split could be used. let’s base the regression on the first 250 observations and use the other 750 for validation.1b | 14.3a | 14.continued  With this much data.

1a | 14.1b | 14. based on their t-values and p-values.5 | 14. To do this we follow the Guidelines for Including / Excluding Variables in a Regression Equation.4a | 14.4 | 14.3a | 14.2 | 14.The Regression  We begin by entering all of the potential explanatory variables.   14.3 | 14. The regression output with all explanatory variables included is provided on the following slide.4b . Our goal then is exclude variables that aren’t necessary.

14.4a | 14.1b | 14.2 | 14.4b .4 | 14.1a | 14.3a | 14.3 | 14.5 | 14.

we see that there are three variables. From the p-value column. These are the obvious candidates for exclusion.1b | 14.2 | 14.5 | 14.1% and se is about $424.1a | 14. and Married.4a | 14.3a | 14. It is often best to exclude one variable at a time starting with the variable with the highest p-value. 14. The regression output with all insignificant variables excluded is seen in the output on the next slide. Own_Home.05.3 | 14.Analysis  This output indicates a fairly good fit. that have p-values well above 0. Age.4 | 14.4b    . The R2 value is 79.

4 | 14.5 | 14.3a | 14.1a | 14.4b .4a | 14.2 | 14.1b | 14.14.3 | 14.

5 cents of every salary dollar was spent on HyTex merchandise.4a | 14.1a | 14.3a | 14.5 | 14. an average customer living close to stores with this type of merchandise spent about $288 less than those customers living far form stores.4 | 14. Similarly. about 1.1b | 14.3 | 14.4b . on average.2 | 14.Interpretation of Final Regression Equation  The coefficient of Gender implies that an average male customer spent about $130 less than the average female customer. The coefficient of Salary implies that.  14.

Interpretation of Final Regression Equation -. 14.1b | 14.3a | 14.continued  The coefficient of Children implies that $158 less was spent for every extra child living at home.4b .47Spent97  – The coefficient 0. – First. both of these terms are 0 for customers who didn’t purchase from HyTEx in 1997.4 | 14. The Customer97 and Spent97 terms are somewhat more difficult to interpret.3 | 14.2 | 14.1a | 14. – For those that did the terms become -724 + 0.47 implies that each extra dollar spent in 1997 can be expected to contribute an extra 47 cents in 1998.4a | 14.5 | 14.

1b | 14.4 | 14.  The coefficient of Catalog implies that each extra catalog can be expected to generate about $43 in extra spending. So if we substitute this for Spent 97 we obtain -301.4b .2 | 14. – Therefore.5 | 14.1a | 14. 14. this “median” spender from 1997 can be expected to spend about $301 less in 1998 than the 1997 nonspender.Interpretation of Final Regression Equation -.3 | 14.continued – The median spender in 1997 spent about $900.3a | 14.4a | 14.

3 | 14. They show little deterioration from the values based on the original 250 customers.2 | 14.Cautionary Notes  When we validate this final regression equation with the 750 customers. we find R2 and se values of 75. These aren’t bad.4a | 14. 14.3a | 14.1a | 14.7% and $485.4b   . We haven’t tried nonlinear or interaction variables. using the procedure from Section 11.4 | 14. We haven’t tried all possibilities yet. nor have we looked at different coding schemes.5 | 14.7.1b | 14. we haven’t checked for nonconstant error variance or looked at potential effects of outliers.

The Partial F Test .

We will continue this analysis here.4 | 14.5 | 14.2 | 14.1a | 14.XLS  Recall from Example 11.3a | 14. The data for these employees are stored in this file.4a | 14.3 that the Fifth National Bank has 208 employees.4b .BANK.1b | 14.    14. In the previous chapter we ran several regressions for Salary to see whether there is convincing evidence of salary discrimination against females.3 | 14.

4b  .2 | 14. we will regress Salary versus the Female dummy. This will be the reduced equation. and the interactions between Female and YrsExper. Then we’ll see whether the JobGrade dummies Job_2 to Job_6 add anything significant to the reduced equation. add anything significant to what we already have. we will then see whether the interactions between the Female dummy and the JobGrade dummies.3a | 14. YrsExper. labeled Fem_YrsExper.3 | 14.4 | 14.5 | 14. labeled Fem_Job2 to Fem_Job6.1a | 14. If so.1b | 14. 14.Analysis Overview  First.4a | 14.

3 | 14.4a | 14.2 | 14.Analysis Overview -.4 | 14.3a | 14. we’ll finally see whether the education dummies Ed_2 to Ed_5 add anything significant to what we already have.5 | 14.1b | 14.continued  If so.4b . 14.1a | 14.

note that we created all of the dummies and interaction variables with StatPro’s Data Utilities procedures.Solution  First. for gender. The reference categories we have used are “male”. note that we have used three sets of dummies.1b | 14.3a | 14.3 | 14.2 | 14.4b   . 14. job grad and education level.4a | 14. job grade 1 and education level 1. it is the reference category. When we use these in a regression equation. Also.4 | 14.5 | 14.1a | 14. the dummy for one category of each should always be excluded.

1b | 14.3 | 14. and Fem_YrsExper as explanatory variables is shown here. 14.continued  The output for the “smallest” equation using Female.Solution -.3a | 14.1a | 14.4a | 14.4b .5 | 14. YrsExper.2 | 14.4 | 14.

continued  We’re off to a good start.Solution -.   14. This equation appears much better. We check whether it is significantly better with the partial test in rows 26-30.4b .1%. These three variables already explain 63.2 | 14.3a | 14. For example R2 has increased to 81. The output for the next equation which adds the explanatory variables Job_2 to Job_6 is on the next slide.5 | 14.9% of the variation of Salary.3 | 14.1a | 14.1b | 14.4 | 14.4a | 14.

1a | 14.1b | 14.14.4b .3a | 14.5 | 14.4a | 14.2 | 14.3 | 14.4 | 14.

Then we calculate the F-ratio in cell C29 with the formula =((Reduced!D12-D12)/C27)/E12 were Reduced!D12 refers to SSE for the reduced equation from the Reduced sheet.5 | 14. It is practically 0.1a | 14.4a | 14.2 | 14. 14.3 | 14.continued  The degrees of freedom in cell C28 is the same as the value in cell C12.4 | 14.1b | 14.4b   . so there is no doubt that the job grade dummies add significantly to the explanatory power of the equation. the degrees of freedom for SSE.C27.Solution -. Finally.C28). we calculate the corresponding p-value in cell C30 with the formula =FDIST(C29.3a | 14.

continued  Do the interactions between the Female dummy and the job dummies add anything more?  We again use the partial F test. The formula in C34 is =((Complete!D12D12)/C32)/E12.2 | 14.1a | 14.3 | 14. and the equation that includes the new interaction terms becomes the new equation.4 | 14.3a | 14.4b   .Solution -. We perform the partial F test in rows 31-35 as exactly as before. The output for this new complete equation is shown on the next slide.1b | 14. but now the previous complete equation becomes the new reduced equation.4a | 14. 14.5 | 14.

1a | 14.5 | 14.1b | 14.4 | 14.14.3 | 14.3a | 14.4a | 14.2 | 14.4b .

14.1b | 14. Finally.   The resulting output is shown on the next slide.4a | 14. This output now corresponds to the complete equation. we add the education dummies. so there is no doubt that the interaction terms add significantly to what we already had. and the previous output corresponds to the reduced equation.4b  . We see how the terms reduced and complete are relative.2 | 14.3a | 14.4 | 14.Solution -.5 | 14.continued  Again the p-value is extremely small.1a | 14.3 | 14.

3 | 14.14.2 | 14.1b | 14.3a | 14.4 | 14.4b .1a | 14.4a | 14.5 | 14.

14.4b   .2 | 14.4a | 14. Based on this evidence.3a | 14. Also the p-value is not extremely small.0% to 84. it is not quite enough to qualify for statistical significance at the 5% level.7%. there is not much to gain from including the education dummies in the equation.3 | 14.5 | 14. According to the partial F test.1b | 14. The R2 value increased from 84. so we would probably elect to exclude them.1a | 14.4 | 14.continued  The formula in cell C38 for the F-ratio is now =((MoreComplete!D12-D12/C36)/E12.Solution -.

4b .3a | 14.Concluding Comments  First. Some of these values can have low t-values.3 | 14.1b | 14.4a | 14. it does not imply that each variable in this block is significant.2 | 14. Many users look only at the R2 and/or se values to check whether extra variables are doing a “good job”. if the partial F test shows that a block of variables is significant.  14. Second.4 | 14.5 | 14.1a | 14. the partial test is the formal test of significance for an extra set of variables.

3 | 14. we included a “Block” option in StatPro to make life easier.Concluding Comments -continued  Third.4 | 14. producing all of these outputs and doing the partial F tests is a lot of work.1a | 14.3a | 14.1b | 14. To run the analysis in this example use StatPro/Regression analysis/Block menu item.4a | 14. we see this dialog box. After selecting Salary as the response variable.5 | 14.2 | 14. 14. Therefore.4b .

The output spans over two figures.3a | 14.1a | 14.4 | 14.2 | 14.3 | 14. Note that the output for Block 4 has been left off because it did not pass the F test at 5%.5 | 14. In later dialog boxes we specify the explanatory variables. Once we have specified all this.Concluding Comments -continued  We want four blocks of explanatory variables.1b | 14. and we want a given block to enter only if it passes the partial F test at the 5% level. 14. The output from this appears on the next two slides. the regression calculations are done in stages.4a | 14.4b .

3 | 14.1a | 14.2 | 14.5 | 14.4 | 14.1b | 14.4b .14.3a | 14.4a | 14.

1a | 14.5 | 14.3 | 14.4a | 14.14.1b | 14.3a | 14.4b .4 | 14.2 | 14.

of the bigger picture.4b  .4 | 14. we have concentrated on the partial F test and statistical significance in this example. Our point is simply that you shouldn’t get so caught in the details of statistical significance that you lose sight of the original purpose of the analysis! 14. however. Once we have decided on a “final” regression equation we need to analyze its implications for the problem at hand.1b | 14.1a | 14.3a | 14. We don’t want you to lose sight. In this case the bank is interested in possible salary discrimination against females.3 | 14. so we should interpret this final equation in these terms.5 | 14.Concluding Comments -continued  Finally.2 | 14.4a | 14.

Outliers .

Questions  Of the 208 employees at Fifth National Bank. are there any obvious outliers? In what sense are they outliers? Does it matter to the regression results.3a | 14. particularly those concerning gender discrimination.1b | 14.5 | 14.3 | 14.4a | 14. whether the outliers are removed?   14.1a | 14.4 | 14.2 | 14.4b .

XLS    There are several places we could look for outliers. An obvious place is the Salary variable.2 | 14.3a | 14.3 | 14.4b . The boxplot shown here shows that there are several employees making substantially more in salary than most of the employees. 14.1a | 14.4 | 14.5 | 14.4a | 14.1b | 14.BANK.

Another place to look is at the scatterplot of the residuals versus the fitted values.4a | 14.Solution  We could consider these outliers and remove them. We leave it to you to check whether the regression results are any different with these high salary employees than without them. arguing perhaps that these are senior managers who shouldn’t be included in the discrimination analysis. 14.2 | 14.4b   .3 | 14.1a | 14.1b | 14. This type of plot shows points with abnormally large residuals.3a | 14.5 | 14.4 | 14.

2 | 14.continued  For example. Fem_YrsExper. 14.1b | 14.3a | 14.1a | 14. and we obtained the output and scatterplot shown here.Solution -. and the five job grade dummies.4 | 14.3 | 14.5 | 14.4a | 14.4b . YrsExper. we ran the regression with Female.

3a | 14.1b | 14.2 | 14.4b .4a | 14.4 | 14.14.1a | 14.3 | 14.5 | 14.

4b . The residual for this point is approximately -21.4 | 14.4a | 14.5 | 14. 14.3 | 14.3a | 14.continued  This scatterplot has several points that could be considered outliers. this residual is over four standard errors below 0 quite a lot.1a | 14.    This person is found to be unusual and special circumstances can explain for this. but we focus on the point identified in the figure.1b | 14.2 | 14. Given the se for this regression is approximately 5.Solution -.

we obtain the output shown here. 14.4a | 14.4 | 14.2 | 14.3a | 14.continued  If we delete this employee and rerun the regression with the same variables.1a | 14.5 | 14.4b .3 | 14.1b | 14.Solution -.

063 to 4. now it’s only about $4350.2 | 14. recalling that gender discrimination is the key issue in this example we compare the coefficients of Female and Fem_YrsExper in the two outputs.1a | 14.1b | 14. In words. the Y-intercept for the female regression line used to be about $6000 higher than for the male line.353.4a | 14. The coefficient of Female has dropped from 6.continued  Now. 14.021 to -0.4b   .Solution -. the coefficient of Fem_YrsExper has changed from -1.5 | 14. More importantly.3 | 14.721.4 | 14. This coefficient indicates how much less steep the female line for Salary versus Yrs_Exper is than the male line.3a | 14.

021 to -0.4 | 14.5 | 14. this unusual female employee accounts for a good bit of the discrimination argument although a strong argument still exists even without her.4a | 14.3a | 14. In other words.1a | 14.1b | 14.2 | 14.721 indicates less discrimination against females now than before.4b .3 | 14. 14.Solution -.continued  So a change from -1.

Prediction .

2 | 14.000. – Employee 156 makes $45. and has 6 years of experience at the bank.Questions  Consider the following three male employees at Fifth National: – Employee 5 makes $29.000.5 | 14.4b .4a | 14.3a | 14.  Using regression equations for Salary that includes the explanatory variables Female.000.1b | 14. and has 12 years of experience at the bank. is in job grade 4.3 | 14. and the job grade dummies Job_2 to Job_6. YrsExper. – Employee 198 makes $60.4 | 14. FemYrs_Exper.1a | 14. is in job grade 1. and has 3 years of experience at the bank. is in job grade 6. check that the predicted salaries for these three employees are close to their actual salaries. 14.

Questions -.4b .1a | 14.continued  Then predict the salaries these employees would obtain if they were females.3a | 14. whom we diagnosed as an outlier in Section 14. employee 208.1b | 14. exclude the last employee.4a | 14. How large are the discrepancies? When estimating the equation.5 | 14.9.3 | 14.2 | 14.4 | 14.   14.

but the standard error of estimate and estimated coefficients have been copied to cell B216 and the range B218:J218. The usual regression output is not shown. Note how employee 208 has been separated from the rest of the data.1b | 14.3 | 14. so that she is not included in the regression analysis.2 | 14.  The top part includes the variables we need for this analysis.3a | 14.4 | 14.1a | 14.4b   .5 | 14. 14.4a | 14.Solution  The analysis appears on the next slide.

4a | 14.3a | 14.1b | 14.1a | 14.5 | 14.4b .4 | 14.2 | 14.14.3 | 14.

and 198 have been copied to the range B222:B224.2 | 14.3 | 14.4b .3a | 14.C222:J222)    Clearly the predictions are quite good for these three employees.4a | 14. 14.Solution -.5 | 14. The worst prediction is off by less than $2000.continued  The values for male employees 5.4 | 14. for example. 156. The formula in cell A222. is =$B$218+SUMPRODUCT($C$218:$J$218.1a | 14. We can then substitute their values into the regression equation to obtain their predicted salaries in column A.1b | 14.

  One way to compare females to males is to enter the formula =(A227-B222)/$B$216 in cell B227 and copy it down.5 | 14. the value of Female becomes 1 and the value of Fem_YrsExper becomes the same as the YrsExper.4a | 14. 14.continued  To see what would happen if these employees were females.3 | 14.Solution -. we need to adjust the values of the explanatory variables Female and Fem_YrsExper.4b .1a | 14.2 | 14. Copying the formula in A222 down to these rows gives the predicted salary for the females. For each employee in rows B227-229.3a | 14.1b | 14.4 | 14.

male employee 198 is earning just about the regression equation predicts he should earn. But the opposite occurs for employees with many years of experience. we would predict a salary about $4500 below the male. almost a full standard error lower.3 | 14.3a | 14. 14.4b   .2 | 14.1b | 14. But if he were female. females with only a few years experience actually tend to make more than males.Solution -. As we discussed earlier with this data set.4 | 14.1a | 14.5 | 14.continued  This is the number of standard errors the predicated female salary is above (if positive) or below (if negative) the actual male salary. For example.4a | 14.

Prediction .

84.1b | 14.4 | 14.4a | 14. find the mean Sales for all regions with each of these values of Promote. 122.2 | 14.4b   . along with 95% confidence interval for these means.3 | 14. 14. Pharmex.The Problem  Besides the 50 regions in the data set. and 101. 98. Also. which have promotional expenses indexes of 114.5 | 14. Find the predicted Sales and a 95% prediction interval for each of these regions.3a | 14. does business in five other regions.1a | 14.

Their given values of Promote are in the range G9:G13. The new regions appear in rows 9-13.XLS  This example cannot be solved with StatPro but it is relatively easy with Excel’s built-in functions.1a | 14.4b    .1b | 14. which we name PromoteNew.3a | 14. The original data appear in Column B and C.4 | 14.5 | 14. 14. We use the range names SalesOld and PromoteOld for the data in these columns.3 | 14.PHARMEX.2 | 14. We illustrate the procedure in this file shown here on the next slide.4a | 14.

2 | 14.4 | 14.1b | 14.1a | 14.14.3 | 14.5 | 14.4b .3a | 14.4a | 14.

Solution

To obtain the predicted sales for these regions, we use Excel’s TREND function by highlighting the range H9:H13, typing the formula =TREND(SalesOld,PromoteOld,PromoteNew) and pressing Ctrl-Shift-Enter.
This substitutes the new values of the explanatory variable (in the third argument) into the regression equation based on the data from the first two arguments.

14.2 | 14.3 | 14.3a | 14.1a | 14.4 | 14.4a | 14.1b | 14.5 | 14.4b

Solution -- continued

We can then use these same predictions in rows 1923 for the mean sales values.
For example, we predict the same Sales value of 112.03 for a single region with Promote equal to 114 or the mean of all regions with this value of Promote. According to the approximate standard error of predication for any individual value is se, calculated in cell H6 with the formula =STEYX(SalesOld,PromoteOld)
14.2 | 14.3 | 14.3a | 14.1a | 14.4 | 14.4a | 14.1b | 14.5 | 14.4b

Solution -- continued

The more exact standard error of prediction depends on the value of Promote. To calculate it we enter the formula
=$H$6*SQRT(1+1/50+(G9-AVERAGE(PromoteOld))^2 /(48*STDEV(PromoteOld))^2)

in cell I9 and copy it down through cell I13.

We then calculate the lower and upper limits of 95% prediction intervals in columns J and K. These use the t-multiple in cell I3, obtained with the formula =TINV(0.05,73).
14.2 | 14.3 | 14.3a | 14.1a | 14.4 | 14.4a | 14.1b | 14.5 | 14.4b

The calculations for the mean predictions in rows 1923 are almost identical.4 | 14.continued  The formulas in cells J9 and K9 are then =H9-$I$3*I9 and =H9+$I$3*I9 which we copy down to row 13.5 | 14.Solution -.4a | 14.3 | 14.2 | 14. 14. The only difference is that the approximate standard error is se divided by the square root of 75 calculated in H16.1b | 14.1a | 14. The more exact standard of error in column I are then calculated by entering the formula =$H$6*SQRT(1/50+(G9-AVERAGE(PromoteOld))^2 /(48*STDEV(PromoteOld))^2)   in cell I19 and copying it down.3a | 14.4b .

Conclusions  We have gone through these rather tedious calculations to make several points.1b | 14. the approximate standard errors se and s e usually quite accurate. This is fortunate because the exact standard errors are difficult to calculate and are not always given in statistical software packages. but as calculations in this example indicate.4a | 14. a simple rule of thumb for calculating individual 95% prediction intervals is to go out an amount 2se.5 | 14.1a | 14.3 | 14.4 | 14.3a | 14. Again this is not exactly correct. on either side of the predicted value. – Second. n are – First.4b . it works quite well. 14.2 | 14.

2 | 14.4 | 14. we see from the wide prediction intervals how much uncertainty remains. The whole problem is that Promote is not highly correlated with Sales. 14.continued  Finally. se.5 | 14. The only way to decrease se and get more accurate predictions is to find other explanatory variables that are more closely related to Sales.Conclusions -.4b    .3 | 14.3a | 14. The reason is the relatively large standard error of estimate.1b | 14.4a | 14. Contrary to what you may believe this is not a sample size problem.1a | 14.

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.