This action might not be possible to undo. Are you sure you want to continue?
Mix and Match
For each task or property of a regression model, indicate where or how to look to find the answer. 1. Test H0: β1 = β2 = 0 2. Test H0: β2 = 0 3. Test H0: α = 0 4. Effect of collinearity on se(b1)
! 5. Correlations among variables
a. t-statistic for b1 b. 1-R2
2 c. sx 2
d. VIF(X1) e. t-statistic for b2 f. F-statistic g. Scatterplot matrix
2 h. sx /VIF(X2) 2
6. Scatterplots among variables 7. Percentage of variation in residuals 8. Test whether adding X1 improves fit of model 9. All variation in X2 10. Unique variation in X2
i. t-statistic for b0 j. Correlation matrix
11. Excess percentage changes in the value of an investment subtract the cost of borrowing from the percentage changes in the value of the investment. 12. The Capital Asset Pricing Model indicates that the estimated intercept in a regression of excess percentage changes in a stock on those in the market is zero. 13. If a multiple regression has a large F-statistic but small t-statistics for each predictor (i.e., the t-statistics for the slopes are near zero), then collinearity is present in the model. 14. The F-statistic is statistically significant only if either t-statistic for a slope in multiple regression is statistically significant. 15. If the R2 of a multiple regression with 2 predictors is larger than 80%, then the regression explains a statistically significant fraction of the variance in y. 16. If the t-statistic for X2 is larger than 2 in absolute size, then adding X2 to the simple regression containing X1 produces a significant improvement in the fit of the model. 17. We can detect outliers by reviewing the summary of the associations in the scatterplot matrix.
Because the economy in the US continues to grow. are these two months leveraged? c) Would you want to use these months in the regression or exclude these from the multiple regression? 24. The best remedy for a regression model that has collinear predictors is to remove one of those that are correlated. It is not appropriate to ignore the presence of collinearity because it violates one of the assumptions of the MRM. Market Excess %Chg 5 0 -5 -10 -15 -15 -10 -5 0 5 10 S&P 500 Excess %Chg a) Data for two months (February 2000 on the left and March 2000 at the right edge of the plot) deviate from the pattern evident in other months. Collinearity is sometimes described as a problem with the data. Here’s a scatterplot of the two. In order to calculate the VIF for an explanatory variable. then we can be sure that collinearity has not inflated the standard error of the estimated partial slope for X2. not the model. 19. the data are concentrated along a diagonal. running from 1996 through 2005. The data span the same 120 months considered in the text. Regression models that describe macro-economic properties in the US often have to deal with large amounts of collinearity. Yes. 22. credit debt passed disposable income in 2001. indeed. A correlation matrix summarizes the same information in the data as given in a scatterplot matrix. These data are quarterly. Rather than having data that fill the scatterplot of x1 on x2. 21. both of these variables grow as well. What makes these months unusual? b) If you were to use both returns on the market and those on the S&P 500 as explanatory variables in the same regression. For example. suppose we want to use as explanatory variables the disposable income and the amount of household credit debt. we need to use the values of the response. this plot shows monthly excess percentage changes in the whole stock market and the S&P 500. If VIF(X2) = 1.) E25-2 . from 1959 through the first quarter of 2006.4/20/2007 25 Exercises 18. 20. (Disposable income is in red and credit debt is green. For example. Think About It 23.
Here are two pictures of a multiple regression of the prices of used BMW cars on their age and mileage (see the 4M exercise in Chapter 24). will you be able to separate the two? d) Suggest some alternative approaches to using the information in both series in a multiple regression that avoid some of the issues caused by collinearity. What position approximates the scatterplot of e plotted on y 26.99 DJ Excess %Chg The data like in a cigar-shaped cylinder. The data are points “floating” within a 3-d cube.) These views show the multiple regression for percentage changes in IBM stock considered in this chapter. The fitted equation is ! Est IBM Excess %Chg = 0. a) The intersection of two planes produces a line. then you can think of multiple regression with two predictors as fitting a 2-d plane to summarize the relationship between y and the pair x1 and x2. and the view on the right shows the fit from the Mileage side. decrease or not affect the correlation between these series? c) If the both variables are used as explanatory variables in a multiple regression. 25. You can appreciate some of the effects of collinearity by taking a visual 3-d look at a regression model.62 + 0. Estimated Price = 40. b) If the variables are expressed on a log scale. The residuals are the vertical deviations above the plane surface. Simple regression fits a line to summarize the relationship between y and x.4/20/2007 12000 25 Exercises Billions of Dollars 10000 8000 6000 4000 2000 0 19590101 19650101 20010101 19710101 19830101 19890101 19950101 19770101 20070101 Cal Date a) This plot shows timeplots of the two series.0. (See Exercise 23.1850 Age . If you like geometry. E25-3 . Do you think that they are correlated? Estimate the correlation.46 Market Excess %Chg + 0.12 Mileage The left view shows the regression surface from the Age side of the cube. will the transformation to logs increase.300 . What is the slope of the line where the regression plane intersects the Age × Price side of the cube (another plane)? b) You can almost see a plot of the residuals on the fitted values by rotating this cube ˆ? into the right position.
What happens if we work with the excess returns themselves rather than the percentage changes? In particular.1 0 -0. what happens to the t-statistic for the test of H0: α = 0. only we scrambled the order of the variables in the two views. and the risk-free asset over the 10 years 1996-2005.00 -0.1 0. Hence. Y. is the orientation of the plane welldetermined. the whole stock market. where would you put a data point. 28. Having seen this comparison.rft) = α + 100 β (Mt .4/20/2007 25 Exercises a) Because the data lie in a cigar-shaped cylinder.rft) + ε where Mt are the returns on the market. and rft are the returns on risk-free investments.20 0.3 0.rft) are the excess percentage changes on the market and 100 β (St . The version of the CAPM studied in this chapter specifies a simple regression model as 100 (St . are the slopes well-determined.10 -0.rft) are the excess percentage changes on the stock. or do they have large standard errors? c) If you wanted to “nail down” the slope to a specific position. If the labels X.10 0. St are the returns on the stock. explain why it does not make difference to subtract the risk-free rate from the variables in the CAPM regression? 0.0 -0. 100 (Mt .30 0.2 25 50 75 125 Count Count Count 29. The following histograms summarize monthly returns on IBM. and T are E25-4 .20 5 10 15 0.3 0.2 0.1 -0. Z.2 0.2 5 10 15 20 25 0. The following correlation matrix and scatterplot matrix show the same data.1 -0. or can we rotate the surface while preserving the fit to the data? b) If we can move the surface while keeping the same fit to the data. How would adding this point affect the correlation between the two explanatory variables? 27.
a) Which variable is the sequence 1.4/20/2007 25 Exercises as given in the scatterplot matrix.75? b) Which variable has mean 102? c) Which pair of variables is most highly positively correlated? d) Which variables is nearly uncorrelated with Y? e) Identify any outliers in these data. 1.5913 -0.0000 -0. then say so. Identify the variable by matching the description to the data shown in the scatterplot matrix.3.8610 1. label the rows and columns in the correlation matrix.3180 1.0204 0.4038 1.6384 -0.0000 -0.4038 0.8610 -0.0204 0.0000 0. The plot shows 75 observations.6384 8 4 0 -4 70 50 30 10 10 6 2 -2 6 5 4 3 2 1 0 -4 0 2 4 6 810 10 30 50 70 -2 0 2 4 6 8 10 -0.3180 0.…. 2 1 0 -1 -2 -3 70 50 30 10 420 415 410 405 400 395 105 104 103 102 101 100 99 -3 -2 -1 0 1 2 10 30 50 70 395 405 415 99 X Y Z T 101 103 105 E25-5 .2.5913 0. If you don’t find any.0000 X Y Z T 0 1 2 3 4 5 6 30.
000. for incomes of $60. the builder gathered prices of homes that use the space differently. 0. To save on costs. In addition to price. it is known that temperature during rolling and the amount of expensive additives (expensive metals like manganese and nickel give steel desired properties) affect the number of pits per 20-foot section. To find out whether employees are interested in joining a union. 1.0% additive 110º. In making cold-rolled steel (as used in making bodies of cars). That would have given them 70 observations. 2. A builder is interested in which types of homes earn a higher price. $70. Because the homes are of roughly equal size (equal numbers of square feet). 35. say.4/20/2007 25 Exercises 31. 25. the firm also recorded the number of years of experience and the salary of the employee. A pit is a small flaw in the surface.” (on a 1-7 Likert scale). a) In building a multiple regression of the agreement variable on years of experience and salary. the less E25-6 . 90º. Both of these are typically positively correlated with agreement with the statement. For example. will this approach yield useful data? b) Would you stick to this plan. a) If Temperature and Additive are to be used as predictors together in a multiple regression. In the 4M exercise of this chapter. For a given number of square feet. or would you expect it to be noticeably larger or smaller? 33. 1.5% additive Multiple sections of steel for each combination would be produced with the number of pits computed. or can you offer an alternative that you think is better? What would that approach be? 34. 2. In addition to rating their agreement with the statement “I do not think we need a union at this company. The marketing research group could have removed this collinearity by collecting data with these two factors made uncorrelated. a) Explain why Income and Age would be uncorrelated for these data. a manager suggested the following plan for testing the results at various temperatures and amounts of additives.5% additive 105º. Modern steel mills are very automated and need to monitor their substantial energy costs carefully to be competitive. and 65 years old. 55. a manufacturing company hired an employee relations firm to survey attitudes toward unionization. 45.0% additive 100º. would you expect to find collinearity? Why? b) Would you expect to find the partial slope for salary to be about the same as the marginal slope. the more space devoted to private use. the homes vary in the number of rooms devoted to personal use (such as bathrooms or private bedrooms) and rooms devoted to social use (enclosed decks or game rooms). This produces collinearity and makes the analysis tricky to interpret.000. the data show correlation between the income and age of the customer. they could have found two customers of.5% additive 95º. b) Would the marginal slope be the same as the partial slope when analyzing these data? c) Would the marginal slope for Age when estimated to these data have positive or negative sign? 32.000 through $120.
more than two predictors 35. For the second. If you’re concerned that it’s not appropriate. The explanatory variable Volume gives the number of gallons of gasoline sold and Washes gives the number of car washes sold at the station. Gold chains (introduced in Chapter 24) These data give the prices (in dollars) for gold link chains at the web site of a discount jeweler. a) The explanatory variable Volume includes Width. The data include the length of the chain (in inches) and its width (in millimeters). The response Sales gives the dollar sales of the convenience store. The variable Private denotes the number of square feet used for private space and Social the number of square feet for social rooms. prediction intervals: using regression for prediction. Otherwise.95. Use the price as the response. This particular station sells gas. Are these explanatory variables perfectly correlated? Can we use them both in the same multiple regression? b) Fit the multiple regression of Price on Width and Volume. E25-7 . if your software supports it. Convenience shopping (introduced in Chapter 19) These data describe sales over time at a franchise outlet of a major US oil company. b) Rather than use both Private and Social as two variables in a multiple regression for Price. suggest an alternative that might in the end be simpler to interpret as well? You Do It We investigated the use of the MRM for inference in these examples in Chapter 24. you might want to look back at the individual scatterplots of these data that were used in the exercises of Chapter 24. see the analyses of these data in Chapter 24. use the width of the chain. For one explanatory variable. Volume = π Metric Length × (Width/2)2.4/20/2007 25 Exercises devoted to social use.4 mm = 1 inch). first convert the length to millimeters (25. Do both explanatory variables improve the fit of the model? c) Find the variance inflation factor and interpret the value that you obtain. calculate the “volume” of the chain as π times its length times the square of half the width. d) What’s the interpretation of the coefficient of Volume? e) The marginal correlation between Width and Price is 0. assume unless indicated otherwise that you can use the MRM for inference. For this exercise.) Each row summarizes sales for one day. How can this be? 36. For each data set. To make the units of volume mm3. All of the chains are 14 karat gold in a similar link style. For answering the questions shown here. and it also has a convenience store and a car wash. a) Would you expect to find collinearity in a multiple regression of Price on Private and Social? Explain. but its slope in the multiple regression is negative. use only the 283 cases for site 1. prepare a scatterplot matrix as a first step in your analysis. (The data file has values for two stations.
The time is given as the number of hours past 8 am on the day of the test. The tests measured how rapidly data moved through its network given the current demand on the network. Download (introduced in Chapter 19) Before plunging into videoconferencing. d) One of the explanatory variables is just barely statistically significant. The customer sends a design via computer. E25-8 . explain statistically significant variation in transfer time? b) Does either explanatory variable improve the fit of the model that uses the other? Use a test statistic for each. Production costs (introduced in Chapter 19) A manufacturer produces custom metal blanks that are used by its customers for computer-aided machining. c) Find the variance inflation factors for both explanatory variables. The data for the analysis were sampled from the accounting records of 195 orders that were filled during the previous 3 months. Does the model. and the manufacturer replies with an estimated price per unit. Eighty files ranging in size from 20 to 100 megabytes (MB) were transmitted over the network at various times of day. Do both explanatory variables improve the fit of the model that uses the other? b) The estimated slope for labor hours per unit is much larger than the slope for material cost per unit. Does this difference mean that labor costs form a larger proportion of production costs than material costs? c) Find the variance inflation factors for both explanatory variables.4/20/2007 25 Exercises a) Fit the multiple regression of Sales on Volume and Washes. Both explanatory variables are per unit produced. Do both explanatory variables improve the fit of the model? b) Which explanatory variable is more important to the success of sales at the convenience store: gasoline sales or car washes? Do the slopes of these variables in the multiple regression provide the full answer? c) Find the variance inflation factor and interpret the value that you obtain. Interpret the value that you obtain. a company tested of its current internal computer network. d) Can collinearity explain the paradoxical results found in “a” and “b”? e) Would it have been possible to obtain data in this situation in a manner that would have avoided the effects of collinearity? 38. and the time to send the files recorded. This cost estimate determines a price for the customer. a) Fit the multiple regression of Transfer Time on File Size and Hours past 8. a) Fit the multiple regression of Average Cost on Material Cost and Labor Hours. would a complete lack of collinearity have made this explanatory variable noticeably more statistically significant? 37. taken collectively. Interpret the value that you obtain. Assuming the same estimated value.
Fit the multiple regression of Cost per SqFt on 1/SqFt and Parking/Sqft. scatterplot the residuals from the regression of Price on Square Feet on the residuals from the regression of Bathrooms on Square Feet. c) We can see the effects of collinearity by constructing a plot that shows the slope of the multiple regression. should there be a correlation between the number of parking spots and the fixed cost of a lease? b) Interpret the coefficient of Parking/SqFt. What’s changed? 40. All of these leases provide office space in a Midwestern city in the US. The number of square feet is also expressed in thousands. would it have been statistically significant? Use the VIF to see. To do this. Fit the simple regression for this scatterplot. a realtor gathered a sample of 150 purchase transactions in her area during a recent three-month period. Home prices (introduced in Chapter 24) In order to assist clients determine the price at which their house is likely to sell. a) Thinking marginally for a moment. Had the two explanatory variables been uncorrelated (and produced these estimates). c) One of the two explanatory variables is just barely explains statistically significant variation in the price. (Recall that the slope of 1/SqFt captures fixed costs of the lease. The number of square feet is as labeled and Parking counts the number of parking spots in an adjacent garage that the realtor will build into the cost of the lease. per year. and compare the slope in this fit to the partial slope For Bathrooms in the multiple regression.) a) Thinking marginally for a moment. Second. E25-9 . would it have been more impressively statistically significant? Use the VIF to see. Are they different? d) Compare the scatterplot of Price on Bathrooms to the partial regression plot constructed in “c”. The cost of the lease is measured in dollars per square foot. Here’s how to make a so-called partial regression leverage plot for these data. should there be a correlation between the square feet and the number of bathrooms in a home? b) One of the two explanatory variables in this model does not explain statistically significant variation in the price. Leases (introduced in Chapter 19) This data table gives annual costs of 223 commercial leases. Once you figure out the units of the slope. Now. Had the two explanatory variables been uncorrelated (and produced these estimates). First.4/20/2007 25 Exercises d) Suppose that one formulated this regression using total costs of each production run rather than average cost per unit. Would collinearity have been a problem in this model? Explain. regress Bathrooms on Square Feet and save these residuals. we have to remove the effect of one of the explanatory variables from the other variables. 39. Fit the multiple regression of Price on Square Feet and Bathrooms. you should be able to get the interpretation. The price of the home is measured in thousands of dollars. those present regardless of the number of square feet. regress Price on Square Feet and save the residuals. and the number of bathrooms is just that.
Second.4/20/2007 25 Exercises d) We can see the effects of collinearity by constructing a plot that shows the slope of the multiple regression. Are they different? f) Compare the scatterplot of Log R&D Expenses on Log Assets to the partial regression plot constructed in “e”. Now. To do this. scatterplot the residuals from the regression of Log R&D Expenses on Log Net Sales on the residuals from the regression of Log Assets on Log Net Sales. and compare the slope in this fit to the partial slope for Log Assets in the multiple regression. First. we have to remove the effect of one of the explanatory variables from the other variables. Fit a multiple regression with the log10 of the base price E25-10 . would you expect to find a correlation between the log of the total assets and the log of net sales? b) Does the correlation between the explanatory variables change if you work with the data on the original scale rather than on a log scale? In which case is the correlation between the explanatory variables larger? c) In which case does correlation provide a more useful summary of the association between the two explanatory variables? d) What is the impact of the collinearity on the standard errors in the multiple regression using the variables on a log scale? Does the size of the VIF tell you that the two explanatory variables are not statistically significant? e) We can see the effects of collinearity by constructing a plot that shows the slope of the multiple regression. Second. and semiconductor manufacturing. regress 1/SqFt on Parking/SqFt and save these residuals. we have to remove the effect of one of the explanatory variables from the other variables. First. Now. What’s changed? 41. To do this. Here’s how to make a so-called partial regression leverage plot for these data. regress Log Assets on Log Net Sales and save these residuals. a) Thinking marginally for a moment. What’s changed? 42. systems design. so 1000 = $1 billion. The variables include the expenses on research and development (R&D). total assets of the company.) Use the natural logs of all variables and fit the regression of log R&D Expenses on log Assets and log Net Sales. regress Cost per SqFt on Parking/SqFt and save the residuals. R&D expenses (Introduced in Chapter 19) This data table contains accounting and financial data that describe 493 companies operating in technology industries: software. scatterplot the residuals from the regression of Cost per SqFt on Parking/SqFt on the residuals from the regression of 1/SqFt on Parking/SqFt. regress Log R&D Expenses on Log Net Sales and save the residuals. Here’s how to make a so-called partial regression leverage plot for these data. Are they different? e) Compare the scatterplot of Cost per SqFt on 1/SqF to the partial regression plot constructed in “d”. and compare the slope in this fit to the partial slope for 1/SqFt in the multiple regression. Cars (Introduced in Chapter 19) This data table gives characteristics of 223 types of cars sold in the US during the 2003 and 2004 model years. Fit the simple regression for this scatterplot. and net sales. All columns are reported in millions of dollars. Fit the simple regression for this scatterplot.
and compare the slope in this fit to the partial slope for Log10 Weight in the multiple regression. regress Log10 Price on Log10 HP and save the residuals. Fit the simple regression for this scatterplot. To do this. scatterplot the residuals from the regression of Log10 Price on Log10 HP on the residuals from the regression of Log10 Weight on Log10 HP. a) Does it seem natural to find correlation between these two explanatory variables. Are they different? e) Compare the scatterplot of Log10 Price on Log10 Weight to the partial regression plot constructed in “e”. For this analysis. Here’s how to make a so-called partial regression leverage plot for these data. regress Log10 Weight on Log10 HP and save these residuals. we cannot judge the effect of outliers on the partial E25-11 . we have to remove the effect of one of the explanatory variables from the other variables. a measure of the overall production in an economy per citizen) among developed countries. Exporting countries tend have large positive trade balances.4/20/2007 25 Exercises as the response and the log10 of the horsepower of the engine (HP) and the log10 of the weight of the car (given in thousands of pounds) as explanatory variables. Her current equation is Estimated per capita GDP =β0 + β1 Trade Balance + β2 Waste per capita Trade balance is measured as a percentage of GDP. OECD (introduced in Chapter 19) An analyst at the UN is developing a model that describes GDP (gross domestic product per capita. Can this be attributed to the effect of collinearity of the standard error of this estimate? d) We can see the effects of collinearity by constructing a plot that shows the slope of the multiple regression. First. she uses national data for 30 countries from the 2005 report of the Organization for Economic Co-operation and Development (OECD).) b) One nation is particularly leveraged in the marginal relationship between per capita GDP. The other explanatory variable is the annual number of kilograms of municipal waste per person. a) Is there any natural reason to expect for these explanatory variables to be correlated? Suppose she had formulated her model using national totals as GDP =β0 + β1 Net Export ($) + β2 Total Waste (kg) would this model have more or less collinearity? (You should not need to explicitly form these variables to answer this question. What’s changed? 43. Importers have negative balances. Which is it? c) Does collinearity exert a strong influence on the standard errors of the estimates in her multiple regression? d) Because multiple regression estimates the partial effect of an explanatory variable rather than its marginal effect. Second. either on a log scale or in the original units? b) How will collinearity on the log scale affect the standard errors of the slopes of these predictors in the multiple regression? c) One of the explanatory variables in the multiple regression is not statistically significant. Now.
you can see the effect of what the analyst did. and keep them with the firm. To do this. To do this. we cannot judge the effect of outliers on the partial slope from their position in the scatterplot of y on x. First. Here’s how to make a so-called partial regression leverage plot for these data. and compare the slope in this fit to the partial slope for Trade Balance in the multiple regression. The firm would like to build a system to monitor the progress of new agents. scatterplot the residuals from the regression of per capita GDP on per capita Waste on the residuals from the regression of Trade Balance on per capita Waste. however. An analyst at the firm is using an equation of the form (with natural logs) Log Profit = β0 + β1 Log Accounts + β2 Log Early Commission For cases have value 0 for early commission. Second. Fit the simple regression for this scatterplot. Now. Here’s how to make a so-called partial regression leverage plot for these data. These regressions remove the effects of the number of accounts opened from the other two variables. does collinearity exert a strong influence on the standard errors of the estimates in her multiple regression? d) Because multiple regression estimates the partial effect of an explanatory variable rather than its marginal effect. Among the possible explanations of performance are the number of new accounts developed by the agent during the first 3 months of work and the commission earned on early sales activity. These regressions remove the effects of waste from the other two variables. From the scatterplot of Log Profit on Log Early Commission. the analyst replaced zero with $1. regress per capita GDP on per capita Waste and save the residuals. We can. The goal is to identify “superstar agents” as rapidly as possible. is leveraged in the partial regression leverage plot constructed in “d”? What would happen to the estimate for this partial slope if the outlier were excluded? 44. we have to remove the effect of one of the explanatory variables from the other variables. however. see their effect by constructing a plot that shows the partial slope. regress Log Commission on Log Accounts and save these residuals. direct-to-consumer sales force. see their effect by constructing a plot that shows the partial slope. regress Trade Balance on per capita Waste and save these residuals. regress Log Profit on Log Accounts and save the residuals. a) The choice of the analyst to fill in the 0 values of early commission with 1 so as to be able to take the log is a common choice (you cannot take the log of 0). We can. First. Hiring (Introduced in Chapter 19) A operates a large. we have to remove the effect of one of the explanatory variables from the other variables. The response of interest is the profit to the firm (in dollars) of contracts sold by agents over their first year. A key task for agents is to open new accounts. Second.4/20/2007 25 Exercises slope from their position in the scatterplot of y on x. Now. These data summarize the early performance of 464 agents. offer them incentives. if any. Are they different? e) Which nation. an account is a new customer to the business. E25-12 . What’s the impact of these filled-in values on the marginal association? b) Is there much collinearity between the explanatory variables? How does the presence of these filled-in values affect the collinearity? c) Using all of the cases.
The column Detail Voice is the ratio of detailing for this drug to the amount of detailing for all cholesterol-lowering drugs in Boston. and compare the slope in this fit to the partial slope for Log Commission in the multiple regression. and Week (which is a simple time trend.4/20/2007 25 Exercises scatterplot the residuals from the regression of Log Profit on Log Accounts on the residuals from the regression of Log Commission on Log Accounts. Promotion (Introduced in Chapter 19) These data describe promotional spending by a pharmaceutical company for a cholesterol-lowering drug. a) Do any of these variables have linear patterns over time? Use timeplots of each one to see. In place of the level spending. Fit the simple regression for this scatterplot. Does the multiple regression. do the partial effects create a different sense of importance from what is suggested by marginal effects? d) Which explanatory variable has the largest VIF? e) What’s your substantive interpretation of the fitted equation? Take into account collinearity and statistically significant. (A scatterplot matrix becomes particularly useful. explain statistically significant variation in the response? c) Does collinearity affect the estimated effects of these explanatory variables in the estimated equation? In particular. Similarly. The analysis is much the same.) Do any weeks stand out as unusual? b) Fit the multiple regression of Market Share on three explanatory variables: Detail Voice. taken as a whole. (Hint: are they collinear?) E25-13 . The column Market Share is the ratio of sales of this product divided by total sales for such drugs in the Boston area. in general. 45. Sample Voice. Detailing counts the number of promotional visits made by representatives of a pharmaceutical company to doctors’ offices. Marketing research often describes the level of promotion in terms of voice. Sample Voice is the share of samples in this market that are from this manufacturer. only now you have to be on the watch for more sources of collinearity. The variables in this collection are shares. voice is the share of advertising devoted to a specific product. numbering the weeks of the study from 1 to 39). Are they different? e) Do the filled-in cases remain leveraged in the partial regression leverage plot constructed in “d”? What does this view of the data suggest would happen to the estimate for this partial slope if these cases were excluded? f) What do you think about filling in these cases with 1 so that we can take the log? Should something else be done with them? The next two exercises multiple regression with 3 explanatory variables. The data covers 39 consecutive weeks and isolates the area around Boston. f) Should both of the explanatory variables that are not statistically significant be removed from the model at the same time? Explain why doing this would not be such a good idea.
as well as returns on the entire stock market. the S&P 500. (The column Whole Market Return is the return on a value-weighted portfolio that purchases stock in the 3 major US markets in proportion to the size of the company rather than one of each stock. The data includes 300 monthly returns on Apple Computer. E25-14 . stock in IBM. and IBM as explanatory variables. or would you just leave it as is? f) Interpret substantively the fit of your model (which might be the one the question starts with).4/20/2007 25 Exercises 46. Just subtract the return on Treasury Bills from each.) a) Do any of these excess returns have linear patterns over time? Use timeplots of each one to see. do the partial effects create a different sense of importance from what is suggested by marginal effects? d) Which explanatory variable has the largest VIF? e) How would you suggest improving this model. Apple (Introduced in Chapter 19) These data track monthly performance of stock in Apple Computer since its inception in 1980. the S&P 500 index. 30-day loans to Uncle Sam). (Excess returns are the same as excess percentage changes.) Formulate the regression with excess returns on Apple as the response and excess returns on the whole market.) Do any months stand out as unusual? b) Fit the indicated multiple regression. and Treasury Bills (short term. (A scatterplot matrix becomes particularly useful. only without being multiplied by 100. Does the estimated multiple regression explain statistically significant variation in the excess returns on Apple? c) Does collinearity affect the estimated effects of these explanatory variables in the estimated equation? In particular.
all three variables are highly correlated. such as advertising via the Internet. see Question 41). This timeplot shows the two of them (TV in green. and printed ads in red).4/20/2007 25 Exercises 4M Budget Allocation Collinearity among the predictors is common in many applications. Managers today face many more choices. as you can see from the timeplot of weekly sales during this period. particularly those that track the growth of a new business over time.5 million. but lets focus on two so we can learn more about the effects of collinearity. Spending ($M) 200 100 0 0 20 40 60 80 100 Week With everything getting larger over time. namely your spending for advertising. the simultaneous changes that take place make it hard to separate important factors from coincidences. sales. 4500 4000 Sales ($M) 3500 3000 2500 2000 0 20 40 60 80 100 Week Other things have grown as well. For this exercise. number of employees and so forth. The problem is worse when the business has steadily grown – or fallen. so you can see from the plot that weekly sales have grown from about $2. over the same 104 weeks. Your company in this exercise has been doing well. We’ve simplified things so that you have two choices: printed ads or television. The past 2 years have been a time of growth. The data are in thousands of dollars. E25-15 . You have a fixed total budget for advertising. you’re the manager who allocates advertising dollars. Because the growth of a business affects many attributes of the business (such as assets. and you have to decide how to spend it.5 million up to around $4.
(i) Does the model as a whole explain statistically significant variation in Sales? (ii) Does each individual explanatory variable improve the fit of the model. (i) Does the fit of this model promise an accurate prediction of sales in the next week. Make a recommendation with enough justification to satisfy their concerns. Do the relationships between the variables seem straight enough for fitting a multiple regression? (e) Do you anticipate that collinearity will affect the estimates and standard errors in the multiple regression. Use the correlation matrix of these variables to help construct your answer. (f) Does the model satisfy the assumptions for the use of the MRM? (g) Assuming that the model satisfies the conditions for the MRM. such as sales on spending for television ads or sales on spending on printed ads? (d) Look at the scatterplot matrix of Sales. accurate enough for you to think that you have nailed the right allocation? Message (j) Everyone at the budget meeting knows the information in the plots shown in the introduction to this exercise: sales and both types of advertising are up. E25-16 . more easily understood simple regression models. Mechanics Fit the multiple regression of Sales on TV Adv and Print Adv. (k) Identify any important concerns or limitations that you feel should be understood to appreciate your recommendation. beyond the variation explained by the other alone? (h) Do the results from the multiple regression suggest a method for allocating your budget? Assume that your budget for the next week is $360. the two explanatory variables (TV Adv and Print Adv) and a time trend (Week).4/20/2007 Motivation 25 Exercises (a) How are you going to decide how to allocate your budget between these two types of promotion? Method (b) Explain how you can use multiple regression to help decide how to allocate the advertising budget between printed ads and television ads.000. (c) Why would it not be enough to work with several.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.