You are on page 1of 3

BASSTAT S14 & N04

Linear Regression and Correlation


Exercises

1. A comparison of the undergraduate grade point averages of 12 corporate employees with their scores on a managerial
trainee examination produced the following results:

Employee 1 2 3 4 5 6 7 8 9 10 11 12
GPA, x 2.2 2.4 3.1 2.5 3.5 3.6 2.5 2.0 2.2 2.6 2.7 3.3
Exam Score, y 76 89 83 79 91 95 82 69 66 75 80 88

Use  = 0.05 whenever applicable.


1.1 Draw the scatter plot for the given data.
1.2 Set up the equation of the least squares (regression) line for the data. (𝒚̂ = 𝟒𝟒. 𝟑𝟕𝟓 + 𝟏𝟑. 𝟓𝟏𝟐𝒙)
1.3 Interpret the slope of the regression line in (1.2).
1.4 From (1.2), estimate the mean exam score for employees with a GPA of 3.0. (84.9)
1.5 Give and interpret the Pearson’s correlation coefficient r between GPA and exam score. (r = 0.821)
1.6 Give and interpret the sample coefficient of determination, r2. (67.5%)
1.7 Do the data provide sufficient evidence to indicate that there is a linear relationship between GPA and exam
scores? Justify your answer. (t = 4.553; Reject H0)

2. The regional transit authority for a major metropolitan area wants to determine whether there is any relationship
between the age of a bus (X) and the annual maintenance cost (Y). A sample of 10 buses resulted in the following
data:
Bus No. 1 2 3 4 5 6 7 8 9 10
X: Age (years) 1 2 2 2 2 3 4 4 5 5
Y: Maintenance Cost ($) 350 370 480 520 590 550 750 800 790 950

2.1 Compute and interpret Pearson’s r. (0.934)


2.2 Compute and interpret the coefficient of determination, r2. (87.3%)
2.3 Test for the significance of r to determine if there is a significant linear relationship between the age (X) and
maintenance cost (Y) of a bus. Use  = 0.01. (t = 7.399; Reject H0)

3. A marketing professor is interested in the relationship between hours spent studying (X) and total points earned
(Y) in a course. Data collected on a sample of 10 students who took the course last term follow.

Student No. 1 2 3 4 5 6 7 8 9 10
X: Hours spent studying 45 30 90 60 105 65 90 80 55 75
Y: Total points earned 40 35 75 65 90 50 90 80 45 65

3.1 Set up the equation of the least squares line (SLRM) for this data set. (𝒚
̂ = 𝟓. 𝟖𝟒𝟕 + 𝟎. 𝟖𝟑𝟎𝒙)
3.2 Interpret the slope (b1) of the fitted SLRM in (3.1).
3.3 Give a point estimate of the expected total points earned ( Ŷ ) when 85 hours are spent for studying. (76.4)

4. The following table shows the number of sales contacts (X) made by a sample of n = 10 salespersons during a
week and the number of sales (Y) made.

Salesperson 1 2 3 4 5 6 7 8 9 10
X: No. of sales contacts 71 64 100 105 75 59 82 68 111 90
Y: No. of sales 25 16 37 40 18 10 22 14 42 19

4.1 Compute and interpret Pearson’s r. (0.920)


4.2 Compute and interpret the sample coefficient of determination, r2. (84.7%)
4.3 Test for the significance of r at  = 0.05 using Ho:  = 0 versus Ha:   0. (t = 6.644; Reject H0)
4.4 Set up the equation of the regression line for this data set. (𝒚 ̂ = −𝟐𝟑. 𝟑𝟗𝟐 + 𝟎. 𝟓𝟕𝟖𝒙)
4.5 Interpret the slope (b1) of the fitted regression line in (4.4).
4.6 Estimate the expected no. of sales when a salesperson makes 90 sales contacts. (28.6)

Page | 1
5. A store manager wishes to find out whether there is a relationship between the age (X) of her employees and the number of
sick days (Y) they take each year. The data for a sample of n = 6 employees are shown below:

Employee 1 2 3 4 5 6
Age (X) 18 26 39 48 53 58
Days (Y) 16 12 9 5 6 2

5.1 Set up the equation of the fitted regression line for this data set. (𝒚
̂ = 𝟐𝟏. 𝟏𝟎𝟎 − 𝟎. 𝟑𝟏𝟕𝒙)
5.2 Is X (age) a significant explanatory variable (predictor) for the response variable Y (days)? Justify.
(t = -9.623; Reject H0)
5.3 Interpret the slope of the regression line in (5.1).
5.4 Give the expected number of sick days for employees with age 50. (5.3 days)
5.5 Give and explain briefly the sample coefficient of determination, r2. (95.9%)
5.6 Give and explain briefly the Pearson’s correlation coefficient, r. (-0.979)

6. A warehouse manager is interested in the possible improvements to labor efficiency if air-conditioning is installed in the
warehouse. The data set shown in the following table is collected which shows the times taken to unload a fully laden truck
at various temperature levels.

Temperature, Unloading Times,


Truck X Y
(in degrees F) (in minutes)
1 52 64
2 68 53
3 64 58
4 88 59
5 80 49
6 75 54
7 59 38
8 63 48
9 85 68
10 74 63
11 71 58
12 66 47

6.1 Fit a linear regression model with time as the dependent variable and temperature as the explanatory
(independent/predictor) variable. Indicate the scope of regression. (𝒚 ̂ = 𝟑𝟔. 𝟏𝟗𝟒 + 𝟎. 𝟐𝟔𝟔𝒙)
6.2 Is X (temperature) a significant predictor for the response variable Y (unloading time)? Justify using an appropriate
significance test. (t = 1.116; DNR H0)
6.3 Does your analysis indicate that there is evidence that the trucks take longer to unload when the temperature is higher?
(No)
6.4 Can a case be made that the installation of air-conditioning will improve worker efficiency? (No)
6.5 Interpret the slope of the regression equation in (6.1).
6.6 Give the expected unloading time when the temperature is 80F. (57.5 minutes)
6.7 Give and interpret the sample coefficient of determination, r2. (11.1%)
6.8 Give and interpret the Pearson’s correlation coefficient, r. (0.333)

7. The following data show the media expenditures, X (in millions of dollars) and the case sales, Y (in millions) for n = 7
major brands of soft drinks (Superbrands ’98, October 20, 1997).

Media Expenditures, Case Sales,


Brand X Y
(in million dollars) (in millions)
1 - Coca-Cola Classic 131.3 1929.2
2 - Pepsi-Cola 92.4 1384.6
3 - Diet Coke 60.4 811.4
4 - Sprite 55.7 541.5
5 - Dr. Pepper 40.2 536.9
6 - Mountain Dew 29.0 535.6
7 - 7-Up 11.6 219.5
Page | 2
7.1 Set up the equation of the fitted regression line for this data set. (𝒚
̂ = −𝟏𝟓. 𝟒𝟐𝟎 + 𝟏𝟒. 𝟒𝟐𝟒𝒙)
7.2 Is X (media expenditures) a significant predictor for the response variable Y (case sales)? Justify.
(t = 10.508; Reject H0)
7.3 Interpret the slope of the regression equation in (7.1).
7.4 Give the expected number of case sales for a soft drink brand with a media expenditure of $100 million. ($1,426.96M)
7.5 Give and interpret the sample coefficient of determination, r2. (95.7%)
7.6 Give and interpret the Pearson’s correlation coefficient, r. (0.978)

8. For a company to maintain a competitive edge in the marketplace, spending on research and development (R & D) is essential.
To determine the optimum level for R & D spending and its effects on a company’s value, a simple linear regression analysis
was performed. Data collected for the largest R & D spenders were used to fit the straight-line model (SLRM)
y  0 1x   ,
where:
x = R & D expenditures/sales (R/S) ratio y = Price/earnings (P/E) ratio.

The sample data for n = 20 of the companies used in the study are provided in the following table:

Company R/S Ratio P/E Ratio Company R/S Ratio P/E Ratio
x y x y
1 0.003 5.6 11 0.058 8.4
2 0.004 7.2 12 0.058 11.1
3 0.009 8.1 13 0.067 11.1
4 0.021 9.9 14 0.080 13.2
5 0.023 6.0 15 0.080 13.4
6 0.030 8.2 16 0.083 11.5
7 0.035 6.3 17 0.091 9.8
8 0.037 10.0 18 0.092 16.1
9 0.044 8.5 19 0.064 7.0
10 0.051 13.2 20 0.028 5.9

8.1 Set up the SLRM for this data set and indicate the scope of regression. (𝒚̂ = 𝟓. 𝟗𝟕𝟕 + 𝟕𝟒. 𝟎𝟔𝟖𝒙)
8.2 Estimate the expected P/E ratio of all companies with an R/S ratio of 0.070. (11.2)
8.3 Interpret the slope of regression equation in (8.1).
8.4 Test the significance of the linear relationship between R/S ratio and P/E ratio. (t = 4.482; Reject H0)
8.5 Give and interpret the following: Pearson’s r; Coefficient of determination r2 (r = 0.726; r2 = 52.7%)

9. The marketing manager of a large supermarket chain would like to determine the effects of shelf space on the sales of pet
food. A random sample of n = 12 equal-sized stores is selected with the following results:

Store Shelf Space, Weekly Sales, Store Shelf Space, Weekly Sales,
X feet Y dollars X feet Y dollars
1 5 160 7 15 230
2 5 220 8 15 270
3 5 140 9 15 280
4 10 190 10 20 260
5 10 240 11 20 290
6 10 260 12 20 310

9.1 Set up the SLRM for this data set and indicate the scope of regression. (𝒚̂ = 𝟏𝟒𝟓. 𝟎 + 𝟕. 𝟒𝒙)
9.2 Estimate the expected weekly sales of all the stores with a 12 feet of shelf space. ($233.80)
9.3 Interpret the slope of regression equation in (9.1).
9.4 Test the significance of shelf space as a predictor for the mean weekly sales. (t = 4.652; Reject H0)
9.5 Give and interpret the following: Pearson’s r; Coefficient of determination r2 (r = 0.827; r2 = 68.4%)

Page | 3

You might also like