You are on page 1of 8

1

APPLIED STATISTICS (SQQS2013)


TUTORIAL 5: CORRELATION AND LINEAR REGRESSION

1. A study is done to investigate if Statistics scores have some effect on students CPA
scores. Data below are Statistics final examination scores of 10 randomly selected
students and their corresponding CPA scores.

Statistics Scores 87 69 75 56 63 90 71 74 80 78
CPA 3.41 3.15 3.28 2.46 2.89 3.73 3.11 3.23 3.50 3.34

a) Identify the dependent and independent variables.
b) Calculate the Pearson coefficient. Interpret the coefficient obtained.
c) Can we conclude that there is a relationship between the Statistics and CPA scores at 2%
significance level?
d) Fit a least squares regression line.
e) Based on your answer in (c), interpret the coefficient obtained.
f) Is there enough evidence to conclude that the Statistics scores have positive significant
effect on CPA scores at 2.5% significance level?
g) Predict a CPA score if a student gets 65 in Statistics.
h) Interpret the coefficient of determination.

Solution:
a) Dependent variable: CPA Independent variable: Statistics scores

b)
The correlation coefficient suggests a strong positive relationship between the
Statistics and CPA scores.

c)


Reject


There is a relationship between the Statistics and CPA scores at 2% significance level.

d)



2

e) The CPA score for a student who had zero mark in Statistic is 0.8110.
For every one mark increase in Statistics, the CPA score will increase 0.0323

f)


()


()


()()

()


Reject


The Statistics scores have positive significant effect on CPA scores at 2.5%
significance level.

g) ()

h)


91.01% of the variation in CPA scores can be explained by the variation in the
Statistics scores.
Only 8.99% is unexplained, due to error.

2. An architect wants to determine the relationship between the heights (in feet) of a building
(y) and the number of stories in the building (x). The following results are based on ten
samples that have been measured.

30.00 40.00 50.00 60.00 70.00
x
400.0
500.0
600.0
700.0
800.0
900.0
y
3

Hint:
5391 . 73 , 6 . 123921 , 4 . 870
275237 , 5968 , 444
0



b S S
xy y x
yy xx


a) Does the scatter plot suggest an approximate linear relationship? Explain.
b) Determine the strength of the relationship between the heights of a building and the
number of stories in the building. Interpret the value.
c) Fit a least squares line.
d) Can we conclude that the number of stories in a building has positive significance
effect on its heights at 5% significance level?

Solution:
a) Yes. The data values fluctuate on the estimated straight line.

b) 10257.8
10
) (444)(5968
275237 S
xy

0.9877
3921.6) (870.4)(12
10257.8
r
The correlation coefficient suggests a strong positive linear relationship between
heights of a building and the number of stories in the building.

c) 11.7852
870.4
10257.8
b
1

x 11.7852 73.5391 y

d) H
0
:
1
0 H
1
:
1
> 0
378.9219
8
257.8) 11.7852(10 123921.6
S
2
e


17.8616
870.4
378.9219
0 11.7852
t
test


1.8595 t
8 0.05,

Reject H
0
.
There is enough evidence to conclude that number of stories in a building has positive
significance effect on its heights at 5% significance level.













4

3. Suppose that the sales manager of a large automotive parts distributor wants to estimate
the total annual sales of a region. Several factors appear to be related to sales, including
the number of retail outlets (X
1
), number of automobiles registered (X
2
), personal incomes
(X
3
), average age of automobiles (X
4
) and number of supervisors (X
4
). The following
output is the results of the analysis obtained by the sales manager. Based on the output,
answer the following questions.

ANOVA(b)
Model
Sum of
Squares df Mean Square F Sig.
1 Regression
1594.237 5 318.847 148.003 .000(a)
Residual
8.617 4 2.154
Total
1602.855 9
a Predictors: (Constant), x5, x3, x2, x4, x1
b Dependent Variable: sales


Coefficients(a)

Model
Unstandardized
Coefficients
Standardized
Coefficients t Sig.
B Std. Error Beta
1 (Constant)
-20.157 5.041 -3.998 .016
x1
.000 .003 -.020 -.148 .889
x2
1.696 .514 .311 3.299 .030
x3
.425 .043 .922 9.775 .001
x4
2.316 .932 .144 2.483 .068
x5
-.145 .203 -.042 -.714 .515
a Dependent Variable: sales

Correlations

sales x1 x2 x3 x4 x5
sales Pearson Correlation
1 .899(**) .604 .962(**) -.369 .243
Sig. (2-tailed)
.000 .064 .000 .294 .500
x1 Pearson Correlation
.899(**) 1 .775(**) .820(**) -.504 .144
Sig. (2-tailed)
.000 .008 .004 .137 .691
x2 Pearson Correlation
.604 .775(**) 1 .400 -.314 .364
Sig. (2-tailed)
.064 .008 .252 .377 .301
x3 Pearson Correlation
.962(**) .820(**) .400 1 -.439 .115
Sig. (2-tailed)
.000 .004 .252 .204 .751
x4 Pearson Correlation
-.369 -.504 -.314 -.439 1 .471
Sig. (2-tailed)
.294 .137 .377 .204 .169
x5 Pearson Correlation
.243 .144 .364 .115 .471 1
Sig. (2-tailed)
.500 .691 .301 .751 .169
** Correlation is significant at the 0.01 level (2-tailed).

5

a) Write down the estimated equation of the regression line.
b) Is there sufficient evidence to indicate that there is a positive relationship between
sales and X
2
at 2.5% level of significance?
c) At the 5% significance level, test the overall validity of the model.
d) Which explanatory variable has no significant effect on Y at 5% significance level?
e) Which variable(s) has negative relationship with X
1
?
f) Which two variables have the strongest relationship?
g) Describe the strength and direction between X
5
and the dependent variable.
h) State the value for determination coefficient and interpret it.

Solution:
a)



b)


Failed to reject


The relationship is not significant at 5% significance level.

c)


Reject


The model is valid at 5% significance level.

d) X
1
, X
4
and X
5


e) X
4


f) X
3
and sales

g) There is a weak positive relationship between X
5
and the dependent variable.

h)


99.46% of the variation in total annual sales can be explained by the variation in
number of retail outlets (X
1
), number of automobiles registered (X
2
), personal incomes
(X
3
), average age of automobiles (X
4
) and number of supervisors (X
4
).
Only 0.54% is unexplained, due to error.









6

4. The electric power consumed (y) each month by a chemical plant is thought to related to
the average ambient temperature (

), the number of days in the month (

), the average
product purity (

) and the tons of product produced (

). The past years historical data


are available and are recorded. The output displayed the result of analysis.
Correlations
y x1 x2 x3 x4
Y Pearson
Correlation
1 .744(**) .802(**) .890(**) .823(**)
Sig. (2-tailed)
. .001 .000 .000 .000
N
15 15 15 15 15
x1 Pearson
Correlation
.744(**) 1 .849(**) .914(**) .934(**)
Sig. (2-tailed)
.001 . .000 .000 .000
N
15 15 15 15 15
x2 Pearson
Correlation
.802(**) .849(**) 1 .769(**) .976(**)
Sig. (2-tailed)
.000 .000 . .001 .000
N
15 15 15 15 15
x3 Pearson
Correlation
.890(**) .914(**) .769(**) 1 .868(**)
Sig. (2-tailed)
.000 .000 .001 . .000
N
15 15 15 15 15
x4 Pearson
Correlation
.823(**) .934(**) .976(**) .868(**) 1
Sig. (2-tailed)
.000 .000 .000 .000 .
N
15 15 15 15 15
** Correlation is significant at the 0.01 level (2-tailed).
Model Summary

Model R R Square
Adjusted R
Square
Std. Error of
the Estimate
1
.973(a) .946 .925 3.23125
a Predictors: (Constant), x4, x3, x1, x2
ANOVA(b)

Model
Sum of
Squares df Mean Square F Sig.
1 Regression
1838.698 4 459.675 44.026 .000(a)
Residual
104.410 10 10.441
Total
1943.108 14
a Predictors: (Constant), x4, x3, x1, x2
b Dependent Variable: y
Coefficients(a)

Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig. B Std. Error Beta
1 (Constant
)
3.716 2.274 1.634 .133
x1
-1.400 .727 -.643 -1.924 .083
x2
1.335 .564 1.497 2.367 .040
x3
5.896 .856 1.453 6.891 .000
x4
-.755 .545 -1.299 -1.385 .196
a Dependent Variable: y
7

a) State the sample size for the above study.
b) Which variables are the independent variables?
c) Which variable is the dependent variable?
d) What the intercept value?
e) Which independent variable has the strongest relationship with the power consumption?
State the value.
f) Interpret the relationship between the average ambient temperature and power
consumption. State whether the relationship is significant at = 0.05.
g) Write down the regression model obtained.
h) Interpret the values of intercept and temperature in the equation.
i) List the variables of X that is able to make the variable of Y to decrease when it
increases.
j) List the independent variables that have significant effect on power consumption at
= 0.05.
k) Based on output, test

at 2% level of significance. Is variable tons of product


produced should be included in the model?
l) Predict power consumption for a month in which

= 25
o
F,

= 24 days,

= 15%
and

= 98 tons.
m) How well the model fit the data?
Solution:
a) n = 15

b) The average ambient temperature (

)
The number of days in the month (

)
The average product purity (

)
The tons of product produced (

).

c) The electric power consumed (y)

d) 3.716

e) The average product purity (

),

= 0.890

f)

= 0.744, there is a strong positive relationship between the average temperature


and power consumption.


Reject


The relationship is significant at 5% significance level.

g)




8

h) The power consumption without the chemical plant is 3.716.
Assuming the other variables are constant, for every 1
o
F increase in temperature, the
power consumption will decrease 1.4.

i)

and



j)

and



k)


Failed to reject


The tons of product produced have no significant at 2% level of significance.
So that, tons of product produced shouldnt be include in the model.

l) () () () ()

m)


94.6% of the variation in electric power consumed can be explained by the variation in
average ambient temperature, number of days in the month, average product purity
and tons of product produced.
Only 5.4% is unexplained, due to error.

You might also like