Problem 1:

The dataset “Exam revision.sav” shows the sample of 40 observations for which data on the
exam scores, hours spent on revising, anxiety levels and A-level entry point scores for 40
students are shown. Construct a multiple linear regression model to explain the effect of the
three independent variables namely hours spent on revising, anxiety levels and A-level entry
point scores on the exam scores.

Problem 2:

The dataset “Net sales.sav” shows the sample of 27 observations for which data on the
Annual net sales, No. of square feet, Inventory, Amount spent on advertising, Size of sales
district and No. of competing stores in district are shown. Construct a multiple linear
regression model to show the effect of five independent variables namely No. of square feet,
Inventory, Amount spent on advertising, Size of sales district and No. of competing stores in
district on Annual net sales.
Problem 1
To fit a regression model for the given data using SPSS

 Identify the dependent and the independent variables from the given data set
Independent variables: hours spent on revising, Anxiety, A-level entry points
Dependent variable: Exam scores
Multiple Linear Regression Model:
Exam scores=β0 + β 1 ( hours spent on revising )+ β 2 ( Anxiety ) + β 3 ( A−level entry points )+ ϵ
 Find the scatter plot for the given data.

 Now perform the regression analysis.


Scatter Plot of dependent and independent variables


This Scatter plot shows;

 Positive linear correlation between hours spent revising (independent variable) and
exam score (dependent variable).
 Positive linear correlation between A-level entry points (independent variable) and
exam score (dependent variable).
 Weak negative correlation between Anxiety (independent variable) and exam score
(dependent variable).

Correlation coefficients between dependent and independent variables


exam score hours spent anxiety A-level entry

revising points

Pearson Correlation 1 .832** -.112 .902**

exam score Sig. (2-tailed) .000 .493 .000

N 40 40 40 40
Pearson Correlation .832**
1 -.333 *
hours spent revising Sig. (2-tailed) .000 .036 .000
N 40 40 40 40
Pearson Correlation -.112 -.333 *
1 -.230
anxiety Sig. (2-tailed) .493 .036 .153
N 40 40 40 40
Pearson Correlation .902**
.778 **
-.230 1

A-level entry points Sig. (2-tailed) .000 .000 .153

N 40 40 40 40

**. Correlation is significant at the 0.01 level (2-tailed).

*. Correlation is significant at the 0.05 level (2-tailed).

The above table gives the correlation coefficients as;
 0.832 between exam score and hours spent revising, which indicates a strong positive
 0.902 between exam score and A-level entry points, which indicates a strong positive
 -0.112 between exam score and anxiety, which indicates a weak negative correlation

Model Summary

Model Summary

Model R R Square Adjusted R Std. Error of the

Square Estimate

1 .938a .881 .871 4.09510

a. Predictors: (Constant), A-level entry points, anxiety, hours spent

Interpretation: R2 = 0.881, which shows that the regression model can explain 88% of
variation in the given data.
Anova table

Model Sum of Squares df Mean Square F Sig.

Regression 4459.062 3 1486.354 88.633 .000b

1 Residual 603.713 36 16.770

Total 5062.775 39

a. Dependent Variable: exam score

b. Predictors: (Constant), A-level entry points, anxiety, hours spent revising

Since the p-value is < 0.05, the overall effect of the three independent variables on the
dependent variable is significant.

Estimated regression coefficients


Model Unstandardized Coefficients Standardized t Sig.


B Std. Error Beta

(Constant) -15.270 5.599 -2.727 .010

hours spent revising .489 .118 .394 4.159 .000

anxiety .101 .037 .166 2.709 .010
A-level entry points 2.234 .324 .634 6.903 .000

a. Dependent Variable: exam score

From the above table, the correlation coefficients are β 0= -15.270, β 1=0.489,β2= 0.101 and
β3= 2.234.
 The p-value of β 0 0.010 is lesser than 0.05, β 0 is significant.
 The p-value of β 1 0.000 is lesser than 0.05, β 1 is significant.
 The p-value of β 20.010 is lesser than 0.05, β 2 is significant.
 The p-value of β 30.000 is lesser than 0.05, β 3 is significant.


 The fitted regression model is given as;

Exam scores= -15.270 + 0.489 (Hours spent revising) + 0.101(anxiety)
+ 2.234(A-level entry points)

 The regression coefficients β 0 , β 1 , β2 , ∧β 3are significant.

β 0=−15.270 , p<0.05
β 1=0.489 , p <0.05
β 2=0.101 , p< 0.05
β 3=2.234 , p<0.05

 The overall effect of the three independent variables namely hours spent revising,
anxiety, A-level entry points on exam scores is significant at 5% level of significance.

 This model can explain up to 88% of the variation in the given data.
Problem 2
To fit a regression model for the given data using SPSS
 Identify the dependent and the independent variables from the given data set
Independent Variables: number sq. ft./1000,inventory/$1000, amount spent on
advertising/$1000, size of sales district/1000 families, number of competing stores in
Dependent variable:annual net sales/$1000
 Multiple Linear Regression:
Annual net sales= β 0+ β 1 (number sq.ft)+ β 2 (inventory)+ β 3 (amount spent on
advertising)+ β 4 (size of sales district) + β 5 (number of competing stores in district)+
 Find the scatter plot for the data

 Now perform the Regression Analysis.

Scatter plot for the dependent and independent variables

Interpretation: The scatter plot shows that

 There is a positive correlation between the dependent variable-Annual net Sales
and the independent variable-number sq.ft,
 There is a positive correlation between the dependent variable Annual net Sales and
the independent variable-inventory.
 There is a positive correlation between the dependent variable Annual net Sales and
the independent variable-amount spent on advertising.
 There is a positive correlation between the dependent variable Annual net Sales and
the independent variable-size of sales district.
 Whereas there is a negative correlation between Annual net sales and the
independent variable- number of competing stores in district.

Correlation co-efficient between the dependent and independent variables


annual number inventory/ amount size of number of

net sq. $1000 spent on sales competing
sales/$10 ft./1000 advertizin district/10 stores in
00 g/$1000 00 district

1 .873** .945** .920** .955** -.912**
annual net Correlation

sales/$1000 Sig. (2-tailed) .000 .000 .000 .000 .000

N 27 27 27 27 27 27
.873** 1 .808** .726** .820** -.761**
number sq. Correlation
ft./1000 Sig. (2-tailed) .000 .000 .000 .000 .000
N 27 27 27 27 27 27
.945** .808** 1 .902** .859** -.807**
Sig. (2-tailed) .000 .000 .000 .000 .000
N 27 27 27 27 27 27
.920** .726** .902** 1 .807** -.856**
amount spent on Correlation
advertizing/$1000 Sig. (2-tailed) .000 .000 .000 .000 .000
N 27 27 27 27 27 27
size of sales .955** .820** .859** .807** 1 -.880**
Sig. (2-tailed) .000 .000 .000 .000 .000
N 27 27 27 27 27 27
number of -.912** -.761** -.807** -.856** -.880** 1
competing stores
Sig. (2-tailed) .000 .000 .000 .000 .000
in district
N 27 27 27 27 27 27

**. Correlation is significant at the 0.01 level (2-tailed).

Interpretation: The above table shows that
 There is a strong positive correlation (0.873) between the dependent variable-Annual
net Sales and the independent variables-number sq.ft.
 There is a strong positive correlation (0.945) between the dependent variable-Annual
net Sales and the independent variable-inventory.
 There is a strong positive correlation(0.920) between the dependent variable-Annual
net Sales and the independent variable-amount spent on advertising.
 There is a strong positive correlation (0.955) between the dependent variable-Annual
net Sales and the independent variable-size of sales district.
 There is a negative correlation (-0.912) between the dependent variable Annual net
Sales and the independent variable ‘number of competing stores in district’.

Model Summary
Model Summary

Model R R Square Adjusted R Std. Error of the

Square Estimate

1 .997a .994 .992 17.212

a. Predictors: (Constant), number of competing stores in district, number

sq. ft./1000, inventory/$1000, size of sales district/1000 families, amount
spent on advertizing/$1000

R2 =0.994,which shows that the regression model can explain 99.4% variation in the given

Anova Table

Model Sum of Squares df Mean Square F Sig.

Regression 953145.069 5 190629.014 643.438 .000b

1 Residual 6221.597 21 296.267

Total 959366.667 26

a. Dependent Variable: annual net sales/$1000

b. Predictors: (Constant), number of competing stores in district, number sq. ft./1000,
inventory/$1000, size of sales district/1000 families, amount spent on advertizing/$1000

Since the p-value is < 0.05, the overall effect of the independent variables on the dependent
variable is significant.

Estimated Regression Co-efficients


Model Unstandardized Standardized t Sig.

Coefficients Coefficients

B Std. Error Beta

(Constant) -48.507 31.269 -1.551 .136

number sq. ft./1000 13.851 3.078 .150 4.500 .000

inventory/$1000 .214 .052 .213 4.099 .001

amount spent on
12.145 2.512 .237 4.835 .000
1 advertizing/$1000

size of sales
13.992 1.730 .377 8.088 .000
district/1000 families

number of competing
-3.581 1.772 -.091 -2.021 .056
stores in district

a. Dependent Variable: annual net sales/$1000

From the above table
 β 0= -48.507. Since the p-value of β 0 is > 0.05, β 0 is not significant.
 β 1= 13.851. Since the p-value of β 1 is < 0.05, β 1 is significant.
 β 2= 0.214. Since the p-value of β 2 is < 0.05, β 2 is significant.
 β 3= 12.145. Since the p-value of β 3 is < 0.05, β 3 is significant.
 β 4 = 13.992. Since the p-value of β 4 is < 0.05, β 4 is significant.
 β 5= -3.581. Since the p-value of β 5 is ~ equal to 0.05, β 5 is significant.

 The fitted regression model is given as
Annual net sales=-48.507+13.851 (number sq.ft)+ 0.214(inventory)+12.145 (amount
spent on advertising)+13.992(size of sales district) +(-3.581) (number of competing
stores in district)+ Error

 The regression coefficients β 1 , β 2 , β 3 , β 4 ∧β 5are significant.

β 0= -48.507 and the p-value of β 0 is > 0.05
β 1= 13.851 and the p-value of β 1 is < 0.05
β 2= 0.214 and the p-value of β 2 is < 0.05
β 3= 12.145 and the p-value of β 3 is < 0.05
β 4 = 13.992 and the p-value of β 4 is < 0.05
β 5= -3.581 and the p-value of β 5 is ~ equal to 0.05

 The overall effect of the independent variables on the dependent variable is significant
at 5% level of significance.
 This model can explain up to 99.4 % of the variation in the given data.

