You are on page 1of 19

Q1. Perform the following tasks in SPSS.

a) Use catalog.sav sample data file to fit a multiple linear model to predict "Sales of
Men's Clothing" on the basis of varibales "Number of Catalogs Mailed", "Number
of Pages in Catalog", "Number of Phone Lines Open for Ordering", "Amount
Spent on Print Advertising" and "Number of Customer Service Representatives".
Use forward selection, backward elimination and enter methods in this respect.
Interpret your result in each case.
Variables Entered/Removeda

Model Variables Variables Method


Entered Removed

1 mail . Forward (Criterion: Probability-of-F-to-enter <= .050)


2 phone . Forward (Criterion: Probability-of-F-to-enter <= .050)
3 print . Forward (Criterion: Probability-of-F-to-enter <= .050)
4 page . Forward (Criterion: Probability-of-F-to-enter <= .050)

a. Dependent Variable: men

Model Summarye

Model R R Square Adjusted R Std. Error of the


Square Estimate

1 .803a .645 .642 3785.49685


2 .877b .770 .766 3061.36064
3 .885c .784 .778 2980.12178
4 .891d .794 .787 2919.90929

a. Predictors: (Constant), mail


b. Predictors: (Constant), mail, phone
c. Predictors: (Constant), mail, phone, print
d. Predictors: (Constant), mail, phone, print, page
e. Dependent Variable: men

Name: M. Hashim Javed Roll: BT-588221


ANOVAa

Model Sum of Squares df Mean Square F Sig.

Regression 3069712621.002 1 3069712621.002 214.216 .000b

1 Residual 1690938397.841 118 14329986.422

Total 4760651018.843 119


Regression 3664135333.036 2 1832067666.518 195.485 .000c
2 Residual 1096515685.807 117 9371928.939
Total 4760651018.843 119
Regression 3730440424.702 3 1243480141.567 140.014 .000d
3 Residual 1030210594.141 116 8881125.812
Total 4760651018.843 119
Regression 3780175938.309 4 945043984.577 110.844 .000e

4 Residual 980475080.535 115 8525870.266

Total 4760651018.843 119

a. Dependent Variable: men


b. Predictors: (Constant), mail
c. Predictors: (Constant), mail, phone
d. Predictors: (Constant), mail, phone, print
e. Predictors: (Constant), mail, phone, print, page

Coefficientsa

Model Unstandardized Coefficients Standardized t Sig.


Coefficients

B Std. Error Beta

(Constant) -14064.614 2099.365 -6.699 .000


1
mail 2.991 .204 .803 14.636 .000
(Constant) -15361.047 1705.559 -9.006 .000
2 mail 1.971 .209 .529 9.424 .000
phone 334.103 41.952 .447 7.964 .000
(Constant) -20665.869 2554.586 -8.090 .000
mail 1.862 .207 .500 8.977 .000
3
phone 339.159 40.880 .454 8.296 .000
print .218 .080 .121 2.732 .007
(Constant) -23898.558 2838.361 -8.420 .000

mail 1.847 .203 .496 9.083 .000

4 phone 327.802 40.329 .439 8.128 .000

print .208 .078 .115 2.656 .009

page 50.508 20.912 .104 2.415 .017

a. Dependent Variable: men

Name: M. Hashim Javed Roll: BT-588221


Excluded Variablesa

Model Beta In t Sig. Partial Collinearity


Correlation Statistics

Tolerance

page .149b 2.773 .006 .248 .980

phone .447b 7.964 .000 .593 .625


1
print .104b 1.877 .063 .171 .957

service .153b 1.997 .048 .182 .501


page .110c 2.496 .014 .226 .968
2 print .121c 2.732 .007 .246 .955
c
service -.064 -.933 .353 -.086 .416
page .104d 2.415 .017 .220 .965
3
service -.079d -1.183 .239 -.110 .413
4 service -.072e -1.096 .275 -.102 .412

a. Dependent Variable: men


b. Predictors in the Model: (Constant), mail
c. Predictors in the Model: (Constant), mail, phone
d. Predictors in the Model: (Constant), mail, phone, print
e. Predictors in the Model: (Constant), mail, phone, print, page

Residuals Statisticsa

Minimum Maximum Mean Std. Deviation N

Predicted Value 2636.2012 34103.5547 16242.8134 5636.14978 120


Residual -8822.03613 9087.41895 .00000 2870.41572 120
Std. Predicted Value -2.414 3.169 .000 1.000 120
Std. Residual -3.021 3.112 .000 .983 120

a. Dependent Variable: men

Name: M. Hashim Javed Roll: BT-588221


Name: M. Hashim Javed Roll: BT-588221
b) Use bankloan.sav sample data file to fit a binary logistic regression model to predict
default on the basis of variables age, ed, income, debtinc, creddebt and othdebt.
Interpret your result.

Dependent Variable Encoding

Original Value Internal Value

No 0
Yes 1

Block 0: Beginning Block

Classification Tablea,b

Observed Predicted

default Percentage

No Yes Correct

No 517 0 100.0
default
Step 0 Yes 183 0 .0

Overall Percentage 73.9

a. Constant is included in the model.


b. The cut value is .500

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

Step 0 Constant -1.039 .086 145.782 1 .000 .354

Variables not in the Equation

Score df Sig.

age 13.265 1 .000

ed 9.205 1 .002

income 3.526 1 .060


Variables
Step 0 debtinc 106.238 1 .000

creddebt 41.928 1 .000

othdebt 14.863 1 .000

Overall Statistics 148.310 6 .000

Name: M. Hashim Javed Roll: BT-588221


Block 1: Method = Enter

Omnibus Tests of Model Coefficients

Chi-square df Sig.

Step 153.662 6 .000

Step 1 Block 153.662 6 .000

Model 153.662 6 .000

Model Summary

Step -2 Log likelihood Cox & Snell R Nagelkerke R


Square Square

1 650.702a .197 .289

a. Estimation terminated at iteration number 5 because parameter


estimates changed by less than .001.

Classification Tablea

Observed Predicted

default Percentage

No Yes Correct

No 483 34 93.4
default
Step 1 Yes 122 61 33.3

Overall Percentage 77.7

a. The cut value is .500

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

age -.047 .014 10.632 1 .001 .954

ed .392 .105 13.802 1 .000 1.480

income -.013 .007 3.077 1 .079 .987

Step 1a debtinc .111 .027 16.986 1 .000 1.117

creddebt .341 .088 15.186 1 .000 1.407

othdebt -.069 .062 1.221 1 .269 .933

Constant -1.198 .563 4.523 1 .033 .302

a. Variable(s) entered on step 1: age, ed, income, debtinc, creddebt, othdebt.

Name: M. Hashim Javed Roll: BT-588221


Name: M. Hashim Javed Roll: BT-588221
c) Find frequency distribution of "Preferred breakfast" for those senior citizens who
are also living an active life (Use sample data set cereal.sav)
Ans:

First step is to select those cases only, for which the life-style is recorded as active. [active=1]
bfast * agecat Crosstabulation

Count

agecat Total

Under 31 31-45 46-60 Over 60

Breakfast Bar 58 60 23 12 153

bfast Oatmeal 2 12 31 57 102

Cereal 51 45 38 17 151

Total 111 117 92 86 406

The frequency distribution of those senior citizens who are living an active life is highlighted in yellow.
It is seen that senior citizens who are living an active life prefer Oatmeal most of the time.

Name: M. Hashim Javed Roll: BT-588221


d) Carry out a test of independence of attributes "Preferred breakfast" and "marital
status". (Use cereal.sav)

Case Processing Summary

Cases

Valid Missing Total

N Percent N Percent N Percent

bfast * marital 880 100.0% 0 0.0% 880 100.0%

bfast * marital Crosstabulation


Count

marital Total

Unmarried Married

Breakfast Bar 108 123 231

bfast Oatmeal 95 215 310

Cereal 100 239 339


Total 303 577 880

Chi-Square Tests

Value df Asymp. Sig. (2-sided)

Pearson Chi-Square 21.157a 2 .000


Likelihood Ratio 20.623 2 .000

Linear-by-Linear Association 16.226 1 .000

N of Valid Cases 880

a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 79.54.

Conclusion:
As the p-value (0.000) is less than our chosen significance level (α=0.05), so we reject the null
hypothesis and accept the alternate hypothesis that there is association between Preferred Breakfast
and Marital Status.

Name: M. Hashim Javed Roll: BT-588221


e) Carry out a test of independence of attributes "Preferred breakfast" and "Age
category". (Use cereal.sav)

Case Processing Summary

Cases

Valid Missing Total

N Percent N Percent N Percent

bfast * agecat 880 100.0% 0 0.0% 880 100.0%

bfast * agecat Crosstabulation


Count

agecat Total

Under 31 31-45 46-60 Over 60

Breakfast Bar 84 90 39 18 231

bfast
Oatmeal 4 24 97 185 310

Cereal 93 92 95 59 339
Total 181 206 231 262 880

Chi-Square Tests

Value df Asymp. Sig. (2-sided)

Pearson Chi-Square 309.336a 6 .000

Likelihood Ratio 350.688 6 .000

Linear-by-Linear Association 4.986 1 .026


N of Valid Cases 880

a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 47.51.

Conclusion:
As the p-value (0.000) is less than our chosen significance level (α=0.05), so we reject the null
hypothesis and accept the alternate hypothesis that there is association between Preferred Breakfast
and Age Category.

Name: M. Hashim Javed Roll: BT-588221


f) Use grocery_coupons.sav sample data file to test that the mean of amount spent is
equal to 105. Find 90% confidence interval for the mean of amount spent. Also test
that the mean of amount spent by both male and female customers is equal. What
would to say about the equality of means for amount spent on stores of different
sizes?
Ans:
Part 1: Use grocery_coupons.sav sample data file to test that the mean of amount spent is
equal to 105. Find 90% confidence interval for the mean of amount spent.

T-Test
One-Sample Statistics

N Mean Std. Deviation Std. Error Mean

amtspent 1404 99.9338 48.54435 1.29555

One-Sample Test

Test Value = 105

t df Sig. (2-tailed) Mean Difference 90% Confidence Interval of the


Difference

Lower Upper

amtspent -3.910 1403 .000 -5.06621 -7.1986 -2.9338

Conclusion:
As the p-value (0.000) is less than our chosen significance level (α=0.1), so it is concluded that
the mean of Amount Spent is not equal and is significantly different from 105. And the 90%
Confidence interval for the mean of Amount Spent is [-7.1986 – -2.9338]

Part 2: Also test that the mean of amount spent by both male and female customers is
equal.
T-TEST GROUPS=gender(0 1)
/MISSING=ANALYSIS
/VARIABLES=amtspent
/CRITERIA=CI(.95).
T-Test
Group Statistics

gender N Mean Std. Deviation Std. Error Mean

Male 740 107.5761 49.09908 1.80492


amtspent
Female 664 91.4168 46.49620 1.80440

Name: M. Hashim Javed Roll: BT-588221


Independent Samples Test

Levene's Test for t-test for Equality of Means


Equality of Variances

F Sig. t df Sig. (2- Mean Std. Error 95% Confidence Interval


tailed) Difference Difference of the Difference

Lower Upper

Equal variances
.458 .499 6.313 1402 .000 16.15930 2.55971 11.13803 21.18058
assumed
amtspent
Equal variances
6.332 1397.923 .000 16.15930 2.55218 11.15280 21.16581
not assumed

Conclusion:
We will use the top row (Equal variances assumed) as the p-value of Leven’s test is above 0.05
As the p-value (0.000) is less than our chosen significance level (α=0.05), so it is concluded that the
mean of Amount Spent is significantly different for both male and female customers. And the 95%
Confidence interval for the mean of Amount Spent is [11.138 – 2.9338]

Part 3: What would to say about the equality of means for amount spent on stores of
different sizes?

ONEWAY amtspent BY size


/MISSING ANALYSIS.
Oneway
ANOVA
amtspent

Sum of Squares df Mean Square F Sig.

Between Groups 20053.727 2 10026.864 4.275 .014


Within Groups 3286191.636 1401 2345.604
Total 3306245.363 1403

Conclusion:
As the p-value (0.014) for is less than our chosen significance level (α=0.05), so it is concluded
that the mean of Amount Spent is not equal and is significantly different for both male and female
customers. And the 95% Confidence interval for the mean of Amount Spent is [11.138 – 2.9338]

Name: M. Hashim Javed Roll: BT-588221


Question 3 is a group assignment (Group size not more than 5 students).
a) Create a Questionnaire having 10-15 questions to collect data from 30
respondents. Topic may be of your choice. Submit the soft copy of
questionnaire through class representative not later than 20th October 2019.
b) Enter your data in SPSS. Naming of variables and data types should be
appropriate.
c) Carry out data analysis of this data set.
d) Submit report of the statistical analysis of this data in MS Word.

Name: M. Hashim Javed Roll: BT-588221


Q4. Perform the following tasks in Minitab.
a) In order to ascertain the age distribution of operatives in a certain industry, random
samples of 1720 males and 1230 females are drawn. The sample means and standard
deviations were 33.93 years and 14.20 years for the males and 27.44 years and 10.79
years for the females. Calculate the 95 percent confidence interval for
i. The mean age of all the male operatives.
i – Ans:

Variable N Mean StDev SE Mean 95% CI


C1 1720 34.173 14.047 0.339 (33.509, 34.837)

ii. The differences between their mean ages.


Ii – Ans:
Estimate for difference: 6.204
95% CI for difference: (5.305, 7.104)
T-Test of difference = 0 (vs not =): T-Value = 13.52 P-Value = 0.000 DF =
2930

b) A psychology class performed an experiment to compare whether a recall score in which


instructions to form images of 25 words were given is better than an initial recall score
for which no images instruction were given. Twelve students participated in the
experiment with the following results:
With
20 24 20 18 22 19 20 19 17 21 17 20
Imagery
Without
5 9 5 9 6 11 8 11 7 9 8 16
Imagery
Does it appear that the average recall score is higher when imaginary is used? Also
construct 95% confidence interval for the difference between the mean of both the
imageries and interpret the results.
Ans:
Paired T-Test and CI: With Imagery, Without Imagery

Paired T for With Imagery - Without Imagery

N Mean StDev SE Mean


With Imagery 12 19.750 2.006 0.579
Without Imagery 12 8.667 3.055 0.882
Difference 12 11.08 3.70 1.07

95% CI for mean difference: (8.73, 13.44)


T-Test of mean difference = 0 (vs not = 0): T-Value = 10.37 P-Value = 0.000

Name: M. Hashim Javed Roll: BT-588221


Conclusion:
The mean of recall score when imagery is 19.75 and without using imagery it is 8.667, it is
clearly obvious that the score is better when using imagery, We also tested it statistically and results
are as the p-value (0.000) for paired t-test of mean difference is less than our chosen significance
level (α=0.05), so it is concluded that the mean difference of Recall Score with Imagery and Recall
Score without Imagery is significant. And the 95% Confidence interval for the mean difference of
Recall Score is [8.73 – 13.44].

c) Generate 4 samples of sizes 5, 6, 7 and 7 from normal populations with means 45, 40, 47
and 38 respectively. While the standard deviations of these distributions are 4, 6, 7 and 8
respectively. Test the equality of means.
Ans:
One-way ANOVA: Population versus Factor

Source DF SS MS F P
Factor 3 595.3 198.4 4.48 0.014
Error 21 929.3 44.3
Total 24 1524.6

S = 6.652 R-Sq = 39.05% R-Sq(adj) = 30.34%

Individual 95% CIs For Mean Based on


Pooled StDev
Level N Mean StDev ---------+---------+---------+---------+
1 5 47.891 3.939 (---------*---------)
2 6 36.004 6.483 (--------*--------)
3 7 47.847 9.382 (--------*-------)
4 7 41.373 4.636 (--------*--------)
---------+---------+---------+---------+
36.0 42.0 48.0 54.0

Pooled StDev = 6.652

Conclusion:
The F-Statistic value is 4.48 and as the p-value (0.014) is less than our chosen significance level
(α=0.05), so it is concluded that all of the means are not equal, at least one of the means is
significantly different.

Name: M. Hashim Javed Roll: BT-588221


Q5.
a) Explain the procedure for testing of equality of several means in Minitab and SPSS.
b) Use Minitab/SPSS to test equality of means for the following experiment of wheat yield
for different varieties. Varieties are shown by A, B, C, D and E.

A (8) B (5.3) C (4.1) D (5) E (16)


D (6.8) A (4.9) B (4.1) C (3.2) E (18)
B (6.3) E (16) C (4.7) D (4.0) A (5.0)
C (5.7) D (3.3) E (25) A (4.0) B (4.2)
E (18) C (4.7) A (4.2) D (6.6) B (6.2)
Ans:
One-way ANOVA: resp versus fac

Source DF SS MS F P
fac 4 740.14 185.03 44.59 0.000
Error 20 83.00 4.15
Total 24 823.13

S = 2.037 R-Sq = 89.92% R-Sq(adj) = 87.90%

Individual 95% CIs For Mean Based on


Pooled StDev
Level N Mean StDev -----+---------+---------+---------+----
1 5 5.220 1.613 (--*---)
2 5 5.220 1.052 (--*---)
3 5 4.480 0.918 (---*---)
4 5 5.140 1.549 (---*---)
5 5 18.600 3.715 (---*---)
-----+---------+---------+---------+----
5.0 10.0 15.0 20.0

Pooled StDev = 2.037

Conclusion:
The F-Statistic value is 44.59 and as the p-value (0.000) is less than our chosen significance level
(α=0.05), so it is concluded that all of the means are not equal, at least one of the means is
significantly different.

Name: M. Hashim Javed Roll: BT-588221


Q6.
a) Consider the experiment in which two fair dice are tossed and the absolute difference of
dots is recorded. Simulate this experiment 600 times using minitab. Find the frequency
distribution of the absolute differences and find mean and variance of this distribution.
b) Compare the statistical packages SPSS and Minitab with respect to statistical data
analysis in social sciences and physical sciences.

Name: M. Hashim Javed Roll: BT-588221


Q7.
a) Perform regression analysis to predict trade on the basis of other two variables on sample
dataset Employ.MTW. Also use matrix approach to do the same task. Furthermore,
calculate predicted values.
b) Discuss the normality tests available in Minitab?

Name: M. Hashim Javed Roll: BT-588221


Q8. Perform the following tasks in R.
a) A psychology class performed an experiment to compare whether a recall score in which
instructions to form images of 25 words were given is better than an initial recall score
for which no images instruction were given. Twelve students participated in the
experiment with the following results:

With
20 24 20 18 22 19 20 19 17 21 17 20
Imagery
Without
5 9 5 9 6 11 8 11 7 9 8 16
Imagery
Does it appear that the average recall score is higher when imaginary is used? Also
construct 95% confidence interval for the difference between the mean of both the imageries
and interpret the results.
b) Consider the experiment in which two fair dice are tossed and the absolute difference of
dots is recorded. Simulate this experiment 600 times. Find the frequency distribution of
the absolute differences and find mean and variance of this distribution.

Name: M. Hashim Javed Roll: BT-588221

You might also like