Professional Documents
Culture Documents
Data Analysis and Statistical Packages 1
Data Analysis and Statistical Packages 1
a) Use catalog.sav sample data file to fit a multiple linear model to predict "Sales of
Men's Clothing" on the basis of varibales "Number of Catalogs Mailed", "Number
of Pages in Catalog", "Number of Phone Lines Open for Ordering", "Amount
Spent on Print Advertising" and "Number of Customer Service Representatives".
Use forward selection, backward elimination and enter methods in this respect.
Interpret your result in each case.
Variables Entered/Removeda
Model Summarye
Coefficientsa
Tolerance
Residuals Statisticsa
No 0
Yes 1
Classification Tablea,b
Observed Predicted
default Percentage
No Yes Correct
No 517 0 100.0
default
Step 0 Yes 183 0 .0
Score df Sig.
ed 9.205 1 .002
Chi-square df Sig.
Model Summary
Classification Tablea
Observed Predicted
default Percentage
No Yes Correct
No 483 34 93.4
default
Step 1 Yes 122 61 33.3
First step is to select those cases only, for which the life-style is recorded as active. [active=1]
bfast * agecat Crosstabulation
Count
agecat Total
Cereal 51 45 38 17 151
The frequency distribution of those senior citizens who are living an active life is highlighted in yellow.
It is seen that senior citizens who are living an active life prefer Oatmeal most of the time.
Cases
marital Total
Unmarried Married
Chi-Square Tests
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 79.54.
Conclusion:
As the p-value (0.000) is less than our chosen significance level (α=0.05), so we reject the null
hypothesis and accept the alternate hypothesis that there is association between Preferred Breakfast
and Marital Status.
Cases
agecat Total
bfast
Oatmeal 4 24 97 185 310
Cereal 93 92 95 59 339
Total 181 206 231 262 880
Chi-Square Tests
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 47.51.
Conclusion:
As the p-value (0.000) is less than our chosen significance level (α=0.05), so we reject the null
hypothesis and accept the alternate hypothesis that there is association between Preferred Breakfast
and Age Category.
T-Test
One-Sample Statistics
One-Sample Test
Lower Upper
Conclusion:
As the p-value (0.000) is less than our chosen significance level (α=0.1), so it is concluded that
the mean of Amount Spent is not equal and is significantly different from 105. And the 90%
Confidence interval for the mean of Amount Spent is [-7.1986 – -2.9338]
Part 2: Also test that the mean of amount spent by both male and female customers is
equal.
T-TEST GROUPS=gender(0 1)
/MISSING=ANALYSIS
/VARIABLES=amtspent
/CRITERIA=CI(.95).
T-Test
Group Statistics
Lower Upper
Equal variances
.458 .499 6.313 1402 .000 16.15930 2.55971 11.13803 21.18058
assumed
amtspent
Equal variances
6.332 1397.923 .000 16.15930 2.55218 11.15280 21.16581
not assumed
Conclusion:
We will use the top row (Equal variances assumed) as the p-value of Leven’s test is above 0.05
As the p-value (0.000) is less than our chosen significance level (α=0.05), so it is concluded that the
mean of Amount Spent is significantly different for both male and female customers. And the 95%
Confidence interval for the mean of Amount Spent is [11.138 – 2.9338]
Part 3: What would to say about the equality of means for amount spent on stores of
different sizes?
Conclusion:
As the p-value (0.014) for is less than our chosen significance level (α=0.05), so it is concluded
that the mean of Amount Spent is not equal and is significantly different for both male and female
customers. And the 95% Confidence interval for the mean of Amount Spent is [11.138 – 2.9338]
c) Generate 4 samples of sizes 5, 6, 7 and 7 from normal populations with means 45, 40, 47
and 38 respectively. While the standard deviations of these distributions are 4, 6, 7 and 8
respectively. Test the equality of means.
Ans:
One-way ANOVA: Population versus Factor
Source DF SS MS F P
Factor 3 595.3 198.4 4.48 0.014
Error 21 929.3 44.3
Total 24 1524.6
Conclusion:
The F-Statistic value is 4.48 and as the p-value (0.014) is less than our chosen significance level
(α=0.05), so it is concluded that all of the means are not equal, at least one of the means is
significantly different.
Source DF SS MS F P
fac 4 740.14 185.03 44.59 0.000
Error 20 83.00 4.15
Total 24 823.13
Conclusion:
The F-Statistic value is 44.59 and as the p-value (0.000) is less than our chosen significance level
(α=0.05), so it is concluded that all of the means are not equal, at least one of the means is
significantly different.
With
20 24 20 18 22 19 20 19 17 21 17 20
Imagery
Without
5 9 5 9 6 11 8 11 7 9 8 16
Imagery
Does it appear that the average recall score is higher when imaginary is used? Also
construct 95% confidence interval for the difference between the mean of both the imageries
and interpret the results.
b) Consider the experiment in which two fair dice are tossed and the absolute difference of
dots is recorded. Simulate this experiment 600 times. Find the frequency distribution of
the absolute differences and find mean and variance of this distribution.