You are on page 1of 8

Fatima Freeman

April 28, 2014


IT 223
IT-223 - Assignment #4

Creaet a Word document in which you paste all graphs and provide your discussion. For credit, your discussions of
the questions asked below MUST demonstrate that you have thought about and can therefore describe the
underlying concepts.
Learning Objectives:
Demonstrate that you have been practicing and experimenting with SPSS.
Demonstrate in your answers that you understand the underlying concepts!
Questions:

1. (15) Use the dataset beer.sav for this problem.This dataset provides the alcohol
percentage, calorie content and carbohydrate content for a series of beer
brands. You suspect that the amount of carbohydrate in a beer should be a
pretty strong predictor of the calorie content. Using SPSS, create a scatterplot
of carbohydrate v.s. calories. Do a regression analysis on this data. On your
scatterplot, include the regression line. Give the correlation coefficient.
Describe this scatterplot using the terminology we emphasized in class. Also
describe R2 in terms of how it applies to this dataset. Provide the regression
model that predicts calories from carbohydrates. Think of how in our last BeerBAC model, figuring out a way to include the size of the person improved our
model significantly. Try to come up with at least 1 data point or change you
could make to improve this beer dataset so that you end up with a more
accurate regression model.

R=0.80
R2=0.65
The linear relationship is strong.

ANOVA
Model

Sum of Squares

df

Mean Square

Regression

981.316

981.316

Residual

539.833

84

6.427

1521.149

85

Total
a. Dependent Variable:
b. Predictors: (Constant),

F
152.697

Sig.
.000

The size of a person can improve the model because this will tell us how much the person
weight and how many calories that they are taking in per beer. And this will show us if
the person is adding weight on because of their beer intake.

2. (25) Use the dataset brain_vs_body_weight.sav . This dataset is a list of brain


weights (in kilograms) and body weights for a series of animals. You suspect
(correctly!) that brain weight should be a pretty good predictor of body weight.
To see if this is so, do a regression analysis. When you look at the numbers
alone, you should see a fairly high correlation coefficient. However, when you
graph the data, you will see that there are some datapoints that may be either
outliers, influential, or both. Discuss which it is in your document. Next,
explore with SPSS and figure out how to sort by body weight (its not difficult
to do.). Then delete the four highest values from your dataset (select the four
rows, and clear them) and do another regression analysis. Graph this dataset as
well. Looking at your graph and at the regression analysis table provided by
SPSS, discuss why taking out only 4 values significantly changed R/R2.

a.
b. There is a huge gap between the rest of the datapoints and two other

datapoints have a huge gap in between.


14.0 0.0050 0.14
40.0 0.01 0.25
20.0 0.023 0.3
55.0 0.048 0.33
39.0 0.023 0.4
15.0 0.06 1.0
38.0 0.12 1.0
53.0 0.075 1.2
52.0 0.28 1.9
48.0 0.55 2.4
61.0 0.104 2.5
59.0 0.9
2.6
54.0 0.122 3.0

23.0 0.785
34.0 3.5
11.0 0.101
26.0 0.2
8.0 1.04
12.0 0.92
18.0 1.7
10.0 0.425
13.0 1.0
3.0 1.35
16.0 3.5
60.0 1.62
43.0 2.5
17.0 2.0
31.0 0.75
41.0 1.4
2.0 0.48
37.0 4.05
27.0 1.41
50.0 3.6
57.0 3.0
25.0 3.3
51.0 4.288
1.0 3.385
62.0 4.235
36.0 35.0
9.0 4.19
49.0 60.0
7.0 14.83
6.0 27.66
24.0 10.0
5.0 36.33
45.0 100.0
58.0 160.0
44.0 55.5
35.0 6.8
47.0 10.55
56.0 192.0

3.5
3.9
4.0
5.0
5.5
5.7
6.3
6.4
6.6
8.1
10.8
11.4
12.1
12.3
12.3
12.5
15.5
17.0
17.5
21.0
25.0
25.6
39.2
44.5
50.4
56.0
58.0
81.0
98.2
115.0
115.0
119.5
157.0
169.0
175.0
179.0
179.5
180.0

30.0 85.0 325.0


29.0 207.0 406.0
21.0 187.1 419.0
4.0 465.0 423.0
46.0 52.16 440.0
42.0 250.0 490.0
22.0 521.0 655.0
28.0 529.0 680.0
32.0 62.0 1320.0
19.0 2547.0 4603.0
33.0 6654.0 5712.0
14.0 0.0050 0.14
40.0 0.01 0.25
20.0 0.023 0.3
55.0 0.048 0.33
39.0 0.023 0.4
15.0 0.06 1.0
38.0 0.12 1.0
53.0 0.075 1.2
52.0 0.28 1.9
48.0 0.55 2.4
61.0 0.104 2.5
59.0 0.9
2.6
54.0 0.122 3.0
23.0 0.785 3.5
34.0 3.5
3.9
11.0 0.101 4.0
26.0 0.2
5.0
8.0 1.04 5.5
12.0 0.92 5.7
18.0 1.7
6.3
10.0 0.425 6.4
13.0 1.0
6.6
3.0 1.35 8.1
16.0 3.5
10.8
60.0 1.62 11.4
43.0 2.5
12.1
17.0 2.0
12.3

31.0 0.75
41.0 1.4
2.0 0.48
37.0 4.05
27.0 1.41
50.0 3.6
57.0 3.0
25.0 3.3
51.0 4.288
1.0 3.385
62.0 4.235
36.0 35.0
9.0 4.19
49.0 60.0
7.0 14.83
6.0 27.66
24.0 10.0
5.0 36.33
45.0 100.0
58.0 160.0
44.0 55.5
35.0 6.8
47.0 10.55
56.0 192.0
30.0 85.0
29.0 207.0
21.0 187.1
4.0 465.0
46.0 52.16
42.0 250.0
22.0 521.0
c.

12.3
12.5
15.5
17.0
17.5
21.0
25.0
25.6
39.2
44.5
50.4
56.0
58.0
81.0
98.2
115.0
115.0
119.5
157.0
169.0
175.0
179.0
179.5
180.0
325.0
406.0
419.0
423.0
440.0
490.0
655.0

d.
e. The biggest body weight was removed and R/R2 changed because the numbers are
smaller. Before the 4 largest number caused an outlier because as the data began to
increase in numbers. The gap opened wider but now that the 4 largest numbers are
gone now the gap is much smaller.