Female Body Weight vs Female Body Mass Index

Term Project
Summer Math 1040/ Intro to Statistics
7-11-2014

Barbara Freeman
Ashley Lingwall
Cami Einerson
Treasa Porter








1. Statistical Question

For this term project we are determining if there is a correlation between female body
weight and female body mass index (BMI). The data we collected from Mario Triola's
Elementary Statistics, 12
th
ed., published by Pearson, will define if a correlation exists such that
female weight(x) depends on female BMI (y). Data is in table 1 (page 2). We recognize that a
linear correlation between the two data sets will not determine causation. Our statistical question
is as follows: Is there a correlation between female body weight (x) and female BMI (y)?


2. Hypothesis
Using natural observation and experience, our group developed a unanimous hypothesis.
We reason that as female body weight increases, female BMI will increase accordingly, and
that a higher BMI will see a higher body weight. In the following pages you will find the raw
data sets, 5 number summaries, shape, outliers, mean, and standard deviation of data set female
body weight x and female BMI y. Also include are charts, graphs, linear correlation coefficient,
r, and the equation for the best fit line. Analysis of all of the collected data will either confirm or
deny the following hypothesis: As female body weight increases, female BMI will increase
accordingly.



TABLE 1. FEMALE WEIGHTS X and BMI OBSERVED Y
Weight kg BMI
59.3 22.13
74.5 27.2
77.7 29.21
97.9 35.4
71.7 26.79
60.9 20.85
60.5 25.68
43.8 18.71
47.9 19.43
64.8 24.91
75.6 26.6
81 33.11
72.8 28.65
67.3 24.96
58 20.31
54.1 20.85
59.6 26.21
48.9 17.62
75.3 26.36
60 23.7
67.3 24.22
77.3 33.68
49.7 20.37
58.4 26.41
82.2 29.02
79.5 31.21
80.3 29.18
56.4 25.89
64.3 23.5
62.3 24.8
74.6 27.2
92.6 31.3
65 22.44


3-4. Calculated Data – Data Summaries
Furthermore, the data was entered, a scatter plot was created to receive a visual of the
data points. After analyzing the scatter plot it was determined that there was sufficient reason to
continue with the analysis. Data set X presented a mean of 74.81 and a standard deviation of
20.70. Data set Y presented a mean of 28.44, standard deviation of 7.4. The data is shown on
Table 2 and 3 on following page: Columns.
According to the X and Y data, we found the shape to be skewed to the right from the histograms
of the quartiles. See Table 4.
When testing for outliers, based on our 1 variable summary, 7 of them were found to be outliers,
and were removed from further analysis. Table 5 is our Outliers.

The original formula was Y=2.86 + 0.342*X. After removing the outliers we used the formula
Y=0.3108 * 14.7727. R2=0.8948.
The raw data histogram table is also included on Table 6.






TABLES 2 & 3

WEIGHT SUMMARY (X)
Mean 74.8275
Standard Error 3.2711011
Median 72.25
Mode 67.3
Standard Deviation 20.68825986
Sample Variance 428.0040962
Kurtosis 0.13870212
Skewness 0.886657164
Range 82.8
Minimum 43.8
Maximum 126.6
Sum 2993.1
Count 40


BMI SUMMARY (Y)

Mean 28.44075
Standard Error 1.169106079
Median 26.505
Mode 27.2
Standard Deviation 7.394076072
Sample Variance 54.67236096
Kurtosis 0.196496055
Skewness 0.900222236
Range 29.62
Minimum 17.62
Maximum 47.24
Sum 1137.63
Count 40


TABLE 4. HISTOGRAM QUARTILES OF THE WEIGHTS (X) & BMI (Y)


WEIGHT (X) VALUES BMI (Y) VALUES
MIN: 43.8 MIN: 17.62
Q1: 58.85 Q1: 22.285
Q2: 72.25 Q2: 26.505
Q3: 76.45 Q3: 28.835
MAX: 126.6 MAX: 47.24



0
1
2
3
4
5
More 43.8
F
r
e
q
u
e
n
c
y

Bin
Weight Quartiles(x)
Frequency
0
1
2
3
4
32.51 17.62 More
F
r
e
q
u
e
n
c
y

Bin
BMI Quartiles (Y)
TABLE 5. OUTLIERS FOR WEIGHT (X) AND BMI (Y) VALUES

EQUATIONS FOR THE OUTLIERS FOR WEIGHT
(X)
Q3-Q1=IQR 17.6
1.5*IQR 26.4
Q1-(1.5*IQR) 32.45
Q3+(1.5*IQR) 102.85















EQUATIONS FOR THE OUTLIERS FOR BMI (Y)
Q3-Q1=IQR 6.55
1.5*IQR 9.825
Q1-(1.5*IQR) 15.735
Q3+(1.5*IQR) 35.385
TABLE 6. RAW DATA HISTOGRAMS OF WEIGHTS (X) & BMI (Y)






0
2
4
6
8
10
12
14
71.4 85.2 57.6 More 99 112.8 43.8
FREQUENCY
WEIGHTS
Weight Values (X)
0
2
4
6
8
10
12
14
16
F
r
e
q
u
e
n
c
y

BMI
BMI (Y)
Frequency
5. Calculate Linear Correlation (X vs Y)
When calculating the linear correlation coefficient r, we found a significant correlation
existed. Therefore we were able to proceed with the linear regression analysis (y=mx+b). The
best fit line equation we calculated is y= .3108x + 4.7727. See Table 7 below: Linear
Regression Line Plot.

TABLE 7. LINEAR REGRESSION COEFFECIENT PLOT





y = 0.3108x + 4.7727
R² = 0.8006
0
5
10
15
20
25
30
35
40
0 20 40 60 80 100 120
Series1
Linear (Series1)
6. Observed Y – Predicted Y
We found predicted Y by calculating the best fit line. See the raw data for predicted Y on
Table 8 shown on following page.
Residual Y vs X plot does not show a discrete pattern. Table 9 shown on following page shows
a random pattern and the spread does not increase or decrease. Therefore, the data is linearly
correlated.

7. Linear Model Validation
All three linear criteria were met in our data. The best fit line equation we calculated is
y= .3108x + 4.7727, R= .8948. This shows that we are correct in our hypothesis, when weight
increases the BMI increases. There is a clear linear correlation between weight and BMI.








TABLE 8. DATA SETS OF THE PREDICTED Y AND THE RESIDUAL Y
VALUES
Predicted Y Residual=Observed Y-Predicted Y X
Residual=Observed Y-Predicted
Y
Y=2.86+0.342*x Weight
23.1406 -1.0106 59.3 -1.0106
28.339 -1.139 74.5 -1.139
29.4334 -0.2234 77.7 -0.2234
36.3418 -0.9418 97.9 -0.9418
27.3814 -0.5914 71.7 -0.5914
23.6878 -2.8378 60.9 -2.8378
23.551 2.129 60.5 2.129
17.8396 0.8704 43.8 0.8704
19.2418 0.1882 47.9 0.1882
25.0216 -0.1116 64.8 -0.1116
28.7152 -2.1152 75.6 -2.1152
30.562 2.548 81 2.548
27.7576 0.8924 72.8 0.8924
25.8766 -0.9166 67.3 -0.9166
22.696 -2.386 58 -2.386
21.3622 -0.5122 54.1 -0.5122
23.2432 2.9668 59.6 2.9668
19.5838 -1.9638 48.9 -1.9638
28.6126 -2.2526 75.3 -2.2526
23.38 0.32 60 0.32
25.8766 -1.6566 67.3 -1.6566
29.2966 4.3834 77.3 4.3834
19.8574 0.5126 49.7 0.5126
22.8328 3.5772 58.4 3.5772
30.9724 -1.9524 82.2 -1.9524
30.049 1.161 79.5 1.161
30.3226 -1.1426 80.3 -1.1426
22.1488 3.7412 56.4 3.7412
24.8506 -1.3506 64.3 -1.3506
24.1666 0.6334 62.3 0.6334
28.3732 -1.1732 74.6 -1.1732
34.5292 -3.2292 92.6 -3.2292
25.09 -2.65 65 -2.65


TABLE 9. SCATTER PLOT OF THE RESIDUAL Y DATA










-4
-3
-2
-1
0
1
2
3
4
5
0 10 20 30 40
Residual=Observed Y-Predicted Y
Residual=Observed Y-
Predicted Y
8. Make A Prediction
The female weights will correlate with BMI using the formula Y= 0.3108x + 4.7727.
The following are some predictions:
.3108(95) + 4.7727 = 34.2987
.3108(60) + 4.7727 = 23.4207
.3108(75) + 4.7727 = 28.0827
.3108(50) + 4.7727 = 20.3127
.3108(30) + 4.7727 = 14.0967

9. Afterthought
In conclusion, our initial hypothesis is proven correct. There is a strong correlation
between female weight and their BMI. We have shown with the collected research and statistical
analysis that as a female individuals weight increases, their BMI increases too. When looking at
the information provided, there was sufficient statistical data to accurately show our evidence
and to ultimately exhibit our outcome. When considering our sample technique, we did obtain
data through convenience because of the absence of time and availability. Some important
questions would be, does this sampling technique accurately help us define information, or
would another sampling technique be more useful? Could the information be more impactful to
the audience with more time? Having said that, the data outcome is being accurately represented
and this conclusion makes sense because it correlates with evident health related studies that
involve female health biology.