Stats Chapter 3 Project Sadie and Kalani

Average Age Per State
Mean: 38.5
Median: 38.4
Mode: 37.7
Range: 13.8
5% trimmed mean: 38.5
10% trimmed mean: 38.4
Variance: 5.6
Standard Deviation: 2.4
Coefficient Variation: 6.14
75% Chebyshev: 33.7 to 43.3
88.9% Chebyshev: 31.3 to 45.7
93.8% Chebyshev: 28.9 to 48.1
Min:30.9
Q1:37.18
Median:38.4
Q3:39.48
Max:44.7
IQR:2.3
Outliers: Utah, New Hampshire, and Maine
For our chapter 3 project, Sadie and I had a difficult time coming up with a topic that had
50 data values and was something that we were interested in. We did not want to send out a
survey as there isn’t even 50 students in our class and it would take awhile for us to get the
responses. So, we thought about doing something involving the states since there are 50 of them
and we did some research and stumbled upon the average ages of each state. Since we had 50
data values, we thought it was perfect.
Box and whisker plot:
We first calculated the mean, median, mode, and range. Mean is the average number in
the data set so we added up the entire list of ages and divided them by the total number. The total
added up to 1922.8, we then divided that number by 50 because there are 50 states in the survey.
The result was 38.5, meaning the average age of individuals living in the United States is around
38.5 years old. Next, we ordered the lists of ages from lowest to highest and found the middle or
median number was 38.4. Now that our data set was already in order we used Google Sheets to
discover the mode or the most repetitive number. The mode of 37.7 was the same in four
different states. These states included: Arizona, Indiana, Washington, and Wyoming. To find our
range we discovered the difference between our highest value of 44.7 and our lowest value 30.9.
Our range ended up being 13.8, meaning our variety of numbers does not extend far.
After we calculated the mean, median, mode, and range, we decided to calculate the 5%
trimmed mean and the 10% trimmed mean. In order to accomplish this, we first started with the
5% and we multiplied 50 by 0.05 to get the amount of numbers that we needed to take off of
each end. We ended up getting 2.5 which we then rounded up to 3. After taking three numbers
off of each end, we used Google Sheets to find the average of the remaining 44 numbers; the 5%
trimmed mean is 38.5. In order for us to calculate the 10% trimmed mean, we had to multiply 50
by 0.1 to get 10%. The amount of numbers we calculated to eliminate from the data set was 5.
We then took off 5 numbers and had Google Sheets calculate the mean again which gave us an
answer of 38.4. 38.4 is the average of the remaining 40 numbers. Both of our trimmed means
were very close to being the same number. What we concluded from this closeness is that our
data is so close together that the numbers we took off of each end for each trimmed mean did not
have much of an impact on the newly calculated mean.
The next calculations that we performed were variance, standard deviation, and
coefficient of variation. Variance and standard deviation are both measures of spread but are just
slightly different; standard deviation allows the viewer of a graph or data set to acquire
increments in order to determine if other data values are considered outliers. To find variance,
Σ(x−x)2
we actually used Google Sheets but the equation used was s2 = n−1 . The final variance
was 5.6. Google Sheets was also used to obtain the standard deviation or σ (sigma). Our standard
deviation was calculated at 2.4. The closer a standard deviation is to 0, the less spread a data set
has. So, because ours was 2.4, it tells us that our data values were close to the mean and there is
not a large spread. In order to calculate coefficient of variation, we had to use the equation
σ
CV = μ
.
Succeeding the variance and standard deviation we found the 75, 88.9, and 93.8 percent
Chebyshev’s by using the specific theorems created for each individual percent. For a 75%
Chebyshev you take the mean minus two times the standard deviation. Our equations was 38.5-
2(2.4) and 38.5 + 2(2.4) and the results were 33.7 to 43.3. This range of numbers tells the viewer
that 75% of the states had an average age between 33.7 and 43.3. Next, we calculated 88.9%
Chebyshev by doing 38.5- 3(2.4) and 38.5 + 3(2.4). The results were a range between 31.3 to
45.7, meaning 88.9% of the states had an average age between 31.3 and 45.7. The last
Chebyshev we calculated was the 93.8%. This equation replicated the last two equations, but
instead the standard deviation was multiplied by four. We found the 93.8% Chebyshev as being
between 28.9 and 48.1. These numbers mean that 93.8% of our data is between ages 28.9 and
48.1. Nevertheless, the data represents that the bigger the percent, the wider the range of ages for
each state. This is due to not having a widely separated range of data.
The last calculations that we had to make were our minimum, quartile 1, median, quartile
3, and our maximum. Our min and max were easy to find because it is just our lowest number
and our highest number which was 30.9 and 44.7. It was also easy to obtain the median(38.4) as
we had already done that calculation previously. In order to find the quartile 1 number, we had to
take the numbers from the min to the median and calculate the median of that set of numbers.
For this we got 37.18. In order to find the upper quartile, we did the same calculation except with
a data set from the median to the max and obtained the number 39.48. These numbers are just the
middle of each data set. Our box and whisker plot tells us that the minimum is pretty spread out
from the Q1 and that the numbers in the data set from Q1 to Q3 are all close together. I can
conclude this because the Q1 and Q3 appear to be around the same distance away from the
median. One other thing that I can conclude is that the maximum is spread out from Q3 but not
as far as the min is from Q1.
After finding the quartile one and three we can now find the interquartile range by
subtracting quartile one from quartile three. This equation was 39.48- 37.18 giving us an IQR of
2.3. The IQR represents the middle 50 percent and eliminates the 25 percent outside of the
quartiles. By distinguishing the IQR we are giving ourselves a smaller and more useful range of
data because we eliminate any outliers that may affect our statistics. Lastly, to identify which
numbers in your data set are outliers you take your interquartile range times 1.5. IQR (2.3) x
1.5= 3.45 + Q3 (39.48)= 42.93. This number represents the last value that is not an outlier. So we
went back to our Google Sheets and found that New Hampshire and Maine had average ages
above 42.93, meaning they are outliers in our data set. But there are not only high outliers, there
can be low outliers. To find your lowest value you take 3.45 - Q3 (39.48) and end up with 36.01.
This number represents the lowest possible value without being an outlier. Our data set does not
obtain numbers that small so we do not have any bottom number outliers.

Stats Chapter 3 Project Sadie and Kalani

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stats Chapter 3 Project Sadie and Kalani

Uploaded by

Copyright:

Available Formats

Average Age Per State

You might also like