Professional Documents
Culture Documents
Math Final Project
Math Final Project
MATH 1040
5/14/2020
Link to data
https://docs.google.com/spreadsheets/d/1A0mfYxA_V-OutOE_T3ASpst0P_Nt9hqNZ_OXbwkw
aS8/edit#gid=0
In this project the 2020 class of Math 1040 bought bags of skittles at random and counted
the number of every color of skittle in the bag. We can assume that our sample used independent
sampling as long as there are more than 620 bags of skittles (31/.05) to choose from, making our
sample of 31 an independent sample. We can also assume that our data will be normally
distributed as our N>30. We will be going through a statistical analysis of our skittles data,
including the Confidence interval of how many yellow skittles as well as a Confidence interval
for how many skittles there are in the bag. For our two confidence interval equations we are
doing one that is around one data point and one that is around the whole class of data points, that
is because it is two very different processes that can both be used to make assumptions of our
data.
Confidence interval estimates:
The purpose is to find the spread in which we have a certain percentage of confidence a
value will fall inside. For example we can say that we are 90% confident that the average man
consumes between 1500 and 2000 calories every day (hypothetical). As the percentage of
confidence increases so does the spread of our interval, for example we are 99% confident that
the average man consumes between 100 and 4000 calories everyday, while it is a true statement
We are trying to estimate the true value of the proportion of yellow skittles in a bag, my best
guess is Phat= .145. However, due to sampling variability we are unlikely to be correct. So we
will be creating an z interval in which we are 99% certain our true average proportion will lie
between. I will only be using my data to make this interval so we must proceed with caution to
.145 +/- 2.575 * sqrt((.145*.855)/62) = Upper bound .2604 Lower bound .0299
Thus I am 99% sure that the interval from .0299 to .2604 captures the true proportion of yellow
skittles in a bag.
Confidence interval for how many skittles in a bag:
We are trying to estimate the average value of skittles in a bag, my best guess is the mean of the
data 60.55. However, due to sampling variability we are unlikely to be correct. So we will be
creating an t interval in which we are 95% certain our true mean will lie between. I will be using
the class data for this assumption so we can safely assume a level of accuracy due to the
X +/- T (s/sqrt(n))= 60.55 +/- 1.96 (3.812/sqrt(31)) = Lower 59.1506 Upper 61.9462
Thus we are 95% confident that the true mean of skittles in a bag is between 59.1506 and
61.9462
Hypothesis testing
A hypothesis test is used to see if a change in data is significant enough to conclude that
The hypothesis that we will be testing is if 20% of skittles in a bag are red, I will only be
using my data to make this interval so we must proceed with caution to assume that this is true
for all bags of skittles. Using my Phat of 13/62 = .21 we will be testing if the true proportion is
Since P is greater than our level of significance (A=.05) we do not reject the Null
hypothesis because there is not significant evidence to support the alternative hypothesis.
We will be using the class data to test whether the number of skittles in a bag is 55. With
the conditions stated in the introduction we can safely assume our conclusions will be accurate.
T value: 8.1063565
P value: >.000001
Since the P value is significantly smaller than our level of significance (a=.01) we can safely
reject the null hypothesis because there is evidence to support that the mean of skittles in a bag