You are on page 1of 4

Riley Meyers

MATH 1040

5/14/2020

Skittles Statistical analysis

Link to data

https://docs.google.com/spreadsheets/d/1A0mfYxA_V-OutOE_T3ASpst0P_Nt9hqNZ_OXbwkw

aS8/edit#gid=0

In this project the 2020 class of Math 1040 bought bags of skittles at random and counted

the number of every color of skittle in the bag. We can assume that our sample used independent

sampling as long as there are more than 620 bags of skittles (31/.05) to choose from, making our

sample of 31 an independent sample. We can also assume that our data will be normally

distributed as our N>30. We will be going through a statistical analysis of our skittles data,

including the Confidence interval of how many yellow skittles as well as a Confidence interval

for how many skittles there are in the bag. For our two confidence interval equations we are

doing one that is around one data point and one that is around the whole class of data points, that

is because it is two very different processes that can both be used to make assumptions of our

data.
Confidence interval estimates:

The purpose is to find the spread in which we have a certain percentage of confidence a

value will fall inside. For example we can say that we are 90% confident that the average man

consumes between 1500 and 2000 calories every day (hypothetical). As the percentage of

confidence increases so does the spread of our interval, for example we are 99% confident that

the average man consumes between 100 and 4000 calories everyday, while it is a true statement

it does not provide much information about our data.

Confidence interval for yellow skittles in bag-

We are trying to estimate the true value of the proportion of yellow skittles in a bag, my best

guess is Phat= .145. However, due to sampling variability we are unlikely to be correct. So we

will be creating an z interval in which we are 99% certain our true average proportion will lie

between. I will only be using my data to make this interval so we must proceed with caution to

assume that this is true for all bags of skittles.

PHAT- from my data= 9 yellow skittles in bag of 62 skittles 9/62= .145

.145 +/- 2.575 * sqrt((.145*.855)/62) = Upper bound .2604 Lower bound .0299

Thus I am 99% sure that the interval from .0299 to .2604 captures the true proportion of yellow

skittles in a bag.
Confidence interval for how many skittles in a bag:

We are trying to estimate the average value of skittles in a bag, my best guess is the mean of the

data 60.55. However, due to sampling variability we are unlikely to be correct. So we will be

creating an t interval in which we are 95% certain our true mean will lie between. I will be using

the class data for this assumption so we can safely assume a level of accuracy due to the

conditions stated in the intro above.

Mean of the data (x): 60.55

Standard deviation (s): 3.812

X +/- T (s/sqrt(n))= 60.55 +/- 1.96 (3.812/sqrt(31)) = Lower ​59.1506 Upper 61.9462

Thus we are 95% confident that the true mean of skittles in a bag is between 59.1506 and

61.9462

Hypothesis testing

A hypothesis test is used to see if a change in data is significant enough to conclude that

we have enough data to say that a hypothesis is false.

Hypothesis testing for proportion of red skittles in a bag:

The hypothesis that we will be testing is if 20% of skittles in a bag are red, ​ I will only be

using my data to make this interval so we must proceed with caution to assume that this is true

for all bags of skittles. Using my Phat of 13/62 = .21 we will be testing if the true proportion is

higher than .2.

H null: P=.2 H alt: P>.2


Using 1-propZtest our P is .5755

Since P is greater than our level of significance (A=.05) we do not reject the Null

hypothesis because there is not significant evidence to support the alternative hypothesis.

Hypothesis testing for the number of skittles in a bag:

We will be using the class data to test whether the number of skittles in a bag is 55. With

the conditions stated in the introduction we can safely assume our conclusions will be accurate.

H null: Mean=55 H Alt: Mean does not = 55

T value: ​8.1063565

P value: >.000001

Since the P value is significantly smaller than our level of significance (a=.01) we can safely

reject the null hypothesis because there is evidence to support that the mean of skittles in a bag

does not equal 55.

You might also like