You are on page 1of 12

Cindy Vega

The data presented is meant to analyze the data obtained from the Skittles candy through the class. My
sample will be the 27 bags of skittles. I will compare this to my own single bag of skittles. (MAKE IT

Data on the Colors in the total Sample:

Comparison of My Bag vs. Whole

Number of skittles in my bag: 56
Number of skittles total: 1601
Number of bags total: 27

My bag






The data seems to be relatively uniform, each color being about 1/5 of the total sample. The color red
does show to be slightly higher in quantity than the others, which does make it less uniform. As I look at
the data, I wonder if red is the least expensive color used in making skittles, which may answer as to why
the red is higher. It seems that the red is highest in both the total sample and my own bag with yellow also
being the least in quantity.
Summary for total number of Skittles per bag
Number of Bags: 27
Mean: 59.296 (1 DECIMAL PLACE) (59.3)
Standard Deviation: 2.714 (2 DECIMAL PLACE) (2.71)
Five number summary

1st Quartile


3rd Quartile


The shape of the data is a bell curve with a skew to the right. This was not what I
was expecting in the data since I was hoping for a uniform non-skewed bell curve
considering that I expected all skittle bags to have a similar number of skittles per
bag. Considering that the number of skittles in my bag was 56, it landed between
the boundaries, but was still far from the center. Even so, it wasnt an outlier, so I
can say that my data did fit with the total sample data.
Categorical is made by counts and goes into buckets. So for this data, the
categorical data would be the number of skittles for each color, the color being the
buckets and the amount of skittles being the counts. Quantitative data is measured,
and in this case there really isnt any quantitative data. There is no such thing as
1.5 skittles that is unless you bite it. We only really counted whole skittles.
The pie and patreo charts would be perfect for categorical data, having the different
bins to place the counts in, showing the proportions and frequencies of each bin,
such as the number of red candies in the pie chart. For quantitative data we would
rely on the box plots, histograms, and dot plots because we can get, for example,
1.645 ounces and place it on all three.
Categorical data could give me calculations on sums and proportions since they are
in categories while quantities data could give me information such as the five
number summary since its the total data. It would not make sense for it to be
switched since categorical data doesnt rely on the quantity as much as quantitative
data does and vise versa.

Purpose and meaning of a Confidence Interval
Confidence Interval Tests are done in order to find the range of values used to make
an estimate as to as to what the true value of a population parameter is. By giving
this range, it provides us with the information needed to see what the CI would hold.
For example, given a 95% Confidence Interval, you would be given a range in which
you could be 95% certain contains the true population parameter, standard
deviation, mean, proportion, etc. In this project, we calculated the CI for the true
mean of number of candies per bag, standard deviation of the number of candies
per bag, and true proportion of yellow candies.

Confidence Intervals

Purpose and Meaning of a Hypothesis test

Hypothesis Tests are conducted in order to reject or accept a null hypothesis, or the
original claim. It is conducted by stating a Null Hypothesis (the original claim), and
an Alternative Hypothesis. The null hypothesis is usually supported by fact, for
example the national average weight for men, while the Alternative Hypothesis is
just that. It is tested out to reject or fail to reject the null hypothesis.
When given alpha you convert that into a z-score, the critical value, and then find
the test statistic of the Null Hypothesis. If the test statistic is beyond the boundaries
set by the critical value, you reject the null hypothesis.

Hypothesis Tests

Conditions, Errors and Possible Improvement
99% confidence interval estimate for the true proportion of yellow
*It is a simple random sample
*It is a Binomial Distribution: Fixed number, independent, two possible outcomes,
and equal probability among all outcomes.
*np greater than or equal to 5 and nq greater than or equal to 5 = Met
n = 1601 p = .183
(n)(p) = 1601 5
(n)(q) = 1308 5
95% confidence interval estimate for the true mean number of candies per
*It is a simple random sample
* It is normally distributed
While the Histogram from before didnt exactly seem completely normally
distributed, it still had a bell curve and we will assume that there are some outliers.
98% confidence interval estimate for the standard deviation of the number
of candies per bag.
*Simple random sample = Met
*Normally Distributed = Met
Again, while the Histogram isnt exactly completely normally distributed, it still had
a curve bell and only a slight skew.
Hypothesis Test: 20% of all candies are Red
*np >5 = (1601)(.20) = 320.2 >5 Condition met.
*nq >5 = (1601)(.80) = 1280.8 >5 Condition met.
*Binomial Distribution = Our data is independent, there are two outcomes, (red or
not red) and the probability of getting an outcome is the same all around. A specific
probability for getting red vs. not getting red. The number of skittles is also fixed,
not changing.
*It is a simple random sample.
Hypothesis Test: Mean number of candies is 55

*The sample is a simple random sample.

*The sample population is normally distributed.
While the Histogram isnt exactly completely normally distributed, it still had a
curve bell and only a slight skew.

The normal distribution condition: A normal distribution needs to be either

normally distributed or have a sample size greater than 30. This is why the
condition for both the true mean number and the standard deviation of the
number of candies may be slightly off. In the tests I assumed that both
samples were normally distributed while a slight skew can be seen in the
boxplot. It wouldnt have been much of a problem if we had added three
more samples, meeting condition where the sample size is greater than 30.
Another thing I want to note is that it would be expected for there to be a
normal bell curve. After all, if the company wanted to keep it around a
specific ounce, you would expect that the number of candies would be
around a specific number.

Discuss Results
I do not believe that I encountered any trouble with finding the confidence intervals
for the project. The book was straight forward in how to proceed with the tests. The
mean we got from our sample was 59.3, within the 95% confidence interval of (59,
61). The standard deviation we got from the sample was 2.71, within the 98%
confidence interval of (2.045, 3.957). This is also true with the true proportion of
yellow candies, where the sample got a proportion of .183, the 99% confidence
interval being between (.15811,.20789).
Even so, the hypothesis test regarding whether or not 20% of all skittles are red
resulted in rejected the null hypothesis. The hypothesis test with the mean number
of candies being 55 also resulted in the rejection of the null hypothesis.