Introduction

The data presented is meant to analyze the data obtained from the Skittles candy through the class. My

sample will be the 27 bags of skittles. I will compare this to my own single bag of skittles. (MAKE IT

LONGER)

Number of skittles in my bag: 56

Number of skittles total: 1601

Number of bags total: 27

My bag

Total

Sample

Red

.268

.225

Orange

.196

.197

Yellow

.143

.183

Green

.179

.195

Purple

.214

.2

The data seems to be relatively uniform, each color being about 1/5 of the total sample. The color red

does show to be slightly higher in quantity than the others, which does make it less uniform. As I look at

the data, I wonder if red is the least expensive color used in making skittles, which may answer as to why

the red is higher. It seems that the red is highest in both the total sample and my own bag with yellow also

being the least in quantity.

Summary for total number of Skittles per bag

Number of Bags: 27

Mean: 59.296 (1 DECIMAL PLACE) (59.3)

Standard Deviation: 2.714 (2 DECIMAL PLACE) (2.71)

Five number summary

Minimum

54

1st Quartile

58

Median

59

3rd Quartile

61

Maximum

66

The shape of the data is a bell curve with a skew to the right. This was not what I

was expecting in the data since I was hoping for a uniform non-skewed bell curve

considering that I expected all skittle bags to have a similar number of skittles per

bag. Considering that the number of skittles in my bag was 56, it landed between

the boundaries, but was still far from the center. Even so, it wasnt an outlier, so I

can say that my data did fit with the total sample data.

Reflection

Categorical is made by counts and goes into buckets. So for this data, the

categorical data would be the number of skittles for each color, the color being the

buckets and the amount of skittles being the counts. Quantitative data is measured,

and in this case there really isnt any quantitative data. There is no such thing as

1.5 skittles that is unless you bite it. We only really counted whole skittles.

The pie and patreo charts would be perfect for categorical data, having the different

bins to place the counts in, showing the proportions and frequencies of each bin,

such as the number of red candies in the pie chart. For quantitative data we would

rely on the box plots, histograms, and dot plots because we can get, for example,

1.645 ounces and place it on all three.

Categorical data could give me calculations on sums and proportions since they are

in categories while quantities data could give me information such as the five

number summary since its the total data. It would not make sense for it to be

switched since categorical data doesnt rely on the quantity as much as quantitative

data does and vise versa.

PART 2

Purpose and meaning of a Confidence Interval

Confidence Interval Tests are done in order to find the range of values used to make

an estimate as to as to what the true value of a population parameter is. By giving

this range, it provides us with the information needed to see what the CI would hold.

For example, given a 95% Confidence Interval, you would be given a range in which

you could be 95% certain contains the true population parameter, standard

deviation, mean, proportion, etc. In this project, we calculated the CI for the true

mean of number of candies per bag, standard deviation of the number of candies

per bag, and true proportion of yellow candies.

Confidence Intervals

Hypothesis Tests are conducted in order to reject or accept a null hypothesis, or the

original claim. It is conducted by stating a Null Hypothesis (the original claim), and

an Alternative Hypothesis. The null hypothesis is usually supported by fact, for

example the national average weight for men, while the Alternative Hypothesis is

just that. It is tested out to reject or fail to reject the null hypothesis.

When given alpha you convert that into a z-score, the critical value, and then find

the test statistic of the Null Hypothesis. If the test statistic is beyond the boundaries

set by the critical value, you reject the null hypothesis.

Hypothesis Tests

Reflection

Conditions, Errors and Possible Improvement

99% confidence interval estimate for the true proportion of yellow

candies.

*It is a simple random sample

*It is a Binomial Distribution: Fixed number, independent, two possible outcomes,

and equal probability among all outcomes.

*np greater than or equal to 5 and nq greater than or equal to 5 = Met

n = 1601 p = .183

(n)(p) = 1601 5

(n)(q) = 1308 5

95% confidence interval estimate for the true mean number of candies per

bag.

*It is a simple random sample

* It is normally distributed

While the Histogram from before didnt exactly seem completely normally

distributed, it still had a bell curve and we will assume that there are some outliers.

98% confidence interval estimate for the standard deviation of the number

of candies per bag.

*Simple random sample = Met

*Normally Distributed = Met

Again, while the Histogram isnt exactly completely normally distributed, it still had

a curve bell and only a slight skew.

Hypothesis Test: 20% of all candies are Red

*np >5 = (1601)(.20) = 320.2 >5 Condition met.

*nq >5 = (1601)(.80) = 1280.8 >5 Condition met.

*Binomial Distribution = Our data is independent, there are two outcomes, (red or

not red) and the probability of getting an outcome is the same all around. A specific

probability for getting red vs. not getting red. The number of skittles is also fixed,

not changing.

*It is a simple random sample.

Hypothesis Test: Mean number of candies is 55

*The sample population is normally distributed.

While the Histogram isnt exactly completely normally distributed, it still had a

curve bell and only a slight skew.

Drawbacks

normally distributed or have a sample size greater than 30. This is why the

condition for both the true mean number and the standard deviation of the

number of candies may be slightly off. In the tests I assumed that both

samples were normally distributed while a slight skew can be seen in the

boxplot. It wouldnt have been much of a problem if we had added three

more samples, meeting condition where the sample size is greater than 30.

Another thing I want to note is that it would be expected for there to be a

normal bell curve. After all, if the company wanted to keep it around a

specific ounce, you would expect that the number of candies would be

around a specific number.

Discuss Results

I do not believe that I encountered any trouble with finding the confidence intervals

for the project. The book was straight forward in how to proceed with the tests. The

mean we got from our sample was 59.3, within the 95% confidence interval of (59,

61). The standard deviation we got from the sample was 2.71, within the 98%

confidence interval of (2.045, 3.957). This is also true with the true proportion of

yellow candies, where the sample got a proportion of .183, the 99% confidence

interval being between (.15811,.20789).

Even so, the hypothesis test regarding whether or not 20% of all skittles are red

resulted in rejected the null hypothesis. The hypothesis test with the mean number

of candies being 55 also resulted in the rejection of the null hypothesis.

