You are on page 1of 6

The Skittles Project

Introduction
The purpose of this project is to use a common item, Skittles, in order to produce
statistical analysis. Below, you will find graphics supporting data I have compiled
from our class population, work between my group and I, and information about my
own sample portion. I have contrasted and compared my prior hypotheses and then
have followed up with the data to support my findings.

Within my own 2.17-ounce bag of Skittles, I had: 16 Red, 12 Orange, 5


Yellow, 14 Green, and 12 Purple candies.

Organizing and Displaying Qualitative Data: COLOR

1. Predictions: What proportion of the Skittles do you expect to see of each color? Complete
the table below, and also briefly explain why you made those predictions.

Red Orange Yellow Green Purple


Predicted 0.6 0.5 0.3 0.025 0.025
Proportion for
each color
When I think of the original bag and advertising of Skittles, I picture a red bag, and also
when I picture a bowl full of Skittles, I picture red skittles to be the dominant color,
followed closely by orange and yellow, with green and purple being the least frequent to
be found.

2. Data: Create a table that displays the counts and the proportions by color and also the
totals from your own bag of candies, together with the data for the entire class sample.

Red Orange Yellow Green Purple Total Count


Counts for my
bag 16 12 5 14 12 59
Counts for the
entire class 829 759 819 688 766 3861
sample
Actual
Proportions for 0.27 0.20 0.08 0.24 0.20 1
my bag
Actual
Proportions for
0.215 0.197 0.212 0.178 0.198 1
the entire class
sample

1
Graphics for Qualitative Data

Group Discussion Consensus

The majority of the group believes that this is indeed a random sample, and the
population we are sampling from are all of the bags of skittles for sale in Salt Lake
Valley. This was a random sample because we all didn't go to the same store, nor did
we buy our bags at the same time, and there was no way for us to have known the

2
amounts of each color within the bags before purchasing, thus reinforcing the
randomness.

Observations

Like I mentioned earlier, simply based on their advertising, I associate the


colors red and yellow with classic Skittles, so I expected to see both of those
colors the most out of all. However, for both myself and the class, the reality was
different. With my own bag, I saw green as the second most frequent color, which
surprised me. I had opened my bag, expecting to pour out perhaps 8 green
candies, if that. As far as the class data is concerned, it seems like the frequency
of purple was the most surprising to me. I noticed that when it came to small
samples, like each of our own individual bags, the colors were pretty random. I
even noticed that some people had bags with around 60 or more candies. As a
whole class population, though, it seemed like things kind of evened out with my
highest expectations.

PART 3: Organizing and Displaying Quantitative Data: Total Candies per Bag

Summary statistics:

Mean number of candies per bag 59.4

Standard deviation of the number of candies per bag 8.1

5-number summary for the number of candies per bag 25, 58, 59, 60, 110

Histogram:

3
Boxplot:

Concerning the shape of the distribution: the histogram and the boxsplot are both
skewed left, I believe, since the tail seems to be longer on the right side. I expected
more of a bell-shaped graph for the histogram, and I knew there would be some
outliers. What I didn’t expect was the frequency of 58 whole candies per package.
(Who was the lucky person who claimed to have gotten 110 candies in their bag, by
the way? Or were they, by chance, using a much bigger bag?)

The outliers would be the bag with only 25 candies versus


somewhere in the 50’s range, along with the point clear out at 110. Having an
outlier like 110 made the right whisker much, much longer than it might have
been. The other extreme, 25 total candies, made for some interesting assumptions
in my mind while creating the graphics. Such as, I feel like the first outlier ate half
their bag first before beginning the assignment, and the second outlier combined
the totals of two bags. I guess it can be assumed than when buying a Skittles
original sized bag at the store, you can safely assume you’ll be getting somewhere
between 50-60 candies in all.

Group Discussion Consensus

The difference between qualitative data and quantitative data is that qualitative data
is all about the uniqueness and attributes of something. The item looked at
differentiates itself from the rest of the pack because of its uniqueness, with any scope
of quality. Qualitative data can have attributes and adjectives attached to them. Pie
charts would work well here, as visual aspects play well into other attributes of
qualitative data. Also, dot plots would work well because just from sheer view, a
layperson would be able to see how often a lottery ticket winner comes from X state,
for example.

Quantitative data is all about the ability to manipulate, through arithmetic, to


assemble numbers, count them, and create information that can be expressed through
numerical means. Histograms, bar charts are some of the best tools we have at our
access to organize quantitative data, as it has the ability to be incredibly specific
(down to fractions and decimal points). Back to the lottery example, we could make
the study extremely precise and count the winners by state down to the decimal point,
as well, and have the public wonder if there are certain "luckier" states than others.

Next, let’s talk about confidence intervals in our candy proportions.

4
Proportion Yellow Candies:

T-interval: [5 (lower), 23 (upper), 12.4 (mean), 8.1 (population standard deviation),


819 (n), .99 (C-level)] = 11.669, 13.131

Thus, we have 99% confidence that in a random bag of Skittles, there will be between
11.669 and 13.131 yellow candies.

Mean Number of Candies:

T-interval: [xbar: 58.5, Sx: 8.1, n: 3861, C-level: .95)] = 58.244, 58.756

Thus, we have 95% confidence that in a random bag of candies, the mean number will
be between 58.244 and 58.756.

(Note: Row 7 with the total count of 110 in their bag really threw this mean off from
where it “ought” to be.)

The purpose and meaning of a confidence interval:


A confidence interval is a range of values around the statistic from a sample of
observations (all of which are around the mean). The C.I. helps us believe, based on
the factors provided, what percentage is probable, like 95-99%.

What factors affect the width of a confidence interval? Why?


Sample size plays a big part in width. The bigger the same size, the larger the width of
the C.I. Another way to change the width would be to lower the confidence level. A
95% confidence rate can allow for a much narrower spread than a 99%. The width
changes because there is a much smaller or larger range of data.

CONCLUSION

This project truly took me by surprise, first by the simplicity in which it started,
then how quickly it accumulated into something quite unique and thorough. With each
new part of the project, I felt as if we were studying a cube, then tilting it ever so
slightly to see something as simple as Skittles in a new way. I realized shortly after
starting that I would begin to count other parts of food I was eating throughout the
semester. (16 oz. of almond milk, 38 red grapes, 30 of which were seedless, and so on.)
This project helped open my mind up to how many factors go into seemingly very
ordinary things. I found it fascinating that I could apply these same tests to other
parts of my life, such as my average gas mileage, my mean time connected to the
internet, and the proportion of that, just looking at a screen. Stats has a way of
putting everything within accessible, tangible terms. It made me much more observant
of numbers wherever I saw them, then began to construct ideas about what those
numbers meant and how I could manipulate them to my advantage. All in all, this
turned out to be an extremely enlightening project.

5
6

You might also like