You are on page 1of 9

# John Hutchins

Report | Reflection

Introduction to our work:

In this project, our group used a convenient sample of random 2.17 bags of Skittles

provided from each of the student in this class and compiled into one data set. We answered the

question of why we would expect to see a proportion size of 20% as well as made both a pareto

chart and a pie chart of relative frequencies of each color or skittles. Next, our team calculated

the mean, standard deviation, and a 5-number summary (maximum, Q1, median, Q2, minimum)

of the sample data of skittles. Then our team provided both a 99% and 90% confidence interval

as well as interpreted what these intervals meant. You will also find “My Take” at the end of

each section of our project. This briefly goes over another subject in each portion of this project.

Group Part #1

To start off this project, we need to start somewhere and that somewhere is finding the

relative frequency of the total number of colors per bag. We also thought it would be a good idea

to graph these.

1. What proportion (or percentage) of the Skittles do you expect to see of each color?

Why?

If you were to pick one skittle from an original bag of skittles you would expect the same

probability of pulling out a red, orange, yellow, green, or purple skittle from your bag. Now we

are assuming that the distribution of skittles is completely random and the number of skittles we

have in this bag is divisible by 5 to get us an even 20% (though this is not very probable). Then,

Page | 1 of 9
statistically, if you were to pick just one skittle, this would create a probability of 20% of anyone

pulling any 1 of the 5 colors of skittles. We have the results from opening and counting 31 bags

of skittles, let’s see the results below.

2. Now open the data set and compute the proportions of Red, Orange, Yellow, Green,

and Purple candies in the class data set. Note that the sample size is the total number

of candies collected by the class.

Pareto Chart to match our data: Pie chart to match our data:

3. Does the class data represent a random sample? What would the population be?

Collaborate to discuss sampling and our data in a paragraph or two. Look carefully at

Page | 2 of 9
the definition of random sample when you work on your group response. This will

likely take some discussion!

The 31 students who opened a 2.17 ounce bag of original skittles represents our

somewhat random sample. The population is everyone who has purchased and opened a bag of

original skittles. As you can see from the graphs (though the pie chart colors may be deceiving),

the yellow skittle was the most pulled at a proportion of 21%. The purple skittle was the least

frequently pulled at 18.64% and came in under our expected proportion as well as the orange

skittle (which was pulled at a below average of 19.41%). The red skittle and the green skittle

were above average at 20.56% and 20.4%. Though we didn’t have a perfect proportion of each

skittle, if we were to take a bigger sample size, we may or may not observe us getting very close

to our expected proportions.

My take:

Creating a table that displays the proportions by color and the total count from your own

bag of candies together with the proportions by color and total count for the entire class sample:

There are 5 categories of skittles and they are randomly put into bags, thus you would

think that all categories would be about 20% each if we surveyed all bags of skittles. The class

count graphs do represents this in a better fashion than my bag did. This illustrates that a sample

can or cannot represent the population as a whole with one or few experiments. It does appear

that I got jibbed out of some orange candies (my bag: 8.2%) while the average (19.41%) had an

Page | 3 of 9
expected proportion. If you were to just look at my bag, you would think that most or all skittles

bags have 10% orange though this is not the case as the average bag contained about 20% orange

skittles. However, I did have an above average amount of green which evened out the orange

skittle proportion. Overall, the class count represented what I thought I would see as we averaged

many samples out even though some of my proportions were way off the class counts. My

orange skittle count was low but my green skittle count was high, thus my total class data does

not match with my own sample bag of candy.

Group Project #2:

Now that we have some visual aids and the relative frequency, we are going to find the

mean number of candies per bag, standard deviation, and a 5 number summary to better describe

our data. We also thought it would be a good idea for some more visual aids in the form of a box

plot and a histogram.

1. Total candies in each bag (calculation via StatCrunch):

a. Mean number of candies per bag: 59
b. Standard Deviation: 2.4
c. 5-Number Summary:

i. Minimum: 53
ii. Q1: 57
iii. Median: 59
iv. Q3: 61
v. Maximum: 63

Page | 4 of 9
2. Histogram: 3. Box Plot:

My Take:

In our findings of the variable number of candies in each bag, by looking at the

histogram, you notice it is bell-shaped (there is a gap at 54, but overall this is still relatively bell-

shaped). This indicates that the data is symmetric and also is proved by the fact the mean is equal

to the median. As each bag contains 2.17 ounces of skittles, you would expect that most of these

bags would have roughly the same number of whole skittles and the graph reflects this with its

bell shape. My bag contained 61 skittles and which was above the mean (59) and median (59)

out of 31 bags sampled. This tells us that my bag was above average but still agrees with the

whole class’s data as most data has above average data and below average data.

Along with discussing the differences between the number of skittles in my bag

compared to the class average and graph shapes, I am going to also discuss the differences in

categorical data and quantitative data and their graphs. Categorical data is data that is broken into

categories and as its name implies and no real math can be done with this data. Gender, for

Page | 5 of 9
instance, would be a category and the data that could fill this category would be “Male”,

“Female”, or “Other” and it wouldn’t make much sense if we added “Male” to “Male”.

Quantitative data on the other hand is data about numerical variables such as number of males or

number of skittles. Adding each skittle together in a bag of skittles would make sense as we

could produce actual data if we collected multiple bags of skittles unlike categorical data. Pie

graphs and bar graphs are very good for graphing categorical data as you wish to display

percentages or counts in the categories and wouldn’t make sense to use graphs like a box plot or

scatter plot as you are representing numerical data with those graphs. With quantitative data, you

do want to use graphs such as a scatter plot or box plot as it will summarize and show data in a

visual representation. Pie charts are very discouraged when graphing quantitative data as it can

be hard to see if numbers are close in value to each other. In summary, categorical data

represents the data about categories such as gender or eye color and are typically graphed with

pie charts or bar graphs and no real math can done with this type of data. Quantitative data is

numerical data such as height that is graphed using scatter plots, box plots, and other graphs, real

math can be done with this data making it very useful to statisticians.

Group Project #3:

Now that we have a mean, standard deviation, and some other useful statistics, we want

to find out how viable these statistics we have just found truly are. For this, we are going to use 2

different confidence intervals as well as find a margin of error to these confidence intervals.

Page | 6 of 9
1. Our 99% confidence interval estimate for the population proportion of yellow candies:
(0.1854, 0.2345).

a. Our yellow skittles had a sample proportion of .21 (where x=384 and n=1829).
b. We also verified that we need to use a z interval rather than a t interval by:

i. Simple random sample or randomized experiment; NO, the data failed to
meet requirements for a simple random sample or randomized experiment
because selection was based upon convenience.
ii. np(1 - p) >10; where n=1829 and p=0.21, 303.43 > 10, YES
iii. n<0.05N, YES, because the total number of skittles in the population is
greater than 36580 (1829/0.05=36580).

c. We concluded this by determining the margin of error is E = 0.0246.

2. Interpret with a complete sentence the confidence interval estimate for the population

proportion of yellow candies.

We are 99% confident that the population proportion of yellow Skittles lies between 0.1854

and 0.2345 with a margin of error of 0.0246 yellow Skittles.

3. Our 90% confidence interval estimate for the population mean number of candies per

bag: (58.279, 59.721).

a. Sample mean: 59.
b. We also verified we need to use a T interval for this portion by:

i. Simple random sample or randomized experiment; NO, the data failed to
meet requirements for a simple random sample or randomized experiment
because selection was based upon convenience.
ii. sample size relatively small to size of population (n<0.05N) YES, sample
size of 31 bags is relatively small compared to the total number of bags of
skittles in existence.
iii. n ≥ 30 OR data comes from a population that is at least approximately
normal with no outliers (verified using a normal probability plot and box
plot); YES, n>30, because n = 31; 31 > 30.

c. We then calculated a margin of error of: 0.721.

4. Interpret with a complete sentence the confidence interval estimate for the population
mean number of candies per bag.

Page | 7 of 9
With a 90% confidence interval, we can conclude we are 90% confident that the actual value

of the population mean number of Skittles in each bag is between 58.279 and 59.721 with a

margin of error of 0.721 Skittles.

My Take:

In statistics we take random samples and run computations on these samples such as

computing the mean of a given data set. We run into a problem when we try to compare these

computations to populations or other results. This is where confidence intervals come into play.

We can use confidence intervals to provide some margin of error and a range of values we would

expect to see given data like a mean. For example, if we had a 95% confidence interval, we

could have a margin of error along with a lower bound and upper bound to see if our data is

within these bounds. Overall, a confidence interval is a range of values that you can be 95%

certain contains the true value of the population.

Summary:

After seeing what all of the things we have done to just a convenient sample of Skittles,

it’s easy to see how important stats is. We used a lot of the basic but essential calculations in

statistics to describe our data and give further use to data. As you’ve seen above, our group ran

calculations such as the mean, confidence intervals, margins or error, standard deviation, as well

as graphed our results. Even with just a bag of Skittles, statistics can be applied.

Reflection on this class:

At the beginning of the semester we sought out a 2.17 ounce bag of candy of skittles.

Our goal was to count how many skittles there were for each color in this bag. We then

Page | 8 of 9
submitted our results to Professor Maw to be compiled and given back to us for further

instructions. Over the semester, we calculated relative frequencies, frequencies, the mean,

standard deviation, 5-number summary, created confidence intervals, and graphed some of

these results.

Along with all of the statistical calculations listed above, we learned about z and t

intervals and when we should use one over the other. We also discussed what a margin of error

and why they are important. All of the things we learned in statics have a real world use. In

computer science, we often use statistics to monitor how efficiently an algorithm is. More than

often we use standard deviation to measure batches of processors to determine how many are

expected to be nonfunctional as well as a certain batches clock. Statistics constantly proves its

usefulness in everyday life.

Through this course I was reminded how important it is to know and understand

statistics. Everywhere you go there are data banks filled with data from millions of people just

waiting to be gone through with statistical analysis. If there is anything I have learned from my

computer science classes is that data matters and the more efficient you are at parsing data

with statistics the more successful your company that you work for or own will grow to be.

Throughout the semester, statistics showed me all of the things you can do with data and how

applicable it is in the real world and it was truly an eye-opening experience.

Page | 9 of 9