You are on page 1of 4

Reflection (Part 6

)


The mean number of skittles, the standard deviation, the 95% confidence interval and
all the other statistical values found in a sample of skittles meant nothing to me before this
class. All I wanted was to eat the skittles and enjoy the taste of the rainbow. This skittle project
taught me how to apply statistical principles to the real world.

Recently I was taking a tour of the Coca-Cola plant. As I watched the amount of cans and
bottles being sorted, filled, labeled, packaged and shipped off in a short amount of time, the
principles of statistics ran across my mind. First, I thought about how many cans of soda are
produced each day. The tour guide told me the plant shipped 1.8 million cans of soda each day.
As I considered how he knows that number, I immediately thought about the mean. Each day
they likely don’t produce exactly 1.8 million cans of soda, but rather 1.8 million is the mean
number of cans of many days. This was a simple way in which I noticed statistics I learned in
class applied to the real world.

This term project taught me a lot about my own problem solving skills. It was a great
way to practice problem solving because it was a real, tangible study that we did as a class. It
wasn’t something I just read out of a book, but rather something I played a role in creating.
Also, this project really forced me to internalize the statistics and understand what I was doing
and what the results meant. It wasn’t just plugging numbers in and getting numbers out. It was
real data that I had to internalize and understand. This is a skill that I will continue to practice in
my life and I know it is important to master.

Although I was hesitant about this term project at the beginning of the semester, I am
grateful it was assigned and it is something I will remember. Statistics is all around us and can
tell us a lot about the world in which we live.

Parts 1 & 2 – Data Collection and Compilation

Data:
Personal Data
Class Data

Red
13
219

Orange
19
205

Yellow
7
220

Green
11
213

Purple
12
213

Total
62
1070

The class data shows us that although different, the colors of skittles in a bag is
relatively consistent. At first glance, we can see the data is similar with two numbers being the
same and two others being one value apart. The mean is 214 with a standard deviation of 6. All
of our data is within 2 standard deviations of the mean, thus showing that our data has no
outliers. My personal data is irregular compared to the class data. I had almost three times as
many orange skittles compared to yellow. The mean for my personal data was 12.4 with
standard deviation of 4.34. Although all the data was within 2 standard deviations and has no
outliers, 4.34 is a big standard deviation for a small set of data.

The graphs are not what I expected to see. After I did my counts and seeing a big
difference between certain colors, I was expecting to see the same results in the class data.
However, the class data shows that the color of skittles in a bag should have similar numbers.
The reason for a difference in my personal data was the result of using a small sample size.

Using a small data set will result in a greater chance for irregular data and counts, such as my
personal data. I now know that after I eat many bags of skittles, I will consume a similar amount
of each color!

Part 3 – Summary Statistics

Our class had a sample size of eighteen bags of 2.17 oz. skittles. The shape of the distribution
was sporadic and followed no systematic order. The max amount of candies, 62, also had the
greatest frequency of five bags. However, the range of the distribution is relatively small (six).
Both the distribution and the range are what we would expect to see. We could do another
sample of eighteen bags and get another scattered distribution with a small range. This would
be consistent throughout different trials.

The mean from the class data was 59.4 and my number of candies was 62. These
numbers are close enough that my data is consistent with the data from the class. The
differences found in different bags of candy could be attributed to broken candies, which we
did not count in our data. I had no broken candies in my bag, thus I had a higher number
compared to the mean.


Categorical data and quantitative data is important to understand if we want to
interpret and create understand from our results. Categorical data are things that cannot be
placed in an order or the order of them does not matter. For example, colors and numbers on
the back of basketball player’s jerseys are examples of categorical data. We could put these
things in order; however, the order would not matter. Quantitative data is data that can be
ordered representing counts or measurements. Examples of quantitative data is number of
skittles in a bag or the heights of basketball players. These can be placed in order and their
differences mean something.

The types of graphs most commonly used for categorical data are bar graphs, pareto
charts, and pie charts. For example, in a bar graph the x-axis represents the category and the yaxis represents the frequency or relative frequency. We could create a histogram with the xaxis having the color of skittles and the y-axis the frequency of those colors in a certain bag.
This is an easy graph to read if we want to compare how many of the different colors of skittles
are in each bag. The types of graphs most commonly used for quantitative data are bar graphs,
stemplots, scatterplots. These things will tell us counts of certain numbers, will help use
estimate the middle, and show us if our data has a certain trend or distribution.

The types of calculations that make sense for categorical data are frequency or relative
frequency. Often when sorting through categorical data, we are interested in the amount of
that certain category. Calculations such as mean, median, and mode could be calculated for
numerical categorical data, but finding these numbers would not make sense or mean anything.
For quantitative data that has order and measure, calculations such as range, median, mean,
mode, and 5-number summaries make sense to calculate. These calculations will help us
understand the distribution of our data and the role each count plays in our sample.


Part 4 – Confidence Intervals
A confidence interval is a range of sample estimates that hopefully encompasses the true
value of what we are seeking (such as mean, standard deviation or proportion). For example, a
99% confidence interval will be two numbers in which we are 99% confident that the true value
of the population lies between the two numbers. The purpose of the confidence interval is it
gives us a sense of how good our estimate is. Rather than just using a single value, we formulate
of confidence interval of certain percent (i.e. 90%, 95%, 99%) to see how accurate our point
estimate is. Confidence intervals are important in describing how certain or uncertain our sample
estimate is in relation to the population parameter.

Part 5 – Hypothesis Testing

1. A hypothesis is a statement made about a characteristic of any particular population.
When a hypothesis test is performed, the claim or statement made about that
population is being tested against a certain confidence level. Each hypothesis tests
creates a null and alternate hypothesis. The null hypothesis is a statement that some
value of the population is equal to a given claim, and the alternate hypothesis is a value
that differs from a certain claim. The conclusion made is either to reject or fail to reject
the null hypothesis. Thus, the meaning of the hypothesis test is found in the conclusion:
it is testing a certain claim made about a population. Hypothesis tests are used to test
the population mean, the population standard deviation or variance, or the population
proportion. The meaning
2. Work shown on scanned paper
3. Work shown on scanned paper
4. In the population proportion of 20% of the skittles being red, all requirements are met.
Specifically, it is a simple random sample, the conditions for a binomial distribution are
met, and np & nq are both greater than 5. In the population mean, the requirement of
the population being normally distributed or n > 30 is not met. The population seems to
follow a normal distribution until we get to bags with 62 skittles, which is the most
amount of skittles and is the most common in the population. Thus, we conclude that
the requirements for the population mean are not met.
5. In the population proportion, our p-value was substantially bigger than the confidence
level and thus we failed to reject the null hypothesis. This means that we fail reject the
claim that 20% of skittles are red. This was pretty intuitive after we calculated p-hat,
which was 20.46%. With that proportion being so close to the claim and our sample size
being quite small, we could see that our claim would not be rejected. With the
population mean, our t-value was a lot larger than the critical value, thus we rejected
the null hypothesis. This means we rejected the claim that the mean number of skittles
in a bag was 55. Again, this was pretty intuitive as we looked at the sample mean of
59.4. Also, none of the bags had less than 56 skittles, so it would be impossible for the
mean to be 55. Thus, it is clear that our hypothesis was correct and we made no error in
rejecting the claim that the mean number of skittles in the bag was 55.