Professional Documents
Culture Documents
Stats Project Part 5
Stats Project Part 5
For our statistics course, we completed a skittles project as a class and individually. Each
student bought a package of skittles and counted how many candies of each color were in
it. We compiled our class data together and incorporated what we learned in statistics
into each part of this project.
Part 1
Each student purchased a 2.17-ounce bag of original skittles and recorded their data on
how many candies of each color were in their bag. My data is recorded below.
Part 2
For the second part of the skittles project, we each had to answer the questions below by
finding the proportions of each candy and creating a pie chart and a pareto chart.
Question 1: What proportion (or percentage) of the Skittles do you expect to see of
each color? Why?
I expect the proportions in each color to be close to one another but I expect the
proportions in the orange, yellow, and green skittles to be slightly higher than those of red
and purple. I expect to see this because in a normal package of skittles, it seems as though
there is usually a lower number of red and purple candies than green, yellow, and orange.
Orange
Red Count Count Yellow Count Green Count Purple Count
Expected
Proportion 0.188 0.222 0.202 0.199 0.189
Observed
Proportion 0.199 0.188 0.208 0.213 0.192
Question 2: In StatCrunch, create a pie chart and a Pareto chart for the total number of
candies of each color in our class data set. Submit copies of your graphs in this report.
Pie Chart
Pareto Chart
Question 3: Does the class data represent a random sample? What would the
population be? Collaborate to discuss sampling and our data in a paragraph or two.
Think carefully about the definition of random sample when you work on your response.
The class data represents a random sample because it comes from data we collected as a
class. Each bag of skittles is equally likely to be included in our data. When we bought our
bags of skittles, we did not know how many candies of each color would be inside. We
collected data individually to contribute to the sample. The total number of skittles
produced by the Skittles company would represent the population because the data we
are collecting as a class allows us to make inferences about the population of skittles.
Question 4: Create a table that displays the proportions by color and the total count
from your own bag of candies together with the proportions by color and total count for
the entire class sample.
Question 5: Write a well thought out paragraph discussing your observations of this
data. Respond to the following prompts:
● Do the graphs reflect what you expected to see? Are there any surprises?
● Are there any observations that appear to be outliers? If so, what impact
might they have on graphics and summary statistics?
● Does the distribution of colors in the total class data match with your own
data from your single bag of candies or are they different?
The graphs partially reflected what I expected to see. I was surprised by the fact that the
proportion of orange candies was less than the proportions of the red and purple candies
because I thought the proportion of orange candies would be higher. There are no
observations that appear to be outliers because in our class data set, the total number of
candies for each color were very close to one another. If there were to be outliers, they
would affect the mean and standard deviation. Outliers would not affect the median and
quartiles.The distribution of colors in the total class data differs from my own data
because the amount of candies in each individual bag is different. For example, the
proportion of yellow candies in my bag is about 5.5% larger than that of the class data and
the proportion of purple candies in the class data is about 3.4% larger than that of my own
data.
Part 3
For part three of the skittles project, we answered the following questions by finding the
summary statistics of the class data and creating a frequency histogram and boxplot.
1. Using the total number of candies in each bag in our class sample, compute the
following measures for the variable “Total candies in each bag”:
2. Create a frequency histogram for the variable “Total candies in each bag”.
3. Create a box plot for the variable “Total candies in each bag”.
4. Write a well written and thoughtful paragraph discussing your findings about the
variable “Total candies in each bag”. Address the following in your writing: What is the
shape of the distribution? Do the graphs reflect what you expected to see? Does the
overall data collected by the whole class agree with your own data from a single bag of
candies?
In my findings about the “Total candies in each bag”, the shape of the distribution was
skewed left. Most of the bags contained anywhere from approximately fifty to sixty
candies in each bag, with a few outliers containing twenty-six and twenty-seven candies in
the bag. The graphs reflect what I expected to see for the most part because most of the
data was close to mine, at fifty-seven candies per bag, but I was not expecting to see
outliers. The overall data collected by the whole class with the exception of the outliers
agrees with my own data because our findings were very similar.
5. In a half page, explain the difference between categorical and quantitative data.
Address the following in your writing: What types of graphs make sense and what types
of graphs do not make sense for categorical data? For quantitative data? Explain why.
What types of calculations make sense and what types of calculations do not make
sense for categorical data? For quantitative data? Explain why.
Categorical data puts data into groups or categories in order to give meaning to them.
Quantitative data measures a certain number or amount of data. For example, in the
histogram above, the “Number of Candies per Bag” values were grouped into categories
such as “20-29.999” or “30-39.999”. This would represent a categorical variable. An
example of quantitative data would be a five-number summary because it shows exactly
where the data is without putting it into groups. For categorical data, bar graphs, box
plots, and pie charts would make the most sense to use because they provide a visual
representation to compare the sizes of each category. Histograms and stem-and-leaf plots
would represent quantitative data because they show shapes of distributions.
Calculations that use numerical values to show a measurement would represent
quantitative data because these numbers are strictly used to show data. The purpose of
quantitative data is to show numbers. Calculations that put individual data into several
groups would represent categorical data because they are used to show the significance
of the groups. The purpose of categorical data is to show the data in groups.
Part 4
For the final part of the skittles project, we used the population proportion of yellow
candies and the population mean number of candies per bag to find confidence intervals
for each.
1. Construct a 99% confidence interval estimate for the population proportion of
yellow candies.
(0.1776, 0.2389)
With 99% confidence, the proportion of yellow candies is between 0.1776 and 0.2389.
2. Construct a 95% confidence interval estimate for the population mean number of
candies per bag.
With 95% confidence, the mean number of candies per bag is between 50.905 and 59.761.
3. Discuss and interpret (with complete sentences) the results of each of your interval
estimates.
For the confidence interval estimate of the population proportion of yellow candies, it
estimates a range of about how many yellow candies will be found in a bag of skittles
compared to the total amount of candies. The data showed that with 99% confidence, the
proportion of yellow candies compared to other candies will be between about 18% and
24%.
For the confidence interval estimate of the population mean number of candies per bag, it
measures the level of confidence that the mean number of candies per bag will fall in a
certain interval. The data showed that with 95% confidence, the mean would
approximately fall between about 50.9 and 59.8 candies per bag.
4. In a well written and thoughtful paragraph, explain in general the purpose and
meaning of a confidence interval.
Summary
From this project, I learned how to incorporate different parts of statistics into one topic:
skittles. Each part of this project allowed me to use the same class information to find
different statistics.This taught me that statistics can be used over a wide variety of topics
and can provide diverse information about these topics.