Stats Project Part 5

Skittles Project
For our statistics course, we completed a skittles project as a class and individually. Each
student bought a package of skittles and counted how many candies of each color were in
it. We compiled our class data together and incorporated what we learned in statistics
into each part of this project.
Part 1
Each student purchased a 2.17-ounce bag of original skittles and recorded their data on
how many candies of each color were in their bag. My data is recorded below.
● Number of red candies: 10

● Number of orange candies: 10
● Number of yellow candies: 15
● Number of green candies: 13
● Number of purple candies: 9
Part 2
For the second part of the skittles project, we each had to answer the questions below by
finding the proportions of each candy and creating a pie chart and a pareto chart.

Question 1: What proportion (or percentage) of the Skittles do you expect to see of
each color? Why?

I expect the proportions in each color to be close to one another but I expect the
proportions in the orange, yellow, and green skittles to be slightly higher than those of red
and purple. I expect to see this because in a normal package of skittles, it seems as though
there is usually a lower number of red and purple candies than green, yellow, and orange.

Orange
Red Count Count Yellow Count Green Count Purple Count
Expected
Proportion 0.188 0.222 0.202 0.199 0.189
Observed
Proportion 0.199 0.188 0.208 0.213 0.192

Question 2: In StatCrunch, create a pie chart and a Pareto chart for the total number of
candies of each color in our class data set. Submit copies of your graphs in this report.

Pie Chart

Pareto Chart

Question 3: Does the class data represent a random sample? What would the
population be? Collaborate to discuss sampling and our data in a paragraph or two.
Think carefully about the definition of random sample when you work on your response.

The class data represents a random sample because it comes from data we collected as a
class. Each bag of skittles is equally likely to be included in our data. When we bought our
bags of skittles, we did not know how many candies of each color would be inside. We
collected data individually to contribute to the sample. The total number of skittles
produced by the Skittles company would represent the population because the data we
are collecting as a class allows us to make inferences about the population of skittles.

Question 4: Create a table that displays the proportions by color and the total count
from your own bag of candies together with the proportions by color and total count for
the entire class sample.

Proportion Proportion Proportion Proportion Proportion Total

Red Orange Yellow Green Purple Count
My Bag 0.175 0.175 0.263 0.228 0.158 1

Class 1
Counts 0.199 0.188 0.208 0.213 0.192

Question 5: Write a well thought out paragraph discussing your observations of this
data. Respond to the following prompts:
● Do the graphs reflect what you expected to see? Are there any surprises?
● Are there any observations that appear to be outliers? If so, what impact
might they have on graphics and summary statistics?
● Does the distribution of colors in the total class data match with your own
data from your single bag of candies or are they different?
The graphs partially reflected what I expected to see. I was surprised by the fact that the
proportion of orange candies was less than the proportions of the red and purple candies
because I thought the proportion of orange candies would be higher. There are no
observations that appear to be outliers because in our class data set, the total number of
candies for each color were very close to one another. If there were to be outliers, they
would affect the mean and standard deviation. Outliers would not affect the median and
quartiles.The distribution of colors in the total class data differs from my own data
because the amount of candies in each individual bag is different. For example, the
proportion of yellow candies in my bag is about 5.5% larger than that of the class data and
the proportion of purple candies in the class data is about 3.4% larger than that of my own
data.

Part 3
For part three of the skittles project, we answered the following questions by finding the
summary statistics of the class data and creating a frequency histogram and boxplot.

1. Using the total number of candies in each bag in our class sample, compute the
following measures for the variable “Total candies in each bag”:
(a) mean number of candies per bag
Mean= 55.3333 candies per bag
(b) standard deviation of the number of candies per bag
Standard Deviation = 9.728
(c) 5-number summary for the number of candies per bag
Min = 26 ; Q1 = 57 ; Med = 59 ; Q3 = 60 ; Max = 61
2. Create a frequency histogram for the variable “Total candies in each bag”.

3. Create a box plot for the variable “Total candies in each bag”.
4. Write a well written and thoughtful paragraph discussing your findings about the
variable “Total candies in each bag”. Address the following in your writing: What is the
shape of the distribution? Do the graphs reflect what you expected to see? Does the
overall data collected by the whole class agree with your own data from a single bag of
candies?
In my findings about the “Total candies in each bag”, the shape of the distribution was
skewed left. Most of the bags contained anywhere from approximately fifty to sixty
candies in each bag, with a few outliers containing twenty-six and twenty-seven candies in
the bag. The graphs reflect what I expected to see for the most part because most of the
data was close to mine, at fifty-seven candies per bag, but I was not expecting to see
outliers. The overall data collected by the whole class with the exception of the outliers
agrees with my own data because our findings were very similar.
5. In a half page, explain the difference between categorical and quantitative data.
Address the following in your writing: What types of graphs make sense and what types
of graphs do not make sense for categorical data? For quantitative data? Explain why.
What types of calculations make sense and what types of calculations do not make
sense for categorical data? For quantitative data? Explain why.
Categorical data puts data into groups or categories in order to give meaning to them.
Quantitative data measures a certain number or amount of data. For example, in the
histogram above, the “Number of Candies per Bag” values were grouped into categories
such as “20-29.999” or “30-39.999”. This would represent a categorical variable. An
example of quantitative data would be a five-number summary because it shows exactly
where the data is without putting it into groups. For categorical data, bar graphs, box
plots, and pie charts would make the most sense to use because they provide a visual
representation to compare the sizes of each category. Histograms and stem-and-leaf plots
would represent quantitative data because they show shapes of distributions.
Calculations that use numerical values to show a measurement would represent
quantitative data because these numbers are strictly used to show data. The purpose of
quantitative data is to show numbers. Calculations that put individual data into several
groups would represent categorical data because they are used to show the significance
of the groups. The purpose of categorical data is to show the data in groups.

Part 4
For the final part of the skittles project, we used the population proportion of yellow
candies and the population mean number of candies per bag to find confidence intervals
for each.
1. Construct a 99% confidence interval estimate for the population proportion of
yellow candies.
Proportion: yellow candies/total candies = 242/1,162
(0.1776, 0.2389)
With 99% confidence, the proportion of yellow candies is between 0.1776 and 0.2389.
2. Construct a 95% confidence interval estimate for the population mean number of
candies per bag.
Mean candies per bag: 55.3333
With 95% confidence, the mean number of candies per bag is between 50.905 and 59.761.
3. Discuss and interpret (with complete sentences) the results of each of your interval
estimates.
For the confidence interval estimate of the population proportion of yellow candies, it
estimates a range of about how many yellow candies will be found in a bag of skittles
compared to the total amount of candies. The data showed that with 99% confidence, the
proportion of yellow candies compared to other candies will be between about 18% and
24%.
For the confidence interval estimate of the population mean number of candies per bag, it
measures the level of confidence that the mean number of candies per bag will fall in a
certain interval. The data showed that with 95% confidence, the mean would
approximately fall between about 50.9 and 59.8 candies per bag.
4. In a well written and thoughtful paragraph, explain in general the purpose and
meaning of a confidence interval.
The purpose of a confidence interval is to produce an interval of numbers to give a range

of likely values for an unknown parameter. The purpose for this is to estimate a value that
you may need to know when you have an unknown parameter. For example, if someone
needed to know how many people would vote in an upcoming election, they could
estimate it using the confidence interval based on how many people said they would vote
in the upcoming election.
Summary
From this project, I learned how to incorporate different parts of statistics into one topic:
skittles. Each part of this project allowed me to use the same class information to find
different statistics.This taught me that statistics can be used over a wide variety of topics
and can provide diverse information about these topics.

Stats Project Part 5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stats Project Part 5

Uploaded by

Copyright:

Available Formats

Skittles Project

● Number of red candies: 10

Proportion Proportion Proportion Proportion Proportion Total

My Bag 0.175 0.175 0.263 0.228 0.158 1

(a) mean number of candies per bag

Mean= 55.3333 candies per bag

(b) standard deviation of the number of candies per bag

Standard Deviation = 9.728

(c) 5-number summary for the number of candies per bag

Min = 26 ; Q1 = 57 ; Med = 59 ; Q3 = 60 ; Max = 61

Proportion: yellow candies/total candies = 242/1,162

Mean candies per bag: 55.3333

The purpose of a confidence interval is to produce an interval of numbers to give a range

You might also like

Stats Project Part 5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stats Project Part 5

Uploaded by

Copyright:

Available Formats

Skittles Project

● Number of ​red​ candies: 10

Proportion Proportion Proportion Proportion Proportion Total

My Bag 0.175 0.175 0.263 0.228 0.158 1

(a) mean number of candies per bag

Mean= 55.3333 candies per bag

(b) standard deviation of the number of candies per bag

Standard Deviation = 9.728

(c) 5-number summary for the number of candies per bag

Min = 26 ; Q1 = 57 ; Med = 59 ; Q3 = 60 ; Max = 61

Proportion: yellow candies/total candies = 242/1,162

Mean candies per bag: 55.3333

The purpose of a confidence interval is to produce an interval of numbers to give a range

You might also like

● Number of red candies: 10