You are on page 1of 7

Skittles Project 

For our statistics course, we completed a skittles project as a class and individually. Each 
student bought a package of skittles and counted how many candies of each color were in 
it. We compiled our class data together and incorporated what we learned in statistics 
into each part of this project. 

Part 1 

Each student purchased a 2.17-ounce bag of original skittles and recorded their data on 
how many candies of each color were in their bag. My data is recorded below. 

● Number of ​red​ candies: 10 

● Number of ​orange​ candies: 10 
● Number of ​yellow​ candies: 15 
● Number of ​green​ candies: 13 
● Number of ​purple​ candies: 9 

Part 2 
For the second part of the skittles project, we each had to answer the questions below by 
finding the proportions of each candy and creating a pie chart and a pareto chart. 
Question 1: ​What proportion (or percentage) of the Skittles do you ​expect​ to see of 
each color? Why? 
I expect the proportions in each color to be close to one another but I expect the 
proportions in the orange, yellow, and green skittles to be slightly higher than those of red 
and purple. I expect to see this because in a normal package of skittles, it seems as though 
there is usually a lower number of red and purple candies than green, yellow, and orange. 
  Red Count  Count  Yellow Count  Green Count  Purple Count 
Proportion  0.188  0.222  0.202  0.199  0.189 
Proportion  0.199  0.188  0.208  0.213  0.192 
Question 2: In StatCrunch, create a pie chart and a Pareto chart for the total number of 
candies of each color in our class data set. Submit copies of your graphs in this report. 
Pie Chart 

Pareto Chart 

Question 3: ​Does the class data represent a random sample? What would the 
population be? Collaborate to discuss sampling and our data in a paragraph or two. 
Think carefully about the definition of random sample when you work on your response. 
The class data represents a random sample because it comes from data we collected as a 
class. Each bag of skittles is equally likely to be included in our data. When we bought our 
bags of skittles, we did not know how many candies of each color would be inside. We 
collected data individually to contribute to the sample. The total number of skittles 
produced by the Skittles company would represent the population because the data we 
are collecting as a class allows us to make inferences about the population of skittles. 
Question 4: Create a table that displays the proportions by color and the total count 
from your own bag of candies together with the proportions by color and total count for 
the entire class sample. 

Proportion  Proportion  Proportion  Proportion  Proportion  Total 

  Red  Orange  Yellow  Green  Purple  Count 

My Bag  0.175  0.175  0.263  0.228  0.158  1 

Class  1 
Counts  0.199  0.188  0.208  0.213  0.192 

Question 5: Write a ​well thought out paragraph ​discussing your observations of this 
data. Respond to the following prompts: 

● Do the graphs reflect what you expected to see? Are there any surprises? 
● Are there any observations that appear to be outliers? If so, what impact 
might they have on graphics and summary statistics? 
● Does the distribution of colors in the total class data match with your own 
data from your single bag of candies or are they different? 

The graphs partially reflected what I expected to see. I was surprised by the fact that the 
proportion of orange candies was less than the proportions of the red and purple candies 
because I thought the proportion of orange candies would be higher. There are no 
observations that appear to be outliers because in our class data set, the total number of 
candies for each color were very close to one another. If there were to be outliers, they 
would affect the mean and standard deviation. Outliers would not affect the median and 
quartiles.The distribution of colors in the total class data differs from my own data 
because the amount of candies in each individual bag is different. For example, the 
proportion of yellow candies in my bag is about 5.5% larger than that of the class data and 
the proportion of purple candies in the class data is about 3.4% larger than that of my own 
Part 3 
For part three of the skittles project, we answered the following questions by finding the 
summary statistics of the class data and creating a frequency histogram and boxplot. 

1. Using the total number of candies in each bag in our class sample, compute the 
following measures for the variable “Total candies in each bag”: 

(a) mean number of candies per bag 

Mean= 55.3333 candies per bag 

(b) standard deviation of the number of candies per bag 

Standard Deviation = 9.728  

(c) 5-number summary for the number of candies per bag 

Min = 26 ; Q1 = 57 ; Med = 59 ; Q3 = 60 ; Max = 61 

2. Create a frequency histogram for the variable “Total candies in each bag”. 


3. Create a box plot for the variable “Total candies in each bag”. 

4. Write a well written and thoughtful paragraph discussing your findings about the 
variable “Total candies in each bag”. Address the following in your writing: What is the 
shape of the distribution? Do the graphs reflect what you expected to see? Does the 
overall data collected by the whole class agree with your own data from a single bag of 

In my findings about the “Total candies in each bag”, the shape of the distribution was 
skewed left. Most of the bags contained anywhere from approximately fifty to sixty 
candies in each bag, with a few outliers containing twenty-six and twenty-seven candies in 
the bag. The graphs reflect what I expected to see for the most part because most of the 
data was close to mine, at fifty-seven candies per bag, but I was not expecting to see 
outliers. The overall data collected by the whole class with the exception of the outliers 
agrees with my own data because our findings were very similar. 

5. In a ​half page​, explain the difference between categorical and quantitative data. 
Address the following in your writing: What types of graphs make sense and what types 
of graphs do not make sense for categorical data? For quantitative data? Explain why. 
What types of calculations make sense and what types of calculations do not make 
sense for categorical data? For quantitative data? Explain why.  

Categorical data puts data into groups or categories in order to give meaning to them. 
Quantitative data measures a certain number or amount of data. For example, in the 
histogram above, the “Number of Candies per Bag” values were grouped into categories 
such as “20-29.999” or “30-39.999”. This would represent a categorical variable.​ ​An 
example of quantitative data would be a five-number summary because it shows exactly 
where the data is without putting it into groups. For categorical data, bar graphs, box 
plots, and pie charts would make the most sense to use because they provide a visual 
representation to compare the sizes of each category. Histograms and stem-and-leaf plots 
would represent quantitative data because they show shapes of distributions. 
Calculations that use numerical values to show a measurement would represent 
quantitative data because these numbers are strictly used to show data. The purpose of 
quantitative data is to show numbers. Calculations that put individual data into several 
groups would represent categorical data because they are used to show the significance 
of the groups. The purpose of categorical data is to show the data in groups. 
Part 4 
For the final part of the skittles project, we used the population proportion of yellow 
candies and the population mean number of candies per bag to find confidence intervals 
for each. 

1. Construct a 99% confidence interval estimate for the population proportion of 
yellow candies. 

Proportion: yellow candies/total candies = 242/1,162 

(0.1776, 0.2389) 

With 99% confidence, the proportion of yellow candies is between 0.1776 and 0.2389. 

2. Construct a 95% confidence interval estimate for the population mean number of 
candies per bag. 

Mean candies per bag: 55.3333 

With 95% confidence, the mean number of candies per bag is between 50.905 and 59.761. 

3. Discuss and interpret (with complete sentences) the results of each of your interval 

For the confidence interval estimate of the population proportion of yellow candies, it 
estimates a range of about how many yellow candies will be found in a bag of skittles 
compared to the total amount of candies. The data showed that with 99% confidence, the 
proportion of yellow candies compared to other candies will be between about 18% and 

For the confidence interval estimate of the population mean number of candies per bag, it 
measures the level of confidence that the mean number of candies per bag will fall in a 
certain interval. The data showed that with 95% confidence, the mean would 
approximately fall between about 50.9 and 59.8 candies per bag. 

4. In a well written and thoughtful paragraph, explain in general the purpose and 
meaning of a confidence interval.  

The purpose of a confidence interval is to produce an interval of numbers to give a range 

of likely values for an unknown parameter. The purpose for this is to estimate a value that 
you may need to know when you have an unknown parameter. For example, if someone 
needed to know how many people would vote in an upcoming election, they could 
estimate it using the confidence interval based on how many people said they would vote 
in the upcoming election. 


From this project, I learned how to incorporate different parts of statistics into one topic: 
skittles. Each part of this project allowed me to use the same class information to find 
different statistics.This taught me that statistics can be used over a wide variety of topics 
and can provide diverse information about these topics.  

You might also like