You are on page 1of 17

Group3 - Thitirat Pongprajuc, Shainalyn Howell, Ashlie Hashimoto and Veronica Hollestelle

Professor Aurora Jensen


MATH 1040 Skittles Project
December 9, 2015

Math 1040 Skittles Term Project


Introduction
This Skittles project represents a sample of college students at Salt Lake Community
College in class of Introduction to Statistics, Math 1040-007, and all the students purchased a
2.17- ounce bag of Skittles at different locations and at different times. This is not a simple
random sample because every student had an equal chance of being chosen for the sample.
The population would be all of the students who purchase a 2.17-ounce bag of Skittles because
everyone was chosen to participate in the sample.
In this project, we compiled the data of the colors and amounts of skittles in each 2.17ounce bag that each student bought. There are 25 respondents, or students in this class. The
cumulative number of skittles came out to 1496. The colors represented a categorical data
(colors) which corresponds to a quantitative data, or the number of skittles in each color. The
entire data consisted of Yellow (276), Purple (294), Red (282), Green (305) and Orange (339).
We needed to create a pie chart, Pareto chart, histogram, and boxplot. We calculated the
mean, standard deviation, five number summaries, and confidence intervals. Here are the data
and conclusions we compiled from all of the available data.
Our guess was that the proportion of each color within the whole sample would be very
different considering the fact that there were five different colors of skittles. When we
examined the data, we determined that the proportions of the skittles would average out to be
mostly even proportions. Each bag contains a different combination of these numbers for each
color causing the proportions to be different in the sample.

Organizing and Displaying Categorical Data: Colors


Group Data

Color
Frequency
Proportion

Yellow
47
0.193

Group Data
Purple
Red
47
48
0.193
0.198

Green
55
0.226

Orange
46
0.189

Total
243
1

Yellow
276
0.184

Overall Class Data


Purple
Red
Green
294
282
305
0.197
0.189
0.204

Orange
339
0.227

Total
1496
1

Overall Class Data

Color
Frequency
Proportion

Individual Data : Thitirat Pongprajuc

Color
Frequency
Proportion

Yellow
14
0.226

Individual Data
Purple
Red
Green
11
10
13
0.177
0.161
0.210

Orange
14
0.226

Total
62
1

Observation of Individual Data, Group Data and Class Data : Thitirat Pongprajuc
What I saw was that my data was slightly similar to the whole class sample. The most
candy color that I had was orange which was also the highest for the overall class sample. For
my data, the orange and yellow had the same amount that was not the case for the class data.
Also, red was my lowest count not yellow, which was one of the classes lowest counts.
However, the highest number of candy color of my group data was green followed by red, and
the lowest number of candy color was orange. The data reflected what I expected to see that
some of my data would be similar to the whole class data, but not exactly the same, but I did
not expect that the group data was extremely different from my data and the class data.

Individual Data : Shainalyn Howell

Color
Frequency
Proportion

Yellow
6
0.098

Individual Data
Purple
Red
Green
10
18
16
0.164
0.295
0.262

Orange
11
0.180

Total
61
1

Observation of Individual Data, Group Data and Class Data : Shainalyn Howell
The data gathered from the class is represented above by a Pie chart and a Pareto chart.
As you can see from these graphs orange had the most and yellow had the least. This I slightly
different from my individual charts, because my bag had more red, orange was in the middle
but yellow was also the least amount in my bag (individual data represented in the graphs
below). Having never put much thought into how many different colored candies are in my
skittle bag before this project I would have just assumed that the number of colors would have
been distributed evenly throughout the bag.

Individual Data : Ashlie Hashimoto

Color
Frequency
Proportion

Yellow
13
0.213

Individual Data
Purple
Red
13
10
0.213
0.164

Green
12
0.198

Orange
13
0.213

Total
61
1

Observation of Individual Data, Group Data and Class Data : Ashlie Hashimoto
My data didnt seem to be that similar to the Class or my Group data. Upon comparing
my data to the group I had less red and green skittles and more yellow, purple, and green.
When comparing my data to class I had more yellow and purple and less red, green, but the
same color of orange skittles. The data didnt seem to reflect what I thought I would see. I
thought that I would be able to have more in common with the other data from the group and
class than I witnessed.

Individual Data : Veronica Hollestelle

Color
Frequency
Proportion

Yellow
14
0.237

Purple
13
.220

Individual Data
Red
Green Orange
10
14
8
.169
.237
.136

Total
59
1

Observation of Individual Data, Group Data and Class Data : Veronica Hollestelle
While the group data seemed to have equal amounts of each color, the group ones did not.
This proves that the more samples you use the more consistent and equal the data count will
be. Th4ere was even more difference in my data compared to the class data. Orange for the
class was the highest .227, the group .189 my count .136 were the least. The lowest color for
the class was yellow .184 and the group count .193 while mine was the highest .237. The
highest group count was green at .226 mine for green was .237 and the class showed a high
number of green also with .204. I had the highest count for yellow, everyone had more orange
candies while I had the least amount of orange candies. My bag had one of the lowest counts of
candies per bag and the colors varied a lot from almost everyone.

Organizing and Displaying Quantitative Data: the Number of Candies per Bag
Group Data
Name
Hashimoto Ashlie
Hollestelle Veronica
Howell Shainalyn
Pongprajuc Thitirat
Total

Yellow
13
14
6
14
47

Summary statistics:
Column n
Mean Std. dev. Min
Total
4
60.8
1.26
59

Purple
13
13
10
11
47

Q1
60

Red
10
10
18
10
48

Median Q3
61
61.5

Green
12
14
16
13
55

Orange
13
8
11
14
46

Total
61
59
61
62
243

Max
62

IQR = Q3 - Q1
= 61.5 60
= 1.5
Lower fence = Q1 1.5IQR
= 60 1.5(1.5)
= 60 2.25
= 57.75
Upper fence = Q3 + 1.5IQR
= 62 + 1.5(1.5)
= 62 + 2.25
= 64.25
There is no outlier for the group data.

Overall Class Data

Name
Alaguretnam Nitharshan
Becker Jenna
Bekavac Morena
Dunn Devin
Ebert Diedre
Hashimoto Ashlie
Hills Seung
Hollestelle Veronica
Howell Shainalyn
Jackson amanda
Jameson Samantha
Juback Haley
Karaiskos Kalliopi
Pongprajuc Thitirat
Rojas Kasy
Schofield Victoria
Schott Kristina
Seike Nai
Shimizu Shelsea
Smith Richard
Sorto Jennifer
Sorto Nicole
Taylor Kelcee
Terrell Elizabeth
Wright Heather
Total

Yellow
6
10
11
10
20
13
13
14
6
9
12
13
16
14
10
4
11
17
4
13
8
13
6
11
12
276

Purple
13
15
11
11
12
13
9
13
10
10
13
12
12
11
10
16
7
12
13
18
11
9
11
10
12
294

Red
13
8
15
10
8
10
12
10
18
15
13
6
12
10
10
13
12
6
10
8
14
8
15
18
8
282

Green
14
13
6
13
9
12
13
14
16
11
11
12
12
13
14
11
12
11
7
13
10
15
17
13
13
305

Orange
16
13
17
9
6
13
12
8
11
17
10
15
16
14
16
17
18
14
22
10
17
11
11
10
16
339

Total
62
59
60
53
55
61
59
59
61
62
59
58
68
62
60
61
60
60
56
62
60
56
60
62
61
1496

Summary statistics:
Column n
Mean Std. dev. Min
Total
25
59.8
2.90
53

Q1
59

Median
60

Q3
61

Max
68

IQR = Q3 - Q1
= 61 59
= 2
Lower fence = Q1 1.5IQR
= 59 1.5(2)
= 59 3
= 56
Upper fence = Q3 + 1.5IQR
= 61 + 1.5(2)
= 61 + 3
= 64
The class datas outliers are 53, 55 and 68.

Observation of Group Data and Class Data


For the variable of the total number of candies in each bag of the overall class data had
a normal distribution as the boxplot above. The graphs reflected what I thought that the
different sizes of data would show some differences because the total of candies for the whole
class was 1496, and the number of skittles was 25 bags meaning that the mean was 59.8, and
the standard deviation was 2.90. The minimum was 53, Q1 was 59, median was 60, Q3 was 61,
and the maximum was 68. The boxplot of our group data was different from the whole class
data because the distribution was skewed left. The totals candies of the group was 243, and the
number of skittles was 4 bags. The group mean was 59.8, and the standard deviation was 1.26.
The minimum was 59, Q1 was 60, median was 61, Q3 was 61.5, and the maximum was 62.
Moreover, there were 3 outliers in the class data, but there was no outlier in our class data.

Reflection : Thirirat Pongprajuc


Both quantitative and categorical data are very different from each other. Quantitative
data portrays a value or amount. It is always a number. The best graphs for quantitative data
are boxplot, histogram, and stem-and-leaf plot. These are used for quantitative data because
they use numbers not categories to plot the data. The type of graphs, which are not ideal for
quantitative data, are pie charts and bar graphs because they do not use numbers to graph the
data. Categorical data is used when we discuss a category like gender. It is based on a
characteristic, but not on a number value. The best graphs used for categorical data are pie
charts and bar graphs. These graphs are ideal for categories because they graph the number
value, which indicate the frequency of each category. A bad way of portraying categorical data
would be to use a stem-and-leaf plot, box plots, or a histogram. These graphs use only
numerical values and not any categorical data.
Reflection : Shainalyn Howell
The Categorical data in this project would be represented by the colors of the candies,
because categorical is names and labels used to represent measurements. The quantitative
data would be represented by the values or measurements, which are the totals or amounts of
each color candy. The 5 number summary wouldnt work if I tried to explain that using just the
colors. You need both the categorical and qualitative information attached for most of the data
to be interpreted correctly. The pie charts are a good representation of both categorical data
and quantitative data working together, but the histogram and 5 number summary is more
representative of the qualitative data only.
Reflection : Ashlie Hashimoto
Quantitative and categorical data are different from one another. One serves more of
numerical purpose and is always a number for example how old are you, how many skittles in a
bag, or how many eggs are in a carton. Where categorical or qualitative data is best used for
things that can be categorized and very numeric for example types of cars, color of the sky, or
how soft a cat might be. When graphing quantitative data it works best to use a boxplot,
histogram, or stem-and-leaf plot. Also I would avoid using a pie chart and bar graph because
they dont work well with numerical data but they are great for qualitative data.
Reflection : Veronica Hollestelle
Pie charts or bar graphs are not as good for quantitative data. They are better for
categorical data such as the colors of skittles. You have a visual of categorical data and how it is
proportioned. Quantitative is an amount, a count of somethingand the best graphs would be
something like a stem and leaf, histograph or a boxplot. Breaking things down into a numerical
count. In this case I believe the pie chart was good at representing both types of data
quantitative and categorical. With this Skittles project the color and count of each color uses
bothe quantitative and categorical data.

Confidence Interval Estimate


The purpose of Confidence Intervals is to estimate the true value of a population
proportion by using a sample proportion. We use a confidence interval, rather than a single
value to estimate results that are more accurate.
Construct a 95% confidence interval estimate for the true proportion of purple candies.
95% confidence interval results:
p : Proportion of successes
Method: Standard-Wald
Proportion Count Total Sample Prop.
Std. Err.
L. Limit
U. Limit
P
294 1496 0.19652406 0.010273739 0.17638791 0.21666022

Based on the StatCrunch result of our data, we are 95% confidence that the interval
between 0.1764 and 0.2166 actually does contain the true value of the population proportion
of purple candies. This means that if we were to select many different samples of size 1496 and
construct the corresponding confidence intervals, 95% of them would actually contain the value
of the population proportion of purple candies.

Construct a 99% confidence interval estimate for the true mean number of candies per bag.
99% confidence interval results:
: Mean of variable
Variable Sample Mean
Std. Err.
N of Candies
59.84
0.57930993
for each bag

DF
24

L. Limit
58.219705

U. Limit
61.460295

Based on the calculations from our data, we are 99% confident that the interval from
actually does contain the true value of . This means that if we selected many different
samples of the same size and construct the corresponding confidence intervals; in the long run
99% of them would actually contain the value of .
Construct a 98% confidence interval estimate for the standard deviation of the number of
candies per bag.
98% confidence interval results:
2 : Variance of variable
Variable Sample Var.

DF

L. Limit

U. Limit

Variable Sample Var.


N of Candies
8.39
for each bag

DF
24

L. Limit
4.6849894

U. Limit
18.547651

Find
4.6849894 18.557651
2.1645 4.3079
Based on this result, we have 98% confidence that the limits of 2.167 and 4.312 contain
the true value of .

Hypothesis Tests
A hypothesis test is a test whether a claim of a value of a population proportion, a
population mean, or a population standard deviation and whether or not the claim is true. The
purpose of a hypothesis test is to make a conclusion about a claim
Use a 0.01 significance level to test the claim that 20% of all Skittles candies are green.

n = 1496
p = 0.2

= 0.2039

x = 305
q = 0.8

= 0.01

Step 1: The original claim is that 20% of all Skittles candies are green. p = 0.20
Step 2: The opposite of the original claim is p 0.20
Step 3: The null hypothesis is p = 0.20 and the alternative hypothesis is p 0.20
0 = 0.20,
0.20
Step 4: The significance level = 0.01
Step 5: Because the testing claim is a population proportion , the sample statistic is relevant,
which makes it a normal distribution.
Step 6: The test statistic z = 0.37 is calculated as: p-value = 0.7077 > 0.01
Hypothesis test results:
p : Proportion of successes
H0 : p = 0.2
HA : p 0.2
Proportion
P

Count
305

Total
1496

Sample Prop.
0.20387701

Std. Err.
Z-Stat
P-value
0.010341754 0.37488858 0.7077

Step 7: Because the p-value is greater than the significance level of = 0.01 the null hypothesis
is supported
p-value > or
0.7077 > 0.01 0
Step 8: From this hypothesis test, because the null hypothesis was failed to reject, there is
sufficient evidence to support the rejection of the claim that 20% of all Skittles candies are
green.

Use a 0.05 significance level to test the claim that the mean number of candies in a bag of
Skittles is 56.
n = 25

= 59.8

= 56

= 0.05

= 0.025

Step 1: The original claim is that the mean number of candies in a bag of Skittles is 56. = 56

Step 2: The alternative to the original claim is does not equal 56.
Step 3: The null hypothesis is = 56 and the alternative hypothesis is 56.
0 = 56,
56
Step 4: The significance level = 0.05
Step 5: Because the testing claim is a population mean , the sample statistic mean is
relevant, which makes it a student t distribution.
Step 6: The test statistic t = 6.6286 is calculated.
Hypothesis test results:
: Mean of variable
H0 : = 56
HA : 56
Variable Sample Mean
N of Candies
59.84
for each bag

Std. Err.
0.57930993

DF
24

T-Stat
6.6285761

P-value
<0.0001

Step 7: Because the p-value is less than the significance level of = 0.05, we reject the null
hypothesis.
p-value < or
(< 0.0001) < 0.01 0
Step 8: There is not sufficient statistical evidence to support the claim that the mean number of
candies in a bag of Skittles is 56.

There are three conditions for confidence Interval for estimating a population proportion p
1) The sample is a simple random sample.
2) Either or both of these conditions are satisfied: the population is normally distributed or
n>30.
Our sample met the both conditions that the sample is randomly selected, and the
population is normally distributed. However, our sample size is 25 which is smaller than the
requirement of 30. We can still calculate for confidence interval for estimating a population
proportion p.
Conditions for Confidence Interval for Estimating a Population Mean with not known
1) The sample is a simple random sample.
2) Either or both of these conditions is satisfied: The population is normally distributed or
n > 30.
Our sample met the both conditions that the sample is randomly selected, and the
population is normally distributed. However, our sample size is 25 which is smaller than the
requirement of 30. We can still calculate for confidence interval for estimating a population mean
with not known.

Conditions for Confidence Interval for estimating a population Standard Deviation or Variance
1) The sample is a simple random sample.
2) The population must have normally distributed values.
Our sample met the condition that the sample is randomly selected. We do not know
exactly if the population is normally distributed, we assume that it is normal distribution. Since
the two conditions are met, we can calculate for confidence interval for estimating a population
standard Deviation or variance.

Mistakes could be made gathering this data. One type of error could be recording
incorrect data. This could happen if the person counted incorrectly or wrote the wrong quantity
down for that color. The sampling method could be improved by increasing the sample
size. We could also improve the sampling method by acquiring bags from different parts of the
country and/or world, rather than the local area.
We have drawn the conclusion that the true mean number of candies in each bag of
Skittles is close to the actual mean we found by gathering our data. We have also drawn the
conclusion that each color of Skittle is somewhat evenly proportioned in each bag.

Reflection
Some of the things that I have learned as a result from this paper are that statistics is
not easy as algebra, but I can use it in real life situations. I am not sure that I would ever have to
know how many candies are in a Skittles bag, but it made the statistics learning fun and
different. I was surprised to know that not all Skittles bags has the even numbers of candies.
Some bags may have some candy color more than others, which makes me wonder what is the
probability of all Skittles candy bags in the world would have outliers numbers of candies.
The math skills that I will applied from this project to other classes is interpreting words
problems. In statistics, understanding the problems and choosing the right formula are very
critical because if I do not read the problems carefully. I would easily solve the problems by
using wrong methods, and the answer would not even close to the right answer. If I apply
interpreting skills to my future classes, I would get the information or make decision correctly.
This Statistics class refreshed my problem solving skills after having taken ten years off of
school. I was confident at the beginning of the semester, but facing complex problem made me
have less confidence. However, I gained a lot of confidence again after practicing solving
problems. With a variety of different formulas and scenarios throughout the book, my thought
process was challenged and made me think through problems. I was able to see the changes
and development in my problem making skills as I completed different parts of the project.
Each section had specific challenges that required us to use our resources and judgement. This
project changed the way I think about real-world math applications. I learned that I could use
statistics in many real world ways, not just using skittles. When you think about it, you actually
use it a lot more than you know, but only simple statistics. I had never thought that the chance

of having a certain gender of baby could be calculated statistically. In order to build an airplane,
the engineer has to consider to the size of seats regarding to an average size of passengers
body. I noticed that statistics is all around, but people do not realize that it is in the real world.