You are on page 1of 9

Kyliegh Billings, Melissa Billings

Math 1040-012
Term project-Skittles data

For this project, everyone in the class was asked to buy a 2.17 oz sized bag
of skittles, and count the number of each color of candy in the bag. The class
data was compiled, and this is the data that we used to complete the
different aspects of this statistics assignment.
For the first part of the project we were asked to determine the
proportion of each color within the overall sample gathered by the class. To
do this, we created a Pie Chart and a Pareto Chart representing the numbers
of each color of candy. We compared the class data to our own personal data
and noted any similarities or differences.
For the next portion of the project we used the skittles data to
calculate the mean, standard deviation, and 5-number summary. We then
used this data to make a frequency histogram, and a box plot.
The last part of the project involved confidence intervals, and
hypothesis tests. We found three different confidence intervals. One each for
the population proportion, mean, and standard deviation and wrote an
analysis about what each confidence interval meant.

Colors
Red
Orange
Yellow
Green
Purple

total
number
295
291
282
294
265

TOTAL

COLOR
red
orange
yellow

1427

PROPORTION
0.207
0.204
0.198

green
purple
TOTAL

0.206
0.186
1

Data for my bag of Skittles


Color

Number

Proportion

Percentage
Green

19

.3015873016

30.16%
Red
17.46%

11

.1746031746

Orange

11

.1746031746

12

.1904761905

10

.1587301587

17.46%
Yellow
19.05%
Purple
15.87%

These graphs do represent what I expected to see. I thought that each color
of candy would be equally represented in each bag, and the sample data
seems to suggest that is the case.
With my sample data all colors were approximately equally represented, with
the exception of green candies. In my bag there were significantly more
green candies. the green candies made up .301587, or 30.16%.

Using the total number of candies in each bag in the class sample, we were
asked to calculate the mean, standard deviation, and 5-number summary.
Those results are as follows:
Mean: 59.5
Sample standard deviation: 1.98
Five number summary: Min=55, Q1=58, Median=60, Q3=61, Max=63

(Histogram drawn out by hand on the print out. Could not figure out
how to do it on excel)

The difference between difference between categorical and quantitative data


is that with quantitative data you can do math, it uses numbers. Categorical
data does not include numbers, so you cannot do math with it. It includes
such things as colors, gender, prenatal care, etc

You would use histograms, pie charts, boxplots, stem and leaf plots, and
scatterplots for quantitative data. For categorical data you should use a
Pareto chart.
When it comes to calculations, mean and median only make sense for
quantitative data. The mean is the average quantity of something in an
entire sample, therefore only makes sense when applied to quantitative
data. The median represents the middle value of the data and once again
makes the most sense only when applied to quantitative data. The best
central tendency to apply to categorical data is the mode. When looking at
the colors of candy in a skittles bag, you may not able to find the average
color or the median color, but you can establish which color occurs the most
often.

99% Confidence Interval estimate for the population proportion of


yellow candies

X= 282
n= 1427
Z-value for 99% CI = 2.575
p= 282/1427= 0.198
99% Confidence Interval Estimate: (0.171, 0.225)
Confidence Intervals estimated from a population proportion are used to
determine, with the
specified degree of confidence, the proportion of a characteristic found
within a population. In
relation to the skittles, we are 99% confident that the proportion of yellow
skittles in any bag of
skittles falls between 0.171 and 0.225.
95% Confidence Interval estimate for the population mean number
of skittles per bag
n= 24
Sx = 1.978
Sample mean= 59.458
59.458 +/ 2.069(0.404) =.835
59.458 + .835 = 60.293
59.458- .835 = 58.623
95% Confidence Interval Estimate: (58.623, 60.293)
Confidence Interval estimates of the population mean use sample data to
extrapolate an interval with the specified degree of confidence that the
mean characteristic of a population should fall within. In this case, we are
95% confident that the mean number of skittles in any bag is between
58.623 and 60.293.

Hypothesis Tests allow us to test a given claim by comparing it to the null, or


alternate, of that claim. In this testing, using the given significance level, we
can ascertain whether the claim is valid, the null is valid, or the alternate is
valid.
Use a 0.05 significance level to test the claim that 20% of all Skittles
candies are red.
Claim: p=.20
Null (H0): p=.20
Alternative (H1): p.20

Test Statistic: z0=

pp p 0
p 0(1 p 0)
n

.2067274.20
.20(.80)
=.6353298386=.64
1427

P-value=.7389
Critical value= 1.96
Fail to reject H0. P-value is greater than .025. There is not sufficient evidence to
warrant rejection of the claim that p=.20
This hypothesis test tells us that we can say with 90% confidence that the claim that
20% of all skittles are red is true.

Use a 0.01 significance level to test the claim that the mean number of candies in
a bag of Skittles is 55.
Claim: =55
Null (H0): =55
Alternative (H1): 55

Test Statistic: t0=

xx 0
s/n

59.45833355
4.458333
=
. 4036929466 =11.03
1.977683464 24

P-value=2.5
Critical value= 2.575
Reject H0. There is sufficient evidence to warrant rejection of the claim the mean
number of candies in each bag equals 55.
This hypothesis test tells us that we can say with 99% confidence that the claim that
there is a mean of 55 Skittles per bag is not true.

You might also like