You are on page 1of 16

John Obiala

Skittles Project

Part 2: Sampling

The statcrunch row numbers associated with the three bags of skittles that our group selected
were 22, 31 and 8. To select our bags, we used a random number generator program on a TI-83
calculator. We used the first two digits behind the decimal place to determine which bag we would
choose. When we got a number that was higher than the amount of rows, we disregarded it and kept
choosing until we had three numbers that corresponded to rows from the sample. The name of the
sampling method used to obtain our groups sample was Cluster sampling because each bag of skittles
was randomly chosen and every skittle from the chosen bags were included in the sample. The sample
totals for our three bags were: 43 red, 38 orange, 33 yellow, 34 green, 30 purple and 178 total candies.
The possible errors that could occur would be non-sampling errors, like if someone miscounted the
number of skittles in their bags. We could minimize the chance of this type of error from occurring by
having each person count their skittles twice. Also, generally speaking, a larger sample size is more
representative of a population. If we were to increase the sample size from 3 bags to 5, it would be a
better representation of the entire population of skittles.

Part 3: Organizing and displaying data


Candy color is categorical data because it is data that consists of names or labels only and they
are not numbers representing counts or measurements. The proportions from my bag are lopsided
and further away from the class values, as some colors account for a much higher percentage of skittles
than others. The proportions for my bag are as follows: .1579 Red, .1754 Orange, .2632 Yellow, .1404
Green and .2632 purple. When we compare our group sample of 3 bags, we can see that the numbers
are slightly more evenly distributed. On the same token, the sample from the three bags has
proportions of colors that are closer to the total population of skittles. Proportions from the 3 bag
sample are .2416 Red, .2135 Orange, .1854 Yellow, .1910 Green and .1685 Purple. This sample is still
slightly lopsided, but less so than my single bag. When looking at the total population of skittles, we can
see that the proportions are more evenly distributed with smaller variations from color to color. The
population proportions are .1843 Red, .1934 Orange, .2073 Yellow, .2051 Green and .2100 Purple.
Considering the data, we can conclude that as the sample size increases it becomes a better
representation of the overall population. The 3 bag sample proportions are closer to that of the

population than my single bag. If we had a sample of 5 bags of skittles it would be an even better
representation of the population.

Number of
Red candies
9

Number of
Orange
candies
10

Number of
Yellow
candies
15

Number of
Green
candies
8

Number of
Purple
candies
15

Total
number of
candies
57

Bag 1: Row
#22
Bag 2: Row
#31
Bag 3: Row
#8
Sample
Totals

14

16

11

12

59

17

16

14

59

12

23

15

60

43

38

33

34

30

178

The group of
ALL bags
collected by
the entire
class

344

361

387

383

392

1867

My bag of
skittles
Sample of
three bags
from our
group

B. The number of candies per bag is quantitative. We know this because the data consists of numbers
which represent counts or measurements. Both the boxplot and frequency distribution show us that the
distribution is skewed left. However, when I specified the boxplot graph to use fences for outlier values,
the shape of the boxplot changed from being skewed left to more symmetrical.
The graphs do reflect what I expected to see. Each 2.17 oz bag of skittles should have a similar
number of total candies. My bag of skittles had 57 candies, which falls in the first quartile of the 5
number summary. After looking at our class data sheet, I saw that all but two values (49 and 53) fell in
close proximity to each other.

n = 31
Mean (X Bar): 60.2
Standard deviation (Sx): 3.13
5 Number summary
Min = 49
Q1 = 59
Med = 61
Q3 = 62
Max = 64

Part 4

Part A.

A confidence interval is a range (or an interval) of values used to estimate the true value of a
population parameter.
True proportion of yellow candies using the class data as my sample:
0.1831<p<0.2315 (See scanned attachment to see work for this problem.)
I am 99% confident that the true proportion of yellow candies is between 18.31% and 23.15%
The single bag of skittles that I purchased had a proportion of yellow candies that was .2632.
This proportion falls outside of the confidence interval that I calculated. Therefore, it would be
considered to be an unusual value.
True mean number of candies per bag with a 95% confidence interval
59.08<u<61.38
I am 95% confident that the true mean number of skittles per bag is between 59.08 and 61.38
candies. Based on my interval for the true mean number of candies per bag, the single bag of
candy I purchased was unusual. My bag contained 57 candies and did not fall within the
confidence interval which I calculated. However, since my bag count was actually an individual
value, not a mean, it makes sense that many such individual values would not be contained in an
interval for the mean.
Standard deviation of the number of candies per bag with a 98% confidence interval estimate
2.40<Sigma<4.43
I am 98% confident that the standard deviation of the number of candies per bag is between
2.40 and 4.43.
Conditions for doing each of the 3 interval estimates:
Population Proportion p
1. Its a simple random sample; there are a fixed number of trials, the trials are independent,
there are two categories of outcomes and the probabilities remain constant for each trial.
The number of trials was 1867.
2. There are at least 5 successes and 5 failures.
Not all of these requirements were met. The sample was not a simple random sample. The
method for choosing the skittles was closer to a cluster sample and the method used to
purchase the bags of skittles was a matter of convenience. Because we looked at less than
5% of the population, all the trials are considered to be independent. There were at least 5
success and 5 failures. There were 387 yellow skittles, which were considered to be
successes. There were 1480 failures (skittles which were not yellow).
Population mean
1. The sample is a simple random sample

2. Either or both of these conditions are satisfied: The population is normally distributed or
n > 30
These requirements were not met because the sample was not a simple random sample.
The population is not normally distributed but n = 31 so the second requirement was met.
Population standard deviation
1. Its a simple random sample
2. The population must have normally distributed values, even if its a large sample.
These requirements were not met. The sample was not a simple random sample. The
population is not normally distributed. For normal distribution, the frequency starts low,
then increases to one or two high frequencies, and then decreases to a low frequency. In
this case, our distribution is skewed left.
Part B. Hypothesis tests

In statistics, a hypothesis is a claim or a statement about a property of a population.


A hypothesis test is a procedure for testing a claim about a property of a population.

Use a 0.05 significance level to test the claim that 20% of all skittles candies are red. (See
attached work)

Z = -1.70

P Value = 0.08893

Since the P Value is greater than the significance level of 0.05, there is insufficient evidence to
warrant rejection of the claim that 20% of all skittles are red.

Use a 0.01 significance level to test the claim that the mean number of candies in a bag of
skittles is 55.
Critical value: t = 2.750 (see attached work)
There is sufficient evidence to reject the claim that the mean number of skittles in a bag is 55.

Conditions for hypothesis tests

Testing a claim about a population proportion

1. The sample observation is a simple random sample


2. The conditions for a binomial distribution are satisfied. (There is a fixed number of
independent trials having constant probabilities, and each trial has two outcome categories
of success and failure.
3. The conditions np is greater than or equal to 5 and nq is greater than or equal to 5 are both
satisfied, so the binomial distribution of sample proportions can be approximated by a
normal distribution with mu equals np and sigma equals the square root of npq
Not all these conditions were met. It is not a simple random sample. The conditions for a
binomial distribution are satisfied, as there is a fixed number of independent trials and each
trial has two outcome categories of success and failure. The conditions np (.20 X 1867) is
greater than or equal to 5 and nq (.80 X 1867) is greater than or equal to 5.
Testing claims about a population mean
1. The sample is a simple random sample
2. Either both of these conditions is satisfied: The population is normally distributed or n > 30.
The first requirement was not met, as this was not a simple random sample. The second
requirement was met because n was greater than 30.

Part 5: Reflection

In our math 1040 class with Tiffany Hilton, the students took part in a skittles project. This
consisted of each student purchasing a 2.17 oz. bag of skittles. Then we counted how many skittles of
each color there were in our bags. In class, we were divided into small groups and required to pick three
bags of skittles from the class sample. The bags had to be chosen randomly. We used a random number
generator on a TI-83 calculator and picked rows that corresponded to the first two decimals that were
generated on the calculator. Then, using stat crunch, we made pie charts, pareto charts and box plots as
well as a histogram to display the data. We made graphs for our individual bags of skittles, the sample of
three bags chosen by our group and the sample of the entire class. The graphs displayed the proportions
of the different colors of skittles and the total number of skittles per bag. Next we constructed
confidence intervals for the proportion of yellow skittles, the true mean number of skittles and for the
standard deviation. The goal of this project was to help us gain a better understanding of the concepts
presented in this class. There were 4 parts to this project and each of them corresponded to chapters
that we were currently learning in class. Another goal of this assignment was to familiarize us with how
to use spreadsheet software. Using a program called stat crunch, we learned how to construct different
types of graphs to display data throughout the project.
As a result of this project, I have learned how to do a lot of things that I previously could not
have done. I know about the different types of sampling methods that are used and the characteristics
of each one. For example, this project used a cluster sampling method. We picked bags of skittles using
a random number generator and then sampled all of the skittles associated with those particular bags.
Prior to taking this class, I did not know how to use spreadsheet software to display data. After
completing this project, I would be confident about my ability to properly construct several different
types of graphs to display data in a way that is meaningful and understandable. Also, I now know how to

construct confidence intervals for the true proportion, mean and standard deviation. I know the
formulas needed to perform each calculation and how to consult Z score, T score and Chi score tables to
figure out the area of the confidence interval and to figure out whether or not to reject a null
hypothesis. I also know how to use the calculator functions on my TI-83 to construct the intervals and
find the associated values.
Although I am not sure if I will need to take another math class to obtain my degree, some of
the information learned during this course will no doubt be useful in my future classes. More than
anything, knowing how to use spreadsheet software to display data and construct graphs will be
advantageous for displaying information in a way that is easily interpretable for class projects. Displaying
data in this fashion can also make a project or a paper more persuasive and help the audience see things
from your perspective. Since I am pursuing a degree in nursing, being familiar with statistics and
knowing how to apply these concepts within the field will be vitally important to delivering high quality
care and practicing medicine in a way that is evidence based and known to be effective. By the time I
have obtained a masters or doctoral degree in nursing and established an advanced practice, statistics
will take on an even greater role within the field. Because I will be prescribing medications to patients, I
will want to know what drugs are proven to work best for treating specific conditions. It is likely that I
will be using statistical studies (or consulting with practitioners who have) as a resource to figure out
what medications and other treatments will work most efficiently to treat the ailments of my patients.