126 views

Uploaded by api-302132755

- Hypothesis Testing
- CFA Level 1 Quantitative Analysis E Book - Part 4(1)
- Oiml Bulletin July 2002
- MATH 533 Course Project All Weeks Part A_Part B_Part C_AJ DAVIS is a Department Store Chain Answer
- Truthfinding with conflicting information on the web
- SPSS.TwoWay.PC.pdf
- Biostatistics-Null hypothesis
- Sample Exam
- PPMIDTERM2s131
- Business Statistics
- MICS Sample Size Calculation Template 20130421
- Best Practice Guide on Statistical
- Stats Midterm
- ADK
- Course 1 Notes 5
- Roll No Ending With 2
- Hypothesis testing
- Chap009.pdf
- e-port posting
- article1114.pdf

You are on page 1of 17

MATH 1040 Skittles Project

December 9, 2015

Introduction

This Skittles project represents a sample of college students at Salt Lake Community

College in class of Introduction to Statistics, Math 1040-007, and all the students purchased a

2.17- ounce bag of Skittles at different locations and at different times. This is not a simple

random sample because every student had an equal chance of being chosen for the sample.

The population would be all of the students who purchase a 2.17-ounce bag of Skittles because

everyone was chosen to participate in the sample.

In this project, we compiled the data of the colors and amounts of skittles in each 2.17ounce bag that each student bought. There are 25 respondents, or students in this class. The

cumulative number of skittles came out to 1496. The colors represented a categorical data

(colors) which corresponds to a quantitative data, or the number of skittles in each color. The

entire data consisted of Yellow (276), Purple (294), Red (282), Green (305) and Orange (339).

We needed to create a pie chart, Pareto chart, histogram, and boxplot. We calculated the

mean, standard deviation, five number summaries, and confidence intervals. Here are the data

and conclusions we compiled from all of the available data.

Our guess was that the proportion of each color within the whole sample would be very

different considering the fact that there were five different colors of skittles. When we

examined the data, we determined that the proportions of the skittles would average out to be

mostly even proportions. Each bag contains a different combination of these numbers for each

color causing the proportions to be different in the sample.

Group Data

Color

Frequency

Proportion

Yellow

47

0.193

Group Data

Purple

Red

47

48

0.193

0.198

Green

55

0.226

Orange

46

0.189

Total

243

1

Yellow

276

0.184

Purple

Red

Green

294

282

305

0.197

0.189

0.204

Orange

339

0.227

Total

1496

1

Color

Frequency

Proportion

Color

Frequency

Proportion

Yellow

14

0.226

Individual Data

Purple

Red

Green

11

10

13

0.177

0.161

0.210

Orange

14

0.226

Total

62

1

Observation of Individual Data, Group Data and Class Data : Thitirat Pongprajuc

What I saw was that my data was slightly similar to the whole class sample. The most

candy color that I had was orange which was also the highest for the overall class sample. For

my data, the orange and yellow had the same amount that was not the case for the class data.

Also, red was my lowest count not yellow, which was one of the classes lowest counts.

However, the highest number of candy color of my group data was green followed by red, and

the lowest number of candy color was orange. The data reflected what I expected to see that

some of my data would be similar to the whole class data, but not exactly the same, but I did

not expect that the group data was extremely different from my data and the class data.

Color

Frequency

Proportion

Yellow

6

0.098

Individual Data

Purple

Red

Green

10

18

16

0.164

0.295

0.262

Orange

11

0.180

Total

61

1

Observation of Individual Data, Group Data and Class Data : Shainalyn Howell

The data gathered from the class is represented above by a Pie chart and a Pareto chart.

As you can see from these graphs orange had the most and yellow had the least. This I slightly

different from my individual charts, because my bag had more red, orange was in the middle

but yellow was also the least amount in my bag (individual data represented in the graphs

below). Having never put much thought into how many different colored candies are in my

skittle bag before this project I would have just assumed that the number of colors would have

been distributed evenly throughout the bag.

Color

Frequency

Proportion

Yellow

13

0.213

Individual Data

Purple

Red

13

10

0.213

0.164

Green

12

0.198

Orange

13

0.213

Total

61

1

Observation of Individual Data, Group Data and Class Data : Ashlie Hashimoto

My data didnt seem to be that similar to the Class or my Group data. Upon comparing

my data to the group I had less red and green skittles and more yellow, purple, and green.

When comparing my data to class I had more yellow and purple and less red, green, but the

same color of orange skittles. The data didnt seem to reflect what I thought I would see. I

thought that I would be able to have more in common with the other data from the group and

class than I witnessed.

Color

Frequency

Proportion

Yellow

14

0.237

Purple

13

.220

Individual Data

Red

Green Orange

10

14

8

.169

.237

.136

Total

59

1

Observation of Individual Data, Group Data and Class Data : Veronica Hollestelle

While the group data seemed to have equal amounts of each color, the group ones did not.

This proves that the more samples you use the more consistent and equal the data count will

be. Th4ere was even more difference in my data compared to the class data. Orange for the

class was the highest .227, the group .189 my count .136 were the least. The lowest color for

the class was yellow .184 and the group count .193 while mine was the highest .237. The

highest group count was green at .226 mine for green was .237 and the class showed a high

number of green also with .204. I had the highest count for yellow, everyone had more orange

candies while I had the least amount of orange candies. My bag had one of the lowest counts of

candies per bag and the colors varied a lot from almost everyone.

Organizing and Displaying Quantitative Data: the Number of Candies per Bag

Group Data

Name

Hashimoto Ashlie

Hollestelle Veronica

Howell Shainalyn

Pongprajuc Thitirat

Total

Yellow

13

14

6

14

47

Summary statistics:

Column n

Mean Std. dev. Min

Total

4

60.8

1.26

59

Purple

13

13

10

11

47

Q1

60

Red

10

10

18

10

48

Median Q3

61

61.5

Green

12

14

16

13

55

Orange

13

8

11

14

46

Total

61

59

61

62

243

Max

62

IQR = Q3 - Q1

= 61.5 60

= 1.5

Lower fence = Q1 1.5IQR

= 60 1.5(1.5)

= 60 2.25

= 57.75

Upper fence = Q3 + 1.5IQR

= 62 + 1.5(1.5)

= 62 + 2.25

= 64.25

There is no outlier for the group data.

Name

Alaguretnam Nitharshan

Becker Jenna

Bekavac Morena

Dunn Devin

Ebert Diedre

Hashimoto Ashlie

Hills Seung

Hollestelle Veronica

Howell Shainalyn

Jackson amanda

Jameson Samantha

Juback Haley

Karaiskos Kalliopi

Pongprajuc Thitirat

Rojas Kasy

Schofield Victoria

Schott Kristina

Seike Nai

Shimizu Shelsea

Smith Richard

Sorto Jennifer

Sorto Nicole

Taylor Kelcee

Terrell Elizabeth

Wright Heather

Total

Yellow

6

10

11

10

20

13

13

14

6

9

12

13

16

14

10

4

11

17

4

13

8

13

6

11

12

276

Purple

13

15

11

11

12

13

9

13

10

10

13

12

12

11

10

16

7

12

13

18

11

9

11

10

12

294

Red

13

8

15

10

8

10

12

10

18

15

13

6

12

10

10

13

12

6

10

8

14

8

15

18

8

282

Green

14

13

6

13

9

12

13

14

16

11

11

12

12

13

14

11

12

11

7

13

10

15

17

13

13

305

Orange

16

13

17

9

6

13

12

8

11

17

10

15

16

14

16

17

18

14

22

10

17

11

11

10

16

339

Total

62

59

60

53

55

61

59

59

61

62

59

58

68

62

60

61

60

60

56

62

60

56

60

62

61

1496

Summary statistics:

Column n

Mean Std. dev. Min

Total

25

59.8

2.90

53

Q1

59

Median

60

Q3

61

Max

68

IQR = Q3 - Q1

= 61 59

= 2

Lower fence = Q1 1.5IQR

= 59 1.5(2)

= 59 3

= 56

Upper fence = Q3 + 1.5IQR

= 61 + 1.5(2)

= 61 + 3

= 64

The class datas outliers are 53, 55 and 68.

For the variable of the total number of candies in each bag of the overall class data had

a normal distribution as the boxplot above. The graphs reflected what I thought that the

different sizes of data would show some differences because the total of candies for the whole

class was 1496, and the number of skittles was 25 bags meaning that the mean was 59.8, and

the standard deviation was 2.90. The minimum was 53, Q1 was 59, median was 60, Q3 was 61,

and the maximum was 68. The boxplot of our group data was different from the whole class

data because the distribution was skewed left. The totals candies of the group was 243, and the

number of skittles was 4 bags. The group mean was 59.8, and the standard deviation was 1.26.

The minimum was 59, Q1 was 60, median was 61, Q3 was 61.5, and the maximum was 62.

Moreover, there were 3 outliers in the class data, but there was no outlier in our class data.

Both quantitative and categorical data are very different from each other. Quantitative

data portrays a value or amount. It is always a number. The best graphs for quantitative data

are boxplot, histogram, and stem-and-leaf plot. These are used for quantitative data because

they use numbers not categories to plot the data. The type of graphs, which are not ideal for

quantitative data, are pie charts and bar graphs because they do not use numbers to graph the

data. Categorical data is used when we discuss a category like gender. It is based on a

characteristic, but not on a number value. The best graphs used for categorical data are pie

charts and bar graphs. These graphs are ideal for categories because they graph the number

value, which indicate the frequency of each category. A bad way of portraying categorical data

would be to use a stem-and-leaf plot, box plots, or a histogram. These graphs use only

numerical values and not any categorical data.

Reflection : Shainalyn Howell

The Categorical data in this project would be represented by the colors of the candies,

because categorical is names and labels used to represent measurements. The quantitative

data would be represented by the values or measurements, which are the totals or amounts of

each color candy. The 5 number summary wouldnt work if I tried to explain that using just the

colors. You need both the categorical and qualitative information attached for most of the data

to be interpreted correctly. The pie charts are a good representation of both categorical data

and quantitative data working together, but the histogram and 5 number summary is more

representative of the qualitative data only.

Reflection : Ashlie Hashimoto

Quantitative and categorical data are different from one another. One serves more of

numerical purpose and is always a number for example how old are you, how many skittles in a

bag, or how many eggs are in a carton. Where categorical or qualitative data is best used for

things that can be categorized and very numeric for example types of cars, color of the sky, or

how soft a cat might be. When graphing quantitative data it works best to use a boxplot,

histogram, or stem-and-leaf plot. Also I would avoid using a pie chart and bar graph because

they dont work well with numerical data but they are great for qualitative data.

Reflection : Veronica Hollestelle

Pie charts or bar graphs are not as good for quantitative data. They are better for

categorical data such as the colors of skittles. You have a visual of categorical data and how it is

proportioned. Quantitative is an amount, a count of somethingand the best graphs would be

something like a stem and leaf, histograph or a boxplot. Breaking things down into a numerical

count. In this case I believe the pie chart was good at representing both types of data

quantitative and categorical. With this Skittles project the color and count of each color uses

bothe quantitative and categorical data.

The purpose of Confidence Intervals is to estimate the true value of a population

proportion by using a sample proportion. We use a confidence interval, rather than a single

value to estimate results that are more accurate.

Construct a 95% confidence interval estimate for the true proportion of purple candies.

95% confidence interval results:

p : Proportion of successes

Method: Standard-Wald

Proportion Count Total Sample Prop.

Std. Err.

L. Limit

U. Limit

P

294 1496 0.19652406 0.010273739 0.17638791 0.21666022

Based on the StatCrunch result of our data, we are 95% confidence that the interval

between 0.1764 and 0.2166 actually does contain the true value of the population proportion

of purple candies. This means that if we were to select many different samples of size 1496 and

construct the corresponding confidence intervals, 95% of them would actually contain the value

of the population proportion of purple candies.

Construct a 99% confidence interval estimate for the true mean number of candies per bag.

99% confidence interval results:

: Mean of variable

Variable Sample Mean

Std. Err.

N of Candies

59.84

0.57930993

for each bag

DF

24

L. Limit

58.219705

U. Limit

61.460295

Based on the calculations from our data, we are 99% confident that the interval from

actually does contain the true value of . This means that if we selected many different

samples of the same size and construct the corresponding confidence intervals; in the long run

99% of them would actually contain the value of .

Construct a 98% confidence interval estimate for the standard deviation of the number of

candies per bag.

98% confidence interval results:

2 : Variance of variable

Variable Sample Var.

DF

L. Limit

U. Limit

N of Candies

8.39

for each bag

DF

24

L. Limit

4.6849894

U. Limit

18.547651

Find

4.6849894 18.557651

2.1645 4.3079

Based on this result, we have 98% confidence that the limits of 2.167 and 4.312 contain

the true value of .

Hypothesis Tests

A hypothesis test is a test whether a claim of a value of a population proportion, a

population mean, or a population standard deviation and whether or not the claim is true. The

purpose of a hypothesis test is to make a conclusion about a claim

Use a 0.01 significance level to test the claim that 20% of all Skittles candies are green.

n = 1496

p = 0.2

= 0.2039

x = 305

q = 0.8

= 0.01

Step 1: The original claim is that 20% of all Skittles candies are green. p = 0.20

Step 2: The opposite of the original claim is p 0.20

Step 3: The null hypothesis is p = 0.20 and the alternative hypothesis is p 0.20

0 = 0.20,

0.20

Step 4: The significance level = 0.01

Step 5: Because the testing claim is a population proportion , the sample statistic is relevant,

which makes it a normal distribution.

Step 6: The test statistic z = 0.37 is calculated as: p-value = 0.7077 > 0.01

Hypothesis test results:

p : Proportion of successes

H0 : p = 0.2

HA : p 0.2

Proportion

P

Count

305

Total

1496

Sample Prop.

0.20387701

Std. Err.

Z-Stat

P-value

0.010341754 0.37488858 0.7077

Step 7: Because the p-value is greater than the significance level of = 0.01 the null hypothesis

is supported

p-value > or

0.7077 > 0.01 0

Step 8: From this hypothesis test, because the null hypothesis was failed to reject, there is

sufficient evidence to support the rejection of the claim that 20% of all Skittles candies are

green.

Use a 0.05 significance level to test the claim that the mean number of candies in a bag of

Skittles is 56.

n = 25

= 59.8

= 56

= 0.05

= 0.025

Step 1: The original claim is that the mean number of candies in a bag of Skittles is 56. = 56

Step 2: The alternative to the original claim is does not equal 56.

Step 3: The null hypothesis is = 56 and the alternative hypothesis is 56.

0 = 56,

56

Step 4: The significance level = 0.05

Step 5: Because the testing claim is a population mean , the sample statistic mean is

relevant, which makes it a student t distribution.

Step 6: The test statistic t = 6.6286 is calculated.

Hypothesis test results:

: Mean of variable

H0 : = 56

HA : 56

Variable Sample Mean

N of Candies

59.84

for each bag

Std. Err.

0.57930993

DF

24

T-Stat

6.6285761

P-value

<0.0001

Step 7: Because the p-value is less than the significance level of = 0.05, we reject the null

hypothesis.

p-value < or

(< 0.0001) < 0.01 0

Step 8: There is not sufficient statistical evidence to support the claim that the mean number of

candies in a bag of Skittles is 56.

There are three conditions for confidence Interval for estimating a population proportion p

1) The sample is a simple random sample.

2) Either or both of these conditions are satisfied: the population is normally distributed or

n>30.

Our sample met the both conditions that the sample is randomly selected, and the

population is normally distributed. However, our sample size is 25 which is smaller than the

requirement of 30. We can still calculate for confidence interval for estimating a population

proportion p.

Conditions for Confidence Interval for Estimating a Population Mean with not known

1) The sample is a simple random sample.

2) Either or both of these conditions is satisfied: The population is normally distributed or

n > 30.

Our sample met the both conditions that the sample is randomly selected, and the

population is normally distributed. However, our sample size is 25 which is smaller than the

requirement of 30. We can still calculate for confidence interval for estimating a population mean

with not known.

Conditions for Confidence Interval for estimating a population Standard Deviation or Variance

1) The sample is a simple random sample.

2) The population must have normally distributed values.

Our sample met the condition that the sample is randomly selected. We do not know

exactly if the population is normally distributed, we assume that it is normal distribution. Since

the two conditions are met, we can calculate for confidence interval for estimating a population

standard Deviation or variance.

Mistakes could be made gathering this data. One type of error could be recording

incorrect data. This could happen if the person counted incorrectly or wrote the wrong quantity

down for that color. The sampling method could be improved by increasing the sample

size. We could also improve the sampling method by acquiring bags from different parts of the

country and/or world, rather than the local area.

We have drawn the conclusion that the true mean number of candies in each bag of

Skittles is close to the actual mean we found by gathering our data. We have also drawn the

conclusion that each color of Skittle is somewhat evenly proportioned in each bag.

Reflection

Some of the things that I have learned as a result from this paper are that statistics is

not easy as algebra, but I can use it in real life situations. I am not sure that I would ever have to

know how many candies are in a Skittles bag, but it made the statistics learning fun and

different. I was surprised to know that not all Skittles bags has the even numbers of candies.

Some bags may have some candy color more than others, which makes me wonder what is the

probability of all Skittles candy bags in the world would have outliers numbers of candies.

The math skills that I will applied from this project to other classes is interpreting words

problems. In statistics, understanding the problems and choosing the right formula are very

critical because if I do not read the problems carefully. I would easily solve the problems by

using wrong methods, and the answer would not even close to the right answer. If I apply

interpreting skills to my future classes, I would get the information or make decision correctly.

This Statistics class refreshed my problem solving skills after having taken ten years off of

school. I was confident at the beginning of the semester, but facing complex problem made me

have less confidence. However, I gained a lot of confidence again after practicing solving

problems. With a variety of different formulas and scenarios throughout the book, my thought

process was challenged and made me think through problems. I was able to see the changes

and development in my problem making skills as I completed different parts of the project.

Each section had specific challenges that required us to use our resources and judgement. This

project changed the way I think about real-world math applications. I learned that I could use

statistics in many real world ways, not just using skittles. When you think about it, you actually

use it a lot more than you know, but only simple statistics. I had never thought that the chance

of having a certain gender of baby could be calculated statistically. In order to build an airplane,

the engineer has to consider to the size of seats regarding to an average size of passengers

body. I noticed that statistics is all around, but people do not realize that it is in the real world.

- Hypothesis TestingUploaded byTeena Thapar
- CFA Level 1 Quantitative Analysis E Book - Part 4(1)Uploaded byZacharia Vincent
- Oiml Bulletin July 2002Uploaded bylibijahans
- MATH 533 Course Project All Weeks Part A_Part B_Part C_AJ DAVIS is a Department Store Chain AnswerUploaded byTom Cox
- Truthfinding with conflicting information on the webUploaded byHaveit12
- SPSS.TwoWay.PC.pdfUploaded bymanav654
- Biostatistics-Null hypothesisUploaded byhttp://www.HelpWithassignment.com/
- Sample ExamUploaded byYchan24
- PPMIDTERM2s131Uploaded byJessica Le
- Business StatisticsUploaded byRaktim
- MICS Sample Size Calculation Template 20130421Uploaded byvito_luvito
- Best Practice Guide on StatisticalUploaded bysailandmore
- Stats MidtermUploaded byNodirPrimkulov
- ADKUploaded bySepfira Reztika
- Course 1 Notes 5Uploaded byAdli Fikri Muhammad
- Roll No Ending With 2Uploaded bySanju Visu
- Hypothesis testingUploaded byNestory mahunja
- Chap009.pdfUploaded byThiện Nhân
- e-port postingUploaded byapi-239698020
- article1114.pdfUploaded byMiguel Augusto
- Genetic Algorithm in Economics and Agent Based ModelsUploaded byDaniel Lee Eisenberg Jacobs
- Marketing Research AssignmentUploaded byranjan
- Msc StatisticsUploaded byNamita Vishwakarma
- worksheetUploaded byanon_65353286
- Stat ReviewerUploaded byDaniel Hofilena
- out of field in korea.pdfUploaded byBhannu Ramanan
- STAT 312 Midterm 3 Study GuideUploaded byeyelash2
- Financial EconometricUploaded byLiamDinh
- Epicondilitis (2)Uploaded byluisa
- Chapter 4-Final + modUploaded byRameez

- rels 2400 - topic paper - buddhism in america - nuchUploaded byapi-302132755
- comm2150 service learning paperUploaded byapi-302132755
- comm2150 service learning proposalUploaded byapi-302132755
- news analysis 2Uploaded byapi-302132755
- hlth 1240 - meditation- a stress healer for college studentsUploaded byapi-302132755
- signature assignment eportfolio reflection - thitiratUploaded byapi-302132755
- eportfolio signature assignment - thitirat pongprajuc - biol1610-411Uploaded byapi-302132755
- causal argument - final draftUploaded byapi-302132755
- gmo essay - revised 11-15Uploaded byapi-302132755
- all religions are ultimately the sameUploaded byapi-302132755
- gp4 - finalUploaded byapi-302132755
- chem1010 - signature assignmentUploaded byapi-302132755
- this i believe at the end of the semesterUploaded byapi-302132755
- this i believe at the beginning of the semester - thitiratUploaded byapi-302132755
- eportfolio - 2004 indian ocean tsunami - step8 final draftUploaded byapi-302132755
- anth 1020 - reflectionUploaded byapi-302132755
- labreport1 - edited version - nuchUploaded byapi-302132755
- bus1010 - term research paper - thitirat pongprajuc - final draftUploaded byapi-302132755
- white temple in thailandUploaded byapi-302132755
- eon1010 economics reflectionUploaded byapi-302132755
- essay 4 - final draftUploaded byapi-302132755
- frederick douglass essayUploaded byapi-302132755
- math1010 - height of a zero gravity parabolic flightUploaded byapi-302132755
- math1010 - an optimizing an advertising campaignUploaded byapi-302132755

- note_8Uploaded bylars32madsen
- Characterising Performance of Environmental ModelsUploaded byJohn Zoof
- Fall 2009 Test 2 SolutionUploaded byAndrew Zeller
- Nadia Tahir and Pervez TahirUploaded byImdad Hussain
- AC-CT3-PMA-18v2Uploaded bySkgddd
- Chapter 16 Fundamentals of Variance AnalysisUploaded byarif420_999
- GIS and Remote Sensing Techniques (Preview)Uploaded byGISRemoteSensing
- FinQuiz-Level1Mock2013Version2JunePMSolutionsUploaded bySaad Riaz
- Least Squares Method for Factor AnalysisUploaded byPrasanna Kumar
- 4.Customer relationship management in small–medium enterprisesUploaded byNoor Lishin
- Being Transparent or Spinning the MessageUploaded byamandaalisha
- A Temperature and Emissivity Separation Technique for Hy ImagesUploaded byLiliana Maria Plata Sarmiento
- Assumptions of Simple and Multiple Linear Regression ModelUploaded byDivina Gonzales
- Measuring trust in advertisement through Celebrity endorsementUploaded bysohilkk007
- MCS OverviewsUploaded bywillbark2day
- Cost terms and concepts.docxUploaded byPeter Pa N
- 4Uploaded byAnonymous 3hrzMH7
- Assignment.docxUploaded byKiran Younas
- SigmaXL Version 6 0 WorkbookUploaded byrhobinangel
- Product Placement in BollywoodUploaded bysumeetpallav9654
- B.arch Syllabus for 1 to 6 SemestersUploaded byDinu
- d Willard SonUploaded byAnonymous Tumf1DmyhH
- Genetic parameters of body condition score and milk production traits in Canadian Holsteins .pdfUploaded byfranky
- Chap 010Uploaded bywoomichoi
- ORIGStatistics for PeopleUploaded byYu Hee Kim
- ch-14 theoritical distrubutions.pdfUploaded bysourav kumar ray
- quiz67.docxUploaded byTran Pham Quoc Thuy
- ASTM-D3786-Bursting-Strength-of-Textile-Fabrics.pdfUploaded byAnonymous cPlDZyD
- 01 & 02 STAT HWA aUploaded bycse0909
- Ch-var-basic.pdfUploaded byTPN