13 views

Uploaded by api-326004315

- Basic Reliability Formulation Handbook2
- MIL-STD-781C and confidence intervals on MTBF.pdf
- Introduction to Business Research 3
- Data Quality Assessment a Reviewer’s Guide
- Econometrics Homework
- inference.ppt
- Design of Packed Bed Reactor Catalyst Based on Shape,Size
- Lexicon
- Example
- a069966.pdf
- 0-Math Nspired Statistics
- Biostat to Answer
- Confidence Intervals.pdf
- term project report
- Basics of Statistical Methods
- term project
- apr14
- Confidence Intervals
- MINITAB 17 BASICS REFERENCE GUIDE.pdf
- shaylorskittlesproject1

You are on page 1of 5

I.

Introduction

The project uses statistical techniques to describe a sample of Skittles candies. A total of 29 students

each purchased one single serving of Skittles. Each student tallied the number of candies per bag and

the number of candies per color per bag and submitted these counts to the instructor. The instructor

supplied all students with this data set for analysis.

II.

Candies by Color

The above charts represent the total sample of candies as reported by all students. Initial viewing of the

pie chart seems to indicate even distribution among the five candy colors. However, the Pareto chart

shows indicates some small differences. Notably, there seems to be significantly fewer yellow candies

than any other color.

I dont know if this is a result of the random nature of our sample, or if this is intentional on the part of

the candy manufacturer. Some possible rationale for including less of one color of candy could be the

respective costs of the candy dyes or a desire for a certain aesthetic effect of the mixed candies.

Sample

Lonnie #

Lonnie %

Total #

Total %

Red

Green

Purple Orange Yellow

TOTAL

12

13

8

13

14

60

0.200

0.217

0.133

0.217

0.233

1.000

366

0.212

355

0.206

346

0.201

343

0.199

314

0.182

1724

1.000

My own bag of Skittles had more yellow candies than any other color. Purple was least represented, and

this results in the largest proportional disparity between my bag and the total sample. The overall

Lonnie Horlacher

MATH 1040, Ping Yu

sample contained a .201 proportion of purple Skittles. My bag had just a .133 proportion. The difference

is .07.

III.

The mean number of candies per bag in this sample of 29 bags is 59.4. The standard deviation is 2.38.

53

58

60 61

64

The number of candies per bag appears to be slightly skewed to the left, however, the media is 60, the

mode is 60, and the mean is 59.4. The similarity among these measurements indicates a relatively

normal distribution. This is additionally supported by the fact that the minimum value in the

distribution, 53, can be considered an outlier as it more than 1.5x the Inter-Quartile Range below Q1. If

this outlier were eliminated, the mean would move even closer to the median and mode.

Given the candy manufacturers interest in providing a consistent experience with each candy purchase,

it is unsurprising that the number of candies per bag varies only narrowly and generates a normal

distribution.

My bag of candies contained exactly 60 candies equal to the median and mode of the sample.

IV.

Reflection

Categorical data, like the colors of Skittles, is data that can be sorted into categories, but for which no

difference can be calculated between categories. Red less blue doesnt mean anything. On the other

hand, differences between the classes in quantitative data, like the number of candies per bag of

Skittles, can be calculated. 60 less 59 yields a meaningful result.

Categorical data is well-represented by pie charts, histograms or Pareto charts. Quantitative data can be

represented by histograms, and the distribution of quantitative data is well illustrated by box plots. In

addition, quantitative data allows for some additional calculations beyond what is available for

categorical data, including mean, median, mode and standard deviation.

Lonnie Horlacher

MATH 1040, Ping Yu

V.

Confidence intervals allow statisticians to provide both a best point estimate for a population parameter

and a measurement of how accurate that estimate might be. Holding alpha constant, a wide confidence

interval indicates that the best point estimate may be significantly different than the true population

parameter due to high variability in the data or a small sample size.

Confidence intervals also allow to the statistician to provide reasonably reliable estimates of what a

parameter is not. The upper and lower boundaries allow a decision maker to consider the impact of a

population parameter that is at most a or at least b. If both boundaries are acceptable, decisions

can be made. If one or both is potentially problematic, the study can be revisited hopefully with an

increased sample size.

99% Confidence Interval for the True Proportion of Yellow Candies

.1582 < p < .206 (see work attached)

95% Confidence Interval for the True Mean of Candies per Bag

58.54 < <60.36 (see work attached)

98% Confidence Interval for the Standard Deviation of Candies per Bag

1.81 < < 3.42 (see work attached)

The results of the test are mostly unsurprising. I had hoped to see the upper limit on the confidence

interval for the proportion of yellow candies to less than .20. This would have indicated some decisionmaking on the part of Skittles to favor some colors above others. However, while .20 is just barely

included, it is included, and there is a chance that the true population proportion is actually 20%.

The CI tests for the mean are standard deviation are not very enlightening. It is worth noting that the CI

for the number of candies per bag is very narrow. This is the parameter that Skittles is likely most

interested in controlling. An change in number of candies per bag could negatively increase profits on

one hand or negatively increase consumer experience on the other hand.

VI.

Hypothesis Tests

A hypothesis test uses data obtained from a sample to evaluate the likelihood of a specified population

parameter being true. While it is very, very hard to measure an entire population, a hypothesis test can

tell a statistician whether a hypothesis is reasonable and supported by the results of a sample or

whether his hypothesis is unlikely.

Hypothesis Test for Number of Read Candies. Claim: P = .20

Test Statistic of |1.25| < |1.96| Critical Value; Fail to reject; There is evidence to support the

claim that 20% of Skittles candies are red.

Hypothesis Test for Mean number of Candies per Bag

[p < 0.01] < [ = .01]; Reject; There is sufficient evidence to warrant rejection of the claim that

mean number of candies in a bag of Skittles is 55.

Lonnie Horlacher

MATH 1040, Ping Yu

Since 21.2% of the candies in the sample were red, I am not surprised that our test supports the

claim that 20% of the population candies are red. Given the narrow confidence interval

calculated for the mean number of candies per bag in the last section, I would have been very

surprised to find that 55 was a plausible population a plausible population mean.

VII.

Reflection

The primary concerns when attempting to estimate any population parameter from a sample are the

size of the sample and the collection methodology of the sample.

Due to the simplicity of collecting the data, the collection methodology is unlikely to have affected the

outcome of the test, but it could have been improved. Most glaringly, our data is the result of 29

different people counting their candies at different times in different places with no training and no

opportunity for double check. It is a simple thing to count candies from a bag. However, this collection

method was needlessly complicated (when solely considering the goal of estimating population

parameters). A single person or team working together in a single sitting would have eliminated several

opportunities for error.

Given the ease of collecting this data (buy some Skittles, count them), it would be hard to justify using

such a small sample size when determining population parameters. Why not buy 100 bags? At the least,

we should have used at least 30 samples in order to calculate the mean.

Lonnie Horlacher

MATH 1040, Ping Yu

VIII.

Final Reflection

There is something empowering about using classroom learning to solve a problem for which

there is no answer printed in the back of the book. The dullness of homework, tests, etc., is largely due

to the fact that there is nothing new to discover. The best answers are already printed and triple

checked and authorized. All any student can do is try to not fail. But, using bookish formulas to discover

anything about something 3-D and tangible and unique is entirely different regardless of the fact other

students and my professor are using the same data to draw the same conclusions. I am part of the first

group ever to measure these 29 bags of Skittles and then use them to estimate population parameters. I

found myself actually curious: How many candies are in a bag of Skittles? Are there exactly as many red

Skittles as yellow? And now, while I dont know anything, I am in the 99 percentile of people who

know about Skittles.

In my previous job as an advertising media buyer, I looked at data a lot. One of the main

functions of my position was to evaluate options based on information obtained from surveys. Have no

fear; Nielsen did all of the hard work before sending the survey over. All I had to do was double check

that my sample size never got too small as I sorted and pivoted the data. However, campaign

performance data often came directly to me. E.g., this website served x impressions which resulted in y

clicks compared to another website which served a impressions and resulted in b clicks. And there were

6 different ads running in different sizes and positions. No one in my department had the background to

statistically demonstrate that one website or ad size or call-to-action actually produced better results.

The most any of us could do was look at the data and say, Well, that one is higher, so I guess. Now I

can use a hypothesis test to answer the question, Did any of these variables make a difference?

Granted, there is a lot more to learn on those seemingly simple questions, but it is nice to have a place

to start.

- Basic Reliability Formulation Handbook2Uploaded bymaykonjhione
- MIL-STD-781C and confidence intervals on MTBF.pdfUploaded byrab__bit
- Introduction to Business Research 3Uploaded byJeremiahOmwoyo
- Data Quality Assessment a Reviewer’s GuideUploaded byEmdad Yusuf
- Econometrics HomeworkUploaded bykmbibireddy
- inference.pptUploaded byAnonymous ffje1rpa
- Design of Packed Bed Reactor Catalyst Based on Shape,SizeUploaded byJenz Lee
- LexiconUploaded bySiddharth Sikaria
- ExampleUploaded byadilsajjad2005
- a069966.pdfUploaded bymangyan
- 0-Math Nspired StatisticsUploaded byJennifer Ward
- Biostat to AnswerUploaded bySamantha Arda
- Confidence Intervals.pdfUploaded bywolfretonmaths
- term project reportUploaded byapi-238585685
- Basics of Statistical MethodsUploaded bypragati goel
- term projectUploaded byapi-276172720
- apr14Uploaded byMario Moreno
- Confidence IntervalsUploaded byyashar2500
- MINITAB 17 BASICS REFERENCE GUIDE.pdfUploaded bybarkah
- shaylorskittlesproject1Uploaded byapi-354558235
- Inferential Statistics-add notes.pptUploaded byKelly Liew
- 04a Solutions Mock 2012Uploaded byS.L.L.C
- Case StudyUploaded byAastha Vyas
- youngsyllabusstatisticsUploaded byapi-251186161
- NEED TO PUMASAUploaded byRain Malabanan
- 64 Equity Case StudyUploaded byTony Zheng
- Ie Slide04Uploaded byLeo Kaligis
- Mgt605 Lecture 12Uploaded bySheraz Ahmed
- Tutorial Test 2 w 15 InstructionsUploaded byWaley Chen
- Methods in Ecology and Evolution (1)Uploaded byMaoH73

- EtherWAN EX1616W Data SheetUploaded byJMAC Supply
- Convergence in Euro-zone Retail Banking- What Interest Rate Pass-through Tells Us About Monetary Policy Transmission, Competition and IntegrationUploaded byNiall Devitt
- GA35GPS WAAS-Antenna AntennaInstallationInstructionsUploaded byenrique
- steel.pptxUploaded byGAURAV SINGH
- Reports ServicesUploaded byhorgaran
- Smith Chart TutorialUploaded bySandun Rathnayake
- Pdfs PathUploaded byAnonymous a7PwoaWN
- Characterization of a Second Methylene TetrahydromethanopterinUploaded bytu_turru
- EZA FYP REPORT (2).pdfUploaded byNurul Ezzah Rosdin
- 806791 - Msi h61m-e23 (b3) User ManualUploaded byNTOService
- Aerospace Toolbox User Guide MatlabUploaded byramksree
- MIT15_053S13_lec19.pdfUploaded byShashank Singla
- CouplesUploaded byderpiboy
- NEST2017_AUploaded byGaurav
- Solving a Multi Objective Fuzzy Variable Linear Programming Problem Using Ranking FunctionsUploaded byIjsrnet Editorial
- Statistical Process Control for Level 4Uploaded byDavid Carter
- NI PXI-4071 Calibration ProcedureUploaded bycisco211
- Wdi Serology Jhlwr Central LabUploaded bydileep2000
- Bureau of Energy EfficiencyUploaded byThirumal
- Alcon Phaco Machine Series 20000 Legacy - Service ManualUploaded bykoner03
- Dell Laptop Battery, Dell Inspiron Battery, Dell Latitude Battery http://www.batteries-mall.com/battery-for-dell-laptop.htmUploaded bysunlin
- Supporting Students in Mathematics Through the Use of ManipulativesUploaded byPusti Alaufa
- Java LabUploaded byJothimani Murugesan K
- SQLite Transactions With Visual BasicUploaded byKamadhatu Anahata
- Electrical Engineering PortalUploaded byerson1981
- lecture_7 MITUploaded bysamotc
- M.C.a.(Sem - I) Discrete MathematicsUploaded bySomya Sachdeva
- Math Symbols Are Shorthand Marks That Represent Mathematical ConceptsUploaded byDave
- MEASUREMENT of DIMENSIONAL Cacat Kelengkungan 17 Mei 2016 UnlockedUploaded byMuhammad Yunus
- State Reduction TechUploaded bySubhasis Banerjee