32 views

Original Title: term project report

Uploaded by api-238585685

- P2900215001-02
- MATH 221 Statistics for Decision Making
- UNSW TELE 2859 (Fall 2011)
- Confidence Intervals
- SIRs Explanation for CDC guidance documents.pdf
- Tugas 3 Multivariate and Econometrics
- Statistics MCQ.doc
- Weiss Et Al 2017 Animals
- MCSL044Section1CRC
- The Application of Statistics to Policy Analysis and Management Book Review
- skittles report - lonnie horlacher pdf
- Basics of Statistical Methods
- term project
- stats 2nd sem2.docx
- Bishop Formby Smith
- qfr_mg
- Inference About Variables
- The Hallmarks of Scientific Research
- Human Errors Reduction
- ASSIGNMENT 1.docx

You are on page 1of 9

November 25, 2014

Term Project

Stat The Rainbow

Introduction

I started this project by purchasing a 2.17-ounce bag of Original Skittles. I counted and

recorded the number of candies of each color: red (13), orange (8), yellow (20), green (9), and

purple (11). The total number of candies in my bag of Skittles was 61. This information was

submitted to my instructor. All students in my class were given the same assignment. Our

instructor took the results of 38 students and reported the results. Out of 2435 candies (in 38

bags), 500 were red, 446 were orange, 474 were yellow, 503 were green, and 512 were purple.

Using this data, I developed Pie and Pareto Charts showing the number of candies by color (as

shown below and on the following page).

Organizing and Displaying Categorical Data: Colors

Number of Skittles, By Color, In 38 Bags

Orange, 446, 18.32%

Purple, 512, 21.03%

Red,

500, 20.53%

500

400

300

200

100

0

Purple

Green

Red

Yellow

Orange

According to these charts, the difference in the number of each color of candy

does not appear particularly significant.

The following table demonstrates the data from my own sample bag in comparison to the

data collected from the class as a whole:

Comparison of Individual Data to Class Data

My Sample

Class Sample

My Proportion

Class Proportion

Red

13

500

.213

.205

Orange

8

446

.131

.183

Yellow

20

474

.328

.195

Green

9

503

.148

.207

Purple

11

512

.18

.210

Total

61

2435

1

1

I was surprised that my own findings did not necessarily agree with those of the class.

Because my own bag had nearly twice as many yellow candies as any other one color, I assumed

that most bags would contain a greater number of yellow candies. Yet, according to the class

data, yellow candies were outnumbered by every other color except orange. Doing this project

has helped me to better understand the importance of using a large data sample in order to make

more correct assumptions about an entire population.

Organizing and Displaying Quantitative Data: the Number of Candies per Bag

Another set of information that the class data supplied was the number of candies in each

bag. There were 61 candies in my bag. As stated earlier, the total number of candies in all 38

bags was 2,435. The mean number of candies in each bag was 64.1. The standard deviation of

the number of candies per bag was 13.2 (13.20); the 5-number summary was: 45, 59, 61, 62, 114.

Since my bag had 61 candies, it was exactly the same as the median number in our class, yet it

was not the same as the mean. Below, you will see a histogram and a box plot that I developed

with this data.

These charts (above, on previous page) indicate a right-skewed distribution of data, with

a somewhat bell-shape. I didnt expect to see such a gap between the third quartile and the right

whisker. When this data is drawn up in a modified box plot (as shown below), a number of

outliers are revealed. I believe this suggests the possibility that a few (Im guessing 3) students

gathered their data from Skittles packages that were larger than the designated 2.17-ounce size. If

that was the case, then their data literally skewed the results, as the box plot below reflects a

slightly left-, rather than extremely right-skewed distribution, and it would have been an example

of a non-random sampling error, since the data wasnt collected from similar samples (sample

bags of the same package size).

It is important to differentiate quantitative data versus categorical data. When working

with categorical information, it wouldnt make sense to compute an average or a mean of the

numbers on the jerseys of a football team; the answers would be meaningless. Jersey numbers

are used to identify specific players; not to count or measure them. So it stands to reason, that

different types of data require different charts to reflect them. When comparing the number of

different colors of candies within a sample, I used a Pie Chart and a Pareto Chart, because these

charts work best to display categorical data, such as color. On the other hand, when

demonstrating quantitative data, histograms and box plots are more appropriate. Histograms

work well for quantitative data, because they have class boundaries that range from a low limit to

a high one, and can include a full range of integers. The color of a Skittles candy doesnt fall

within a range; either it is one color, or it is another. Since Pareto Charts have gaps between bars,

and Histograms do not, it wouldnt make sense to use a Histogram to display categorical

information. Although it may sound a bit confusing and complicated at first glance, common

sense guides statisticians to recognize the appropriate use of each category of data.

A confidence interval gives you a low number and a high number between which a

specific value is expected to fall. For example, when a significance level of .05 is used, the

confidence interval should cover a base of 95% of the possibilities. Below, you will find some

confidence intervals based on our previous candy data. The work for this information is on the

following page.

Specific Value

Significance Level

Confidence Interval

99%

95%

98%

Based on these confidence interval estimates, I can make the following statements:

I have 99% confidence that a random bag of Skittles will have between 17.4 and 21.5%

yellow candies.

I have 95% confidence that a random bag of Skittles will have a mean of between 59 and

69 candies.

I have 98% confidence that the number of Skittles in a random bag will have a standard

deviation of 13 candies.

Hypothesis Tests

When a claim is made about the characteristics of all members of a general population,

a hypothesis test can be made on a simple random sample to find the likelihood that any

randomly chosen individual/item would fall into the parameters of the claim. With the data from

such a test, a determination can be made, with a specified degree of confidence, whether there is

sufficient evidence to support or reject the original claim.

For instance, for the claim that 20% of all Skittles candies are red, I can run a hypothesis

test at a 0.05 significance level. Since the z-score for this test (0.65) is less than the critical value

(1.96), there isnt sufficient evidence to reject the claim that 20% of all Skittles candies are red.

Another example would be to test the accuracy of the claim that the mean number of

candies in a bag of Skittles is 35, using a 0.01 significance level. Since the t-stat for this test

(4.250) is greater than the critical value (2.715), there is sufficient evidence to reject the claim

that the mean number of candies in a bag of Skittles is 35.

The work for both of these hypothetical tests can be found on the following page.

The purpose of using confidence interval estimates is to be able to make assumptions

about the whole population based on the data from a sample. I cant possibly count how many

candies are in every Skittles package to find out the true proportion of yellow candies. But with a

sample size of 38 bags, I can get relatively close. I would be able to get even closer to the true

proportion if I used a larger sample size, like 50 or even 100 bags.

The purpose of hypothesis testing is to check the accuracy of a claim concerning an entire

population, by testing data obtained from a sample. The two claims on the previous page were

good examples of this. Still, there is the possibility of error. Earlier, I stated my suspicion that

three students gathered data from bags that were larger than 2.17 ounces. If that was the case, our

confidence intervals and hypothesis testing could be a bit off. 3 out of 38 may not seem like a lot,

but it is 7.9%, which exceeds the 5% rule. So, to be have more accurate summations, I would

need to have data from a sample where all of the information was gathered from bags that were

the same size.

- P2900215001-02Uploaded byiqbal
- MATH 221 Statistics for Decision MakingUploaded byAlan Mark
- UNSW TELE 2859 (Fall 2011)Uploaded bySangminPark
- Confidence IntervalsUploaded byzuber2111
- SIRs Explanation for CDC guidance documents.pdfUploaded byMadmanMSU
- Tugas 3 Multivariate and EconometricsUploaded byLastri Junedah
- Statistics MCQ.docUploaded byboogeyman12
- Weiss Et Al 2017 AnimalsUploaded byHeather Clemenceau
- MCSL044Section1CRCUploaded byVibhav Mathur
- The Application of Statistics to Policy Analysis and Management Book ReviewUploaded byduminda_ka
- skittles report - lonnie horlacher pdfUploaded byapi-326004315
- Basics of Statistical MethodsUploaded bypragati goel
- term projectUploaded byapi-276172720
- stats 2nd sem2.docxUploaded byDavid John
- Bishop Formby SmithUploaded bySuman Rakshit
- qfr_mgUploaded bykishorvedpathak
- Inference About VariablesUploaded by1ab4c
- The Hallmarks of Scientific ResearchUploaded byAli Aslam
- Human Errors ReductionUploaded byNeenaRawat
- ASSIGNMENT 1.docxUploaded bymega rahayu
- Limfosit Ok BenarUploaded bytiya syahrani
- D 2915 – 03 ;RDI5MTU_Uploaded byCarlos L. Oyuela
- CIUploaded bymuralidharan
- Fifiantias amalia.pdfUploaded bySodia Indriyani Utina
- Lecture.2.Probability.allUploaded byGeleni Shalaine Bello
- SamplingUploaded byHusnul Hotimah
- Assignment GsUploaded byJohnjohn Mateo
- SIP ppt (2)Uploaded byAbhilasha Solanki
- CH16Uploaded bybihosambi
- Lecture 8Uploaded byNamra Khalid

- u4 essayUploaded byapi-238585685
- functional resumeUploaded byapi-238585685
- observation 4Uploaded byapi-238585685
- radio resumeUploaded byapi-238585685
- personal reflectionUploaded byapi-238585685
- reflectionUploaded byapi-238585685
- cue cardsUploaded byapi-238585685
- company p 3Uploaded byapi-238585685
- company p 2Uploaded byapi-238585685
- company p 1Uploaded byapi-238585685
- superbowl storyUploaded byapi-238585685
- queen storyUploaded byapi-238585685
- girls boys state storyUploaded byapi-238585685
- spirit week storyUploaded byapi-238585685
- current event aids storyUploaded byapi-238585685
- candlelight storyUploaded byapi-238585685
- present conclusionUploaded byapi-238585685
- what you need to know before you goUploaded byapi-238585685
- presentation preparedness info 1Uploaded byapi-238585685
- service learning wrapup presentUploaded byapi-238585685
- review a lie of the mindUploaded byapi-238585685
- fish reminder emailUploaded byapi-238585685
- written report in wordUploaded byapi-238585685
- compliment letter faux responseUploaded byapi-238585685
- free the peopleUploaded byapi-238585685
- serving learning paperUploaded byapi-238585685
- april in octoberUploaded byapi-238585685
- eportfolio presentationUploaded byapi-238585685
- chap book entry made newUploaded byapi-238585685
- reflectionUploaded byapi-238585685

- Handbook_Public_Policy_Peters2Uploaded byBenjamín Sandoval
- 77362 PMD2M-4Uploaded byMythili Karthikeyan
- 19.1 Villanueva vs CA DigestUploaded byEstel Tabumfama
- Final English Version Chc Pspan Newsletter 2nd EditionUploaded byLeslie-Ann Boisselle
- Vapour Compression UnitUploaded byifoo82
- SEM 4 QBUploaded byArvind Thankappan
- Care_Plan_TemplateUploaded byjlauchman
- Flownex ProfileUploaded bycrazy_nm_add
- Keppel Seghers Waste-to-Energy.pdfUploaded byCarlos Gomez
- PCS J1939 Messages v2_1Uploaded bytransalp323
- Tomico v CA Gr 122539Uploaded byLalaLaniba
- Fortigate Getting Started 52Uploaded byMauricio Flores
- Activating HSDPAUploaded bydjemai chergui
- Fired Boiler inspection.docUploaded byOkky Agassy Firmansyah
- G.O.MS.No.06-2010-FinUploaded byJyothi John
- earth leakage protection selection.pdfUploaded byanand
- South Burlingame Neighbors Victorious vs Macadam Ridge Southwest Neighborhood NewsUploaded byLivablePDX
- Sam4s_M_Series_Setup_Manual.pdfUploaded byAli Joseph
- 2012_851 ESMA MiFID Supervisory Briefing Appropriateness and Execution-OnlyUploaded byalefbiondo
- Who Dat v. Who Dat - Trademark ComplaintUploaded byMark Jaffe
- Performance Planning PresentationUploaded byfirinaluvina
- master v glinsky complaint 1Uploaded byapi-296035455
- 15-15 Nature of ProceedingsUploaded byLe Belle Soriano
- Food Market OperationsUploaded byJosephat Mutama
- CorrigoNet Reports Summary.pdfUploaded bycorrigomarketing
- Honda Case StudyUploaded bySarjodh Singh
- McDonalds June Inc Algonac - PrintInspectionUploaded byLiz Shepard
- thesis_rampurawala.pdfUploaded byRusu Bogdan
- Supply Chain ManagementUploaded byMartin SoutihonHalomoan Sibarani
- Change Management StrategyUploaded byngo_quan_7