© All Rights Reserved

32 views

term project report

© All Rights Reserved

- Assignment 2 Question and Instruction
- Statistics Success in 20 Minutes a Day
- Six Sigma Tools
- MGS3100 Project1 Simulation Directions
- 6. Significance tests.ppt
- PAS Project
- Assignment of SPSS Data Analysis
- 0-Math Nspired Statistics
- Strat Dev Process
- Sample Size Estimation in Prevalence Studies
- Luis Bettencourt - Origin of Scaling in Cities
- Week 11 Lecture 20
- Syllabus
- 40263381
- Problem Set 2 2017-2018
- skittles report - lonnie horlacher pdf
- Basics of Statistical Methods
- math 1040 skittles term project
- 153627-ID-pengaruh-kompetensi-pengembangan-karir-p.pdf
- math 1040 term project - skittles 2

You are on page 1of 9

November 25, 2014

Term Project

Stat The Rainbow

Introduction

I started this project by purchasing a 2.17-ounce bag of Original Skittles. I counted and

recorded the number of candies of each color: red (13), orange (8), yellow (20), green (9), and

purple (11). The total number of candies in my bag of Skittles was 61. This information was

submitted to my instructor. All students in my class were given the same assignment. Our

instructor took the results of 38 students and reported the results. Out of 2435 candies (in 38

bags), 500 were red, 446 were orange, 474 were yellow, 503 were green, and 512 were purple.

Using this data, I developed Pie and Pareto Charts showing the number of candies by color (as

shown below and on the following page).

Organizing and Displaying Categorical Data: Colors

Number of Skittles, By Color, In 38 Bags

Orange, 446, 18.32%

Purple, 512, 21.03%

Red,

500, 20.53%

500

400

300

200

100

0

Purple

Green

Red

Yellow

Orange

According to these charts, the difference in the number of each color of candy

does not appear particularly significant.

The following table demonstrates the data from my own sample bag in comparison to the

data collected from the class as a whole:

Comparison of Individual Data to Class Data

My Sample

Class Sample

My Proportion

Class Proportion

Red

13

500

.213

.205

Orange

8

446

.131

.183

Yellow

20

474

.328

.195

Green

9

503

.148

.207

Purple

11

512

.18

.210

Total

61

2435

1

1

I was surprised that my own findings did not necessarily agree with those of the class.

Because my own bag had nearly twice as many yellow candies as any other one color, I assumed

that most bags would contain a greater number of yellow candies. Yet, according to the class

data, yellow candies were outnumbered by every other color except orange. Doing this project

has helped me to better understand the importance of using a large data sample in order to make

more correct assumptions about an entire population.

Organizing and Displaying Quantitative Data: the Number of Candies per Bag

Another set of information that the class data supplied was the number of candies in each

bag. There were 61 candies in my bag. As stated earlier, the total number of candies in all 38

bags was 2,435. The mean number of candies in each bag was 64.1. The standard deviation of

the number of candies per bag was 13.2 (13.20); the 5-number summary was: 45, 59, 61, 62, 114.

Since my bag had 61 candies, it was exactly the same as the median number in our class, yet it

was not the same as the mean. Below, you will see a histogram and a box plot that I developed

with this data.

These charts (above, on previous page) indicate a right-skewed distribution of data, with

a somewhat bell-shape. I didnt expect to see such a gap between the third quartile and the right

whisker. When this data is drawn up in a modified box plot (as shown below), a number of

outliers are revealed. I believe this suggests the possibility that a few (Im guessing 3) students

gathered their data from Skittles packages that were larger than the designated 2.17-ounce size. If

that was the case, then their data literally skewed the results, as the box plot below reflects a

slightly left-, rather than extremely right-skewed distribution, and it would have been an example

of a non-random sampling error, since the data wasnt collected from similar samples (sample

bags of the same package size).

It is important to differentiate quantitative data versus categorical data. When working

with categorical information, it wouldnt make sense to compute an average or a mean of the

numbers on the jerseys of a football team; the answers would be meaningless. Jersey numbers

are used to identify specific players; not to count or measure them. So it stands to reason, that

different types of data require different charts to reflect them. When comparing the number of

different colors of candies within a sample, I used a Pie Chart and a Pareto Chart, because these

charts work best to display categorical data, such as color. On the other hand, when

demonstrating quantitative data, histograms and box plots are more appropriate. Histograms

work well for quantitative data, because they have class boundaries that range from a low limit to

a high one, and can include a full range of integers. The color of a Skittles candy doesnt fall

within a range; either it is one color, or it is another. Since Pareto Charts have gaps between bars,

and Histograms do not, it wouldnt make sense to use a Histogram to display categorical

information. Although it may sound a bit confusing and complicated at first glance, common

sense guides statisticians to recognize the appropriate use of each category of data.

A confidence interval gives you a low number and a high number between which a

specific value is expected to fall. For example, when a significance level of .05 is used, the

confidence interval should cover a base of 95% of the possibilities. Below, you will find some

confidence intervals based on our previous candy data. The work for this information is on the

following page.

Specific Value

Significance Level

Confidence Interval

99%

95%

98%

Based on these confidence interval estimates, I can make the following statements:

I have 99% confidence that a random bag of Skittles will have between 17.4 and 21.5%

yellow candies.

I have 95% confidence that a random bag of Skittles will have a mean of between 59 and

69 candies.

I have 98% confidence that the number of Skittles in a random bag will have a standard

deviation of 13 candies.

Hypothesis Tests

When a claim is made about the characteristics of all members of a general population,

a hypothesis test can be made on a simple random sample to find the likelihood that any

randomly chosen individual/item would fall into the parameters of the claim. With the data from

such a test, a determination can be made, with a specified degree of confidence, whether there is

sufficient evidence to support or reject the original claim.

For instance, for the claim that 20% of all Skittles candies are red, I can run a hypothesis

test at a 0.05 significance level. Since the z-score for this test (0.65) is less than the critical value

(1.96), there isnt sufficient evidence to reject the claim that 20% of all Skittles candies are red.

Another example would be to test the accuracy of the claim that the mean number of

candies in a bag of Skittles is 35, using a 0.01 significance level. Since the t-stat for this test

(4.250) is greater than the critical value (2.715), there is sufficient evidence to reject the claim

that the mean number of candies in a bag of Skittles is 35.

The work for both of these hypothetical tests can be found on the following page.

The purpose of using confidence interval estimates is to be able to make assumptions

about the whole population based on the data from a sample. I cant possibly count how many

candies are in every Skittles package to find out the true proportion of yellow candies. But with a

sample size of 38 bags, I can get relatively close. I would be able to get even closer to the true

proportion if I used a larger sample size, like 50 or even 100 bags.

The purpose of hypothesis testing is to check the accuracy of a claim concerning an entire

population, by testing data obtained from a sample. The two claims on the previous page were

good examples of this. Still, there is the possibility of error. Earlier, I stated my suspicion that

three students gathered data from bags that were larger than 2.17 ounces. If that was the case, our

confidence intervals and hypothesis testing could be a bit off. 3 out of 38 may not seem like a lot,

but it is 7.9%, which exceeds the 5% rule. So, to be have more accurate summations, I would

need to have data from a sample where all of the information was gathered from bags that were

the same size.

- Assignment 2 Question and InstructionUploaded byShoaib Ahmed
- Statistics Success in 20 Minutes a DayUploaded byVishal Joshi
- Six Sigma ToolsUploaded byDave Hanley
- MGS3100 Project1 Simulation DirectionsUploaded bymaherkamel
- 6. Significance tests.pptUploaded byTruong Giang Vo
- PAS ProjectUploaded byAkash Verma
- Assignment of SPSS Data AnalysisUploaded byangel
- 0-Math Nspired StatisticsUploaded byJennifer Ward
- Strat Dev ProcessUploaded byAlex Look
- Sample Size Estimation in Prevalence StudiesUploaded byRajiv Kabad
- Luis Bettencourt - Origin of Scaling in CitiesUploaded bymatt_willian
- Week 11 Lecture 20Uploaded byMuhammad Faisal
- SyllabusUploaded byChris Bochen
- 40263381Uploaded byIonela Broasca
- Problem Set 2 2017-2018Uploaded byJanae Carter
- skittles report - lonnie horlacher pdfUploaded byapi-326004315
- Basics of Statistical MethodsUploaded bypragati goel
- math 1040 skittles term projectUploaded byapi-242666981
- 153627-ID-pengaruh-kompetensi-pengembangan-karir-p.pdfUploaded bymusiyamah
- math 1040 term project - skittles 2Uploaded byapi-313998583
- Statistics is the Science of the CollectionUploaded bySamuel Castor
- Superiority, Equivalence, And Non-Inferiority TrialsUploaded bycrystalmodel
- Ejercicio Resuelto Teoría de la Probabilidad Ingles TraducidoUploaded bySebastian Muñoz
- IKM_KesgaUploaded byAyu Fitrya Marini
- Review Inferential Statistics 2Uploaded byvanny septia efendi
- statistics group projectUploaded byapi-384638689
- BasicStatistical .pdfUploaded byWacks Guadalupe
- How to Write an E-mailUploaded byNiin Nini
- Solutions w 07Uploaded byJamie Samuel
- Pubmed Result (21)Uploaded byKatarina Vuković

- u4 essayUploaded byapi-238585685
- functional resumeUploaded byapi-238585685
- observation 4Uploaded byapi-238585685
- radio resumeUploaded byapi-238585685
- reflectionUploaded byapi-238585685
- personal reflectionUploaded byapi-238585685
- cue cardsUploaded byapi-238585685
- company p 3Uploaded byapi-238585685
- company p 2Uploaded byapi-238585685
- company p 1Uploaded byapi-238585685
- superbowl storyUploaded byapi-238585685
- queen storyUploaded byapi-238585685
- girls boys state storyUploaded byapi-238585685
- spirit week storyUploaded byapi-238585685
- current event aids storyUploaded byapi-238585685
- candlelight storyUploaded byapi-238585685
- present conclusionUploaded byapi-238585685
- what you need to know before you goUploaded byapi-238585685
- presentation preparedness info 1Uploaded byapi-238585685
- service learning wrapup presentUploaded byapi-238585685
- review a lie of the mindUploaded byapi-238585685
- fish reminder emailUploaded byapi-238585685
- written report in wordUploaded byapi-238585685
- compliment letter faux responseUploaded byapi-238585685
- free the peopleUploaded byapi-238585685
- serving learning paperUploaded byapi-238585685
- april in octoberUploaded byapi-238585685
- eportfolio presentationUploaded byapi-238585685
- chap book entry made newUploaded byapi-238585685
- reflectionUploaded byapi-238585685

- City of Maricopa Strategic PlanUploaded byJennifer Ray Grentz
- Romance, Marriages and RelationshipsUploaded byAlok Jagawat
- Daftar KepustakaanUploaded byTatang Taufik
- 04 Grinding GD Hammer mill V002.docUploaded byEwin Septian Guntur
- IASbaba Current Affairs Magazine-June 2017.pdfUploaded byharshasg92
- Roman NumberUploaded byNur Ainna Shafiqah Sopi
- Chapter8 Connecting to Network-Defined Users and GroupsUploaded byBrent Michel Farmer
- English VocabularyUploaded byDaniel
- astronomy research paperUploaded byapi-308411782
- group mechatronics.docxUploaded bySiddhi Bamb
- Gram StainUploaded byprannoy
- Bio Remediation PAHs DesignUploaded byapi-3721576
- 10.1.1.72.4856Uploaded byamarhan
- EoDUploaded bybabissoul
- As 3846-2005 the Handling and Transport of Dangerous Cargoes in Port AreasUploaded bySAI Global - APAC
- slo artifacts and evidenceUploaded byapi-253644880
- chemistry Chapter 1 & 2Uploaded byMark Anthony Nacu Lising
- SICAM_PAS_SW_4_1_2_en_08Uploaded bypero1971
- Hoyer Tb Ch01-CbUploaded byFatmah
- Conceptual Questions of Full BookUploaded byMaryam Nisar
- The Railway ChildrenUploaded byYeap Yee Lin
- digital citizenship lesson plan autosavedUploaded byapi-346504750
- ME341_HW1Uploaded bynimishk92
- Active and PassiveUploaded byOlisDani
- Vietnam Political AnalysisUploaded byMina Yulz
- Risk ManagementUploaded byShaik Abdul Rafi
- Kepserverex ManualUploaded bydanioro23
- Oracle 11 New FeaturesUploaded byVish Yrdy
- Chris Fiala Has Joined Omega Products International to Grow Siena Tile & Stone ProductsUploaded byPR.com
- Use of Computational Fluid Dynamics for Calculating Flow Rates Through Perforated Tiles in Raised-Floor Data CentersUploaded byalkmind