217 views

Uploaded by api-246855746

- UT Dallas Syllabus for opre6301.503.08s taught by Avanti Sethi (asethi)
- Biostatistics (Dr Shilpi Gilra)
- Course Outline
- Week5 Data Ex
- Statistics Notes
- UT Dallas Syllabus for opre6301.502.08s taught by Avanti Sethi (asethi)
- calculator use.pdf
- Statistics Notes
- Indian Railways
- Analysis of Variance.docx
- ch09 (1)
- Effect of POGIL on Academic Performance de Gale and Boisselle
- 6911-20072-1-PB (1).pdf
- t-test
- chap4
- Stat130 assignment 4 on Unit_5.docx
- RP_18Feb2015
- Final Exam Review[1]
- Hypothesis Testing I (Summer 2014)
- Hypothesis Testing

You are on page 1of 16

Skittles Project

Part 2: Sampling

The statcrunch row numbers associated with the three bags of skittles that our group selected

were 22, 31 and 8. To select our bags, we used a random number generator program on a TI-83

calculator. We used the first two digits behind the decimal place to determine which bag we would

choose. When we got a number that was higher than the amount of rows, we disregarded it and kept

choosing until we had three numbers that corresponded to rows from the sample. The name of the

sampling method used to obtain our groups sample was Cluster sampling because each bag of skittles

was randomly chosen and every skittle from the chosen bags were included in the sample. The sample

totals for our three bags were: 43 red, 38 orange, 33 yellow, 34 green, 30 purple and 178 total candies.

The possible errors that could occur would be non-sampling errors, like if someone miscounted the

number of skittles in their bags. We could minimize the chance of this type of error from occurring by

having each person count their skittles twice. Also, generally speaking, a larger sample size is more

representative of a population. If we were to increase the sample size from 3 bags to 5, it would be a

better representation of the entire population of skittles.

Candy color is categorical data because it is data that consists of names or labels only and they

are not numbers representing counts or measurements. The proportions from my bag are lopsided

and further away from the class values, as some colors account for a much higher percentage of skittles

than others. The proportions for my bag are as follows: .1579 Red, .1754 Orange, .2632 Yellow, .1404

Green and .2632 purple. When we compare our group sample of 3 bags, we can see that the numbers

are slightly more evenly distributed. On the same token, the sample from the three bags has

proportions of colors that are closer to the total population of skittles. Proportions from the 3 bag

sample are .2416 Red, .2135 Orange, .1854 Yellow, .1910 Green and .1685 Purple. This sample is still

slightly lopsided, but less so than my single bag. When looking at the total population of skittles, we can

see that the proportions are more evenly distributed with smaller variations from color to color. The

population proportions are .1843 Red, .1934 Orange, .2073 Yellow, .2051 Green and .2100 Purple.

Considering the data, we can conclude that as the sample size increases it becomes a better

representation of the overall population. The 3 bag sample proportions are closer to that of the

population than my single bag. If we had a sample of 5 bags of skittles it would be an even better

representation of the population.

Number of

Red candies

9

Number of

Orange

candies

10

Number of

Yellow

candies

15

Number of

Green

candies

8

Number of

Purple

candies

15

Total

number of

candies

57

Bag 1: Row

#22

Bag 2: Row

#31

Bag 3: Row

#8

Sample

Totals

14

16

11

12

59

17

16

14

59

12

23

15

60

43

38

33

34

30

178

The group of

ALL bags

collected by

the entire

class

344

361

387

383

392

1867

My bag of

skittles

Sample of

three bags

from our

group

B. The number of candies per bag is quantitative. We know this because the data consists of numbers

which represent counts or measurements. Both the boxplot and frequency distribution show us that the

distribution is skewed left. However, when I specified the boxplot graph to use fences for outlier values,

the shape of the boxplot changed from being skewed left to more symmetrical.

The graphs do reflect what I expected to see. Each 2.17 oz bag of skittles should have a similar

number of total candies. My bag of skittles had 57 candies, which falls in the first quartile of the 5

number summary. After looking at our class data sheet, I saw that all but two values (49 and 53) fell in

close proximity to each other.

n = 31

Mean (X Bar): 60.2

Standard deviation (Sx): 3.13

5 Number summary

Min = 49

Q1 = 59

Med = 61

Q3 = 62

Max = 64

Part 4

Part A.

A confidence interval is a range (or an interval) of values used to estimate the true value of a

population parameter.

True proportion of yellow candies using the class data as my sample:

0.1831<p<0.2315 (See scanned attachment to see work for this problem.)

I am 99% confident that the true proportion of yellow candies is between 18.31% and 23.15%

The single bag of skittles that I purchased had a proportion of yellow candies that was .2632.

This proportion falls outside of the confidence interval that I calculated. Therefore, it would be

considered to be an unusual value.

True mean number of candies per bag with a 95% confidence interval

59.08<u<61.38

I am 95% confident that the true mean number of skittles per bag is between 59.08 and 61.38

candies. Based on my interval for the true mean number of candies per bag, the single bag of

candy I purchased was unusual. My bag contained 57 candies and did not fall within the

confidence interval which I calculated. However, since my bag count was actually an individual

value, not a mean, it makes sense that many such individual values would not be contained in an

interval for the mean.

Standard deviation of the number of candies per bag with a 98% confidence interval estimate

2.40<Sigma<4.43

I am 98% confident that the standard deviation of the number of candies per bag is between

2.40 and 4.43.

Conditions for doing each of the 3 interval estimates:

Population Proportion p

1. Its a simple random sample; there are a fixed number of trials, the trials are independent,

there are two categories of outcomes and the probabilities remain constant for each trial.

The number of trials was 1867.

2. There are at least 5 successes and 5 failures.

Not all of these requirements were met. The sample was not a simple random sample. The

method for choosing the skittles was closer to a cluster sample and the method used to

purchase the bags of skittles was a matter of convenience. Because we looked at less than

5% of the population, all the trials are considered to be independent. There were at least 5

success and 5 failures. There were 387 yellow skittles, which were considered to be

successes. There were 1480 failures (skittles which were not yellow).

Population mean

1. The sample is a simple random sample

2. Either or both of these conditions are satisfied: The population is normally distributed or

n > 30

These requirements were not met because the sample was not a simple random sample.

The population is not normally distributed but n = 31 so the second requirement was met.

Population standard deviation

1. Its a simple random sample

2. The population must have normally distributed values, even if its a large sample.

These requirements were not met. The sample was not a simple random sample. The

population is not normally distributed. For normal distribution, the frequency starts low,

then increases to one or two high frequencies, and then decreases to a low frequency. In

this case, our distribution is skewed left.

Part B. Hypothesis tests

A hypothesis test is a procedure for testing a claim about a property of a population.

Use a 0.05 significance level to test the claim that 20% of all skittles candies are red. (See

attached work)

Z = -1.70

P Value = 0.08893

Since the P Value is greater than the significance level of 0.05, there is insufficient evidence to

warrant rejection of the claim that 20% of all skittles are red.

Use a 0.01 significance level to test the claim that the mean number of candies in a bag of

skittles is 55.

Critical value: t = 2.750 (see attached work)

There is sufficient evidence to reject the claim that the mean number of skittles in a bag is 55.

2. The conditions for a binomial distribution are satisfied. (There is a fixed number of

independent trials having constant probabilities, and each trial has two outcome categories

of success and failure.

3. The conditions np is greater than or equal to 5 and nq is greater than or equal to 5 are both

satisfied, so the binomial distribution of sample proportions can be approximated by a

normal distribution with mu equals np and sigma equals the square root of npq

Not all these conditions were met. It is not a simple random sample. The conditions for a

binomial distribution are satisfied, as there is a fixed number of independent trials and each

trial has two outcome categories of success and failure. The conditions np (.20 X 1867) is

greater than or equal to 5 and nq (.80 X 1867) is greater than or equal to 5.

Testing claims about a population mean

1. The sample is a simple random sample

2. Either both of these conditions is satisfied: The population is normally distributed or n > 30.

The first requirement was not met, as this was not a simple random sample. The second

requirement was met because n was greater than 30.

Part 5: Reflection

In our math 1040 class with Tiffany Hilton, the students took part in a skittles project. This

consisted of each student purchasing a 2.17 oz. bag of skittles. Then we counted how many skittles of

each color there were in our bags. In class, we were divided into small groups and required to pick three

bags of skittles from the class sample. The bags had to be chosen randomly. We used a random number

generator on a TI-83 calculator and picked rows that corresponded to the first two decimals that were

generated on the calculator. Then, using stat crunch, we made pie charts, pareto charts and box plots as

well as a histogram to display the data. We made graphs for our individual bags of skittles, the sample of

three bags chosen by our group and the sample of the entire class. The graphs displayed the proportions

of the different colors of skittles and the total number of skittles per bag. Next we constructed

confidence intervals for the proportion of yellow skittles, the true mean number of skittles and for the

standard deviation. The goal of this project was to help us gain a better understanding of the concepts

presented in this class. There were 4 parts to this project and each of them corresponded to chapters

that we were currently learning in class. Another goal of this assignment was to familiarize us with how

to use spreadsheet software. Using a program called stat crunch, we learned how to construct different

types of graphs to display data throughout the project.

As a result of this project, I have learned how to do a lot of things that I previously could not

have done. I know about the different types of sampling methods that are used and the characteristics

of each one. For example, this project used a cluster sampling method. We picked bags of skittles using

a random number generator and then sampled all of the skittles associated with those particular bags.

Prior to taking this class, I did not know how to use spreadsheet software to display data. After

completing this project, I would be confident about my ability to properly construct several different

types of graphs to display data in a way that is meaningful and understandable. Also, I now know how to

construct confidence intervals for the true proportion, mean and standard deviation. I know the

formulas needed to perform each calculation and how to consult Z score, T score and Chi score tables to

figure out the area of the confidence interval and to figure out whether or not to reject a null

hypothesis. I also know how to use the calculator functions on my TI-83 to construct the intervals and

find the associated values.

Although I am not sure if I will need to take another math class to obtain my degree, some of

the information learned during this course will no doubt be useful in my future classes. More than

anything, knowing how to use spreadsheet software to display data and construct graphs will be

advantageous for displaying information in a way that is easily interpretable for class projects. Displaying

data in this fashion can also make a project or a paper more persuasive and help the audience see things

from your perspective. Since I am pursuing a degree in nursing, being familiar with statistics and

knowing how to apply these concepts within the field will be vitally important to delivering high quality

care and practicing medicine in a way that is evidence based and known to be effective. By the time I

have obtained a masters or doctoral degree in nursing and established an advanced practice, statistics

will take on an even greater role within the field. Because I will be prescribing medications to patients, I

will want to know what drugs are proven to work best for treating specific conditions. It is likely that I

will be using statistical studies (or consulting with practitioners who have) as a resource to figure out

what medications and other treatments will work most efficiently to treat the ailments of my patients.

- UT Dallas Syllabus for opre6301.503.08s taught by Avanti Sethi (asethi)Uploaded byUT Dallas Provost's Technology Group
- Biostatistics (Dr Shilpi Gilra)Uploaded byJitender Reddy
- Course OutlineUploaded bytapera_mangezi
- Week5 Data ExUploaded byashven05
- Statistics NotesUploaded byMei Lin Chen
- UT Dallas Syllabus for opre6301.502.08s taught by Avanti Sethi (asethi)Uploaded byUT Dallas Provost's Technology Group
- calculator use.pdfUploaded bykunichiwa
- Statistics NotesUploaded bydavidushka
- Indian RailwaysUploaded byTushar Mathur
- Analysis of Variance.docxUploaded byIvan Mikhail Reyes
- ch09 (1)Uploaded byParth Vaswani
- Effect of POGIL on Academic Performance de Gale and BoisselleUploaded byLisa Dwi Purnamasari
- 6911-20072-1-PB (1).pdfUploaded byraymond gaunia
- t-testUploaded byReycelle Tantoco
- chap4Uploaded byEugene
- Stat130 assignment 4 on Unit_5.docxUploaded byUsman
- RP_18Feb2015Uploaded byvicky
- Final Exam Review[1]Uploaded byjus2lovely
- Hypothesis Testing I (Summer 2014)Uploaded byCrisUnderwood
- Hypothesis TestingUploaded byP3 Powers
- Julia Rink Class NotesUploaded byMihad Ahmed Salim
- KNOW THE PREFERENCE OF CONSUMER ON HERO-HONDA TWO-WHEELERSUploaded byakucool143
- 10.1.1.422.2581Uploaded byzeeshan
- the inference project fengyao luoUploaded byapi-340180845
- BZAN 6310 HW Solutions (Non-hand-In)Uploaded byHector Gutierrez
- case studyUploaded byVarun Soni
- 09Uploaded bySandra Rodriguez
- 1040 term project - skittlesUploaded byapi-364944368
- Testing for Multiple Structural Breaks- An Application of Bai-perron Test to the Nominal Interest Rates and Inflation in Turkey[#242872]-211274 (1)Uploaded byvita
- Intr Mind MapUploaded byFirda Anwar

- Business Statistics Course Outline MBA 2011 13Uploaded byShashank Shekhar
- technology integration for probability - final paperUploaded byapi-288856174
- ELEMENTS OF STATISTICS / TUTORIALOUTLET DOT COMUploaded byalbert0076
- otobi11newUploaded byRaju
- Probability & Statistics Practice Final Exam and answersUploaded bymisterjags
- Detecting Spam Zombies by Monitoring Outgoing Messages DocxUploaded byletter2lal
- t-testUploaded bylone wvlf
- Medical InstrumentationUploaded byBMT
- Stat 231 Final SlidesUploaded byRachel L
- A Meta-learning Approach for Recommending a Subset of White-box Classification Algorithms for Moodle Datasets-30Uploaded byAldo Ramirez
- Audit Committee and Timeliness of Financial ReportsUploaded byAlexander Decker
- PAR-08-2013-0086.pdfUploaded byMuhammad Azeem
- STA301_CurrentPastFinalTermSolvedQuestions_www.vustudents.net.pdfUploaded byThat'x UniQue
- Price Discovery for Copper Futures in Informationally Linked MarketsUploaded byFranciscoMuñozElguezabal
- a04v8n3Uploaded byrgovindan123
- Syllabus 321-02 Fall 11Uploaded bytiberianxzero
- Non Parametric MethodUploaded byKyai Mbethik
- Course Outline MTH2212 2011Uploaded byLuqmanhakim Xav
- An Investigation of Techniques for Detecting Data Anomalies in Earned Value Management DataUploaded bySoftware Engineering Institute Publications
- Binary Logistic Regression - 6.2Uploaded bysmallbhai
- The Survival Time of Chocolates on Hospital Wards_ Covert Observational Study _ BMJUploaded by태환최
- Ch 08 Lecture Notes.pdfUploaded byPorav Malhotra
- Syllabus for B.th III an Introduction to LogicUploaded byPrince Daniel
- UT Dallas Syllabus for psy2317.501.11s taught by Nancy Juhn (njuhn)Uploaded byUT Dallas Provost's Technology Group
- Observed Correlations and Dependencies Among Op Losses in the ORX Consortium Database November 2008Uploaded bykumarpra
- Lesson12.pdfUploaded byhuyhoaius9038
- 1805.03379Uploaded byvivekgandhi7k7
- Passenger Satisfaction - BmtcUploaded byLata Vasisht
- MODULE 15 Hypothesis TestingUploaded byEza Hasnan
- Important Statistics FormulasUploaded byS