124 views

Uploaded by api-272673268

© All Rights Reserved

- AP Psych Study Guide
- Biochemia Medica
- 6 Sigma Glossary
- IB Lab Report Guide v20162
- Faster Route to the CEO Suite- Nepotism or Managerial Proficiency?
- SBE12E Chapter 22
- Chi Square Genetics Practice Problems Worksheet.280185356
- Math 1005 Statistics
- Anova Review
- 26321-28991teewqf-1-PB
- Statistical Tools and Formula
- New Proposal
- 62058-225982-1-PB
- 14632practicalsignificance-161017020922
- Stat Assignement
- Formulas
- messegeeproject3r
- Econometrics PowerPoint
- Chapter IV
- Hari Krishna Karri

You are on page 1of 12

Sylvester

Math 1040

Variables are classified in the following four levels of measurement: Nominal,

Ordinal, Interval, and Ratio. Each level of measurement is defined by the following

categories: Type of Data (qualitative/quantitative), Natural Order (means more),

differences (add/subtract) makes sense, and Ratios (fractions) makes sense.

Candy colors is a qualitative variable because its level of measurement is nominal

(name only), there is no natural order (adding/subtracting colors doesnt make sense),

and ratios dont make sense when classifying the variable.

The number of candies per bag is a quantitative variable because its level of

measurement is ratio. Natural order (means more), differences (add/subtract), and

ratios (fractions) make sense.

Jason Sylvester

Math 1040

Summary statistics:

number of candies per

bag

Mean

Std. dev.

Median

Range

Min

Max

Q1

Q3

IQR

Mode

60

59.18

3.11

59

14

50

64

58

61.5

3.5

59

Min = 50, Q1 = 58, Median = 59, Q3 61.5, Max = 64

Fences

Lower Fence: 58-1.5(3.5) = 52.75

Upper Fence: 61.5+1.5(3.5) = 66.75

Outliers

There were two outliers 50, and 52. The bag of skittles I purchased had 57 total

candies. It was not an outlier because its less than the upper fence, and greater than

the lower fence.

Shape of the distributions

It is not appropriate to discuss the shape of the qualitative distribution as the

qualitative data is not numeric. However, it is appropriate to discuss the shape of the

Jason Sylvester

Math 1040

quantitative data as there is a logical order (more means more), differences

(add/subtract) makes sense, and ratios (fractions) make sense. Also, outliers (extremes)

affect the shape of a quantitative frequency distribution.

Jason Sylvester

Math 1040

The question was asked Can height be used to predict the number of candies that will be in a

bag of Skittles you purchase?.

What do I think the results will be? You will not be able to use height of a person to predict the

numbers of candies per bag. The correlation between the two variables is not relevant nor does

it makes sense. The height of a person does not influence the number of candies per bag since

bags are purchased at random.

Height is the explanatory variable, and number of candies would be the response variable. The

response variable is number of candies per bag because the variable of interest (number of

candies per bag) would be explained/predicted by another variable height. Height is the

explanatory variable since it would explain the value of the response variable number of

candies per bag.

Scatter Diagram

There is no significant relationship between the two variables. Since n=60, the critical value is

.361. The absolute value of the correlation coefficient (0.17042887) is less than the critical

value, there is no significant relationship. The correlation coefficient (0.17042887) is not close

to -1 or 1, which would indicate either a strong negative or strong positive relationship. Because

the dots are so spread out there is no linear relationship (general pattern). The general

direction of the diagram is not easily seen, but because of the slope of the regression line we

know that its positive.

Jason Sylvester

Math 1040

This is exactly what I expected since since using the height of a person is not really a good way

to predict the number of candies in a bag of skittles that is purchased from the store.

Use the mean of the response variables

Using the data, the regression equation would be Y (y hat) = 50.713668 + 0.1287705X. If a

person is 63.5 inches tall we would plug 63.5 in for X and it would give us an approximation for

number of candies per bag. According to the regression line if a person is 63.5 inches tall, then

the approximate number of candies per bag would be 58.891 or 59 candies (rounded). It was

not appropriate to use the regression equation since there is no linear relation.

Using the regression output, we can then determine the coefficient of determination which

measures the proportion of variation in the response variable explained by the regression line

(percent of variation in Y explained by X. To find the coefficient of determination we square the

correlation coefficient (0.17042887) and get .0290459997 or 2.9%. We see that 2.9% of

variation in the number of candies per bag can be explained by the height of the person who

purchased the bag. Since this is so low, we can see that this is not a good way to predict the

number of candies per bag using height as the explanatory variable.

If we were to assume there is a significant relationship between height and number of candies

per bag, would it be appropriate to predict the number of candies in a bag purchased by retired

Houston Rockets player Yao Ming, who is 90 inches tall? No, because Yao Mings height of 90

inches is outside the range of samples taken and would therefore be considered extrapolation.

Doing a similar analysis on a smaller data set I used a systematic approach, starting with the

second row of data and every 10th row after that I got a smaller sample size of 6. Plugging the

data into my calculator I got a correlation coefficient of .1457, and a regression equation of Y (y

hat) = 52.9615 + .0769X. The critical value for the data set is .811. So the absolute value of the

correlation coefficient is .1457, and since .1457 is less than the critical value we can say that

there is no significant relationship between X and Y for the smaller data set.

Jason Sylvester

Math 1040

The bag of skittles I purchased contained the following:

Red - 16

Orange - 11

Yellow - 13

Green - 9

Purple - 8

Total Number of Candies - 57

My height in inches is 71

What is the probability that both Skittles are purple if you select them with replacement?

Give your answer correct to four decimal places.

There are a total of 8 purple skittles, and the total number of skittles in the bag is 57. The

probability of selecting two skittles (with replacement) that are both purple is (8/57)^2 = .0197

What is the probability that both Skittles are purple if you select them without replacement?

Give your answer correct to four decimal places.

The probability of selecting two purple skittles (without replacement) is (8/57) (7/56) = .0175

What is the probability that at least one Skittle is purple if you select them with replacement?

At least means 1 Probability(None). The probability of none is 1 (probability of selecting

two purple skittles). So the probability of selecting at least 1 purple skittle (with replacement) is

1 - (1 - 8/57)^2 = .2610.

Suppose all of the Skittles in the class data set are combined into one large bowl and you are

going to randomly select one Skittle.

What is the probability that you select a green Skittle?

There are 710 green skittles in the bowl, and there is a total of 3,551 skittles in the bowl. The

probability of selecting a green skittle is 710/3,551 = .1999

What is the probability that you select a Skittle that is NOT green?

To find the probability of selecting a skittle that is not green we use the compliment rule, which

is 1 P(selecting a green skittle). So the probability that the skittle is not green is 1 - .1999 =

.8001.

Jason Sylvester

Math 1040

What is the probability that you select a Skittle that is red OR yellow?

To find the probability that you select a skittle that is red or yellow we use the addition rule for

disjoint events which is P(selecting a red) + P(selecting a yellow). So the probability of selecting

a red or yellow is (716/3551) + (726/3551) = .4061

What is the probability that you select a Skittle that is orange GIVEN that it is a secondary

color (secondary colors are green, orange and purple)?

To find the probability that a skittle is orange given that its secondary color we will add

together the total number of skittles that are secondary colors (green + orange + purple). The

total number of skittles that are a secondary number is 710 + 698 + 701 = 2,109. Next, we will

divide the total number of orange skittles by the total number of skittles that are secondary

colors. We get 698/2,109 = .3310

Problem 3: Suppose all of the Skittles in the class data set are combined into one large bowl

and you are going to randomly select ten Skittles with replacement and count how many are

yellow.

Show that this meets the requirements of the binomial probability distribution and identify n

and p.

The criteria for a binomial probability is as follows

Is the experiment performed a fixed number of times? Yes, 10

Are the trials independent (outcome of one trial does not affect the outcome of the

other)? Yes, because we are selecting skittles with replacement. Selecting one skittle

does not affect the outcome of selecting another skittle.

For each trial there are 2 disjoint outcomes success or failure? Yes, either the skittle

is yellow or its not.

Probability of success is the same for each trial? Yes, you are equally likely to select a

yellow skittle in each trial.

The number of independent trials (n) of the experiment is 10

The probability of success (drawing a yellow skittle) for each trial is .2044

What is the probability that exactly 4 of the 10 Skittles are yellow?

We plug in (n,p,x) in the calculator using the binomialpdf function. With, n = 10, p = .2044, and x

= 4). The probability that exactly 4 of the 10 skittles are yellow is .093

For samples of size 10, what is the expected value and standard deviation for the number of

yellow skittles that will be included?

The expected value is (n)(P), so (10)(.2044) = 2.044. The standard deviation is 1.28

Jason Sylvester

Math 1040

Problem 4: For this problem, treat a 2.17 ounce bag of Skittles as an individual. Suppose the

values for our class data are the parameter values for all 2.17 ounce bags of Skittles. In other

words, assume = mean number of candies per bag in our class data set and = standard

deviation of number of candies per bag in our class data set (you computed these values in

Part 2).

Describe the sampling distribution for the mean number of candies per bag for samples of 32

bags. Include center, spread and shape. Note: The shape of the SAMPLING DISTRIBUTION is

different from the shape of the population, which you determined in Part 2 of the project.

Earlier in part 2 we found that the mean number of candies per bag is 59.18, and that the

standard deviation is 3.11. If n = 32, the sampling distribution of the mean has a center of 59.18

(mean), spread (standard error of the mean) of .5498, and the shape is approximately normal.

Because the sample size is greater than 30 we know that the shape is approximately normal.

What is the probability that the mean number of candies per bag for a sample of 32 bags is

greater than 58.5?

To find p(x bar > 58.5) we first calculate the z score (58.5 59.18)/(.5498) = -1.24. We then find

the probability using the table or calculator. We get a probability of 1 - .1075 = .8925. So theres

a 89.25% chance that a sample of 32 bags has a mean number of candies per bag that is greater

than 58.5.

Jason Sylvester

Math 1040

In most cases its not feasible to conduct experiments for entire populations. Simple

random samples from of size n from a population whose parameter is unknown will result

in an interval that contains the parameter. Its an interval of numbers based on a point

estimate that gives a range of likely values for an unknown parameter.

Identify the requirements for computing confidence intervals. List the requirements

separately for a confidence interval for a population proportion and for a population

mean. (5 points)

In order to construct a confidence interval for the population proportion you need the

following:

1. An approximately normal sampling distribution of p hat: np(1 p) 10

2. Independent trials: n 0.05N (sample size is smaller than 5% of population).

In order to construct a confidence interval for the population proportion you need the

following:

1. Sample data come from a simple random sample or randomized experiment.

2. Sample size is small relative to the population n 0.05N

3. The data comes from a population that is normally distributed, OR the sample size is

large (n 30).

Using values for the class data that you computed in Part 2 of the project, construct a 99%

confidence interval estimate for the true proportion of yellow candies using the class data

as your sample. Remember that for this computation, n is the number of CANDIES for the

entire class data. Include all your work, showing the formula used and appropriate values

inserted (neatly written and scanned or typed). (10 points)

Since our parameter of interest is qualitative data we will compute the confidence interval

for the proportion. We will use n = 3,551, x = 726, and a c-level of .99. Using technology, we

use 1-PropZInt command and get a confidence interval of (.18702, .22188), and p hat of

.2044.

Give an appropriate interpretation of your interval. (5 points)

With 99% confidence the true proportion of all yellow skittles is between .18702, and

.22188.

Jason Sylvester

Math 1040

Based on your interval for the true proportion of yellow candies, was the proportion of

yellow candies in the single bag of candy you purchased a likely value for the true

population proportion? Explain how you know using actual values from your data and

computations. (5 points)

The proportion of yellow candies in my bag was is extremely close to the the upper bound

in the confidence interval. The proportion of yellow candies in my bag was 13/57, which

equals about .2281, and the upper bound in the confidence interval is .22188.

Using values you computed in Part 2 of the project, construct a 95% confidence interval

estimate for the true mean number of candies per bag using the class data as your

sample, but for this computation, n is the number of BAGS. Include all your work, showing

the formula used and appropriate values inserted (neatly written and scanned or

typed). (10 points)

Since our parameter of interest in qualitative data we will compute the t-interval. We will

use n = 60, mean = 59.18, standard deviation = 3.11, and a c-level of .95. Using technology,

we use t-interval command and get a confidence interval of (58.377, 59.983).

Give an appropriate interpretation of your interval. (5 points)

With 95% confidence the mean number of candies per all the bags is between 58.377, and

59.983.

Based on your interval for the true mean number of candies per bag, was the total

number of candies in the single bag you purchased a likely value for the population

mean? Explain how you know using actual values from your data and computations. (5

points)

The number of candies in my bag was is slightly outside the the lower value in the

confidence interval, but it was very close. The number candies in my bag was 57, which is

slightly less than the lower bound (58.377) in the confidence interval.

Jason Sylvester

Math 1040

Purpose and meaning of a hypothesis test: A procedure based on sample results and probability

that tests hypotheses about the population. Testing two statements (null hypothesis, and

alternative hypothesis), and whether or not the statement is true or false.

1. Null hypothesis: H0: P = .20, 20% of all skittles are red. Alternative Hypothesis: H1: P

.20, the proportion of red skittles is not 20%

2. Conditions for performing the hypothesis test:

a. Simple random sample or random experiment? Convenience sampling, but the

test will work in this case.

b. n .05N? 3551 .05(all skittles)? Yes

c. np (1-p) 10? (3551)(.20)(1-.20) 10? Yes

3. The Test Statistic: 1-PropZTest (.20(p), 716(x), 3551(n)), so Z0 = .2433

4. The P-Value: .8076

5. Compare the p-value to alpha (reject H0 if p-value < alpha). 8076(p-value) > .05(alpha).

So we do not reject the null hypothesis. There is insufficient evidence to conclude that

the proportion of red skittles is not 20%.

6. The Type I and Type II errors are as follows

a. Type I error: To Reject H0 when H0 is true. Reject that 20% of all skittles are red,

when 20% of all skittles are actually red.

b. Type II error: Do not Reject H0 when H0 is false, and H1 is true. Do not reject that

20% of all skittles are red, when the proportion of red skittles is actually different

than 20%.

1. Null hypothesis: H0: mew = 58, mean number of skittles is 58. Alternative Hypothesis:

H1: mew > 58, the mean number of skittles is greater than 58.

2. Conditions for performing the hypothesis test:

a. Simple random sample or random experiment? Convenience sampling, but the

test will work in this case.

b. No outliers & normal or n 30.

c. Independent, n .05N? 60 .05(all skittles)? Yes

3. The Test Statistic: T-Test (mew(58), 59.18(xBar), 3.11(s), 60(n), Right Tailed), so T0 =

2.9390

4. The P-Value: .0023

5. Compare the p-value to alpha (reject H0 if p-value < alpha). .0023 (p-value) < .05(alpha).

So we reject the null hypothesis. There is sufficient evidence to conclude that the true

mean of skittles per bag is greater than 58

6. Interpret the p value: If the mean number of skittles per bag is really 59.18, then the

probability of getting 59.18 or greater is .0023.

Jason Sylvester

Math 1040

This semester we have been learning the concepts of statistics. As weve learned each

concept weve applied it to our skittles project. At the beginning of the semester all of the

students in the class were asked to purchase a bag of original skittles. All the students counted

the number of orange, green, yellow, red, and purple skittles in their bag. We then combined all

the classes data and used it throughout the the semester for parts 2-6 of the project. Each part of

the project helped us solidify the statistical concepts we had just learned.

Throughout this semester I have learned a ton about collecting data, the importance of data,

and what we can learn/infer from the data that we collect. I learned that data can easily be

misrepresented, and that collecting bad/wrong data can lead to incorrect assumptions and

decisions. I learned that the sampling methods used to collect data is extremely important, and it

must be done in a way that will not skew or misrepresent your explanatory or response variables.

I also learned that graphs can be misleading, and that its important to pay close attention to what

youre looking at. Ive been able to identify graphs on the web, and in social media that looked

great, but were misleading.

Ive learned a lot about inference, and what we can conclude and predict based of the

information we collect. For instance, during the project we were asked can height be used to

predict the number candies that will be in a bag of skittles?. Obviously we couldnt as this

doesnt make sense, but its been interesting to think about the real world applications and what

can be concluded from data thats collected. We can actually use math to see if there is any

correlation between two variables. Ive learned a lot throughout this semester and Its been

interesting to learn statistics. I can now apply what Ive learned to my degree, and its exciting!

- AP Psych Study GuideUploaded bynaruto710@wu
- Biochemia MedicaUploaded byshuchikhandu
- 6 Sigma GlossaryUploaded byCSeder
- IB Lab Report Guide v20162Uploaded byayse
- Faster Route to the CEO Suite- Nepotism or Managerial Proficiency?Uploaded byElena Sandu
- SBE12E Chapter 22Uploaded bylee
- Chi Square Genetics Practice Problems Worksheet.280185356Uploaded byGaurang Dave
- Math 1005 StatisticsUploaded bysuitup10
- Anova ReviewUploaded byR Ua Kambos
- 26321-28991teewqf-1-PBUploaded bysigirya
- Statistical Tools and FormulaUploaded byRosalie Rosales
- New ProposalUploaded bySneha
- 62058-225982-1-PBUploaded byBadhan Mustary
- 14632practicalsignificance-161017020922Uploaded byJasMisionMXPachuca
- Stat AssignementUploaded bydarkcloud9
- FormulasUploaded byTrisha Faith Alber
- messegeeproject3rUploaded byapi-259928430
- Econometrics PowerPointUploaded byJenningsJingjingXu
- Chapter IVUploaded byZandra Lloren
- Hari Krishna KarriUploaded byREDDY
- John_new.xlsxUploaded byLucky Lucky
- Sustainability of sharia rural bank in Central JavaUploaded byJEKI
- PhonologyUploaded byKyel Lopez
- 1Uploaded byElmer Gratil Doronila
- Ej WalpoleUploaded byDiego H Martinez
- Normal It AsUploaded byraisharainbow
- 1Uploaded byIntan Madulara Stratan
- 6 Basic Statistical ToolsUploaded byzenflesh
- Chapter 10Uploaded bymichipi
- RMUploaded bynirajmishra

- Exercise Stat3Uploaded byTejashwi Kumar
- MNSTA Chapter 4Uploaded byRenee Jezz Lopez
- Peggy Vaughan -- Help-For-Therapists and PatientsUploaded bybob.kalapaca4662
- 107597Uploaded bywasiuddin
- Statistics Guide - Harvey MotulskyUploaded byRoberto Npg
- Einspem.upm.Edu.my Journal Fullpaper Vol6no1 1. Noraini Edit 4.1.12Uploaded byMFong Thong
- 5. a Study on Customer Preference Towards Heavy Commercial Vehicle With Reference to Ashok Leyland-2019-01!09!12-44Uploaded byImpact Journals
- Testing of HypothesisUploaded bySiddharth Bahri
- 16.pdfUploaded bymbacrack3
- Comparing Two ProportionsUploaded by1ab4c
- Short Medium and Long Term Load Forecasting ModelUploaded byRyan Akbar
- Psychology Practice TestUploaded byseethurya
- lab 3 - diatomsUploaded byapi-335545160
- 2016 06_level_1_Mock114_V2_qnsUploaded byhuiminleong
- ANOVAUploaded byDivya Lekha
- Assignment 2 Outliers and Normality (1)Uploaded byjeyanthirajagur418
- Edu 2009 Fall Exam c TableUploaded byTrever Grah
- skittles term project math 1040Uploaded byapi-316781222
- Golden RatioUploaded bygrivas2121
- skittlesUploaded byapi-337959039
- Impacts of Parental Support on Academic Performance Among Secondary School Students in IslamabadUploaded byThe Explorer Islamabad
- 2012_1_05Uploaded bykadekpramitha
- sample7.pdfUploaded byThenes Kumar
- ah_statistics_2006.pdfUploaded byBalkis
- Stat171_09_2015_1 copy 4Uploaded bynigerianhacks
- ej 01 prosperUploaded byMarcelo Ayllón Ribera
- Case AnalysisUploaded bytoto789
- Coke.1.Phase2Uploaded byemjourney
- Spectro ResultUploaded byAbby Fernandez
- The Detection of Earnings Manipulation Messod D. BeneishUploaded byOld School Value