You are on page 1of 10

# Statistiek: Assignment 1

Arend Slomp

Groningen, September 22, 2011

2 .

. . . . . . . . between median and mean . . . . . . . . . .28 . . . . .9 . . . . . . .3 Formal analysis . . 2. . .Contents 1 Losse opgaven 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . 4 4 4 4 5 6 2 Onderzoek Does smoking of weight of their babies? 2. . . . . . . . . . . .1 Opgave 1: Diﬀerence 1. . . .2 Opgave 2. 1. . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 . . 10 . . .8 . . . . .1 Introduction . .5 Discussion . . . . . . . . . . . . . . . . . . .4 Conclusion . . . . 8 . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 . .2 Exploratory analysis . . . . . . . . . . . . . . . . . . .5 Opgave 7. . . . . . . . . . . 9 . . 2. . . . . . . . . . . . . . . . . . . . . 2. . . pregnant mothers have an eect on the birth . . . . 8 . . . . . . . . . . . . . . . . . . . . . . . .3 Opgave 2. . . . . . . . . . . . .43 . .4 Opgave 7. . . . .

2 Opgave 2.1 Losse opgaven 1. The median is a more reliable estimator than the mean. hist(brightness) dev.off() 4 .43 > > > > data(brightness) png("brightness.png"). The median is determined by sorting all data.3 Opgave 2. and selecting the middle value of all the data. in essence because it is less sensitive to outliers.8 > library(UsingR) > data(npdb) > attach(npdb) > max(table(state)) [1] 1566 > which(table(state) == max(table(state))) CA 6 California had the most awards 1. 1.1 Opgave 1: Diﬀerence between median and mean The diﬀerence between the median and the mean is that the mean is the sum of all values divided by the number of values. the average between the two middle values is taken. If the dataset contain an even number of values.

9) 1-sample proportions test with continuity correction data: 4 out of 5.As you can see in this histogram.3493025 0.3711 alternative hypothesis: true p is not equal to 0.8.5 90 percent confidence interval: 0.conf.conf. conf.level = 0. p-value = 0.100.5.level=0.9 x=4.4 Opgave 7.9861052 sample estimates: p 0.test(80.level=0.9) : Chi-squared approximation may be incorrect > x = 80. 5. n = 5 > prop. df = 1.test(4. n=100 > prop.test(4.5 X-squared = 0. the brightness of the stars is symmetric. 1. null probability 0.8 Warning message: In prop.9) 1-sample proportions test with continuity correction 5 . Further is the data unimodal.

1-sample proportions test with continuity correction data: 800 out of 1000. p-value = 3.8204633 sample estimates: p 0.conf.7212471-0.8 x = 800. null probability 0. df = 1.3922.7778789-0.3493025-0.81.8617706 90% conﬁdence intervals are for 800 out of 1000: 0.data: 80 out of 100. p-value < 2.7212471 0. null probability 0.age t = 17.28 Initialisatie omgeving > library(UsingR) > data(babies) > attach(babies) > t.5 90 percent confidence interval: 0.95) One Sample t-test data: dage .8617706 sample estimates: p 0.635e-09 alternative hypothesis: true p is not equal to 0.9861052 90% conﬁdence intervals are for 80 out of 100: 0. p-value < 2.801.level=0. df = 1235.7778789 0.5 Opgave 7.745356 6 .5 90 percent confidence interval: 0.5 X-squared = 358. df = 1.8 90% conﬁdence intervals are for 4 out of 5: 0. n=1000.5 X-squared = 34.986035 3.8204633 1.2e-16 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 2.2e-16 alternative hypothesis: true p is not equal to 0.test(dage-age.

it doesn’t contain 0. 7 .sample estimates: mean of x 3.365696 When we look at the conﬁdence interval.

the amount of previous premature labours. Looking at the data we see that 115 mothers don’t smoke. the race. if she smoked. We need to formulate an hypothesis about the test we want to perform. the mother’s weight in pounds at the last menstrual period. the history of hypertension. Massachusets during 1986. The data contained 189 mothers. and the birth weight. The data further contained extra information about the age. with the mothers age. and the number of physician visits during the ﬁrst semester. we have two sets. We know the amount of mothers that smoke and we know which don’t smoke. the presence of uterine irritability. As nullhypothesis I choose: H0 : p1 = p2 ⇒ there’s no diﬀerence between smoking and no smoking Ha : p1 = p2 ⇒ There is a diﬀerence between smoking and no smoking. 8 .1 Introduction I am conducting an research about the eﬀect of smoking of pregnant mothers on the birth weight of their babies. as this has nothing to do with my research question. I don’t think the extra data contained in the data set is relevant. we can use a t-test. When we look at the data. I used data from Baystate Medical Center in Springﬁeld. so I will not use that in my research. Since we have 2 sets with weight of babies with mothers that do smoke and don’t smoke.2 Onderzoek Does smoking of pregnant mothers have an eect on the birth weight of their babies? 2. and 74 do smoke.

7 grams.2.1: 1 = boxplot of birth-weight of babies from mothers who smoke 2 = boxplot of birth-weight of babies from mothers who don’t smoke First when we talk about smoking mothers then we talk in general about the weight of the babies from the mothers who are smoking. 9 . We see that the mean of these babies is 3055. we see then that the median is 2775.2 Exploratory analysis Figure 2. we see that the median is 3100 grams. If we take a look at the mean of the weight of babies from mothers who smoke we see that the weight is 2772 grams.5 grams. If we take a look at the weight of the babies from the mothers who did smoke. Consistently when we talk about non smoking mothers we talk in general about the weight of the babies from the mothers who don’t smoke. This means the data is probably normally distributed. When we take a look at the weight of the babies from the mothers who didn’t smoke.

smoke = bwt[smoke==1] > m. df = 187.3 Formal analysis We choose a conﬁdence interval of 0.4 Conclusion Looking at the data.2. > m. If the smokers do come from getto’s and the non-smokers from the wealthy part of society than is that a diﬀerent reason why the we see this correlation.smoke. 2.79735 sample estimates: mean of x mean of y 3055. This means that the test is signiﬁcant.m. and we reject the null-hypothesis. We furthermore have no information about the habitat of the two groups.008.nosmoke and m.smoke t = 2.95. 2.008667 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 72.nosmoke = bwt[smoke==0] >> t.nosmoke. we see that there is a statistical signiﬁcance. 10 .6529.test(m.75612 494. p-value = 0.5 Discussion The fact that there is a relation between the weight of the babies and the fact that mothers do or don’t smoke doesn’t mean that smoking or smoking really the cause is that the weight is so diﬀerent.919 When we look at this test we see that the p-value equals 0. and we have to reject the null-hypothesis.696 2771.equal=TRUE) Two Sample t-test data: m.var.