You are on page 1of 10

Statistiek: Assignment 1

Arend Slomp

Groningen, September 22, 2011

2 .

. . . . . . . . between median and mean . . . . . . . . . .28 . . . . .9 . . . . . . .3 Formal analysis . . 2. . .Contents 1 Losse opgaven 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . 4 4 4 4 5 6 2 Onderzoek Does smoking of weight of their babies? 2. . . . . . . . . . . .1 Opgave 1: Difference 1. . . .2 Opgave 2. 1. . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 . . 10 . . .8 . . . . .1 Introduction . .5 Discussion . . . . . . . . . . . . . . . . . . .4 Conclusion . . . . 8 . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 . .2 Exploratory analysis . . . . . . . . . . . . . . . . . . .5 Opgave 7. . . . . . . . . . . 9 . . 2. . . . . . . . . . . . . . . . . . . . . 2. . . pregnant mothers have an eect on the birth . . . . 8 . . . . . . . . . . . . . . . . . . . . . . . .3 Opgave 2. . . . . . . . . . . . .43 . .4 Opgave 7. . . . .

2 Opgave 2.1 Losse opgaven 1. The median is a more reliable estimator than the mean. hist(brightness) dev.off() 4 .43 > > > > data(brightness) png("brightness.png"). The median is determined by sorting all data.3 Opgave 2. and selecting the middle value of all the data. in essence because it is less sensitive to outliers.8 > library(UsingR) > data(npdb) > attach(npdb) > max(table(state)) [1] 1566 > which(table(state) == max(table(state))) CA 6 California had the most awards 1. 1.1 Opgave 1: Difference between median and mean The difference between the median and the mean is that the mean is the sum of all values divided by the number of values. the average between the two middle values is taken. If the dataset contain an even number of values.

9) 1-sample proportions test with continuity correction data: 4 out of 5.As you can see in this histogram.3493025 0.3711 alternative hypothesis: true p is not equal to 0.8.5 90 percent confidence interval: 0.conf.conf. conf.level = 0. p-value = 0.100.5.level=0.9 x=4.4 Opgave 7.9861052 sample estimates: p 0.test(80.level=0.9) : Chi-squared approximation may be incorrect > x = 80. 5. n = 5 > prop. df = 1.test(4. n=100 > prop.test(4.5 X-squared = 0. the brightness of the stars is symmetric. 1. null probability 0.8 Warning message: In prop.9) 1-sample proportions test with continuity correction 5 . Further is the data unimodal.

1-sample proportions test with continuity correction data: 800 out of 1000. p-value = 3.8204633 sample estimates: p 0.conf.7212471-0.8 x = 800. null probability 0. df = 1.3922.7778789-0.3493025-0.81.8617706 90% confidence intervals are for 800 out of 1000: 0.data: 80 out of 100. p-value < 2.7212471 0. null probability 0.age t = 17.28 Initialisatie omgeving > library(UsingR) > data(babies) > attach(babies) > t.5 90 percent confidence interval: 0.95) One Sample t-test data: dage .8617706 sample estimates: p 0.635e-09 alternative hypothesis: true p is not equal to 0.9861052 90% confidence intervals are for 80 out of 100: 0. p-value < 2.801.level=0. df = 1235.7778789 0.5 Opgave 7.745356 6 .5 90 percent confidence interval: 0.5 X-squared = 358. df = 1.8 90% confidence intervals are for 4 out of 5: 0. n=1000.5 X-squared = 34.986035 3.8204633 1.2e-16 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 2.2e-16 alternative hypothesis: true p is not equal to 0.test(dage-age.

it doesn’t contain 0. 7 .sample estimates: mean of x 3.365696 When we look at the confidence interval.

the amount of previous premature labours. Looking at the data we see that 115 mothers don’t smoke. the race. if she smoked. We need to formulate an hypothesis about the test we want to perform. the mother’s weight in pounds at the last menstrual period. the history of hypertension. Massachusets during 1986. The data contained 189 mothers. and the birth weight. The data further contained extra information about the age. with the mothers age. and the number of physician visits during the first semester. we have two sets. We know the amount of mothers that smoke and we know which don’t smoke. the presence of uterine irritability. As nullhypothesis I choose: H0 : p1 = p2 ⇒ there’s no difference between smoking and no smoking Ha : p1 = p2 ⇒ There is a difference between smoking and no smoking. 8 .1 Introduction I am conducting an research about the effect of smoking of pregnant mothers on the birth weight of their babies. as this has nothing to do with my research question. I don’t think the extra data contained in the data set is relevant. we can use a t-test. When we look at the data. I used data from Baystate Medical Center in Springfield. so I will not use that in my research. Since we have 2 sets with weight of babies with mothers that do smoke and don’t smoke.2 Onderzoek Does smoking of pregnant mothers have an eect on the birth weight of their babies? 2. and 74 do smoke.

7 grams.2.1: 1 = boxplot of birth-weight of babies from mothers who smoke 2 = boxplot of birth-weight of babies from mothers who don’t smoke First when we talk about smoking mothers then we talk in general about the weight of the babies from the mothers who are smoking. 9 . We see that the mean of these babies is 3055. we see then that the median is 2775.2 Exploratory analysis Figure 2. we see that the median is 3100 grams. If we take a look at the mean of the weight of babies from mothers who smoke we see that the weight is 2772 grams.5 grams. If we take a look at the weight of the babies from the mothers who did smoke. Consistently when we talk about non smoking mothers we talk in general about the weight of the babies from the mothers who don’t smoke. This means the data is probably normally distributed. When we take a look at the weight of the babies from the mothers who didn’t smoke.

smoke = bwt[smoke==1] > m. df = 187.3 Formal analysis We choose a confidence interval of 0.4 Conclusion Looking at the data.2. > m. If the smokers do come from getto’s and the non-smokers from the wealthy part of society than is that a different reason why the we see this correlation.smoke. 2.79735 sample estimates: mean of x mean of y 3055. This means that the test is significant.m. and we reject the null-hypothesis. We furthermore have no information about the habitat of the two groups.008.nosmoke and m.smoke t = 2.95. 2.008667 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 72.nosmoke = bwt[smoke==0] >> t.nosmoke. we see that there is a statistical significance. 10 .6529.test(m.75612 494. p-value = 0.5 Discussion The fact that there is a relation between the weight of the babies and the fact that mothers do or don’t smoke doesn’t mean that smoking or smoking really the cause is that the weight is so different.919 When we look at this test we see that the p-value equals 0. and we have to reject the null-hypothesis.696 2771.equal=TRUE) Two Sample t-test data: m.var.