You are on page 1of 3

Problem 1

We begin by declaring and initialising the required arrays for the problem.
x = c(26.1, 26.6, 27.4, 27.5, 27.8, 28.1, 28.4, 29.5, 29.8, 30.4, 30.4, 31.2,
31.5, 32.9, 33.6, 34.1, 35.9)
y = c(27.4, 28.1, 22.9, 31.3, 16.3, 50.1, 20.0, 24.6, 23.3, 19.3, 24.4, 24.4,
29.5, 27.6, 21.7, 25.4, 39.4)

a)
i. The null hypothesis is H 0 :m ( X −Y )=0, where m ( X−Y ) denotes the median of
the difference of X and Y. The alternative hypothesis is H 1 : m ( X −Y ) ≠ 0 .
len = length(sign(x-y)[sign(x-y)>0]) #This gives the number of occurrences of
x being greater than its paired y value
binom.test(12,17)

##
## Exact binomial test
##
## data: 12 and 17
## number of successes = 12, number of trials = 17, p-value = 0.1435
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.4404173 0.8968645
## sample estimates:
## probability of success
## 0.7058824

ii. The hypotheses are the same as those used in Part i. The null hypothesis is
H 0 :m ( X −Y )=0, where m ( X−Y ) denotes the median of the difference of X and
Y. The alternative hypothesis is H 1 : m ( X −Y ) ≠ 0 .
wilcox.test(x,y,paired=TRUE)

##
## Wilcoxon signed rank test
##
## data: x and y
## V = 124, p-value = 0.02322
## alternative hypothesis: true location shift is not equal to 0

iii. The null hypothesis is H 0 : μ X =μ Y , where μ X and μY denote the means of X and Y
respectively. The alternative hypothesis is H 1 : μ X ≠ μY .
t.test(x,y,paired=TRUE)

##
## Paired t-test
##
## data: x and y
## t = 1.6402, df = 16, p-value = 0.1205
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.954790 7.484202
## sample estimates:
## mean of the differences
## 3.264706

b) Among the three tests, we find that the sign and t-tests fail to reject their null
hypotheses, while the Wilcoxon test rejects its null hypothesis. In terms of why the
sign test may have failed to find a significant result while the Wilcoxon test was able to
reject the null, this would likely be due to the increased power of the Wilcoxon test
(the sign test does not take into account the magnitude of the differences between x
and y, while the Wilcoxon test uses this information to come to a conclusion). In
addition, the probable reason for why the t-test failed to reject the null hypothesis
while the Wilcoxon test did reject the null is elucidated when boxplots of the two
datasets are drawn. While we have strong evidence of a difference in location between
the two distributions, the presence of skewness and outliers in the sample, most
significantly in the sample distribution of y, would have had a substantial effect on the
t-test, leading to a non-significant result, but would not have had such a strong impact
on the conclusions of the more robust Wilcoxon test. Given the information that we
have gleaned from this analysis and that we do not have any specific information
about the distribution of X and Y, it seems sensible to use the results of the
distribution-free Wilcoxon test and conclude that there is evidence that the median of
X – Y is not zero and thus, that X and Y differ in location.

boxplot(x,y,horizontal=T, names = c("x","y"))


c) We will conduct 1000 simulation runs, and denote by count_binom, count_wilcox and
count_t the number of times the null hypothesis was rejected by the sign, Wilcoxon
and t-tests respectively. The estimates of power (i.e. relative frequency of successful
rejection) are given in the R output. Note that here, the t-test has the highest estimated
power due to the normality of the underlying distributions.
count_binom = 0
count_wilcox = 0
count_t = 0
for(i in 1:1000){
x = rnorm(17,30,3)
y = x+rnorm(17,3,5)
len = length(sign(x-y)[sign(x-y)>0])
count_binom = count_binom + (binom.test(len,17)$p.value <= 0.05)
count_wilcox = count_wilcox + (wilcox.test(x,y,paired=TRUE)$p.value <=
0.05)
count_t = count_t + (t.test(x,y,paired=TRUE)$p.value <= 0.05)
}
count_binom/1000 #sign test power estimate

## [1] 0.482

count_wilcox/1000 #Wilcoxon test power estimate

## [1] 0.611

count_t/1000 #t-test power estimate

## [1] 0.645

Note: while we do not have sufficient information in the question statement to determine
the joint distribution of X and Y, we can generate realisations of Y as X + (Y – X), whereby
the difference between the two random variables will be distributed as desired. Given that
all of the tests are based on this difference, there is no issue with generating values of Y in
this fashion. The primary reason for which this has been done is to use the inbuilt test
functions in R as opposed to undertaking a more tedious series of non-automated
computations based on the sampled values of Y – X.

You might also like