You are on page 1of 4

MATH 1281

STATISTICAL INFERENCE
UNIT 4
WRITTEN ASSIGNMENT:
In this assignment we consider data that examines the effect of two soporific drugs, drugs that induce sleep. These
two drugs were tested on a group of 10 patients. For each of the patients the increase in hours of sleep was measured
both for drug 1 and for drug 2. The source of the data is the R dataset "sleep", which quotes the paper by Cushny,
A.R. and Peebles, A.R. (1905). (The action of optical isomers. II. Hyoscines. J. Physiol. 32, 501–510.) The files
"sleep_paired.csv" and "sleep_unpaired.csv" contain the same data in two different formats. The first format will be
used in the first part of the assignment and the second format will be used in th second part. You may download the
files sleep_paired.csv and sleep_unpaired.csv using the links.
A Paired Design
A paired design corresponds to the situation where two different treatments are given to the same subject and the
goal is to assess the difference between the response to the two treatments. This design was used in the original
experiment. The file "sleep_paired.csv" reflects this design. It contains two variables and 10 observations. The
variables are:
• drug1 = The increase in hours of sleep as a result of using the first drug. (numeric)
• drug2 = The increase in hours of sleep as a result of using the first drug. (numeric)
Save this data set on your computer and read it into R. Compute the difference between the increase in the first drug
and the increase in the second drug (Hint: If you saved the data in an object by the name "paired" then you may use
the code "d <- paired$drug1 - paired$drug2" in order to produce the difference.) In Tasks 1-5 you are asked to
examine the distribution of this difference, apply the t-test and compute a confidence interval for the mean. For that
you may produce a box plot of the difference, compute the mean and standard deviation, obtain the percentiles of the
t-distribution, and apply the function "t.test".
An Unpaired Design
An unpaired design corresponds to the situation where each treatments is given to a different subject. The goal is to
assess the difference between the response of the group that obtained the first treatment and the response of the
group that obtained the second treatment. This design is discussed in the next unit. Part of that discussion involves
the construction of a statistical test that is based on the t-distribution to examine expected difference of response
between the two treatments. The goal in this part of the assignment is to develop a different type of test, called The
Permutation Test, to carry out the same task. As an exercise we will use the same data, but this time treat it as if
each of the drugs were given to a different group of 10 patients. The file "sleep_unpaired.csv" reflects this
(inaccurate) assumption. It contains two variables and 20 observations. The variables are:
• group = The type of drug that was used. (factor)
• extra = The increase in hours of sleep as a result of using the drug. (numeric)
Save this data set on your computer, read it into R and apply the permutation test in order to test the null hypothesis
that the expected increase in hours of sleep in the first group is equal to the expected increase in the second group.
The test is carried out by the computation of a test statistic and the computation of a p-value, which corresponds to
the probability of obtaining by random chance outcomes that are more extreme than the computed statistic.
The test statistic is the absolute value of the difference between the average of the variable extra for the first 10
observation and the average for the last 10 observations. The sampling distribution in the permutation test
corresponds to a random assignment of responses to treatment. Therefore, in order to simulate the sampling
distribution of the statistic the responses are randomly reordered and the same statistic is computed to the reordered
data.
Specifically, say the data is saved in an object by the name "unpaired" and the observed response is given in an
object by the name x (using the code "x <- unpaired$extra"). Then the statistic is computed with th expression
"abs(mean(x[1:10])-mean(x[11:20]))". On the other hand, the sampling distribution of the statistic is obtained by a
random permutation of the values of x and the application of the same formula to the permuted values. Repeating
this procedure for a large number of random permutations produces an approximation of the sampling distribution of
the test statistic under the null hypothesis. (Hint:: An object "X" that contains a random permutation of the values of
x may be obtained using the expression "X <- sample(x)".)
A Paired Design:
1. The number of outlier observations in the difference between the response to drug 1 and
the response to drug 2 is: _____.
Explain each step in the computation of the number of outlier observations:
The answer is 2.

<- read.csv(“sleep_paired.csv”)
<- paired$drug1 - paired$drug2
<- summary(d)
<- sum (d < (-1.70 - 1.5 *.65)) + sum ( d> (-1.05 + 1.5*.65))

This means that the value ‘d’ is the representation of the differences between the 1 st and 2nd drug.

2. The percentile of the t-distribution that should be used in order to compute an 80%-
confidence interval for the expectation of the difference between the responses to the two
drugs is (write the numerical value): _____.
Attach the R code for conducting the computation:
The answer is: 1.383

> qt(0.10,(10-1))

3. The 80%-confidence interval for the expectation of the difference between the responses
to the two drugs is:
Lower end = _____, Upper end = _____.
Explain each step in the computation of the confidence interval:
The answer is: Lower end = -2.117941, and the Upper end = -1.042059

> mean(d) + qt(c(0.10,0.9),9)*sd(d)/sqrt(10)

4. The p-value for testing the null hypothesis that the expected difference is equal to 0
versus the two-sided alternative is equal to: _____.
Attach the R code for computing the p-value:
The answer is: 0.002833

> t.test (d)


> t = -4.0621, df = 9, p – value = 0.002833
95% confidence interval:
> [-2.4598858; 0.7001142]
> mean(x)
> -1.58
5. Do you reject the null hypothesis with a significance level of 5%? __Yes __No.
Explain your choice:
The answer is: YES

The p-value is equal to 0.002833, as calculated in the previous question, and this value us less
than 0.05, therefore this null hypothesis should be rejected.

An Unpaired Design:
6. The test statistic for the permutation test is the absolute value of the difference between
the average of the first 10 observations and the average of the last 10 observations. The
value of the test statistic is:_____.
Attach the R code for computing the test statistic:
The answer is: 1.58

> st <-abs (mean (unpaired$extra [1 : 10]) – mean (unpaired$extra [11 : 20]))


> st
[1] 1.58

7. Run a simulation to compute the p-value of the permutation test. The p-value is:_____.
Attach the R code for computing the p-value:
The answer is: 0.0772

> X.bar <- rep (0,10^5)


> I <- [1:10^5]
> x <- sample (st)
> x.bar [i] <- mean (x)
> t.test (x.bar)
> t <- [897,2], d(f) <- [99999], p-value <- [2.2e – 16]

8. Do you reject the null hypothesis with a significance level of 5%? __Yes __No.
Explain your choice:
The answer is: NO

The reason for this is that the p-value is calculated to 0.0772, as shown in the previous answer,
and this is more than 0.05. Since it is more than 0.05, the null hypothesis cannot be rejected.
Reference:

• Yakir, B. (2011). Introduction to Statistical Thinking (With R, Without Calculus).

Jerusalem, IL: The Hebrew University of Jerusalem, Department of Statistics.

You might also like