You are on page 1of 4

R Programming Seminar for Statistics

Exercise Sheet - Control Structures and Functions

Control Structures

1. For-loop.

a) Generate the following matrix:


## [,1] [,2] [,3] [,4] [,5]
## [1,] 3 1 1 1 1
## [2,] 2 3 0 0 0
## [3,] 2 0 3 0 0
## [4,] 2 0 0 3 0
## [5,] 2 0 0 0 3

Hint: You can use the command diag().


b) Construct the previous matrix from a zero matrix by using a for-loop over the entries.

2. Ifelse.

The following data was collected in an experiment with seeds of oats. The seeds were split
into three groups. Group 1 and 2 were separately treated by mordanting with the same
agent. Group 3 remained untreated (control). Seven seeds were taken from each group and
planted in 21 pots. At the end of the experiment, the yield per pot was measured (in grams).
Load the data from
url <- "http://stat.ethz.ch/Teaching/Datasets/WBL/hafer.dat"
d.hafer <- read.table(url, header = TRUE)

a) Using the variable YIELD, create a new categorial variable class consisting of 4 groups.
Hint: Take quartile breaks as cutting points.
b) Construct a new categorical variable which levels have the following properties:
1: YIELD < 350 and GROUP = 1, 3
2: YIELD > 350 and GROUP = 1, 3
3: YIELD < 350 and GROUP = 2
4: YIELD > 350 and GROUP = 2
c) Make contingency tables of your new variables and check the number of observations
for the different levels.
Hint: Use the command table().

3. Loops.

A scientist analyzed in an experiment the influence of curing temperature (variable temp)


on the breaking strength (variable breaking) of three alloys (variable alloy). He had four
ovens in which he simultaneously tested three alloys. He repeated his experiment on three
different days (variable day). Get the data by using following commands in R:
2

url <- "http://stat.ethz.ch/Teaching/Datasets/WBL/legierung.dat"


d.alloy <- read.table(url, header = TRUE)

a) Which curing temperatures did the scientist use?


b) Construct a new factor variable breaking.class which indicates by the levels "low"
and "high" if the breaking strength is greater than the mean breaking strength. What
is the number of breaks which occur below the mean breaking strength?
Hint: Use the function ifelse().
c) If the cumulative breaking strength exceeds 250, the scientist needs to calibrate the
measurement system newly. How many measurements of breaking strength were done
before the first calibration? You can assume that the order in the data represents the
order in which the scientist measured the breaking strength.
Hint: Count once by using a while-loop and once using a for-loop.

Functions

4. Writing a Function.

The goal is to write a function my.confint() that calculates the mean and approximate
confidence intervals for a vector of data. The lower and upper limits of the 95% confidence
interval of the estimated mean X̄ can be approximated as:
√  √ √ 
X̄ ± 2 · S/ n = X̄ − 2 · S/ n , X̄ + 2 · S/ n ,

where S is the standard deviation and n is the number of observations.

Note: This approximation only applies if the distribution of X is not too skewed and n is not much
smaller than 10.

The following points shall support you in this exercise:

a) Think about the number and names of all necessary arguments for your function and
define your function name.
b) Make sure your function also works when it gets a vector containing missing values.
c) Write the body of your function which calculates the mean and the standard deviation.
d) Collect the result in a named output vector. The vector should return 3 values: the
mean, the lower end of the confidence interval, and the upper end of the confidence
interval.
e) Now try your function with a test vector, e.g.
c(2, 5, 2, 7, 3, 4, 7, 2, 9, 10, 1),
which yields mean 4.73, lower limit 2.86 and upper limit 6.60. Is this estimated mean
significantly different from zero?

5. if - else

Look at the following small function tst() which computes a qualifying string for a single
number x.
3

tst4 <- function(x) {


if (x < -2) "pretty negative"
else if (x < 1) "close to zero"
else if (x < 3) "in [1, 3)"
else "large"
}

a) Run the function for x = 0, x = Inf and x = 2.


b) Write a new function Test4() giving the same output as tst4(), but define the function
using the command ifelse() instead of if ... else. In this way, the function will
not only work for single numbers but also for whole vectors x. Test your function
Test4() on the input vector c(-Inf, -3:5).
c) Write a new function Test5() which extends the function tst4() in a such a way that
the input argument can be a vector and not only a single number. Use the original
function tst4() and the command sapply() to write the new function. Test your
function Test5() on the input vector c(-Inf, -3:5).
Note: The functions Test4() and Test5() are doing the same thing basically.
d) We want to know which one of the functions Test4() and Test5() is performing the
calculations faster when the input vector is defined as
x <- rnorm(10000)
Use the function system.time() to measure the time of calculation. Take a look at the
help file ?system.time and at the output of the command example(system.time) if
you are not familiar with the function system.time().

6. Simulation of Confidence intervals - apply() ...

a) Simulate 1000 samples each of 20 observations, normally distributed with expectation


µ = 5 and standard deviation σ = 2. Save the result in a matrix mN. Each column of
this matrix should correspond to a different sample.
b) We want to compute the 95%–confidence interval for µ, based on the t-test, for every of
our samples. To achieve this, we proceed step-by-step:
1. The function t.test() returns a list-like result. Use str() to take a look at the
different components. Extract the list component conf.int from the list.
Hint: Look at the output of t.test(X) (and the output of str(t.test(X)) for a simple
sample such as X <- 1:20. For the interval of this one sample, you should get a numeric
vector of length 2 (the two interval end points).
2. Define a function f.conf1() which computes the confidence interval for one sample,
i.e. one input argument of the function will be a numeric vector x of length 20.
3. Now apply this function to each column of the matrix mN, using apply(). Save the
result in t.conf. Of what type / class is this object?
c) Find out how often the true parameter µ (see part “a)”) is contained in the confidence
interval. Which result would you expect from theory?
Hint: Use the comparison operators ==, <=, >=, <, > and != and the logical operators &, |
and !. Make use of the fact that the logical values TRUE and FALSE can be used in arithmetic
functions in R equivalently to the numbers 1 and 01 .
d) Write a function conf.i() (using the previous function from b)), with following addi-
tional arguments:
• n.samp, the number of samples
1
“logical” objects are coerced to “numeric” whenever appropriate, notably in arithmetic and mathematical
functions
4

• alpha, the confidence level


• it may contain additional arguments such as the size of one sample, n.
It should return the percentage (in [0, 1]) of cases in which the true µ is contained in
the confidence interval. E.g.,
conf.i(500, 0.95)
## [1] 471
e) Represent the first 50 confidence intervals graphically. Proceed step-by-step:
• First prepare the plot range and label the axes using the function plot().
• Add the lines parallel to the x-axis corresponding to the confidence intervals using
the function segments().
• Add a vertical line at the true mu = 5 using the function abline().

You might also like