Professional Documents
Culture Documents
R-Bloggers
This post originally appeared on my WordPress blog on May 23, 2010. I present it here in its
original form.
The R Function of the Day series will focus on describing in plain language how
certain R functions work, focusing on simple examples that you can apply to gain insight into your
own data. Today, I will discuss the sample function.
Random Permutations
In its simplest form, the sample function can be used to return a random permutation of a vector. To
illustrate this, let’s create a vector of the integers from 1 to 10 and assign it to a variable x.
x <- 1:10
sample(x)
[1] 3 2 1 10 7 9 4 8 6 5
Note that if you give sample a vector of length 1 (e.g., just the number 10) that it will do the exact
same thing as above, that is, create a random permutation of the integers from 1 to 10.
sample(10)
[1] 10 7 4 8 2 6 1 9 5 3
Warning!
This can be a source of confusion if you're not careful. Consider the following example from
the sample help file.
Notice how the first output is of length 2, since only two numbers are greater than eight in our
vector. But, because of the fact that only one number (that is, 10) is greater than nine in our
vector, sample thinks we want a sample of the numbers from 1 to 10, and therefore returns a vector
of length 10.
Página 1
Taking samples in R
R-Bloggers
Often, it is useful to not simply take a random permutation of a vector, but rather sample
independent draws of the same vector. For instance, we can simulate a Bernoulli trial, the result of
the flip of a fair coin. First, using our previous vector, note that we can tell sample the size of the
sample we want, using the size argument.
sample(x, size = 5)
[1] 2 10 5 1 6
Oops, we can't take a sample of size 100 from a vector of size 2, unless we set the replace argument
to TRUE.
Heads Tails
53 47
The sample function can be used to perform a simple bootstrap. Let's use it to estimate the 95%
confidence interval for the mean of a population. First, generate a random sample from a normal
distribution.
Then, use sample multiple times using the replicate function to get our bootstrap resamples. The
defining feature of this technique is that replace = TRUE. We then take the mean of each new
sample, gather them, and finally compute the relevant quantiles.
Página 2
Taking samples in R
R-Bloggers
2.5% 97.5%
9.936387 10.062525
t.test(rn)$conf.int
Página 3