Professional Documents
Culture Documents
SGTA3 Sol
SGTA3 Sol
8. Discuss whether an SRS would be appropriate for the following situations. What
other design might be used (if you know)?
a. For an email survey of students, you have a sampling frame that contains a list of
email addresses for all students.
Maybe.
It depends on which variable is of particular interest in the survey, and need at least
consider the following question: Are we dealing with a rather homogenous population in
terms of the variable of interest?
c. You want to estimate the percentage of topics in a medical websites that have
errors.
First of all we may not have a list (or lists) to be used as the frame.
d. A county election official wants to assess the accuracy of the machine that counts
the ballots by taking a random sample of the paper ballots and comparing the
estimated vote tallies for candidates from the sample to the machine counts.
Seems ok here.
1
11. Mayr et al. (1994) took an SRS of 240 children aged 2 to 6 years who visited their
pediatric outpatient clinic. They found the following frequency distribution for free
(unassisted) walking among the children:
Age (months) 9 10 11 12 13 14 15 16 17 18 19 20
Number of children 13 35 44 69 36 24 7 3 2 5 1 1
x<- c(9, 10, 11, 12, 13, 14, 15, 16, 17, 18 ,19, 20) # Age
freq<- c(13,35 ,44, 69, 36, 24 , 7, 3 , 2 , 5,1,1) # Number of Childre
n
data <- rep(x,freq)
hist(data, main = "Distribution of age at walking", xlab = "Age (month)",
ylab = "Number of children")
The histogram appears skewed to the right. With a mildly skewed distribution, a sample
of size 240 should large enough for us to assume that sample means should be
approximately, normally distributed according to the Central Limited Theory (CLT).
2
b. Find the mean, standard error, and a 95% Cl for the average age for onset of free
walking.
## [1] 12.07917
## [1] 3.705003
(Since we don’t know the population size N, we ignore the fpc, ie, assuming N is very
large, at the risk of a slightly overestimating the standard error.)
A 95% confidence interval (using z critical value as the sample size is very large here)
is
c. Suppose the researchers want to do another study in a different region and want a
95% Cl for the mean age of onset of walking to have margin of error 0.5. Using the
estimated standard deviation for these data, what sample size would they need to
take?
1.96 2 ⋅ 3.705
n= = 57 as an approximation, more accurately,
0.52
−1
0.52
n = N 1 + N 2
= ? .
1 . 96 ⋅ 3 . 705
Without knowing N, you could only obtain the approximated sample size, which is 57.
15. The data set agsrs.csv on the unit iLearn are collected on a SRS of size 300 (n) from a
population of 3078 (N) farms, including a number of variables. For each of the
following variables, plot the data and estimate the population mean for that
variable, along with its standard error. Give a 95% CI for your estimate.
3
# Reading data set
LGA <- read.csv("agsrs.csv", fileEncoding = "UTF-8-BOM")
head(LGA) # first 6 observations
mean(LGA$acres87)
## [1] 301953.7
var(LGA$acres87)
## [1] 118907450529
mean(LGA$farms92)
## [1] 599.06
var(LGA$farms92)
## [1] 161795.4
mean(LGA$largef92)
4
## [1] 56.59333
var(LGA$largef92)
## [1] 5292.73
mean(LGA$smallf92)
## [1] 46.82333
var(LGA$smallf92)
## [1] 4398.199
All have quite skewed distribution as shown above. It needs large sample size for normal
approximation. Given n=300 here, it should be big enough.
y = 301,953.7, s 2 = 118,907,450,529;
s2 300
95% CI : 301953.7 ± 1.96 (1 − )
300 3078
= (264883, 339025).
Check answer:
Check answer:
5
y = 56.593, s 2 = 5292.73; 95% CI : 48.8 - 64.4.
Check answer:
a) Calculate the population mean and the population standard deviation, using
the formulas on Slides 14 & 15 in Week 2 lecture.
σ2 =
1
5 −1
{[ ] }
2 2 + 5 2 + 4 2 + 6 2 + 8 2 − 5 × (5) = 5
2
Thus, σ = 5 = 2.236
b) List all possible simple random samples of size 4 could be selected from the
population above without replacement, and for each possible sample provide
its probability (chance) to be selected and calculate its average/mean.
There are five unique random samples if not respecting to order of values.
Each is equally likely with a probability of 1 in 5, ie, 0.2.
6
3 {2, 4, 6, 8} 0.2 5
5
E( Y ) = ∑ y i ⋅ Pr( Y = y i )
i =1
5
E( Y 2 ) = ∑ y 2i ⋅ Pr( Y = y i )
i =1
= 25.25
Var ( Y ) = E( Y 2 ) − [E( Y )]
2
= 25.25 − 5 2 = 0.25
d) Calculate the variance of the sample mean (not using the information in part
b) using the formula derived in Week 2 (see Slide 39), and compare it with the
variance obtained in part c.
σ2
Var ( y) = (1 − f )
n
5 4
= (1 − )
4 5
= 0.25
7
This is identical to variance of sample mean found in c), as expected.
e) Do you think that the variance of the mean of sample of size 4 from the
sampling with replacement will be greater, comparing to the sampling
without replacement? Do not carry out any calculation but give your answer
and reasons for it.
Certainly, as you could get more extreme samples, giving very small or large
mean and thus a greater variability and larger variance, eg, sample (2, 2, 2,
2) with a mean of 2, or sample (8, 8, 8, 8) with a mean of 8, when sampling
with replacement.