You are on page 1of 3

STK4610 - Statistical Methods for Social Sciences: Survey Sampling Spring 2020, UiO

Exercise Sheet for Session 1


1. Amazon books (www.amazon.com) summarizes reader reviews of the books it sells. Persons who want
to review a book can submit a review online; Amazon then reports the average rating from all reader
reviews on its website. (Lohr, 2019, p.19)

2. Potential jurors in some jurisdictions are chosen from a list of county residents who are registered voters
or licensed drivers over age 18. In the fourth quarter of 1994, 100 300 jury summons were mailed to
Maricopa County, Arizona, residents. Approximately 23 000 of those were returned from the post
office as undeliverable. Approximately 7 000 persons were unqualified for service because they were
not citizens, were under 18, were convicted felons, or other reason that disqualified them from serving
on a jury. An additional 22 000 were excused from jury service because of illness, financial hardship,
military service, or other acceptable reason. The final sample consists of persons who appear for jury
duty; some unexcused jurors fail to appear. (Lohr, 2019, p.20)

3. A survey is conducted to find the average weight of cows in a region. A list of all farms is available for
the region, and 50 farms are selected at random. Then the weight of each cow at the 50 selected farms
is recorded. (Lohr, 2019, p.20)

4. The American Statistical Association sent the following e-mail with subject line “Joint Statistical
Meetings 2005 Participants Survey” to a sample of persons who attended the 2005 Joint Statistical
Meetings: “Thank you for attending the 2005 Joint Statistical Meetings (JSM) in Minneapolis, Min-
nesota. We need your help to complete an online survey about the JSM. Because the quality of the
JSM is very important, a survey is being conducted to find out how we might improve future meetings.
We would like to get your opinion about various aspects of the 2005 meeting your preferences for 2006
and beyond.
You are part of a small sample of conference registrants who have been selected randomly to partic-
ipate in the survey. We hope you will take the time to complete this short questionnaire online at
www.amstat.org/meetings/jsm/2005/survey. In order to tabulate and analyze the data, please submit
your response by mid-September 2005.” (Lohr, 2019, p.21)

5. Fark and Johnson (1997) report on a survey of professors of education taken in summer of 1997 and
conclude that there is a large disparity between the views of education professors and those of the
general public. A sample of 5 324 education professors was drawn from a population of about 34
000 education professors in colleges and universities across the country.A letter was mailed to each
professor in the sample in May 1997, inviting him or her to participate and to provide a number where
he or she could be reached during the summer for a telephone interview. During the summer, a total
of 778 interviews were completed by telephone. An additional 122 interviews were obtained by calling
professors in the sample at work in August and September. To attempt to minimize question order
effects, the survey was pretested and some questions were asked in random order.
Respondents were asked which in a series of qualities were “absolutely essential” to be imparted to
prospective teachers: 84% of the respondents selected having teachers who are “life-long learners and
constantly updating their skills”; 41%, having teachers “trained in pragmatic issues of running a
classroom such as managing time and preparing lesson plans”; 19%, for teachers to “stress correct
spelling, grammar, and punctuation”; and 12%, for teachers to “expect students to be neat, on time,
and polite” (p. 30). (Lohr, 2019, p.21)

6. Kripke et al. (2002) claim that persons who sleep 8 or more hours per night have a higher mortality
risk than persons who sleep 6 or 7 hours. They analyzed data from the 1982 Cancer Prevention Study
II of the American Cancer Society, a national survey taken by about 1.1 million people. The survival
or date of death was determined for about 98% of the sample six years later. Most of the respondents

1
were friends and relatives of American Cancer Society volunteers; the purpose of the original survey
was to explore factors associated with the development of cancer, but the survey also contained a few
questions about sleep and insomnia. (Lohr, 2019, p.21)
7. Consider the population in Example 2.2: Let U be a population of size N = 8 with the index set
U = {1, 2, 3, 4, 5, 6, 7, 8}. The values of yi are

i 1 2 3 4 5 6 7 8
yi 1 2 4 4 7 7 7 8

For this population, consider the following sampling scheme:

S P (S)
{1,3,5,6} 1/8
{2,3,7,8} 1/4
{1,4,6,8} 1/8
{2,4,6,8} 3/8
{4,5,7,8} 1/8

a. Find the probability of selection πi for each unit i.


b. What is the sampling distribution of t̂ = 8ȳ? (Lohr, 2019, p.62)

8. (R code available) For the population given in Exercise 7, find the sampling distribution of ȳ for
a. an SRS of size 3 (without replacement)
b. an SRSWR of size 3 (with replacement).
For each, draw the histogram of the sampling distribution of ȳ. Which sampling distribution has the
smaller variance, and why? (Lohr, 2019, p.62)

9. A letter in the December 1995 issue of Dell Champion Variety Puzzles stated: “I’ve noticed over the
last several issues there have been no winners from the South in your contests. You always say that
winners are picked at random, so does this mean you’re getting fewer entries from the South?” In
response, the editors took a random sample of 1 000 entries from the last few contests, and found that
175 of those came from the South.
a. Find a 95% CI for the percentage of entries that come from the South.
b. According to Statistical Abstract of the United States, 30.9% of the U.S. population live in states
that the editors considered to be in the South. Is there evidence from your CI that the percentage
of entries from the South differs from the percentage of persons living in the South? (Lohr,
2019, p.63)
10. Which of the following SRS designs will give the most precision for estimating a population mean?
Assume that each population has the same value of the population variance S 2 .
i. An SRS of size 400 from a population of size 4 000
ii. An SRS of size 30 from a population of size 300
iii. An SRS of size 3 000 from a population of size 300 000 000 (Lohr, 2019, p.63)
11. The percentage of patients overdue for a vaccination is often of interest for a medical clinic Some clinics
examine every record to determine that percentage; in a large practice, though, taking a census of the
records can be time-consuming. Cullen (1994) took a sample of the 580 children served by an Auckland
family practice to estimate the proportion of interest.
a. What sample size in an SRS (without replacement)would be necessary to estimate the proportion
with 95% confidence and margin of error 0.10?

2
b. Cullen actually took an SRS with replacement of size 120, of whom 27 were not overdue for
vaccination. Give a 95% CI for the proportion of children not overdue for vaccination. (Lohr,
2019, p.64)
12. The Special Census of Maricopa County, Arizona, gave 1995 populations for the following cities:

City Population
Buckeye 4 857
Gilbert 59 338
Gila Bend 1 724
Phoenix 1 149 417
Tempe 153 821

Suppose that you are interested in estimating the percentage of persons who have been immunized
against polio in each city and can take an SRS of persons. What should your sample size be in each
of the 5 cities if you want the estimate from each city to have margin of error of 4 percentage points?
For which cities does the finite population correction make a difference? (Lohr, 2019, p.65)
13. (R code available) Suppose we are interested in estimating the proportion p of a population that has
a certain disease. As in Section 2.3 let yi = 1 if person i has the disease, and yi = 0 if person i does
not have the disease. Then p̂ = ȳ.

a. Show, using
p r
V (ȳ) n S
CV(ȳ) = = 1− √ ,
E(ȳ) N nȳU

that
s
N −n1−p
CV(p̂) = ·
N − 1 np

If the population is large and the sampling fraction is small, so that (N − n)/(N − 1) ≈ 1, write
2
zα/2 S2
n= 2
zα/2 S2
, (1)
(rȳU )2 + N

where r is relative precision, in terms of the CV for a sample of size 1.


b. Suppose that the 1 − fpc → 1. Consider populations with p taking the successive values
0.001, 0.005, 0.01, 0.05, 0.10, 0.30, 0.50, 0.70, 0.90, 0.95, 0.99, 0.995, 0.999.
For each value of p, find the sample size needed to estimate the population proportion (a) with
fixed margin of error 0.03, using
2
zα/2 S2
n= 2
zα/2 S2
,
e2 + N

and (b) with relative error 0.03p using (1). What happens to the sample sizes for small values of
p? (Lohr, 2019, pp.66-67)

You might also like