sample size estimation

© All Rights Reserved

5 views

sample size estimation

© All Rights Reserved

- Tn 1117 Poll
- FY 2014 Fair Market Rent Documentation System — Calculation for Santa Clara County, California
- Acceptance Sampling
- Sampling Methods
- All Methods of Data Collection Can Supply Quantitative Data
- SMU MBA Semester 3 Summer 2012 drive Solved Assignments
- Sampling.pptx
- EarthFix Survey Web Version
- 20405095-Marketing-Research-Proposal-nasha-arabian-styled-cafe.doc
- COMM 215 Tutorial Problems for Final Exam
- ri-1
- (1970) krejcie
- Safari
- Sampling Frame
- Research Methodology
- Research Methods 101
- East Africa Resilience
- wofkt3researchexperimentpaper
- 5842572 (1)
- description: tags: NTR11031

You are on page 1of 6

Center for Infectious Disease Preparedness

University of California at Berkeley

15 August 2007

1 Introduction

It is important to understand how to analyze complex survey samples using

statistical packages like R. Due to limited time and resources, we are often not

able to take measurements on everyone in a population under study. As a result,

we may take a sample of the population using the various techniques we have

for sampling. After we collect the data, the fundamental questions to ask are

What results would we obtain if we had data on the entire population?. Since

we did not take measurements on everyone, we will need to use our study sample

as a representation of our total population.

The purpose of this article is to illustrate how to calculate a desired sample

size estimate for a simple proportion. The statistical program R will be used to

illustrate the calculations.

2 The Estimation of Sample Size Assuming Sim-

ple Random Sampling

In setting up a survey, one needs to think about taking a random sample of some

nite size and collecting data on each sampling unit. A fundamental question

in thinking about a sample is how many sampling units do I include in my

sample?. In order to perform a sampling size calculation, an investigator needs

to identify two thing:

1. the margin of error we will allow; for example, we would like to estimate

the proportion of people who die after a disaster with a margin of error of

plus or minus 10% (degree of accuracy).

2. the level of statistical certainty or condence interval (alpha = probability

of Type I error);

Thus, we designate that there is some margin of error d in the estimated

proportion p of units in relation to the true proportion P that one could ex-

pect. There is some (hopefully small) risk alpha that we are willing to bet that

the actual error is larger than d. In doing our sample size calculation, we are

assuming:

1

Pr(|p P| d) = (1)

If simple random sampling is assumed, we can estimate p from the data and

calculate the variance of p, V(p), :

V (p) = E(p P)

2

=

S

2

n

(

N n

N

) =

PQ

n

(

N n

N 1

) (2)

and the standard error of p,

p

:

p

=

V (p) =

N n

N 1

PQ

n

(3)

If we assume that the distribution of this proportion is normally distributed,

we can then express the degree of precision, d, with the sample size, n, as follows

(Cochran 1977):

d = Z

N n

N 1

PQ

n

(4)

where Z is the abscissa of the normal curve that cuts an area of at the

two tails (Cochran 1977). By solving for n, we get:

n =

Z

2

PQ

d

2

1 +

1

N

(

Z

2

PQ

d

2

1)

(5)

For practical use, we can substitute an advance estimate of p for P. If N is

very large, the equation reduces to:

n

0

=

Z

2

pq

d

2

. (6)

Thus, we can use this formula in our calculation of a desired sample size

(assuming simple random sampling).

3 Sample Size Calculations using R

In R, we will rst create a function that takes several inputs and gives us the

answer we want: a sample size, n. In order for our function to work, we need

to give it certain values often referred to as parameters. For a sample size

calculation, these parameters are:

abscissa of the normal curve (called Z);

estimated proportion that one is trying to estimate in the population

(called p);

degree of accuracy desired, assumed to be half the condence interval

(called d);

the probability of Type I error (called alpha).

2

> srssamp <- function(alpha, pp, dd) {

+ zz <- qnorm(1 - alpha/2)

+ qq <- 1 - pp

+ nn <- (zz^2 * pp * qq)/dd^2

+ nn

+ }

Now we can run our new function, giving the function the parameters it

needs to calculate sample size.

> srssamp(alpha = 0.05, pp = 0.5, dd = 0.1)

[1] 96.03647

Note that for a proportion equal to 50 percent + or - 10 percent, the es-

timated sample size is the widely quoted n=96 individuals, assuming simple

random sample (Brogan 1994, p. 303). The nice thing about our function is

that we can now vary the input parameters to see how the inputs change our

estimate of the desired sample size. For a margin of error equal to 5 percent or

15 percent (versus 10 percent), we can easily calculate this with our function by

changing our input value for dd.

> srssamp(alpha = 0.05, pp = 0.5, dd = 0.05)

[1] 384.1459

> srssamp(alpha = 0.05, pp = 0.5, dd = 0.15)

[1] 42.68288

We can also look at the estimated sample size and how it varies with varying

proportions that we are trying to estimate in the population.

> estprop <- seq(0.01:1, by = 0.01)

> srssamp(alpha = 0.05, pp = estprop, dd = 0.1)

[1] 3.803044 7.529259 11.178645 14.751202 18.246929 21.665828 25.007897 28.273137

[9] 31.461548 34.573129 37.607882 40.565805 43.446899 46.251164 48.978600 51.629207

[17] 54.202984 56.699932 59.120051 61.463341 63.729802 65.919433 68.032236 70.068209

[25] 72.027353 73.909668 75.715153 77.443810 79.095637 80.670635 82.168804 83.590144

[33] 84.934655 86.202336 87.393188 88.507211 89.544405 90.504770 91.388305 92.195012

[41] 92.924889 93.577937 94.154156 94.653545 95.076106 95.421837 95.690739 95.882812

[49] 95.998056 96.036471 95.998056 95.882812 95.690739 95.421837 95.076106 94.653545

[57] 94.154156 93.577937 92.924889 92.195012 91.388305 90.504770 89.544405 88.507211

[65] 87.393188 86.202336 84.934655 83.590144 82.168804 80.670635 79.095637 77.443810

[73] 75.715153 73.909668 72.027353 70.068209 68.032236 65.919433 63.729802 61.463341

[81] 59.120051 56.699932 54.202984 51.629207 48.978600 46.251164 43.446899 40.565805

[89] 37.607882 34.573129 31.461548 28.273137 25.007897 21.665828 18.246929 14.751202

[97] 11.178645 7.529259 3.803044 0.000000

We can modify our program if we want to include an estimated sample size

for a cluster sample of a given design eect.

3

> de.samp <- function(alpha, pp, dd, deseff) {

+ zz <- qnorm(1 - alpha/2)

+ qq <- 1 - pp

+ nn <- deseff * ((zz^2 * pp * qq)/dd^2)

+ nn

+ }

> de.samp(alpha = 0.05, pp = 0.5, dd = 0.1, deseff = 2)

[1] 192.0729

We could still use this new function if we wanted to calculate a sample size

assuming simple random sampling by assuming that the design eect equals 1.0.

> de.samp(alpha = 0.05, pp = 0.5, dd = 0.1, deseff = 1)

[1] 96.03647

Lets modify our function to include a plot of the estimated sample size by

the estimate proportion we are trying to measure in the population.

> de.samp.plot <- function(alpha, pp, dd, deseff, sampplot = c("TRUE", "FALSE")) {

+ sampplot <- match.arg(sampplot, several.ok = FALSE)

+ zz <- qnorm(1 - alpha/2)

+ qq <- 1 - pp

+ nn <- deseff * ((zz^2 * pp * qq)/dd^2)

+ if (sampplot == "TRUE") {

+ nn

+ ylim.max <- max(nn) + 10

+ plot(pp, nn, type = "b", xlab = "Estimated Proportion", ylab = "Estimated Sample Size",

+ xlim = range(0, 1), ylim = range(0, ylim.max))

+ }

+ else {

+ nn

+ }

+ }

And now, lets run our new function.

4

0.0 0.2 0.4 0.6 0.8 1.0

0

2

0

4

0

6

0

8

0

1

0

0

Estimated Proportion

E

s

t

i

m

a

t

e

d

S

a

m

p

l

e

S

i

z

e

If we wanted to run the function without producing the graph, we could

modify our command.

> de.samp.plot(alpha = 0.05, pp = estprop, dd = 0.1, deseff = 1, sampplot = "FALSE")

[1] 18.24693 34.57313 48.97860 61.46334 72.02735 80.67064 87.39319 92.19501 95.07611 96.03647

[11] 95.07611 92.19501 87.39319 80.67064 72.02735 61.46334 48.97860 34.57313 18.24693

If we want to calculate the desired sample size for a rapid needs assessment

assuming that our design eect is equal to 2.0, we can re-run our function and

produce a graph.

5

0.0 0.2 0.4 0.6 0.8 1.0

0

5

0

1

0

0

1

5

0

2

0

0

Estimated Proportion

E

s

t

i

m

a

t

e

d

S

a

m

p

l

e

S

i

z

e

4 References

1. Brogan D, Flagg EW, Deming M, Waldman R. Increasing the accuracy

of the Expanded Programme on Immunizations Cluster Survey Design.

Ann Epidemiol 1994;4:302-311.

2. Cochran WG. Sampling Techniques. John Wiley & Sons:New York, 1977.

pp.74-76.

6

- Tn 1117 PollUploaded byDan Lehr
- FY 2014 Fair Market Rent Documentation System — Calculation for Santa Clara County, CaliforniaUploaded byhiph0pstah
- Acceptance SamplingUploaded byMg Garcia
- Sampling MethodsUploaded byMadhusudan Partani
- All Methods of Data Collection Can Supply Quantitative DataUploaded bypeters_477
- Sampling.pptxUploaded byAmeer Akram
- EarthFix Survey Web VersionUploaded byearthfixteam
- 20405095-Marketing-Research-Proposal-nasha-arabian-styled-cafe.docUploaded byAminura
- SMU MBA Semester 3 Summer 2012 drive Solved AssignmentsUploaded byRajdeep Kumar
- COMM 215 Tutorial Problems for Final ExamUploaded byRosalie-Ann Massé
- ri-1Uploaded byMadhu Bala
- (1970) krejcieUploaded byLuis Pineda
- SafariUploaded byRavinder Chopra
- Sampling FrameUploaded byHaider Khan
- Research MethodologyUploaded byMurali Thiru
- Research Methods 101Uploaded byedwin
- East Africa ResilienceUploaded byedwin
- wofkt3researchexperimentpaperUploaded byapi-302363741
- 5842572 (1)Uploaded byKenesa
- description: tags: NTR11031Uploaded byanon-707380
- 08d - Sep 6 - Test 1 Review - Solutions (1)Uploaded byJiadong Yang
- br-140221062942-phpapp02Uploaded bythensuresh
- Town Planning ReportUploaded byArshdeep Kaur
- Kshitij RoughUploaded byNeeraj solanki
- presentation on synopsisUploaded byatulunik
- 2020 elections surveymonkey report 1Uploaded byapi-464875919
- Pistachio Ppt Research ProposalUploaded byRyan Bermundo
- 5Uploaded byuser 02
- WI 4134 Inspection CertificateUploaded byselcuk
- BaoCaoNganhThucPhamNamPhi2007Uploaded byTiếp Híp

- Compound InterestUploaded bysakthiprime
- e Shopping and consumer behaviourUploaded bysakthiprime
- Sq RetbankUploaded bysakthiprime
- Customer LoyaltyUploaded bysakthiprime
- Aattukaal PayaUploaded bysakthiprime
- Micro Finance^^Mridul AggarwalUploaded bysakthiprime
- e 0114049Uploaded bysakthiprime
- E0114049.pdfUploaded bysakthiprime
- PURCHASE BEHAVIOUR QUESTIONNAIREUploaded bysakthiprime
- Discriminant-analysis_explained.epubUploaded bysakthiprime
- Structure in CUploaded bysakthiprime
- Overview Internet ShoppingUploaded bysakthiprime
- AFL_CODE.docUploaded bysakthiprime
- AppendicesUploaded bysakthiprime
- 189.docUploaded bysakthiprime
- Cystic FibrosisUploaded bysakthiprime
- Estimating FirmvalueUploaded bysakthiprime
- Debt and Capital StructureUploaded bysakthiprime
- Compound Interest HandoutUploaded bysakthiprime
- Bank Service ChargeUploaded bySumat Jain
- di-gameUploaded byarifakazi
- Create a New Browser User ProfileUploaded bysakthiprime

- X-Plane Helicopter ManualUploaded byErnani Kern
- Airbus FAST Special Edition Oct2015Uploaded byrenjithaero
- Wind Turbine Blade DesignUploaded byluffydmon
- Emailing Motorola GM339 User GuideUploaded byBulls Tadili
- intervention program peer review rubricUploaded byapi-278394251
- Philips - Flood LightingUploaded byrumahsketch
- 1 - Chiller Running Condition.pdfUploaded bymahmoud4871
- Symphony Ltd launches mobile commercial air cooler range - 'MobiCool' [Company Update]Uploaded byShyam Sunder
- ms275xUploaded byalbin_grau
- Valve Lash Clearances CorrectUploaded byMohd Ridzu Hus
- Density&Viscosity CalculatorUploaded byRanjit Rjt
- 8902-11174-1-PB.pdfUploaded byAndika Fahrurozi
- Label free biosensorUploaded byMamidala Jagadesh Kumar
- Blanchard e Quah 1989 the Dynamic Effects of Aggregate Demand and Supply DisturbancesUploaded byArley Rodrigues Bezerra
- The Theory-Based View: Economic Actors as TheoristsUploaded byTeppo Felin
- About Surrogate KeysUploaded bymichaela1012
- induction.pdfUploaded byJobian Gutierrez
- BS 7910Uploaded byrastogi_rohit
- Tim Rifat - Defeating Radio Based Mind Control SystemsUploaded byGary Beaver
- ARMY TM 9-2320-266-24P Mantainance Manual Dodge M880 1-¼ TON 4X4 JUL91.pdfUploaded byRockWagon
- PARALLEL DRIVING OF SYNCHRONOUS GENERATORUploaded byMutevu Steve
- 22748Uploaded bysuranga
- M02_EXP_SRB_PROGLB_9005_U02.pdfUploaded byRubens Viegas
- G Drive Engines Weichai Oman MuscatUploaded bymustafa
- 2012-01-07-SEARI-WeldsUploaded byDonna Boncan
- RESol AL Installation ManualUploaded byDan Jung
- Week 3 Relations & FunctionsUploaded bydip8200
- Sap Apo Dp NutsellUploaded bySundaran Nair
- Clarity in Technical ReportingUploaded byKim Lingo
- JNTUUploaded bybrijkishor2017