© All Rights Reserved

6 views

© All Rights Reserved

- 2.1 Data Analysis
- Determining the Sample Size (Continuous Data)
- Statistics Exam Question 1
- ITK 226 2 Statistics
- tmpA3B3.tmp
- Antimicrob. Agents Chemother. 2009 Chinh 828 31
- Chapter 12 Work Sampling1
- Chap4-1 Poin Est and Hypothesis Testing
- Project report on House Hold Items
- Carrying Passengers as a Risk Factor for Crashes Fatal to 16- And 17-Year-Old Drivers
- Chap10Slides.pdf
- Chapter 10 - Estimating With Confidence
- Sampling and Data Collection
- asfasf
- Stratified.pdf
- Anova
- Normal Distribution Tolerance Sample Size Calculator [EDocFind[2][1].Com]
- Chapter08 New
- Guide_To_Great_Survey_Design.pdf
- Confidence Intervals for Cpk

You are on page 1of 3

How big should my sample size be? Statisticians are often asked this

question. When studies involve data in the form of counts or proportions, the

best answer is probably, As big as you can afford. The reason for this is

that there is surprisingly little information in such data, even from quite big

studies.

For example, in a 1991 poll of 1, 000 adult Americans, 45% said that they

would be prepared to ask a child under 18 to give up a kidney for a transplant into a relative. The corresponding 95% confidence interval for the true

percentage in the population who would do this is given by [42%, 48%], an

interval of width 6%. This is fairly imprecise, and yet it is based upon a

sample of 1, 000 which is an unbelievably large sample size for researchers in

most areas outside of survey sampling. And even with this study of 1, 000

people, if you wanted to look at subgroups, e.g. by age and sex, the sample

sizes of the individual groups of interest soon become small, and the resulting

confidence intervals are much wider than the 6% above.

1994 World Heath Organization figures rated Hepatitis B as the fourth

biggest killer among the worlds infectious diseases.1 Suppose that we wanted

to take a sample of the citizens of a particular city to determine the percentage

of people who have, or are carrying, Hepatitis B. Suppose that the level of

precision we require is such that a 95% confidence interval is no wider than

5% (i.e. 0.05).

Assuming n is large enough, a confidence

p interval for the true proportion p

is given by pb z se(b

p ), where se(b

p ) = pb (1 pb )/n. The interval has

Width = 2 z

p

pb (1 pb )/n.

which is twice the margin of error . The above expression involves pb which is

unknown until the study is finished. We will postpone considering what to do

about pb for the time being. Now, suppose that we want the margin of error

to be no more than m. We therefore want

z

p

pb (1 pb )/n m.

z2

1 1-2

pb (1 pb )

m2 .

n

million deaths per year. Follows Acute Respiratory infections (4.3 million), Diarrheal

diseases (3.2 million), and Tuberculosis (3 million). AIDS (550,000) was 10th.

2

Solving this equation for n gives

z 2

pb (1 pb )

m

The boxed equation still depends upon the eventual pb, which is unknown when

one is planning the survey. However, we see that the bigger pb (1 pb ) is, the

larger n has to be. Now, Fig. 1 graphs pb (1 pb ) versus pb, and we see that

pb (1 pb ) takes its biggest value when pb = 12 . Thus, the worst case occurs

when we use pb = 12 in the boxed equation as it leads to the biggest value of

n. Use of this value of n guarantees that the interval will be no wider than w

no matter what pb turns out to be.

^ (1 ^

p

p)

0.25

pb

pb

0

1

.1

.9

.2

.8

.3

.7

.4

.6

.5

.5

pb (1 pb )

.09

.16

.21

.24

.25

Figure 1 :

0.5

^

p

pb (1 pb ) versus pb.

so use z = 1.96. We require an interval no wider than w = 0.05 (since

5% = 5/100 = 0.05), or m = 0.025. Thus, we will have sufficient precision if

we take

2

1.96

1 1

n

= 1536.64.

0.025

2 2

In the interests of using nice round numbers, we would probably end up taking

a sample of around 1500, or 2000 depending upon the budget.

From Fig. 1, we see that pb (1 pb ) can be quite a bit smaller than 12 12 if

pb is close to zero or one. Suppose that we knew that the prevalence of

Hepatitis B in our city was no more than 10%. The sample proportion should

end up roughly in the same sort of range. If we substitute pb = 0.1 into the

equation instead of pb = 12 we get n 553.19. Thus, if we knew p 0.1, we

would know we could be confident of getting the required level of precision

from a much smaller (and therefore cheaper) sample.

Rule for dealing with p

b in the sample size formula: Use p

b = 12 ,

unless it is known that p belongs to an interval a p b that does not

1

2

for pb.

3

To illustrate, if nothing is known about the true value of p, or it is known

that 0.2 p 0.6, or 0.4 p 0.9 substitute pb = 12 . If it is known that

0.02 p 0.10 substitute pb = 0.1, and if 0.7 p 0.9, substitute pb = 0.7.

Example Suppose we require a 95% confidence interval for p to be no wider

than 0.02 (i.e. 2%). Using the formula above with z = 1.96, if nothing is known

about the size of p, we should take

1.96

0.01

1 1

= 9604.

2 2

1.96

0.01

2

0.05 0.95 = 1824.76.

very precisely nearly 10, 000 observations in the first calculation. These

calculations apply only to getting the required precision for estimates of single

proportions. Even larger numbers are required to get this sort of precision

for differences. However, survey organizations seldom sample more than one

or two thousand people. Part of the reason for this is cost. Another reason

is that, in practice, it is very hard to control the level of non-sampling errors

in very large surveys (see Section 1.1.2). There is little point in reducing the

sampling errors below the level of the non-sampling errors (or biases).

Exercises

Estimate the size of sample required to ensure that:

1. A 90% confidence interval for an unknown p is no wider than 0.04.

2. A 99% confidence interval for p expressed as a percentage is no wider than

4% if it is believed that p < 0.2.

3. A 95% confidence interval for an unknown p is no wider than 0.01.

4. A 90% confidence interval for p expressed a a percentage is no wider than

5% if it is believed that p > 0.9.

- 2.1 Data AnalysisUploaded byLei Yin
- Determining the Sample Size (Continuous Data)Uploaded bySurbhi Jain
- Statistics Exam Question 1Uploaded byCOCO
- ITK 226 2 StatisticsUploaded bydgooo
- tmpA3B3.tmpUploaded byFrontiers
- Antimicrob. Agents Chemother. 2009 Chinh 828 31Uploaded byRendra Dananjaya
- Chapter 12 Work Sampling1Uploaded byAnand Kumar Govindraghavan
- Chap4-1 Poin Est and Hypothesis TestingUploaded byCesars Bibiano
- Project report on House Hold ItemsUploaded bySumit Kumar
- Carrying Passengers as a Risk Factor for Crashes Fatal to 16- And 17-Year-Old DriversUploaded byLyn Jones
- Chap10Slides.pdfUploaded bygirithik14
- Chapter 10 - Estimating With ConfidenceUploaded bycamillesyp
- Sampling and Data CollectionUploaded byAbhishek Purohit
- asfasfUploaded byBagus Firman Diatmika
- Stratified.pdfUploaded byNagara Akuma
- AnovaUploaded byanon_789315340
- Normal Distribution Tolerance Sample Size Calculator [EDocFind[2][1].Com]Uploaded byPriyaprasad Panda
- Chapter08 NewUploaded byTrí
- Guide_To_Great_Survey_Design.pdfUploaded byMELHEM_J8008
- Confidence Intervals for CpkUploaded byscjofyWFawlroa2r06YFVabfbaj
- confidence interval 1Uploaded byapi-356215201
- Parental Visiting ADCUploaded byumberto de vonderweid
- CHAPTER-7-PRODMAN.pptxUploaded byangelica labii
- Solution Ch.13Uploaded byBrooke Leverton
- Eka_Aryani_22010112110093_Lap_KTI_BAB_7Uploaded byAldis Al Fadjrin
- presentasi onielUploaded byoniel
- Anis (2008).pdfUploaded byelyesyoussef
- CSR.docxUploaded byyatin patil
- samplemid2-sol.docUploaded byTomas P.
- MINITAB 17 BASICS REFERENCE GUIDE.pdfUploaded bybarkah

- Fernandes Et Al. 2011 GRLUploaded bytaucci123
- NNAS-Juan.docxUploaded byMevelle Laranjo Asuncion
- With Solutions Positive curvilinear assocationUploaded byhelper 2014
- Statistics for Dummies - Cheat SheetUploaded byalastairwong90
- Lecture 1 FIN 100 PPT revised lecture (1).pptxUploaded byPhillip Gordon Mules
- Relationship Between Frequency Speed of Kick Test Performance Optimal Load and AnthropometricUploaded byLucas Mendes
- Quants NotesUploaded byTanvir Ahmed Syed
- Sampling in ResearchUploaded byNishant Jilova
- Hypothesis TestingUploaded byRay Sang
- JMP-Incomplete Block DesignUploaded bymrt5000
- EAL-G23Uploaded byOmar Castellón
- STUDY OF ALIENATION AND VALUES OF STUDENT TEACHERSUploaded byAnonymous CwJeBCAXp
- Andrade Et Al 2016 AJOUploaded byGarfel
- Burnside 2011Uploaded byFernandaMundim
- Stat_testUploaded byAnilkumarsanctum
- ap stat 1-7 notesUploaded byCooper Barth
- Dynamic Response of Pedestrian Bridges for Random Crowd-Loading Brand_Sanjayan_SudburyUploaded byYoshua Yang
- 16MM382Uploaded byAnonymous hSVpr2AraH
- jmvUploaded byArnold Tafur Mendoza
- (Ethics & Research )Questions_ RasmahUploaded byhashem
- measurement & instrumentation NOV 10Uploaded byNurul Wahieda Muhamad Bustaman
- 31 McMahon - A New Method keratoconus.pdfUploaded bySetiawan Arif Wibowo
- The Persistence of Bureaucracy: A Meta-Analysis of Weber’s Model of Bureaucratic ControlUploaded byozpqu
- Titelbaum, Draft of Fundamental of Bayesian Epistemology (Aug 5 16)Uploaded byMichelle Alanis
- Rizzi-Ortuzar-JTEP-06Uploaded byphalanx
- Statistics Made Easy PresentationUploaded byHariomMaheshwari
- Fitting DistributionsUploaded byParikshit Yadav
- Neural Signal Processign Tutorial OneUploaded byAnkan Biswas
- SI Language Eligibility Guidelines ManualUploaded byampicca
- AgainstAllOdds_StudentGuide_Unit30Uploaded byHacı Osman