This action might not be possible to undo. Are you sure you want to continue?

central limit thrm is difficult; becuz dependence relation

1.1

Introduction

Often we are interested in some characteristics of a ﬁnite population, e.g. the average income of last year’s graduates from HKUST; unemployment rate of last quarter in HK. Since the population is usually very large, we would like to say something (i.e. make inference) about the population by collecting and analyzing only a part of that population. The principles and methods of collecting and analyzing data from a ﬁnite population is a branch of statistics known as Sample Survey Method. The theory involved is called Sampling Theory. Sample survey is widely used in many areas such as agriculture, education, industry, social aÆairs, medicine.

1.2

**Some technical terms
**

i.e., UST graduate, a student is an element population

• An element is an object on which a measurement is taken. • A population is a collection of elements about which we require information. • Population characteristic: this is the aspect of the population we wish to measure, e.g. the average income of last year’s graduates from HKUST, or the total wheat yield of all farmers in a certain country.

sometimes units can be household, groups

• Sampling units are nonoverlapping collections of elements from the population.

Sampling units may be the individual members of the population, they may be a coarser subdivision of the population, e.g. a household which may contain more than one individual member.

order them, list of names; 字母顺序/ st ID

• A frame is a list of sampling units, e.g., telephone directory. • A sample is a collection of sampling units drawn from a frame or frames.

100 人 做sample， 统计 2000毕业生薪酬；

1.3

Why sampling?

If a sample is equal to the population, we have a census, which contains all the information one wants. However, census is rarely conducted for several reasons:

1

• cost, (money is limited) • time, (time is limited) • destructive (testing a product can be destructive, e.g. light bulbs), • accessibility (non-response can be a serious issue). In those cases, sampling is the only alternative.

1.4

How to select the sample: the design of the sample survey

The procedure for selecting the sample is called the sample survey design. The general aim of sample survey is to draw samples which are “representative” of the whole population. Broadly speaking, we can classify sampling schemes into two categories: probability sampling and some other sampling schemes.

1. Probability sampling

probability structure, we kw how the random structure is, we can put error bond, sure line within bonds

This is a sampling scheme whereby particular samples are numerated and each has a non-zero probability of being selected. With probability built in the design, we can make statements such as “our estimate is unbiased and we are 95% conﬁdent that it is within 2 percentage point of the true proportion”. In this course, we shall only concentrate on no error bond, not kw structure, not kw perfms of parameter Probability sampling.

不准！因为其他人没有在看此节目 far away from the real drug control group,aids， a) ‘volunteer sampling’: a TV telephone polls, medical volunteers for research. 心理上作用， b) ‘subjective sampling’: We choose samples that we consider to be typical or 我们不知道effect or not “representative” of the population. bias sampling

2. Some other sampling schemes i.e. weight, measurement error->kw underline structure可以知道how

Statistics a lot of assumptions, nothing is wrong; just how accurate

c) ‘quota sampling’: One keeps sampling until certain quota is ﬁlled.

所以需要⼀一个well structured sample to address those unknowns

All these sampling procedures provide some information about the population, but it is hard to deduce the nature of the population from the studies as the samples are very subjective and often very biased. Furthermore, it is hard to measure the precision of these estimates. 有⼀一些情况prefer bias sampling

1.5

How to design a questionnaire and plan a survey

This can be the most important and perhaps most di±cult part of the survey sampling problem. We shall come back to this point in more details later.

2

au www.ons.com www.statistics. Bureau of the Census Statistics Canada Statistics Norway Statistics Sweden UK O±ce for National Statistics Australian Bureau of Statistics Statistics New Zealand Statistics Netherlands Gallup Organization Nielsen Media Research National Opinion Research Center Inter-University Consortium for Political and Social Research Address www.S.nielsenmedia.nz www.ca www.gov www.edu 3 .statcan.stats.edu www. Note that these sites are subject to change.icpsr.umich.uk www.census.gov.govt.se www. Organization Federal Interagency Council of Statistical Policy U.com www.scb.6 Some useful websites Many government statistical organizations and other collectors of survey data now have Web sites where they provide information on the survey design.norc.gallup.1.no www.gov www. Here are a few examples. but you should be able to ﬁnd the organization through a search.gov.fedstats.cbs.uchicago.nl www.ssb.

Y etc. If we assign probability 1 to each of n n the diÆerent samples.s as (y1 . Theorem 2. in survey sampling. to represent ﬁxed values. y etc. However.√ ! 4 .s. then each sample thus obtained is a simple random sample.1. yn = uin ) = where i1 . to denote random variables and lower-case letters like x. y2 = ui2 .1 For simple random sampling. N! √ ! . Remark: In other statistics course. · · · . in are mutually diÆerent. the sampling procedure is called simple random sampling (s. We have the following result. to denote random variables. · · · . Deﬁnition: If a sample of size n is drawn from a population of size N in such a way that every possible sample of size n has the same probability of being selected. 2. y2 .r.r. u2 . by convention. The resulting sample is called a simple random sample. · · · . we use upper-case letters like X. · · · . i2 .Chapter 2 Simple random sampling Simple random sampling is the simplest sampling procedure. y2 etc. and is the building block for other more complicated sampling schemes to be introduced in later chapters. Denote such a s. (N ° n)! . we have P (y1 = ui1 . yn ). Suppose that the population of size N has values N N There are possible samples of size n. uN }. we use lower-case letters like y1 . for short).1 How to draw a simple random sample {u1 .

That is. Since the sample is drawn without replacement.√ ! N (where the order is not important) is 1 . · · · .in ) n X The theorem is thus proved by the deﬁnition of the simple random sampling. Proof. · · · . a2 . · · · . a2 . N !n! N! N n! n √ ! N Recall that the total number of all possible samples is . an }. · · · . Theorem 2. · · · .···. Suppose that our sample obtained by drawing n values without replacement from the N population values is {a1 . Let {ai1 .···. ai2 . Therefore.Proof. Therefore.1. the probability of obtaining the sample {ui1 . where the order is not important. · · · . yn = ain ) = 1 1 1 (N ° n)! ··· = . the probability of obtaining the sample {a1 . getting a simple random sample by ﬁrst listing all possible samples and then drawing one at random would not be practical. · · · .2 A sample obtained by drawing n values successively without replacement from the N population values is a simple random sample. uin } . we have P (y1 = ai1 . uin }.s. yn = uin ) = √ 1 (N ° n)!n! (N ° n)! ! = = . an }. · · · . ui2 . ain } be any permutation of {a1 . = n! £ =√ N! N! N all (i1 . There are n! number of ways of ordering n {ui1 . · · · .r. an } (where the order is not important) is X all (i1 . 5 . yn = ain ) = (N ° n)! (N ° n)! 1 !. until we get a sample of n (diÆerent) values. P (y1 = ui1 . ui2 . N (N ° 1) (N ° n + 1) N! Hence. which could be very n large if N and n are large. By the deﬁnition of s. we ﬁrst draw one value at random from the N population values. y2 = ui2 . An easier way to get a simple random sample is simply to draw n values at random without replacement from the N population values. and then draw another value at random from the remaining N ° 1 population values and so on.in ) P (y1 = ai1 .

N (N ° 1) i 6= j. {b.1 Estimation of population mean and total Estimation of population mean For a population of size N : {u1 . yn = uin ) (N ° n)! N °2 (N ° n)! (N ° 2)! 1 = £ (n ° 2)! = £ = . Theorem 2. (i) (ii) Proof. t = 1. · · · . {c. in )..1. c. N N i=1 N 1 X æ = (ui ° µ)2 . u2 . yk = uik . yj = ut ) = √ X X P (yi = us ) = 1 .s of size 2. c}. . d}.ij = t ! P (y1 = ui1 . and n = 2. d}. N! n°2 N! (N ° n)! N (N ° 1) Example 1. yj = ut ) = 1 . Possible samples of size 2 are {a. List all possible samples and ﬁnd out the prob. · · · . in ).3 For any i. Solution. N! n°1 N! (N ° n)! N P (yk = us . N P (yi = us . yn = uin ) The probability of drawing {b. {a. 2. · · · . {a. · · · . of drawing {b. P (yk = uj ) = all (i1 . d}. d}. uN }. d} is 1/6. all (i1 .. s 6= t. . c}. · · · . We wish to draw a s.Two special cases will be used later when n = 1. A population contains {a. n and s. N .2 2. but ik = j √ ! (N ° n)! N °1 (N ° n)! (N ° 1)! 1 = £ (n ° 1)! = £ = .. {b. N i=1 2 6 . b. P (y1 = ui1 .r.. but ik = s.2.. we are interested in • the population mean µ= • the population variance N u1 + u2 + · · · + uN 1 X = ui . j = 1. · · · . d}. b}..

uk P (yi = uk ) = k=1 N X k=1 k=1 N X uk 1 = µ. Cov(yi . So E(¯) = (Ey1 + . By Theorem 2.3.2. yj ) 2 n n i=1 j=1 i=1 j=1 X 1 @X = Cov(yi .r. yj ) = E(yi yj ) ° µ2 = ° Næ°1 .2. t 2 " 3 "√ N X us s=1 = Thus. + yn ). yj )A n2 i6=j i=j n X 1 @X æ2 = (° )+ V ar(yi )A n2 i6=j N ° 1 i=1 ≥ N°n N°1 ¥ . By deﬁntion. y h i 1 æ2 = N 2 µ2 ° N æ 2 ° N µ 2 = ° + µ2 . N (N ° 1) N °1 2 1 (N µ)2 ° N (N ° 1) √N X s=1 (us ° µ)2 + N µ2 !# !√ N X t=1 ut ° ! s=1 N X u2 s # Var(¯) = y æ2 n 1 1 1 (y1 + . an obvious estimator for µ is the sample mean: µ=y= ˆ ¯ Theorem 2. Cov(yi .s of size n: {y1 . N (ii). y2 . yj = ut ) = u s ut N (N ° 1) all s 6= t all s 6= t X 1 1 6 X 7 = us ut ° us ut 5 = 4 N (N ° 1) N (N ° 1) s=t all s.2 Proof.. N °1 7 ! . yj ) = ° . 0 0 √ 1 1 1 = n2 æ2 = n æ2 = n æ2 n(n ° 1)(° ) + næ 2 N °1 µ ∂ 1 (n ° 1)(° )+1 N °1 µ ∂ N °n .1 (i) E(yi ) = µ. Now. Theorem 2. Note y = ¯ E(¯) = µ. Now y n n n n n n n X X XX 1 1 V ar(¯) = y Cov( yi . N N X (uk ° µ)2 P (yi = uk ) = k=1 (uk ° µ)2 1 = æ2. n i=1 V ar(yi ) = æ 2 . X X 1 E(yi yj ) = us ut P (yi = us . for N °1 Proof. n 1X yi . yj ) = E(yi yj ) ° E(yi )E(yj ) = E(yi yj ) ° µ2 ...1.Given a s. yj ) = 2 Cov(yi . E(yi ) = V ar(yi ) = N X i 6= j. (i). yn }. æ2 (ii) Cov(yi . yj ) + Cov(yi .. + Eyn ) = (nµ) = µ. · · · .

the sample (y1 .).i. y2 .d.2. Also as n gets large (but ¯ n ∑ N ). As a result.2. y y V ar(¯) = y æ2 n µ N °n N °1 ∂ < æ2 = V ariid (¯). y ¯ when n = N . æ2 Eiid (¯) = µ. for n > 1.2. y n Thus. sampling without replacement produces a less variable estimator of µ. n Notice that V ariid (¯) is diÆerent from V ar(¯) in Theorem 2. y Remark: In previous statistics courses.Remark: From Theorem 2. yn ) are usually independent and identically distributed (i. y V ariid (¯) = y . In particular. Thus y becomes more accurate for µ as n gets larger. Why? 8 . · · · . for the same sample size n. V ar(¯) tends to 0. y is an unbiased estimator for µ. we have a census and V ar(¯) = 0. In fact. namely they are drawan from the population with replacement.2.

fpc) N f= Then we have the following theorem. n(N ° 1) N µ ∂ Conﬁdence intervals for µ 9 .) N °1 ! n X 1 = Ey 2 ° nE(¯)2 y n ° 1 i=1 i √ √ √ n i h i Xh 1 = V ar(yi ) + (Eyi )2 ° n V ar(¯) + (E y )2 y ¯ n ° 1 i=1 = = = = h i 1 æ2 N ° n n æ 2 + µ2 ° n + µ2 n°1 n N °1 µ ∂∏ 2 ∑ næ 1 N °n 1° n°1√ n N °1 ! 2 næ nN ° n ° (N ° n) n°1 n(N ° 1) 2 Næ .g.2. V ar(¯ 2 N °1 2 s N N °1 2 is an unbiased estimator of æ .2.3 Proof. N °1 " #! ! The bias in s2 can be easily corrected. (i.e.2. N n 1°f =1° to be the ﬁnite population correction (ab.2.2. Theorem 2. The next theorem is an easy consequence of the last theorem.4 æ := ˆ We shall deﬁne n to be the sample proportion. E s = æ2. Es 2 E(s2 ) = N æ 2 .2 Estimation of æ 2 and V ar(¯) y The population variance æ 2 is usually unknown. Es2 d y E V ar(¯) = (1 ° f ) = n N æ2 n 1° . e. N 2 µ ∂ n Proof. s2 is biased for æ 2 . y n ° 1 i=1 n ° 1 i=1 i 2 √ ! Theorem 2.5 An unbiased estimator for V ar(¯) is y 2 d y ) = s (1 ° f ) . Theorem 2. Now deﬁne n n X 1 X 1 s = (yi ° y )2 = ¯ y 2 ° n(¯)2 ..

for µ is 12. is called bound on the error of estimation.I. 10 . 1). ¯ 20 q Solution 94 ® 1. A s. and ˆ ¯ æ2 N ° n ˆ s2 1252 ˆ V ar(¯) = y = (1 ° n/N ) = (1 ° 100/10000) = 12. 1). Find a 95% C. as n/N ! ∏ > 0. n N °1 n 100 q ˆ V ar(¯) = 3. then q y°µ ¯ V ar(¯) y ª N (0.6.96 £ 3. d y If V ar(¯) is replaced by its estimator V ar(¯). for µ.96 p 1 ° 1/5 = 94 ® 2. 1) approximately. Central limit theorem: If n ! N such that n/N ! ∏ 2 (0.5206 y A 95% C. and ﬁnd a 95% conﬁdence interval for µ.5 and s2 = 1252. A simple random sample of n = 100 water meters within a community is monitored to estimate the average daily water consumption per household over a speciﬁed dry spell. Solution µ = y = 12. The sample mean and variance are found to be y = 12.5206 = (5. an approximate (1 ° Æ) conﬁdence interval for µ is Ø 0Ø 1 Ø Ø µ ∂ q q Ø y°µ Ø ¯ d y d y Ø ∑ zÆ/2 A = P y ° zÆ/2 V ar(¯) ∑ µ ∑ y + zÆ/2 V ar(¯) @Ø q 1°ÆºP Ø ¯ ¯ Ø d y Ø V ar(¯) Ø q s q d y y ® zÆ/2 V ar(¯) = y ® zÆ/2 p ¯ ¯ 1 ° f. 19. q Example. say) and f = n/N is not too close to 0 or 1. If ¯ we assume that there are N = 10.r.It can be shown that the sample average y under the simple random sampling is approxi¯ mately normally distributed provided n is large (∏ 30.I.5 ± 1. d y V ar(¯) ªapprox. estimate µ.s. n d y B := zÆ/2 V ar(¯) .5. we still have y q y°µ ¯ Thus. N (0.3948.4).479 200 Example. Therefore. the true average daily consumption. 000 households within the community. resulting in y = 94 and s2 = 400. of size n = 200 is taken from a population of size N = 1000.

Solution. P (|¯ ° µ| < B) º 1 ° Æ. P @q <q V ar(¯) y V ar(¯) y B æ2 n ≥ ¥ 0 1 By the central limit theorem.e. we can approximate æ 2 by the following methods: (i) from pilot studies (ii) from previous surveys (iii) other studies.96 º 2. From previous studies. we know that the standard deviation of the starting salary is approximately $400. which is typically unknown in practice. Suppose that a total of 1500 students are to graduate next year. 4 This coincides with the (2): the above formula requires the knowledge of the population variance æ 2 . Example.37 º 230. However. 11 . 1500£4002 n = 1499£402 /1. (N ° 1)D + æ 2 N (N ° 1)D (N ° 1)D + æ 2 =1+ = n æ2 æ2 where D= B2 2 zÆ/2 Remarks: (1): if Æ = 5%.3 Selecting the sample size for estimating population means 2 population mean We have seen that V ar(¯) = æ N °n . the y n N °1 more accurate our estimate y is.2. N (N ° 1)D °1= () n æ2 nº N æ2 .6452 +4002 = 229. q B V ar(¯) y =r ≥ N °n N °1 ¥ º zÆ/2 () æ2 n µ N °n B2 = 2 = D. N °1 zÆ/2 ∂ () Thus. B2 . say. then zÆ/2 = 1.9. So the bigger the sample size n is (but ∑ N ).. y i. |¯ ° µ| y B A º 1 ° Æ. It is of interest to ﬁnd out the minimum n such that our ¯ estimate is within an error bound with certain probability 1 ° Æ. Determine the sample size n needed to ensure that the sample average in starting salary is within $40 of the population average with probability at least 0. so D º formula in the textbook (page 93).

76 º 211. æ 2 ).96) = 95%. Example 4.96æ) = P (|N (0. 95% or 99% of all accounts. æ 2 )| ∑ 3æ) = P (|N (0. we take 2 £ (3æ) = 100. æ 2 )| ∑ 1. We need an estimate of æ 2 . we take 2 £ (2æ) = 100. There are 1000 open accounts. so æ = 50/3.94. so æ = 25. The average amount of money µ for a hospital’s accounts receivable must be estimated. If most means 95%. For the normal distribution. 1)| ∑ 3) = 99. 1)| ∑ 1. 90%. that most accounts lie within a $100 range is known. Then n º 107. we have P (|N (0. B = 3.5 (p. The solution depends on how one inteprets “most accounts”. If most means 99. Find the sample size needed to estimate µ with a bound on the error of estimation $3 with probability 0. 12 .87%. Then n = 210. Although no prior data are available to estimate the population variance æ 2 .95. N = 1000. Solution. N (0.Example. 5th edition).87% So 95% accounts lie within a 4æ range and 99. P (|N (0. whether it means 70%.87% accounts lie within a 6æ range.

. n where f = n/N . n i=1 s2 = n 1 X (yi ° y )2 . . (N ° 1)D + æ 2 where D= B2 2 zÆ/2 13 . 4) An approximate (1 ° Æ) C. n 5) Minimum sample size n needed to have an error bound B with probability 1 ° Æ nº N æ2 .. ¯ n ° 1 i=1 N °n . ¯ æ2 V ar(¯) = y n µ n 1X yi .1 A quick summary on estimation of population mean 1 (u1 + u2 + · · · + uN ). N °1 ∂ 3) An unbiased estimator of V ar(¯) is y d y V ar(¯) = s2 (1 ° f ) . 1) Estimators of µ and æ 2 are µ=y= ˆ ¯ 2) Properties of y : ¯ E y = µ.2. yn }. for µ is q s q d y y ® zÆ/2 V ar(¯) = y ® zÆ/2 p ¯ ¯ 1 ° f.. N The population mean is deﬁned to be µ= Suppose a simple random sample is {y1 .I.3.

d ø V ar(ˆ) °!d N (0..2 Estimation of population total ø = (u1 + u2 + · · · + uN ) = N µ The population total is deﬁned to be Suppose a simple random sample is {y1 . Therefore. as n/N ! ∏ > 0.. n n “Proof”. an approximate (1 ° Æ) conﬁdence interval for ø is ø ® zÆ/2 ˆ B := zÆ/2 q d ø V ar(ˆ) = N zÆ/2 q Ø 0Ø 1 Ø Ø µ ∂ q q Ø ø °ø Ø ˆ d ø d ø Ø ∑ zÆ/2 A = P ø ° zÆ/2 V ar(ˆ) ∑ ø ∑ ø + zÆ/2 V ar(ˆ) 1 ° Æ º P @Ø q ˆ ˆ Ø Ø d ø Ø V ar(ˆ) Ø q d y V ar(¯) .. 1.2. 5. 1). ˆ 3. 1). An estimator of V ar(ˆ) is ø s d ø d V ar(ˆ) = V ar(N y ) = N 2 (1 ° f ) ¯ n √ ! 2 V ar(ˆ) = N 2 ø æ2 n µ N °n . . we still have ø q ø °ø ˆ Thus. (N ° 1)D + æ 2 14 where D= B2 2 N 2 zÆ/2 . yn }. The mean and variance of ø are ˆ E ø = ø. is called bound on the error of estimation. Central limit theorem (CLT): if n ! N such that n/N ! ∏ 2 (0. 1). An estimator of ø is ø = Ny ˆ ¯ 2. Minimum sample size n needed to have an error bound B with probability 1 ° Æ nº N æ2 . N °1 ∂ 4. then ø °ø ˆ q °!d N (0. An approximate (1 ° Æ) conﬁdence interval for ø is ø ® zÆ/2 ˆ q s q s q d ø V ar(ˆ) = ø ® zÆ/2 N p ˆ 1 ° f = N y ® zÆ/2 p ¯ 1°f . V ar(ˆ) ø d ø If V ar(ˆ) is replaced by its estimator V ar(ˆ). s d ø V ar(ˆ) = ø ® zÆ/2 N p ˆ n q 1 ° f.3.

96N )2 = 10002 /(1.00 (grams)2 . to weigh each bird would be time-consuming and tedious. Many similar studies on chick nutrition have been run in the past.72 ª 122 15 . Obviously. was approximately 36.26. Therefore.962 £ 10002 ) = 0. An investigator is interested in estimating the total weight gain in 0 to 4 weeks for N = 1000 chicks fed on a new ration.26 + 36) = 121. determine the number of chicks to be sampled in this study in order to estimate ø within a bound on the error of estimation equal to 1000 grams with probability 95%. Solution.Example 4. Using data from these studies. n = N æ 2 /((N ° 1)D + æ 2 ) = 1000 £ 36/(999 £ 0.6. the investigator found that æ 2 . D = B 2 /(1. Determine the required sample size. (Page 95 of the textbook). the population variance.

The minimum sample size n required to estimate p such that our estimate p is within ˆ an error bound B with probability 1 ° Æ is. An approximate (1 ° Æ) conﬁdence interval for p is p q pq q ˆˆ d p) = p ® z p p ® zÆ/2 V ar(ˆ ˆ ˆ 1 ° f.2. Again. From equation (4. nº N pq . we have µ = E(yi ) = p. N °1 ∂ 3. An estimator of p: y= ¯ An estimator of æ 2 = pq: s 2 n n X 1 X 1 = (yi ° y )2 = ¯ y 2 ° n(¯)2 y n ° 1 i=1 n ° 1 i=1 i n ¥ X 1 1 ≥ = yi ° nˆ2 = p nˆ ° nˆ2 p p n ° 1 i=1 n°1 n = pq where q = 1 ° p ˆˆ ˆ ˆ n°1 Pn i=1 yi n = p.5. Therefore.2. √ ! √ ! From Theorems 2. 16 .1) and Theorem 2. ˆ say. Therefore.2. an estimator of the variance of p is ˆ d p V ar(ˆ) = s2 pq ˆˆ (1 ° f ) = (1 ° f ) . the variance of p is ˆ æ2 V ar(ˆ) = p n µ N °n pq = N °1 n ∂ N °n . Æ/2 n°1 5. (N ° 1)D + pq where D= B2 2 zÆ/2 Note that the right hand side is an increasing function of æ 2 = pq. we have E(ˆ) = p. n n°1 4. Let 1 if the ith element has the characteristic yi = { 0 if not 2 It is easy to see that E(yi ) = E(yi ) = p (Why?).2. æ 2 = var(yi ) = p ° p2 = pq.2 and 2. where q = 1 ° p So the total number of elements in the sample of size n possessing the speciﬁed characP teristic is n yi .3.2.2. i=1 1.1) 2.4 Estimation of population proportion We are interested in the proportion p of the population with a speciﬁed characteristic. N °1 N °1 µ (4. from Theorem 2. p E(s2 ) = N N æ2 = pq.

What sample size n must be drawn in order to estimate p to be within 0. A small town has N = 800 people.7754). Let p = the proportion of people with blood type A.10 so æ 2 = pq ∑ 0. B = 0.96 £ 0. thus ˆ pq = 1/4. n°1 39 q ˆ V ar(¯) = 0. Comment on the diÆerence between (1) and (2). Æ = 0. Solution. we get n = 345. If we know no more than 10% of the population have blood type A.95? (2). Exmaple.625.09. (3).889 £ 10°3 . Find n again in (1). for p is 0. (1).625 £ 0. of size n = 200 is taken and 7% of the sample has blood type A.375 (1 ° n/N ) = £ (1 ° 40/2000) = 5. Take p = 1/2 in the formula.07674. y A 95% C. (2). so we can replace it by some estimate (from previous study. we can replace it by p = 1/2. the proportion of students on campus in favor of the change. Simple calculation yields n = 171. b) If we don’t have an estimate p. p ∑ 0. Estimate p. 0.). Solution. A s. ˆ ¯ ˆ V ar(¯) = y pq ˆˆ 0.05. 25 students answered a±rmatively.04 of p with probability 0.r. (3).625 ± 1.s. Find a 90% conﬁdence interval for p.0767 = (0.) Find a 95% conﬁdence interval for p. p = y = 25/40 = 0. etc. 0. pilot study.4746. 17 .040. Example A simple random sample of n = 40 college students was interviewed to determine the proportion of students in favor of converting from the semester to the quarter system.096). (Assume N = 2000.a) p is often unknown.I. N = 800. (0.04. (1).

· · · . A dependent example Suppose an opinion poll asks n people the question “Do you favor the abortion?” The opinions given are YES. since if one is high. · · · . xm is a random sample (i. Clearly.i. suppose x1 . whose unbiased estimator is y ° x. x) = y ¯ 0. a more interesting case is when the two samples are dependent.y∏0. Y = y. y ¯ y x y ¯ Remark: If the two samples {x1 . Then X = nˆ1 . then Cov(¯.) from a population with mean µx and y1 . p2 and p3 . p2 and p3 be the three respective sample proportions amongst the sample of ˆ ˆ ˆ size n. respectively. V ar(¯ ° x) = V ar(¯) + V ar(¯) ° 2Cov(¯. ‘NO’. as ¯ ¯ E(¯ ° x) = µy ° µx . x). · · · . However. we are interested in comparing p1 and p2 by looking at p1 ° p2 . p2 . Let p1 .d.x+y+z=n x! y! z! Question: What is the distribution of X? (Hint: Classify the people into “Yes” and “Not Yes”) X √ n x. xm } and {y1 . which will be illustrated in the following example. 1 2 3 x∏0. p3 ). yn is a random sample from a population with mean µy . Y = nˆ2 and Z = nˆ3 follows a multinomial distribution with p p p parameter (n. yn } are independent. · · · . y. We are interested in the diÆerence of means µy ° µx . z ! px py pz = 1 2 3 n! p x py p z x! y! z! 1 2 3 18 . Let the proportions of people who answer ‘YES’.5 Comparing estimates For simplicity. p1 and p2 are dependent proportions. NO OPINION. That is P (X = x. the other is likely to be low. Z = z) = Please note that n! px py pz = 1. NO.2. y ¯ Further. p1 . In particular. ‘No opinion’ be p1 .

x+y∑n x°1. p2 ) = p ˆ p p p ˆ One estimator of V ar(ˆ1 ° p2 ) is p ˆ d p V ar(ˆ1 ° p2 ) = ˆ p1 q1 p2 q2 2p1 p2 + + . p2 ) = °p1 p2 /n. Cov(X.5. we have V ar(ˆ1 ° p2 ) = V ar(ˆ1 ) + V ar(ˆ2 ) ° 2Cov(ˆ1 .1 E(X) = np1 .y°1∏0. V ar(Y ) = np2 q2 . X = number of people saying “YES” ª Bin(n. Cov(X. E(Y ) = np2 . p V ar(ˆ2 ) = p2 q2 /n. p1 ).x+y∑n n! px py pn°x°y (x ° 1)! (y ° 1)! (n ° x ° y)! 1 2 3 x.2 E(ˆ1 ) = p1 .y1 ∏0. p E(ˆ2 ) = p2 . V ar(X) = np1 q1 . p ˆ Proof.x+y∑n x. Apply the last theorem. But E(XY ) = = = = x. Y = y) xyP (X = x. E(Z) = np3 . p V ar(ˆ1 ) = p1 q1 /n. Y ) = E(XY ) ° n2 p1 p2 = °np1 p2 .x1 +y1 ∑(n°2) (n ° 2)! (n°2)°x1 °y1 px1 py1 p3 1 2 (x ° 1)! (y ° 1)! ((n ° 2) ° x1 ° y1 )! = n(n ° 1)p1 p2 = n2 p1 p2 ° np1 p2 . n n n v u u p1 q1 tˆ ˆ Therefore. Now Cov(X. an approximate (1 ° Æ) conﬁdence interval for p1 ° p2 is (ˆ1 ° p2 ) ® zÆ/2 Vd p1 ° p2 ) = (ˆ1 ° p2 ) ® zÆ/2 p ˆ ar(ˆ ˆ p ˆ 19 r n + p2 q2 2ˆ1 p2 ˆˆ p ˆ + . Y ) = °np1 p2 .x+y∑n X X X X xyP (X = x. Proof. n n n p1 q1 p2 q2 2ˆ1 p2 ˆˆ ˆˆ p ˆ + + .(x°1)+(y°1)∑(n°2) = n(n ° 1)p1 p2 = n(n ° 1)p1 p2 (n ° 2)! (n°2)°(x°1)°(y°1) px°1 py°1 p3 2 1 (x ° 1)! (y ° 1)! ((n ° 2) ° (x ° 1) ° (y ° 1))! x1 . Y = y. Y ) = E(XY ) ° (EX)(EY ) = E(XY ) ° n2 p1 p2 . Note that p1 = X/n and p2 = Y /n. p X Cov(ˆ1 .y∏1. V ar(X) = np1 q1 . n n . Therefore.5.y∏1.y∏0. ˆ ˆ From the last theorem. Theorem 2.Theorem 2.y∏1. Z = n ° x ° y) xy n! px py pn°x°y x! y! (n ° x ° y)! 1 2 3 X x. So EX = np1 .

1994 gave the following results. In a poll of 600 adult Americans.52 £ 0.08 £ 0.48 0.) Should smoking be banned from the workplace? A Time/Yankelovich poll of 800 adult Americans carried out on April 6-7. 600 600 600 Example.92 + = 0. respectively.34 ± z0.11339.34 = + + 600 n 600 = 1. The proportions choosing “banned” are independent of each other. These are multinomial proportions. are really diÆerent? Solution.52 ° 0.290. Does evidence suggest that the true proportions who blame players and owner. 34% blamed the owners. the former must be small. A.52 + +2£ = 0.08 ± 0.44 £ 0.44 £ 0.44 ° 0. estimate and construct a 95% C. 0.71 3466 2ˆ ˆ 0.56 0. The major league baseball season in US came to an abrupt end in the middle of 1994. p2 be proportions of Americans who blamed the players and the owners.08 ± 2 s 0. 29% blamed the players for the strike.290.Example. An appropriate estimate of this diÆerence is 0. for (1) the true diÆerence between the proportions choosing “Banned” between nonsmokers and smokers.08.44 £ 0.0458 £ 10°3 So an approximate 95% C. respectively.I. q 20 . Thus. an appropriate estimate of this diÆerence is 0.025 V ar(ˆ1 ° p2 ) = °0.06 600 200 B.29 ° 0. if the latter is large.56 0.I. The proportion of nonsmokers choosing “special areas” is dependent on the proportions choosing “banned”.36 ± 0.05 ± 1. p1 q1 p2 q2 2ˆ1 p2 ˆˆ ˆˆ p ˆ ˆ V ar(ˆ1 ° p2 ) = p ˆ + + n n n ˆ ˆ ˆ ˆ 0.01339). and the rest held various other opinions.03234 p ˆ = (°0.44 ± 2 s 0.96 £ 0. Let p1 . (2) the true diÆerence between the proportions among nonsmokers choosing between “Banned” and “Special Areas”. Solution. Banned Special areas No restrictions Nonsmokers 44% 52% 3% Smokers 8% 80% 11% Using a sample of 600 nonsmokers and 200 smokers. (From the textbook. for p1 ° p2 is ˆ 0. a high value does not force a low value of the other.

n N i=1 i=1 N n V ar(Zi ) = EZi2 ° (EZi )2 = = n n 1° N N µ ∂ µ ∂ EZi N N X X 1 V ar(¯) = y Cov Zi ui .. . for i.2. N N X 1 @X 2 = ui V ar (Zi ) + ui uj Cov (Zi .... N N n As a consequence. {Z1 . n n The Zi ’s are the only r.. EZi = EZi2 = n N n n ° N N µ ∂2 n°1 n N °1 N µ ∂ µ ∂2 µ ∂µ ∂ n°1 n n 1 n n Cov(Zi . with ºi = P (Zi = 1) = P (select unit i in sample) number of samples including unit i = number of possible samples √ ! N °1 n°1 n √ ! = = . N °1 n 0 1 21 . Zn } are identically distributed (but not independent) Bernoulli r. Zj ) = E(Zi Zj ) ° E(Zi )E(Zj ) = ° =° 1° N °1 N N N °1 N N E(Zi Zj ) = P (Zi = 1. . Zi ui n2 i=1 i=1 √ ! = N N 1 XX ui uj Cov (Zi .....’s here. . we have. j = 1. N . Ey = ¯ N X i=1 N N ui X n ui 1 X = = ui = µ... µ ∂ N ° n æ2 = . Zj = 1) = P (Zj = 1|Zi = 1)P (Zi = 1) = Therefore.v. For simple random sampling. Then y= ¯ = Zi ui . i 6= j.6 Randomization Theory results for simple random sampling n X yi i=1 N X i=1 Deﬁne Zi = I{ui is in the sample} for i = 1. Zj ) n2 i=1 j=1 . n. Zj )A n2 i=1 i6=j = .v..

y 2. Final an optimal n minimizing the total cost L(n) + C(n)? 22 µ ∂ . 1. 3.05 with probability 95%. Calculate æ 2 and V (¯). the minimum sample size n required so that our estimate is within an error bound B with probability 1 ° Æ is given by N æ2 nº . 6. ¯ estimate the true average daily consumption µ. and the rest held various other opinions. respectively.7 Exercises 1. we P used p = yi /n to estimate it. 3.5 and s2 = 1252. (1). 34% blamed the owners. (a) Suppose that a town has population size N = 2000. Suppose that we use æ 2 = p ° p2 as an estimator of æ 2 . the proportion of students on campus in favor of the change. C(n) = c0 + c1 n. (N ° 1)D + æ 2 where B2 D= 2 2 N zÆ/2 7. A simple random sample of n = 40 college students was interviewed to determine the proportion of students in favor of converting from the semester to the quarter system. Assume N = 10. Find an unbiased estimator of æ 2 . 25 students answered a±rmatively. based on your calculation on (1). ﬁnd n again in (a). 000 households. Compare this estimate with the one given in the lecture. and ﬁnd a 95% conﬁdence interval. Does evidence suggest that the true proportions who blame players and owner. (b) If we know that at least 80% of the people will support building childcare centre. N n for some c0 . In a poll of 600 adult Americans. where ˆ yi = { 1 if the ith element has the characteristic 0 if not 2 We have seen in the lecture that E(yi ) = E(yi ) = p. two functions are speciﬁed: L(n) (loss or cost of a bad estimate) and C(n) (cost of taking the sample).2. List all possible simple random samples of size n = 2 that can be selected from the population {0. Is it an ˜ ˆ ˆ unbiased estimator of æ 2 ? If not. A simple random sample of n = 100 water meters within a community is monitored to estimate the average daily water consumption per household over a speciﬁed dry spell. The major league baseball season in US came to an abrupt end in the middle of 1994. (Assume N = 2000. Estimate p. for estimating the population total. Find the sample size n required to estimate p with an error bound B = 0. We are interested in the proportion p of people who support building childcare centre. 4}. are really diÆerent? 5. and æ 2 = var(yi ) = p ° p2 . 29% blamed the players for the strike. 2. c1 and k. resulting in y = 12. Show that. 8. 4. Suppose n æ2 L(n) = kV ar(¯) = k 1 ° y .) Find a 95% conﬁdence interval for p. what is the its bias? (2). In a decision theory approach. In estimating the proportion p of the population with a speciﬁed characteristic.

Sampling course notes

Sampling course notes

- Statistics- Random Samplingby Dr Singh
- sampling of rare population using disproportionate stratified random sampling and adaptive cluster samplingby Sushil Kumar
- (Statistics, Textbooks and Monographs) Arijit Chaudhuri-Randomized Response and Indirect Questioning Techniques in Surveys-Chapman & Hall_CRC (2011)by Tuliedz Maulidiya
- Courseby pruebaprueba123

- Lecture-9
- Sampling Methods
- pps.pdf
- Sampling Theory and Methods
- Sampling Race
- Lecture-7
- frbrich_wp90-12.pdf
- 21-Need for Statistical Data-VKGRP
- Total
- Dd 31720725
- Lecture 20
- 1-s2.0-S030440760100094X-main
- G502056278.pdf
- Sampling Distribution
- The International Journal of Engineering and Science (The IJES)
- 10.1.1.52.8816.pdf
- Class Notes 1 Sta 4222&5223
- 3 Unequal
- Lecture 29
- Complex Surveys
- Statistics- Random Sampling
- sampling of rare population using disproportionate stratified random sampling and adaptive cluster sampling
- (Statistics, Textbooks and Monographs) Arijit Chaudhuri-Randomized Response and Indirect Questioning Techniques in Surveys-Chapman & Hall_CRC (2011)
- Course
- Lecture 28
- Lecture 22
- An Improved Suggestion in Stratified Random Sampling Using Two Auxiliary Variables
- Harvard Government 2000 Lecture 3
- Modern.survey.sampling [Dr.soc]
- Isi Jrf Stat 08
- chapters1n2

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd