65 views

Uploaded by Sharon Huang Zihang

Sampling course notes

- njc sampling lecture notes
- 445 texbook 0003
- Abhijeet Nigam Internship Report..docx
- 15_MPH
- Nbs Newsletter - March, 2017
- Biostat - BStat 102 Stat Data Collection
- EIE_812[HOD]
- 17761_work Sampling 1
- UNESCO Ramp Archival Practices
- part2g
- netflix and chill - the confidence interval mini-project
- Sample Matching Technique an Approach to Select Better Sample Data
- TN5A-Teacher Notes for LP5A.pdf
- ss353overview
- Chapter 3 Notes
- Sampling
- Ratio and Product Type Estimators Using Stratified Ranked Set Sampling
- Standing Operating Procedure
- 1.pdf
- Two-Phase Sampling in Estimation of Population Mean in The Presence of Non-Response

You are on page 1of 22

1.1

Introduction

Often we are interested in some characteristics of a nite population, e.g. the average income of last years graduates from HKUST; unemployment rate of last quarter in HK. Since the population is usually very large, we would like to say something (i.e. make inference) about the population by collecting and analyzing only a part of that population. The principles and methods of collecting and analyzing data from a nite population is a branch of statistics known as Sample Survey Method. The theory involved is called Sampling Theory. Sample survey is widely used in many areas such as agriculture, education, industry, social aairs, medicine.

1.2

i.e., UST graduate, a student is an element population

An element is an object on which a measurement is taken. A population is a collection of elements about which we require information. Population characteristic: this is the aspect of the population we wish to measure, e.g. the average income of last years graduates from HKUST, or the total wheat yield of all farmers in a certain country.

sometimes units can be household, groups

Sampling units may be the individual members of the population, they may be a coarser subdivision of the population, e.g. a household which may contain more than one individual member.

order them, list of names; / st ID

A frame is a list of sampling units, e.g., telephone directory. A sample is a collection of sampling units drawn from a frame or frames.

100 sample 2000

1.3

Why sampling?

If a sample is equal to the population, we have a census, which contains all the information one wants. However, census is rarely conducted for several reasons:

cost, (money is limited) time, (time is limited) destructive (testing a product can be destructive, e.g. light bulbs), accessibility (non-response can be a serious issue). In those cases, sampling is the only alternative.

1.4

The procedure for selecting the sample is called the sample survey design. The general aim of sample survey is to draw samples which are representative of the whole population. Broadly speaking, we can classify sampling schemes into two categories: probability sampling and some other sampling schemes.

1. Probability sampling

probability structure, we kw how the random structure is, we can put error bond, sure line within bonds

This is a sampling scheme whereby particular samples are numerated and each has a non-zero probability of being selected. With probability built in the design, we can make statements such as our estimate is unbiased and we are 95% condent that it is within 2 percentage point of the true proportion. In this course, we shall only concentrate on no error bond, not kw structure, not kw perfms of parameter Probability sampling.

far away from the real drug control group,aids a) volunteer sampling: a TV telephone polls, medical volunteers for research. b) subjective sampling: We choose samples that we consider to be typical or effect or not representative of the population. bias sampling

2. Some other sampling schemes i.e. weight, measurement error->kw underline structurehow

All these sampling procedures provide some information about the population, but it is hard to deduce the nature of the population from the studies as the samples are very subjective and often very biased. Furthermore, it is hard to measure the precision of these estimates. prefer bias sampling

1.5

This can be the most important and perhaps most dicult part of the survey sampling problem. We shall come back to this point in more details later.

1.6

Many government statistical organizations and other collectors of survey data now have Web sites where they provide information on the survey design. Here are a few examples. Note that these sites are subject to change, but you should be able to nd the organization through a search. Organization Federal Interagency Council of Statistical Policy U.S. Bureau of the Census Statistics Canada Statistics Norway Statistics Sweden UK Oce for National Statistics Australian Bureau of Statistics Statistics New Zealand Statistics Netherlands Gallup Organization Nielsen Media Research National Opinion Research Center Inter-University Consortium for Political and Social Research Address www.fedstats.gov www.census.gov www.statcan.ca www.ssb.no www.scb.se www.ons.gov.uk www.statistics.gov.au www.stats.govt.nz www.cbs.nl www.gallup.com www.nielsenmedia.com www.norc.uchicago.edu www.icpsr.umich.edu

Simple random sampling is the simplest sampling procedure, and is the building block for other more complicated sampling schemes to be introduced in later chapters. Denition: If a sample of size n is drawn from a population of size N in such a way that every possible sample of size n has the same probability of being selected, the sampling procedure is called simple random sampling (s.r.s. for short). The resulting sample is called a simple random sample.

2.1

{u1 , u2 , , uN }.

N N There are possible samples of size n. If we assign probability 1 to each of n n the dierent samples, then each sample thus obtained is a simple random sample. Denote such a s.r.s as (y1 , y2 , , yn ). Remark: In other statistics course, we use upper-case letters like X, Y etc. to denote random variables and lower-case letters like x, y etc. to represent xed values. However, in survey sampling, by convention, we use lower-case letters like y1 , y2 etc. to denote random variables. We have the following result. Theorem 2.1.1 For simple random sampling, we have P (y1 = ui1 , y2 = ui2 , , yn = uin ) = where i1 , i2 , , in are mutually dierent. (N n)! . N!

Proof. By the denition of s.r.s, the probability of obtaining the sample {ui1 , ui2 , , uin } , ! N (where the order is not important) is 1 . There are n! number of ways of ordering n {ui1 , ui2 , , uin }. Therefore, P (y1 = ui1 , y2 = ui2 , , yn = uin ) = 1 (N n)!n! (N n)! ! = = . N !n! N! N n! n

!

N Recall that the total number of all possible samples is , which could be very n large if N and n are large. Therefore, getting a simple random sample by rst listing all possible samples and then drawing one at random would not be practical. An easier way to get a simple random sample is simply to draw n values at random without replacement from the N population values. That is, we rst draw one value at random from the N population values, and then draw another value at random from the remaining N 1 population values and so on, until we get a sample of n (dierent) values. Theorem 2.1.2 A sample obtained by drawing n values successively without replacement from the N population values is a simple random sample. Proof. Suppose that our sample obtained by drawing n values without replacement from the N population values is {a1 , a2 , , an }, where the order is not important. Let {ai1 , ai2 , , ain } be any permutation of {a1 , a2 , , an }. Since the sample is drawn without replacement, we have P (y1 = ai1 , , yn = ain ) = 1 1 1 (N n)! = . N (N 1) (N n + 1) N!

Hence, the probability of obtaining the sample {a1 , , an } (where the order is not important) is

X

all

(i1 ,,in )

X

The theorem is thus proved by the denition of the simple random sampling.

Two special cases will be used later when n = 1, and n = 2. Theorem 2.1.3 For any i, j = 1, ..., n and s, t = 1, ..., N , (i) (ii) Proof. P (yk = uj ) = all (i1 , , in ), but ik = j ! (N n)! N 1 (N n)! (N 1)! 1 = (n 1)! = = . N! n1 N! (N n)! N P (yk = us , yj = ut ) =

X X

P (yi = us ) =

1 . N

P (yi = us , yj = ut ) =

1 , N (N 1)

i 6= j,

s 6= t.

(N n)! N 2 (N n)! (N 2)! 1 = (n 2)! = = . N! n2 N! (N n)! N (N 1) Example 1. A population contains {a, b, c, d}. We wish to draw a s.r.s of size 2. List all possible samples and nd out the prob. of drawing {b, d}. Solution. Possible samples of size 2 are {a, b}, {a, c}, {a, d}, {b, c}, {b, d}, {c, d},

!

2.2

2.2.1

Estimation of population mean

For a population of size N : {u1 , u2 , , uN }, we are interested in the population mean = the population variance

N u1 + u2 + + uN 1 X = ui , N N i=1

N 1 X = (ui )2 . N i=1 2

Given a s.r.s of size n: {y1 , y2 , , yn }, an obvious estimator for is the sample mean: =y= Theorem 2.2.1 (i) E(yi ) = ,

n 1X yi . n i=1

V ar(yi ) = 2 . 2 (ii) Cov(yi , yj ) = , for N 1 Proof. (i). By Theorem 2.1.3, E(yi ) = V ar(yi ) =

N X

i 6= j.

uk P (yi = uk ) =

k=1 N X k=1

k=1

N X

uk

1 = , N

N X

(uk )2 P (yi = uk ) =

k=1

(uk )2

1 = 2. N

(ii). By dention, Cov(yi , yj ) = E(yi yj ) E(yi )E(yj ) = E(yi yj ) 2 . Now, X X 1 E(yi yj ) = us ut P (yi = us , yj = ut ) = u s ut N (N 1) all s 6= t all s 6= t

X 1 1 6 X 7 = us ut us ut 5 = 4 N (N 1) N (N 1) s=t all s, t 2 " 3 " N X

us

s=1

h i 1 2 = N 2 2 N 2 N 2 = + 2 . N (N 1) N 1

2

1 (N )2 N (N 1)

N X

s=1

(us )2 + N 2

!#

! N X

t=1

ut

s=1

N X

u2 s

Var() = y

2 n

1 1 1 (y1 + ... + yn ). So E() = (Ey1 + ... + Eyn ) = (n) = . Now y n n n n n n n X X XX 1 1 V ar() = y Cov( yi , yj ) = 2 Cov(yi , yj ) 2 n n i=1 j=1 i=1 j=1

X 1 @X = Cov(yi , yj ) + Cov(yi , yj )A n2 i6=j i=j

n X 1 @X 2 = ( )+ V ar(yi )A n2 i6=j N 1 i=1

Nn N1

0 0

1 = n2 2 = n 2 = n

Remark: From Theorem 2.2.2, y is an unbiased estimator for . Also as n gets large (but n N ), V ar() tends to 0. Thus y becomes more accurate for as n gets larger. In particular, y when n = N , we have a census and V ar() = 0. y Remark: In previous statistics courses, the sample (y1 , y2 , , yn ) are usually independent and identically distributed (i.i.d.), namely they are drawan from the population with replacement. As a result, 2 Eiid () = , y V ariid () = y . n Notice that V ariid () is dierent from V ar() in Theorem 2.2.2. In fact, for n > 1, y y V ar() = y 2 n

N n N 1

<

2 = V ariid (). y n

Thus, for the same sample size n, sampling without replacement produces a less variable estimator of . Why?

2.2.2

n n X 1 X 1 s = (yi y )2 = y 2 n()2 . y n 1 i=1 n 1 i=1 i 2

2

E(s2 ) =

!

n X 1 = Ey 2 nE()2 y n 1 i=1 i

= = = =

h i 1 2 N n n 2 + 2 n + 2 n1 n N 1 2 n 1 N n 1 n1 n N 1 ! 2 n nN n (N n) n1 n(N 1) 2 N . N 1

"

#!

The bias in s2 can be easily corrected. The next theorem is an easy consequence of the last theorem. Theorem 2.2.4 := We shall dene n to be the sample proportion, N n 1f =1 to be the nite population correction (ab. fpc) N f= Then we have the following theorem. Theorem 2.2.5 An unbiased estimator for V ar() is y

2 d y ) = s (1 f ) . V ar( 2 N 1 2 s N

2

Proof.

Es2 d y E V ar() = (1 f ) = n

N 2 n 1 . n(N 1) N

It can be shown that the sample average y under the simple random sampling is approxi mately normally distributed provided n is large ( 30, say) and f = n/N is not too close to 0 or 1. Central limit theorem: If n ! N such that n/N ! 2 (0, 1), then

q

V ar() y

N (0, 1)

approximately.

Thus,

d y V ar()

as

n/N ! > 0.

Example. A s.r.s. of size n = 200 is taken from a population of size N = 1000, resulting in y = 94 and s2 = 400. Find a 95% C.I. for . 20 q Solution 94 1.96 p 1 1/5 = 94 2.479 200 Example. A simple random sample of n = 100 water meters within a community is monitored to estimate the average daily water consumption per household over a specied dry spell. The sample mean and variance are found to be y = 12.5 and s2 = 1252. If we assume that there are N = 10, 000 households within the community, estimate , the true average daily consumption, and nd a 95% condence interval for . Solution = y = 12.5. and 2 N n s2 1252 V ar() = y = (1 n/N ) = (1 100/10000) = 12.3948. n N 1 n 100

q

V ar() = 3.5206 y

10

2.3

2

population mean We have seen that V ar() = N n . So the bigger the sample size n is (but N ), the y n N 1 more accurate our estimate y is. It is of interest to nd out the minimum n such that our estimate is within an error bound with certain probability 1 , say, P (| | < B) 1 , y i.e., | | y B A 1 . P @q <q V ar() y V ar() y B

2 n

q

B V ar() y

=r

N n N 1

z/2

()

2 n

N n B2 = 2 = D, N 1 z/2

() Thus,

N (N 1)D 1= () n 2 n N 2 , (N 1)D + 2

Remarks: (1): if = 5%, then z/2 = 1.96 2, so D formula in the textbook (page 93).

B2 . 4

(2): the above formula requires the knowledge of the population variance 2 , which is typically unknown in practice. However, we can approximate 2 by the following methods: (i) from pilot studies (ii) from previous surveys (iii) other studies. Example. Suppose that a total of 1500 students are to graduate next year. Determine the sample size n needed to ensure that the sample average in starting salary is within $40 of the population average with probability at least 0.9. From previous studies, we know that the standard deviation of the starting salary is approximately $400. Solution. 15004002 n = 1499402 /1.6452 +4002 = 229.37 230. 11

Example. Example 4.5 (p.94, 5th edition). The average amount of money for a hospitals accounts receivable must be estimated. Although no prior data are available to estimate the population variance 2 , that most accounts lie within a $100 range is known. There are 1000 open accounts. Find the sample size needed to estimate with a bound on the error of estimation $3 with probability 0.95. Solution. The solution depends on how one inteprets most accounts, whether it means 70%, 90%, 95% or 99% of all accounts. We need an estimate of 2 . For the normal distribution, N (0, 2 ), we have P (|N (0, 2 )| 1.96) = P (|N (0, 1)| 1.96) = 95%, P (|N (0, 2 )| 3) = P (|N (0, 1)| 3) = 99.87% So 95% accounts lie within a 4 range and 99.87% accounts lie within a 6 range. B = 3, N = 1000. If most means 95%, we take 2 (2) = 100, so = 25. Then n = 210.76 211. If most means 99.87%, we take 2 (3) = 100, so = 50/3. Then n 107.

12

2.3.1

1 (u1 + u2 + + uN ). N

Suppose a simple random sample is {y1 , ..., yn }. 1) Estimators of and 2 are =y= 2) Properties of y : E y = , 2 V ar() = y n

n 1X yi , n i=1

s2 =

n 1 X (yi y )2 . n 1 i=1

N n . N 1

d y V ar() =

s2 (1 f ) , n

where f = n/N .

5) Minimum sample size n needed to have an error bound B with probability 1 n N 2 , (N 1)D + 2 where D= B2 2 z/2

13

2.3.2

= (u1 + u2 + + uN ) = N

Suppose a simple random sample is {y1 , ..., yn }. 1. An estimator of is = Ny 2. The mean and variance of are E = , 3. An estimator of V ar() is s d d V ar() = V ar(N y ) = N 2 (1 f ) n

!

2

V ar() = N 2

2 n

N n . N 1

q

Proof. Central limit theorem (CLT): if n ! N such that n/N ! 2 (0, 1), then q !d N (0, 1). V ar()

d If V ar() is replaced by its estimator V ar(), we still have q

Thus,

d V ar()

!d N (0, 1),

as

n/N ! > 0.

q d V ar() = N z/2 q

s d V ar() = z/2 N p

1 f.

5. Minimum sample size n needed to have an error bound B with probability 1 n N 2 , (N 1)D + 2 14 where D= B2 2 N 2 z/2

Example 4.6. (Page 95 of the textbook). An investigator is interested in estimating the total weight gain in 0 to 4 weeks for N = 1000 chicks fed on a new ration. Obviously, to weigh each bird would be time-consuming and tedious. Therefore, determine the number of chicks to be sampled in this study in order to estimate within a bound on the error of estimation equal to 1000 grams with probability 95%. Many similar studies on chick nutrition have been run in the past. Using data from these studies, the investigator found that 2 , the population variance, was approximately 36.00 (grams)2 . Determine the required sample size. Solution. D = B 2 /(1.96N )2 = 10002 /(1.962 10002 ) = 0.26. n = N 2 /((N 1)D + 2 ) = 1000 36/(999 0.26 + 36) = 121.72 122

15

2.4

We are interested in the proportion p of the population with a specied characteristic. Let 1 if the ith element has the characteristic yi = { 0 if not

2 It is easy to see that E(yi ) = E(yi ) = p (Why?). Therefore, we have

where q = 1 p

So the total number of elements in the sample of size n possessing the specied characP teristic is n yi . Therefore, i=1 1. An estimator of p: y= An estimator of 2 = pq: s

2 n n X 1 X 1 = (yi y )2 = y 2 n()2 y n 1 i=1 n 1 i=1 i n X 1 1 = yi n2 = p n n2 p p n 1 i=1 n1 n = pq where q = 1 p n1

Pn

i=1

yi

= p,

say.

!

E(s2 ) =

N N 2 = pq. N 1 N 1

(4.1)

N n pq = N 1 n

N n . N 1

d p V ar() =

s2 pq (1 f ) = (1 f ) . n n1

5. The minimum sample size n required to estimate p such that our estimate p is within an error bound B with probability 1 is, n N pq , (N 1)D + pq where D= B2 2 z/2

a) p is often unknown, so we can replace it by some estimate (from previous study, pilot study, etc.). b) If we dont have an estimate p, we can replace it by p = 1/2, thus pq = 1/4. Exmaple. A small town has N = 800 people. Let p = the proportion of people with blood type A. (1). What sample size n must be drawn in order to estimate p to be within 0.04 of p with probability 0.95? (2). If we know no more than 10% of the population have blood type A. Find n again in (1). Comment on the dierence between (1) and (2). (3). A s.r.s. of size n = 200 is taken and 7% of the sample has blood type A. Find a 90% condence interval for p. Solution. N = 800, = 0.05, B = 0.04. (1). Take p = 1/2 in the formula, we get n = 345. (2). p 0.10 so 2 = pq 0.09. Simple calculation yields n = 171. (3). (0.040, 0.096). Example A simple random sample of n = 40 college students was interviewed to determine the proportion of students in favor of converting from the semester to the quarter system. 25 students answered armatively. Estimate p, the proportion of students on campus in favor of the change. (Assume N = 2000.) Find a 95% condence interval for p. Solution. p = y = 25/40 = 0.625. V ar() = y pq 0.625 0.375 (1 n/N ) = (1 40/2000) = 5.889 103 . n1 39

q

V ar() = 0.07674. y

17

2.5

Comparing estimates

For simplicity, suppose x1 , , xm is a random sample (i.i.d.) from a population with mean x and y1 , , yn is a random sample from a population with mean y . We are interested in the dierence of means y x , whose unbiased estimator is y x, as E( x) = y x . y Further, V ar( x) = V ar() + V ar() 2Cov(, x). y y x y Remark: If the two samples {x1 , , xm } and {y1 , , yn } are independent, then Cov(, x) = y 0. However, a more interesting case is when the two samples are dependent, which will be illustrated in the following example.

A dependent example Suppose an opinion poll asks n people the question Do you favor the abortion? The opinions given are YES, NO, NO OPINION.

Let the proportions of people who answer YES, NO, No opinion be p1 , p2 and p3 , respectively. In particular, we are interested in comparing p1 and p2 by looking at p1 p2 . Clearly, p1 and p2 are dependent proportions, since if one is high, the other is likely to be low. Let p1 , p2 and p3 be the three respective sample proportions amongst the sample of size n. Then X = n1 , Y = n2 and Z = n3 follows a multinomial distribution with p p p parameter (n, p1 , p2 , p3 ). That is P (X = x, Y = y, Z = z) = Please note that n! px py pz = 1. 1 2 3 x0,y0,x+y+z=n x! y! z! Question: What is the distribution of X? (Hint: Classify the people into Yes and Not Yes)

X

n x, y, z

px py pz = 1 2 3

n! p x py p z x! y! z! 1 2 3

18

Theorem 2.5.1 E(X) = np1 , V ar(X) = np1 q1 , E(Y ) = np2 , V ar(Y ) = np2 q2 , E(Z) = np3 , Cov(X, Y ) = np1 p2 .

Proof. X = number of people saying YES Bin(n, p1 ). So EX = np1 , V ar(X) = np1 q1 . Now Cov(X, Y ) = E(XY ) (EX)(EY ) = E(XY ) n2 p1 p2 . But E(XY ) = = = =

x,y0,x+yn

X X X X

X

x,y1,x+yn

x,y1,x+yn

x1,y10,(x1)+(y1)(n2)

= n(n 1)p1 p2

= n(n 1)p1 p2

x1 ,y1 0,x1 +y1 (n2)

(n 2)! (n2)x1 y1 px1 py1 p3 1 2 (x 1)! (y 1)! ((n 2) x1 y1 )! = n(n 1)p1 p2 = n2 p1 p2 np1 p2 . Therefore, Cov(X, Y ) = E(XY ) n2 p1 p2 = np1 p2 . Theorem 2.5.2 E(1 ) = p1 , p V ar(1 ) = p1 q1 /n, p E(2 ) = p2 , p V ar(2 ) = p2 q2 /n, p

Cov(1 , p2 ) = p1 p2 /n. p Proof. Note that p1 = X/n and p2 = Y /n. Apply the last theorem. From the last theorem, we have V ar(1 p2 ) = V ar(1 ) + V ar(2 ) 2Cov(1 , p2 ) = p p p p One estimator of V ar(1 p2 ) is p

d p V ar(1 p2 ) =

p1 q1 p2 q2 2p1 p2 + + . n n n

p1 q1 p2 q2 21 p2 p + + . n n n

v u u p1 q1 t

19

p2 q2 21 p2 p + . n n

Example. (From the textbook.) Should smoking be banned from the workplace? A Time/Yankelovich poll of 800 adult Americans carried out on April 6-7, 1994 gave the following results. Banned Special areas No restrictions Nonsmokers 44% 52% 3% Smokers 8% 80% 11%

Using a sample of 600 nonsmokers and 200 smokers, estimate and construct a 95% C.I. for (1) the true dierence between the proportions choosing Banned between nonsmokers and smokers; (2) the true dierence between the proportions among nonsmokers choosing between Banned and Special Areas. Solution. A. The proportions choosing banned are independent of each other; a high value does not force a low value of the other. An appropriate estimate of this dierence is 0.44 0.08 2

s

B. The proportion of nonsmokers choosing special areas is dependent on the proportions choosing banned; if the latter is large, the former must be small. These are multinomial proportions. Thus, an appropriate estimate of this dierence is 0.52 0.44 2

s

0.44 0.56 0.52 0.48 0.44 0.52 + +2 = 0.08 0.08. 600 600 600

Example. The major league baseball season in US came to an abrupt end in the middle of 1994. In a poll of 600 adult Americans, 29% blamed the players for the strike, 34% blamed the owners, and the rest held various other opinions. Does evidence suggest that the true proportions who blame players and owner, respectively, are really dierent? Solution. Let p1 , p2 be proportions of Americans who blamed the players and the owners, respectively. p1 q1 p2 q2 21 p2 p V ar(1 p2 ) = p + + n n n 0.290.71 3466 2 0.290.34 = + + 600 n 600 = 1.0458 103 So an approximate 95% C.I. for p1 p2 is 0.29 0.34 z0.025 V ar(1 p2 ) = 0.05 1.96 0.03234 p = (0.11339, 0.01339).

q

20

2.6

n X yi i=1 N X i=1

The Zi s are the only r.v.s here. For simple random sampling, {Z1 , ..., Zn } are identically distributed (but not independent) Bernoulli r.v. with i = P (Zi = 1) = P (select unit i in sample) number of samples including unit i = number of possible samples ! N 1 n1 n ! = = . N N n As a consequence, we have, for i, j = 1, ..., n, i 6= j, EZi = EZi2 = n N n n N N

2

n1 n N 1 N 2 n1 n n 1 n n Cov(Zi , Zj ) = E(Zi Zj ) E(Zi )E(Zj ) = = 1 N 1 N N N 1 N N E(Zi Zj ) = P (Zi = 1, Zj = 1) = P (Zj = 1|Zi = 1)P (Zi = 1) = Therefore, Ey =

N X i=1 N N ui X n ui 1 X = = ui = , n N i=1 i=1 N n

n n 1 N N

EZi

21

2.7

Exercises

1. List all possible simple random samples of size n = 2 that can be selected from the population {0, 1, 2, 3, 4}. Calculate 2 and V (). y 2. A simple random sample of n = 100 water meters within a community is monitored to estimate the average daily water consumption per household over a specied dry spell, resulting in y = 12.5 and s2 = 1252. Assume N = 10, 000 households, estimate the true average daily consumption , and nd a 95% condence interval. 3. A simple random sample of n = 40 college students was interviewed to determine the proportion of students in favor of converting from the semester to the quarter system. 25 students answered armatively. Estimate p, the proportion of students on campus in favor of the change. (Assume N = 2000.) Find a 95% condence interval for p. 4. The major league baseball season in US came to an abrupt end in the middle of 1994. In a poll of 600 adult Americans, 29% blamed the players for the strike, 34% blamed the owners, and the rest held various other opinions. Does evidence suggest that the true proportions who blame players and owner, respectively, are really dierent? 5. (a) Suppose that a town has population size N = 2000. We are interested in the proportion p of people who support building childcare centre. Find the sample size n required to estimate p with an error bound B = 0.05 with probability 95%. (b) If we know that at least 80% of the people will support building childcare centre, nd n again in (a). 6. Show that, for estimating the population total, the minimum sample size n required so that our estimate is within an error bound B with probability 1 is given by N 2 n , (N 1)D + 2 where B2 D= 2 2 N z/2

7. In estimating the proportion p of the population with a specied characteristic, we P used p = yi /n to estimate it, where yi = { 1 if the ith element has the characteristic 0 if not

2 We have seen in the lecture that E(yi ) = E(yi ) = p, and 2 = var(yi ) = p p2 .

(1). Suppose that we use 2 = p p2 as an estimator of 2 . Is it an unbiased estimator of 2 ? If not, what is the its bias? (2). Find an unbiased estimator of 2 , based on your calculation on (1). Compare this estimate with the one given in the lecture. 8. In a decision theory approach, two functions are specied: L(n) (loss or cost of a bad estimate) and C(n) (cost of taking the sample). Suppose n 2 L(n) = kV ar() = k 1 y , C(n) = c0 + c1 n, N n for some c0 , c1 and k. Final an optimal n minimizing the total cost L(n) + C(n)? 22

- njc sampling lecture notesUploaded bybhimabi
- 445 texbook 0003Uploaded bysavan
- Abhijeet Nigam Internship Report..docxUploaded byAbhijeet Nigam
- 15_MPHUploaded byShayan Gujjar
- Nbs Newsletter - March, 2017Uploaded byOthman Michuzi
- Biostat - BStat 102 Stat Data CollectionUploaded byMichelle Teodoro
- EIE_812[HOD]Uploaded byPraiseGod'borngreat Chidozie Emenike
- 17761_work Sampling 1Uploaded byVicky Singh
- UNESCO Ramp Archival PracticesUploaded byAnonymous y9IQflUOTm
- part2gUploaded byMahesh Babu
- netflix and chill - the confidence interval mini-projectUploaded byapi-318490244
- Sample Matching Technique an Approach to Select Better Sample DataUploaded byIJSTR Research Publication
- TN5A-Teacher Notes for LP5A.pdfUploaded byVasanth Raja
- ss353overviewUploaded byAbdul Moid
- Chapter 3 NotesUploaded byJulie Mae Alacha
- SamplingUploaded bySuresh Kumar Nachimuthu
- Ratio and Product Type Estimators Using Stratified Ranked Set SamplingUploaded byinventionjournals
- Standing Operating ProcedureUploaded byVishal Singh
- 1.pdfUploaded byMia Utami
- Two-Phase Sampling in Estimation of Population Mean in The Presence of Non-ResponseUploaded byAnonymous 0U9j6BLllB
- 4Uploaded bysejalgad
- research assignment formatted.docxUploaded byanjana
- Module 4 Sampling PG ClassUploaded byAbdi
- Sampling Concise PPT 14-11-16Uploaded byjoe briffa
- Copy of SynopsisUploaded byRajni Arora
- Presentation on SipUploaded byRaj Saini
- _mother Dairy vs Its Competitors in Ice-cream Business_Uploaded byEm Kay
- Impact of Advertisements and Female Peer Group on Buying Decision of Millennial Females- A Comparative StudyUploaded bysona
- INTRO TO STAT.pptUploaded byRainiel Victor M. Crisologo
- Feedback_Dissertation Chapters_BA Adv Entry _ Jalal Qamar_7286Uploaded bySheikh Muhammad Shabbir

- Chap1&2 Stochastic Modeling Lecture notesUploaded bySharon Huang Zihang
- Heinz CaseUploaded bySharon Huang Zihang
- Bloomberg AssessmentUploaded bySharon Huang Zihang
- Basic Tech FunctionsUploaded bySharon Huang Zihang
- communication and negotiationUploaded bySharon Huang Zihang
- BD_SM01_final_sn_geUploaded bySharon Huang Zihang

- Testing of Hypothesis-AGBSUploaded bySonam Goel
- BUS 308 Weeks 1Uploaded bymenefiem
- solution Chapter 4 Mandenhall bookUploaded byUzair Khan
- StemUploaded byEng Ibraheem
- Kruskal-Wallis testUploaded byxiaoqiang9527
- Data Analysis-Testing of SignificanceUploaded bymeggie123
- 233767863-Newbold-Chapter-7.pptUploaded byAshutosh
- Probability BookUploaded bySankar Ranjan
- Sample Size Computations and Power Analysis With the SAS SystemUploaded byAhmad Abdullah Najjar
- Six Sigma Statistics Using ExcelUploaded bybee-mi
- ANOVA - GraphPad.pdfUploaded byprayoga
- ProbabilityUploaded bypiskala34
- Stat 255 Supplement 2011 FallUploaded bygoofbooter
- STAT 121 Writing Assignment 1 Confidence IntervalsUploaded byjoneschristine
- Literature for Nonlinear Time SeriesUploaded byDuygu Deveci
- 8900Uploaded byDhamu Prakash
- 6th_chapter6_section1Uploaded byTuy An
- Chaper 1Uploaded byChen
- Chap 06(1)Uploaded byZahra Ahmed Alfaraj
- Chapter 09B, Hypothesis Testing With t and pUploaded byLinh Hihi
- Set 1 - Introduction & Probability ModelUploaded byambady123
- Hypothesis TestingUploaded bynavya um
- New Microsoft Office Word Document (6).docxUploaded byAgustinus Sitio
- data8 uc berkeleyUploaded bySteven Truong
- Biomedical Literature EvaluationUploaded byRuth Sharmaine
- 1. Multiple Random VariablesUploaded byNeelkanth Kundu
- DIS_ch_11.pptxUploaded byVictoria Liendo
- Choosing the Right Statistical TestUploaded byraywood
- Worksheet November 21 SolutionsUploaded bySainath Nutalapati
- QNT 561 Weekly Learning Assessments | StudentehelpUploaded bystudentehelp8