Sampling and Sampling Distributions Population and Samples Parameters and Statistics Types of Sampling

Simple Random, Stratified, Systematic and Cluster sampling, sampling distributions

Standard errors Sampling from normal and non normal populations Central Limit Theorem Finite population Multiplier

Population
Any well defined set (group) of objects about which a statistical enquiry is being made is called a population or universe. The total number of objects (individuals or members) in a population is known as the size of the population which may be finite or infinite. The population can refer to things as well as people. For example, All members of the cultural society of your city. All students of mathematics of Ithaca college. All Americans who saw 'TITANIC' last year. Heights of all students of your school. Weights of all the citizens of city of Lucknow above 20 years of age. Mileages of automobiles tyre of Dunlop. etc.

Sample
A finite set of objects drawn from the population with an aim is called a sample. Even in every day life we make many of our decisions based on samples taken, though we are not aware of it. I met Jackson yesterday first time for an hour or two, I concluded that "Jackson is crazy" which may be wrong. We just take a little from a gunny bag of rice, we judge its quality and then we purchase the whole bag. If we want to taste milk, we just take a glassful of milk from the can and taste it. Note that taking a sample is easy in many cases where the population is uniform or homogeneous. When the population is heterogeneous (not uniform), the selection of a sample is not very easy.

Inference Process
Population

Inference Process
Population

Sample

Inference Process
Population

Sample statistic (X )

Sample

Inference Process
Estimates & tests Population

Sample statistic (X )

Sample

Population and Samples,

Why Sampling

Definition of sampling
Procedure by which some members of a given population are selected as representatives of the entire population

Problems in Sampling?

• What problems do you know about? • What issues are you aware of? • What questions do you have?

Why do we use samples ?
Get information from large populations
– At minimal cost – At maximum speed – At increased accuracy – Using enhanced tools

Sampling

Precision Cost

What we need to know
• Concepts
– Representative ness – Sampling methods – Choice of the right design

Key Sampling Concepts

Sampling and representative ness

Sampling Population

Sample

Target Population

Target Population  Sampling Population  Sample

Sampling and representative ness
Study on prevalence of gynecological infection in women in Bagalore

Female Women population of 4 city wards
Female population of Bangalore

Target Population  Sampling Population  Sample

PARAMETERS AND STATISTICS
• A parameter is a numerical quantity that describes some characteristic of a population. Parameters are often estimated since their value is generally unknown, especially when the population is large enough that it is impossible or impractical to obtain measurements for all observations. Parameters are normally represented by Greek letters. The most common parameters are the population mean and variance, A statistic is a quantitative value that is calculated from the observations in a sample. They are usually represented by lowercase English letters with other symbols. The sample mean and variance, two of the most common statistics derived from samples, are denoted by the symbols x and s2, respectively.

Population and Sample

Population

Sample Use statistics to summarize features

Use parameters to summarize features

Inference on the population from the sample

Sampling Methods
• Simple random sampling • Stratified Random Sampling • Cluster Sampling • Systematic Sampling • Convenience Sampling • Judgment Sampling

Stratified Random Sampling
• The population is first divided into groups of elements called strata. • Each element in the population belongs to one and only one stratum. • Best results are obtained when the elements within each stratum are as much alike as possible (i.e. homogeneous group). • A simple random sample is taken from each stratum. • Formulas are available for combining the stratum sample results into one population parameter estimate.

Stratified Random Sampling
• Advantage: If strata are homogeneous, this method is as “precise” as simple random sampling but with a smaller total sample size. • Example: The basis for forming the strata might be department, location, age, industry type, etc.

Cluster Sampling
• The population is first divided into separate groups of elements called clusters. • Ideally, each cluster is a representative smallscale version of the population (i.e. heterogeneous group). • A simple random sample of the clusters is then taken. • All elements within each sampled (chosen) cluster form the sample. … continued

Cluster Sampling
• Advantage: The close proximity of elements can be cost effective (I.e. many sample observations can be obtained in a short time). • Disadvantage: This method generally requires a larger total sample size than simple or stratified random sampling. • Example: A primary application is area sampling, where clusters are city blocks or other well-defined areas.

Systematic Sampling
• If a sample size of n is desired from a population containing N elements, we might sample one element for every n/N elements in the population. • We randomly select one of the first n/N elements from the population list. • We then select every n/Nth element that follows in the population list. • This method has the properties of a simple random sample, especially if the list of the population elements is a random ordering.

Systematic Sampling
• Advantage: The sample usually will be easier to identify than it would be if simple random sampling were used. • Example: Selecting every 100th listing in a telephone book after the first randomly selected listing.

Convenience Sampling
• It is a nonprobability sampling technique. Items are included in the sample without known probabilities of being selected. • The sample is identified primarily by convenience. • Advantage: Sample selection and data collection are relatively easy. • Disadvantage: It is impossible to determine how representative of the population the sample is. • Example: A professor conducting research might use student volunteers to constitute a sample.

Judgment Sampling
• The person most knowledgeable on the subject of the study selects elements of the population that he or she feels are most representative of the population. • It is a nonprobability sampling technique. • Advantage: It is a relatively easy way of selecting a sample. • Disadvantage: The quality of the sample results depends on the judgment of the person selecting the sample. • Example: A reporter might sample three or four senators, judging them as reflecting the general opinion of the senate.

Simple random sampling
• Principle
– Equal chance of drawing each unit

• Procedure
– Number all units – Randomly draw units

Simple random sampling
Example: evaluate the prevalence of tooth decay among the 1200 children attending a school • List of children attending the school • Children numerated from 1 to 1200 • Sample size = 100 children • Random sampling of 100 numbers between 1 and 1200

How to randomly select?

Simple random sampling

Table of random numbers
57172 33883 77950 11607 56149 80719 93809 40950 12182 13382 38629 60728 01881 23094 15243 53501 07698 22921 68127 55309 92034 50612 81415 38461 07556 60557 42088 87680 67344 11596 55678 65101 19505 86216 59744 48076 94576 32063 99056 29831 21100 58431 24181 25930 00501 10713 90892 84077 98504 44528 24587 50031 70098 28923 10609 01796 38169 77729 82000 48161 65695 73151 48859 12431 46747 95387 48125 68149 01161 79579 37484 36439 69853 41387 32168 30953 88753 75829 11333 15659 87119 24498 47228 83949 79068 17646 83710 48724 75654 23898 08846 23917 05243 25405 01527 43488 99278 65660 06175 54107 17822 08633 71626 05622 26902 09839 15859 17009 49931 83358 45552 24164 41125 35670 17152 23683 01331 07421 16181 23463 17046 13211 28751 72554 61221 09190 49946 08049 64864 30237 29959 45817 74577 67119 94303 75230 86776 35513 14291 38453 66516 10853 88163 97869 39641 49168 31460 71120 80855 77021 76825 74305 37545 68698 54986 77795 43909 89405 42791 00614 67448 56624 48980 94057 74773 63154 78796 04038 74462 88092 36970 02048 91507 91715 02035 46279 18239 68196 47201 08759 38964 41870 49607 70743 75889 49529 31286 27549 56684 51834 66391 58116 73099 75246 14551 72201 99522 31522 16050 49881 10910 22705 47687 75634 85224 45611 83534 26300

Systematic sampling
• N = 1200, and n = 60 ⇒ sampling fraction = 1200/60 = 20 • List persons from 1 to 1200 • Randomly select a number between 1 and 20 (ex : 8) ⇒ 1st person selected = the 8th on the list ⇒ 2nd person = 8 + 20 = the 28th etc .....

Systematic sampling

1 13

2 14

3 4 15

5

6

7

8

9

10 11

12

16 17 18 19 28 29 30

20

21 22

23

24

25

26 27

31 43

32 33 44 45

34 35

36

37 38

39

40

41 42

46 47

48

49 50

51

52 53

54

55

……..

Systematic sampling Example: systematic sampling

Stratified sampling
• Principle :
–Classify population into internally homogeneous subgroups (strata) –Draw sample in each strata –Combine results of all strata

Example: Stratified sampling
• Determine vaccination coverage in a country • One sample drawn in each region • Estimates calculated for each stratum • Each stratum weighted to obtain estimate for country (average)

Stratified sampling
• Advantages
– More precise if variable associated with strata – All subgroups represented, allowing separate conclusions about each of them – Sampling error difficult to measure – Loss of precision if very small numbers sampled in individual strata

• Disadvantages

Cluster sampling
• Principle
–Random sample of groups (“clusters”) of units –In selected clusters, all units or proportion (sample) of units included

Example: Cluster sampling
Section 1 Section 2

Section 3

Section 5 Section 4

Cluster sampling
• Advantages
– Simple as complete list of sampling units within population not required – Less travel/resources required

• Disadvantages

– Imprecise if clusters homogeneous and therefore sample variation greater than population variation (large design effect) – Sampling error difficult to measure

Sample design
• The focus of the design for a sample must be on the magnitude of the standard errors of sampling not than on an arbitrary percentage of the target population. • The standard errors are used to calculate confidence intervals around the sample data.

Standard Error of Mean
• 1. Standard Deviation of All Possible Sample Means,X – Measures Scatter in All Sample Means,X • • 2. Less Than Pop. Standard Deviation 3. Formula (Sampling With Replacement)

σ σx = n

Properties of Sampling Distribution of Mean

Properties of Sampling Distribution of Mean
• 1. • 2. • 3. Unbiasedness Efficiency Consistency
– Mean of Sampling Distribution Equals Population Mean – Sample Mean Comes Closer to Population Mean Than Any Other Unbiased Estimator – As Sample Size Increases, Variation of Sample Mean from Population Mean Decreases

Unbiasedness
P(  X)
Unbiased Biased

A

C

µ

X

Efficiency
P(  X)
Sampling distribution of mean

B
Sampling distribution of median

A

µ

X

Consistency
P(  X)
Larger sample size

B
Smaller sample size

A

µ

X

Sampling from Normal Populations

Sampling from Normal Populations
•Central Tendency

µx = µ
•Dispersion

Population Distribution
σ = 10

σ σx = n
Sampling with replacement

µ = 50

X

Sampling Distribution
n=4 σ X = 5
µ X- = 50

n =16 σ  X = 2.5
X

Standardizing Sampling Distribution of Mean
X − µx X − µ Z= = σ σx n

Sampling Distribution

Standardized Normal Distribution

σ X

σ =1

µ X

X

µ =0

Z

Thinking Challenge
•You’re an operations analyst for AT&T. Longdistance telephone calls are normally distribution with µ = 8 min. & σ = 2 min. If you select random samples of 25 calls, what percentage of the sample means would be between 7.8 & 8.2 minutes?
© 1984-1994 T/Maker Co.

Sampling Distribution Solution*
X − µ 7.8 − 8 Z= = = − .50 σ n 2 25 X − µ 8.2 − 8 Z= = = .50 Standardized σ n 2 25 Normal Distribution

Sampling Distribution

σ  X = .4

σ =1
.3830
.1915 .1915

7.8

8 8.2

X

-.50

0 .50

Z

Sampling from Non-Normal Populations

Sampling from Non-Normal Populations
•Central Tendency

µx = µ
•Dispersion

Population Distribution
σ = 10

σ σx = n
– Sampling with replacement

µ = 50

X

Sampling Distribution
n=4 σ X = 5
µ X- = 50

n =30 σ  X = 1.8
X

Central Limit Theorem

Central Limit Theorem
As sample size gets large enough (n ≥ 30) ...

X

Central Limit Theorem
As sample size gets large enough (n ≥ 30) ...

sampling distribution becomes almost normal.

X

Central Limit Theorem
As sample size gets large enough (n ≥ 30) ...

σ σx = n

sampling distribution becomes almost normal.

µx = µ

X

Sign up to vote on this title
UsefulNot useful