100%(22)100% found this document useful (22 votes)

4K views78 pagesSTATISTICS basically analysis of standard errors and different kinds of sampling designs etc

Mar 19, 2008

© Attribution Non-Commercial (BY-NC)

PPT or read online from Scribd

STATISTICS basically analysis of standard errors and different kinds of sampling designs etc

Attribution Non-Commercial (BY-NC)

100%(22)100% found this document useful (22 votes)

4K views78 pagesSTATISTICS basically analysis of standard errors and different kinds of sampling designs etc

Attribution Non-Commercial (BY-NC)

You are on page 1of 78

Parameters and Statistics

Types of Sampling

Simple Random, Stratified, Systematic and Cluster

sampling, sampling distributions

Standard errors

Sampling from normal and non

normal populations

Central Limit Theorem

Finite population Multiplier

Population

Any well defined set (group) of objects about which a

statistical enquiry is being made is called a population or

universe.

population is known as the size of the population which

may be finite or infinite.

For example,

All members of the cultural society of your city.

All students of mathematics of Ithaca college.

All Americans who saw 'TITANIC' last year.

Heights of all students of your school.

Weights of all the citizens of city of Lucknow above 20

years of age.

Mileages of automobiles tyre of Dunlop. etc.

Sample

called a sample.

samples taken, though we are not aware of it. I met Jackson

yesterday first time for an hour or two, I concluded that "Jackson is

crazy" which may be wrong.

We just take a little from a gunny bag of rice, we judge its quality and

then we purchase the whole bag.

If we want to taste milk, we just take a glassful of milk from the can

and taste it.

population is uniform or homogeneous. When the population is

heterogeneous (not uniform), the selection of a sample is not very

easy.

Inference Process

Population

Inference Process

Population

Sample

Inference Process

Population

Sample

statistic

(X ) Sample

Inference Process

Estimates Population

& tests

Sample

statistic

(X ) Sample

Population and Samples,

Why Sampling

Definition of sampling

of a given population are selected as

representatives of the entire

population

Problems in Sampling?

• What issues are you aware of?

• What questions do you have?

Why do we use samples ?

Get information from large populations

– At minimal cost

– At maximum speed

– At increased accuracy

– Using enhanced tools

Sampling

Precision

Cost

What we need to know

• Concepts

– Representative ness

– Sampling methods

– Choice of the right design

Key Sampling Concepts

Sampling and representative ness

Sampling

Population

Sample

Target Population

Sampling and representative

ness

Study on prevalence of gynecological infection in women

in Bagalore

Female

Women

population

of 4 city wards

PARAMETERS AND

STATISTICS

• A parameter is a numerical quantity that describes

some characteristic of a population. Parameters are

often estimated since their value is generally

unknown, especially when the population is large

enough that it is impossible or impractical to obtain

measurements for all observations. Parameters are

normally represented by Greek letters. The most

common parameters are the population mean and

variance,

from the observations in a sample. They are usually

represented by lowercase English letters with other

symbols. The sample mean and variance, two of the

most common statistics derived from samples, are

denoted by the symbols x and s2, respectively.

Population and Sample

Population Sample

Use statistics to

summarize

features

Use parameters to

summarize

features

Sampling Methods

• Stratified Random Sampling

• Cluster Sampling

• Systematic Sampling

• Convenience Sampling

• Judgment Sampling

Stratified Random Sampling

• The population is first divided into groups of

elements called strata.

• Each element in the population belongs to one

and only one stratum.

• Best results are obtained when the elements

within each stratum are as much alike as

possible (i.e. homogeneous group).

• A simple random sample is taken from each

stratum.

• Formulas are available for combining the

stratum sample results into one population

parameter estimate.

Stratified Random Sampling

• Advantage: If strata are

homogeneous, this method is as

“precise” as simple random sampling

but with a smaller total sample size.

• Example: The basis for forming the

strata might be department,

location, age, industry type, etc.

Cluster Sampling

• The population is first divided into separate

groups of elements called clusters.

• Ideally, each cluster is a representative small-

scale version of the population (i.e.

heterogeneous group).

• A simple random sample of the clusters is then

taken.

• All elements within each sampled (chosen) cluster

form the sample.

… continued

Cluster Sampling

• Advantage: The close proximity of elements can

be cost effective (I.e. many sample observations

can be obtained in a short time).

• Disadvantage: This method generally requires a

larger total sample size than simple or stratified

random sampling.

• Example: A primary application is area

sampling, where clusters are city blocks or other

well-defined areas.

Systematic Sampling

• If a sample size of n is desired from a

population containing N elements, we

might sample one element for every

n/N elements in the population.

• We randomly select one of the first n/N

elements from the population list.

• We then select every n/Nth element

that follows in the population list.

• This method has the properties of a

simple random sample, especially if the

list of the population elements is a

random ordering.

Systematic Sampling

• Advantage: The sample usually will

be easier to identify than it would be

if simple random sampling were

used.

• Example: Selecting every 100th

listing in a telephone book after the

first randomly selected listing.

Convenience Sampling

• It is a nonprobability sampling technique.

Items are included in the sample without

known probabilities of being selected.

• The sample is identified primarily by

convenience.

• Advantage: Sample selection and data

collection are relatively easy.

• Disadvantage: It is impossible to determine

how representative of the population the

sample is.

• Example: A professor conducting research

might use student volunteers to constitute a

sample.

Judgment Sampling

• The person most knowledgeable on the

subject of the study selects elements of the

population that he or she feels are most

representative of the population.

• It is a nonprobability sampling technique.

• Advantage: It is a relatively easy way of

selecting a sample.

• Disadvantage: The quality of the sample

results depends on the judgment of the person

selecting the sample.

• Example: A reporter might sample three or

four senators, judging them as reflecting the

general opinion of the senate.

Simple random sampling

• Principle

– Equal chance of

drawing each unit

• Procedure

– Number all units

– Randomly draw

units

Simple random sampling

Example: evaluate the prevalence of

tooth decay among the 1200

children attending a school

• Children numerated from 1 to 1200

• Sample size = 100 children

• Random sampling of 100 numbers

between 1 and 1200

How to randomly select?

Simple random sampling

Table of random numbers

57172 42088 70098 11333 26902 29959 43909 49607

33883 87680 28923 15659 09839 45817 89405 70743

77950 67344 10609 87119 15859 74577 42791 75889

11607 11596 01796 24498 17009 67119 00614 49529

56149 55678 38169 47228 49931 94303 67448 31286

80719 65101 77729 83949 83358 75230 56624 27549

93809 19505 82000 79068 45552 86776 48980 56684

40950 86216 48161 17646 24164 35513 94057 51834

12182 59744 65695 83710 41125 14291 74773 66391

13382 48076 73151 48724 35670 38453 63154 58116

38629 94576 48859 75654 17152 66516 78796 73099

60728 32063 12431 23898 23683 10853 04038 75246

01881 99056 46747 08846 01331 88163 74462 14551

23094 29831 95387 23917 07421 97869 88092 72201

15243 21100 48125 05243 16181 39641 36970 99522

53501 58431 68149 25405 23463 49168 02048 31522

07698 24181 01161 01527 17046 31460 91507 16050

22921 25930 79579 43488 13211 71120 91715 49881

68127 00501 37484 99278 28751 80855 02035 10910

55309 10713 36439 65660 72554 77021 46279 22705

92034 90892 69853 06175 61221 76825 18239 47687

50612 84077 41387 54107 09190 74305 68196 75634

81415 98504 32168 17822 49946 37545 47201 85224

38461 44528 30953 08633 08049 68698 08759 45611

07556 24587 88753 71626 64864 54986 38964 83534

60557 50031 75829 05622 30237 77795 41870 26300

Systematic sampling

• N = 1200, and n = 60

⇒ sampling fraction = 1200/60 = 20

• List persons from 1 to 1200

• Randomly select a number between 1 and 20

(ex : 8)

⇒ 1st person selected = the 8th on the

list

⇒ 2nd person = 8 + 20 = the 28th

etc .....

Systematic sampling

1 2 3 4 5 6 7 8 9 10 11 12

13 14 15

16 17 18 19 20 21 22 23 24 25 26 27

28 29 30

31 32 33 34 35 36 37 38 39 40 41 42

43 44 45

46 47 48 49 50 51 52 53 54 55 ……..

Systematic sampling

Example: systematic sampling

Stratified sampling

• Principle :

internally homogeneous

subgroups (strata)

–Draw sample in each strata

–Combine results of all strata

Example: Stratified

sampling

• Determine vaccination coverage in

a country

• One sample drawn in each region

• Estimates calculated for each

stratum

• Each stratum weighted to obtain

estimate for country (average)

Stratified sampling

• Advantages

– More precise if variable associated

with strata

– All subgroups represented,

allowing separate conclusions

about each of them

• Disadvantages

– Sampling error difficult to measure

– Loss of precision if very small

numbers sampled in individual

strata

Cluster sampling

• Principle

(“clusters”) of units

–In selected clusters, all units or

proportion (sample) of units

included

Example: Cluster sampling

Section 1 Section 2

Section 3

Section 5

Section 4

Cluster sampling

• Advantages

– Simple as complete list of sampling

units within population not required

– Less travel/resources required

• Disadvantages

– Imprecise if clusters homogeneous

and therefore sample variation

greater than population variation

(large design effect)

– Sampling error difficult to measure

Sample design

• The focus of the design for a sample

must be on the magnitude of the

standard errors of sampling not than

on an arbitrary percentage of the

target population.

• The standard errors are used to

calculate confidence intervals around

the sample data.

Standard Error of Mean

– Measures Scatter in All Sample Means,X

σ

σx =

n

Properties of

Sampling

Distribution of

Mean

Properties of Sampling

Distribution of Mean

• 1. Unbiasedness

– Mean of Sampling Distribution Equals

Population Mean

• 2. Efficiency

– Sample Mean Comes Closer to Population

Mean Than Any Other Unbiased Estimator

• 3. Consistency

– As Sample Size Increases, Variation of

Sample Mean from Population Mean

Decreases

Unbiasedness

P( X)

Unbiased Biased

A C

µ X

Efficiency

P( X) Sampling

distribution

of mean B

Sampling

A distribution

of median

µ X

Consistency

P( X) Larger

sample

size

B

Smaller

A sample

size

µ X

Sampling from

Normal

Populations

Sampling from

Normal Populations

•Central Tendency

Population Distribution

µx = µ σ = 10

•Dispersion

σ µ = 50 X

σx =

n Sampling Distribution

Sampling with n=4 n =16

replacement σ X = 5 σ X = 2.5

µ X- = 50 X

Standardizing Sampling

Distribution of Mean

X − µx X − µ

Z= =

σx σ

Sampling n Standardized

Distribution Normal Distribution

σ X σ =1

µ X X µ =0 Z

Thinking Challenge

•You’re an operations

analyst for AT&T. Long-

distance telephone calls

are normally distribution

with µ = 8 min. & σ = 2

min. If you select random

samples of 25 calls, what

percentage of the sample

means would be between

7.8 & 8.2 minutes?

© 1984-1994 T/Maker Co.

Sampling Distribution

Solution*

X − µ 7.8 − 8

Z= = = − .50

σ n 2 25

X − µ 8.2 − 8

Sampling Z= = = .50 Standardized

σ n 2 25

Distribution Normal Distribution

σ X = .4 σ =1

.3830

.1915 .1915

Sampling from

Non-Normal

Populations

Sampling from

Non-Normal Populations

•Central Tendency

Population Distribution

µx = µ σ = 10

•Dispersion

σ µ = 50 X

σx =

n Sampling Distribution

– Sampling with n=4 n =30

replacement σ X = 5 σ X = 1.8

µ X- = 50 X

Central Limit

Theorem

Central Limit Theorem

As

sample

size gets

large

enough

(n ≥ 30) ...

X

Central Limit Theorem

As

sample sampling

size gets distribution

large becomes

enough almost

(n ≥ 30) ... normal.

X

Central Limit Theorem

σ

As σx =

sample n

sampling

size gets distribution

large becomes

enough almost

(n ≥ 30) ... normal.

X

µx = µ

Skip section### Trending

- The Graveyard BookNeil Gaiman
- Bird Box: A NovelJosh Malerman
- Point of Retreat: A NovelColleen Hoover
- One Night of SinElle Kennedy
- The Black IceMichael Connelly
- Bad Kitty Gets a BathNick Bruel
- The Bell Jar: A NovelSylvia Plath
- Queen of ShadowsSarah J. Maas
- Diary of a Wimpy KidJeff Kinney
- The Return of the King: Book Three in the Lord of the Rings TrilogyJ.R.R. Tolkien
- The Long WalkStephen King
- Wizard and Glass: The Dark Tower IVStephen King
- The 5 Love Languages: The Secret to Love that LastsGary Chapman
- Orphan XGregg Hurwitz
- All the Missing Girls: A NovelMegan Miranda
- Braiding Sweetgrass: Indigenous Wisdom, Scientific Knowledge and the Teachings of PlantsRobin Wall Kimmerer
- Always and Forever, Lara JeanJenny Han
- RoomiesChristina Lauren
- Dork Diaries 13: Tales from a Not-So-Happy BirthdayRachel Renee Russell
- Can't Hurt Me: Master Your Mind and Defy the OddsDavid Goggins
- Pete the Cat and the Perfect Pizza PartyJames Dean
- This Tender Land: A NovelWilliam Kent Krueger
- Butterface (A Hot Romantic Comedy)Avery Flynn
- How to Destroy America in Three Easy StepsBen Shapiro