Why sampling:

Time available to take decision Cost of gathering data Reasonable accuracy of information Destructive testing

Extensively used in MR, QC, economic, biological & pharmaceutical studies

Sampling Errors

Sampling error Non-sampling error

Sample size

Census

Sampling Errors

**Sampling errors: variation in
**

Mean and standard deviation of the survey sample against the population Inaccurate reporting by respondents Poor sampling design Misinterpretation of questions Respondents lying

**Non-sampling errors:
**

Sampling Process

**Defining population to be sampled
**

Element: Unit about which information is collected (consumer, company, dealer, household) Population: Aggregation of elements, relevant segment Sampling Unit: Elements available for selection at some stage of sampling process Survey population: Aggregation of elements from which actual survey sample is chosen

Defining frame (boundaries): Subset of population; geographical or within some published or available data like Rotary and such Clubs Method of selecting sample units Decide on size of sample Identifying & selecting actual members of sample

**Determining Sample Size
**

**e = tolerable error acceptable Confidence level:
**

90% = 1.645x (1.645 sigma/std deviation) 95% = 1.96x 99% = 2.58x

**Approximate estimate of standard deviation:
**

=(Maximum – minimum)/6

**Sampling distribution of sample means
**

3x=99.73%

1x=68.27% 2x=95.45% 3x 2x 1x 1x 2x 3x

**Determining Sample Size
**

**For continuous or interval scaled variables like: 1-5, 1-7, 1-10 etc
**

n = ((z*s)/e)2 where z = desired confidence level If 90% = 1.645 If 95% = 1.96 If 99% = 2.58 s = standard deviation; (max-min)/6 e = tolerable error in estimating the variable

Sample Sizes

Interval Scaled Variable: 1 to 7

Error level

Confidence level

Confidence level

Confidence level

0.10 0.05 0.02 0.01

90% 270 1080 6800 27000

95% 385 1540 9600 38500

99% 670 2680 16600 67000

**Determining Sample Size
**

**When estimating proportions
**

**n = p*q*(z/e)2 , where, p = Frequency of occurrence expressed as proportion. Example:
**

1 in 4 = 0.25 1 in 10 = 0.10 Represents things like market share or proportion of target market with respect to variables like age, gender, profession etc ‘p’ is always less than 1

**q = 1-p z = confidence level factor e = tolerable error expressed in (%/100)
**

3% error = 0.03 5% error = 0.05

Sample size is maximum at p=0.50 for a given z and e

Sample Sizes

Proportions:

e=error level in % at various FOQ: Frequency of Occurrence

Sample size

**Confidence level 90%
**

FOQ 10 FOQ 20 FOQ 30

**Confidence level 95%
**

FOQ 10 FOQ 20 FOQ 30

**Confidence level 99%
**

FOQ 10 FOQ 20 FOQ 30

50 100 500 1000 5000

7.8 10.7 12.3 6.2 4.7 7.2 1.8 1.4 2.1 1.0 0.7 0.4 0.6 0.6

8.5 11.4 13.0 8.0 6.0 9.2 2.7 4.1 1.9 3.6 2.6

9.0 12.0 13.7 9.8 7.3 11.2 5.4 4.0 6.1 2.8 4.3 1.3 3.8 1.7

2.9 0.85 1.1

**Determining Sample Size
**

**Cell size analysis:
**

Sample size should be more than 10 times the required cell Cell: Total category market combination; say age=4groups, income category=4groups; then sample size > 4*4*10 >160

In multiple questions with varying interval scaled variables, set the sample for the major variable If wider geographical coverage is required, insist on minimum sample size at each centre (if sample size obtained from formula become small) Time and budget constraint

Probability Sampling

Simple Random:

Picking out of lot by random Possible for smaller population Subdivide the population by sample size and choose at random one each from the unit You need to select at random 100 out of 2000; No of units=2000/100=20 For every 20 choose 1 or in the first unit let us say we picked 6, then add 20 like 26,36,46 etc

Systematic Sampling

Probability Sampling

**Stratified Random Sampling:
**

**Proportionate: Dividing into segments based on:
**

% of each segment (wi) Standard deviation of each segment (si)

n = (z/e)2 * sum (wi*si2)

Probability Sampling

**Stratified Random Sampling:
**

**Ex: z=1.96 for Conf. Level 95%; e=0.05 error segment distribution:
**

< 25 years w1=0.3 26 to 40 years w2=0.3 >40years w3=0.4

s1=1.2 s2=0.9 s3=0.7

**Total sample size = 1341 approx
**

Sample size for <25 years = 666 Sample size for 26 to 40 years = 375 Sample size for >40 years = 300

Probability Sampling

**Stratified Random Sampling:
**

Type of store Corporate chains

**Disproportionate: Used in special cases to balance:
**

% of allfood stores 8%

% retail food sales 26% 30% 16% 19% 9%

Desired z & e Degree of heterogeneity Relevance of various strata to the study

Co-operatives 10% Large independent Medium independent Small independent 12% 30% 40%

Probability Sampling

**Cluster Sampling / Area Sampling:
**

In the first stage clusters are identified and selected Sample elements are selected from these clusters Disadvantage: Clusters tend to behave similar Combining cluster sampling and stratified sampling

**Multi-stage or Combination Sampling:
**

**Selecting Sample Units
**

**Non-probability samples
**

Convenience Sampling: on the basis of convenience or accessibility Snowball Sampling: Further samples relying on referrals of the earlier sample units Judgement Sampling: Opinion based on recommendation of experts or by our own assessment of spread of population based on previous studies or data Quota Control Sampling: conforms to chosen parameters of population Census: Total population

