# Sampling

Dr. Mary Wolfinbarger Marketing Research

Sample vs. Census

Census -- every population member included s With sampling, researcher infers population characteristics from a sample

s

Why sample?

Saves money s Saves time s A sample can be more accurate; it has fewer “nonsampling” errors than a census

s

Sampling terms

Population (or Universe): a complete listing of a set of elements having a given characteristic(s) of interest An example of population definition: s Americans s Registered Voters s Voters s Swing voters (Which is more relevant to politicians?)

s

Sampling terms

Element: Unit about which information is sought Most common units in marketing: s individuals s households s Sudman and Blair suggest a conceptual sample: sales dollars or potential sales dollars

Sampling terms

Sample Frame: A list of population members s May get a complete listing of population, but often population and sample frame are different s Example: “American recreational tennis players?” s Differences between the sample frame and population: “sample frame error”

Sampling terms

Parameter: The actual characteristic of the population, the true value of which can only be known by taking an error-free census s Statistic: The estimate of a characteristic obtained from the sample

s

Sampling terms

Non-response error: error created when chosen sample members who do not participate Non-response creates two problems: s Need a larger initial sample size to allow for non-response s More seriously, non-respondents may differ from respondents (“questionnaire freaks?”)

Sample Types

Two broad categories: s Probability: each population element has a known, non-zero chance of being included in the sample s Non-probability: cannot mathematically estimate the probability of a population element being included in the sample

Sample Types

Statistician’s opinion: all N-P samples are worthless because you cannot estimate the degree to which your results are generalizable s So, why are N-P samples ever used?

s

**Non-probability Samples
**

Convenience s Judgment s Quota

s

Convenience Samples

“Accidental samples” -- those in sample are where the data is being collected s One major form in marketing: “Mall Intercept” s What do statisticians think? “Rarely do samples selected on a convenience sample basis, regardless of size, prove representative, and are not recommended for descriptive or causal research.”

s

Convenience Samples

I agree, but…. Minimizing drawbacks of convenience samples: s compare sample characteristics and findings to those collected on a census/random sample basis s speculate intelligently about bias, and how it is likely to have affected results

s

Convenience Samples

When possible, collect the sample where your population is likely to be (retailers collecting in-store surveys) s Cultivate diversity in the sample (e.g. mall intercept using multiple locations) s May be better at understanding relationships between variables than at making descriptive estimates

s

Judgment Samples

Also called purposive sampling s Sample elements are hand picked because it is felt that they are representative of some population of interest s Typically a small sample (maybe as small as 10) in which the researcher tries to represent all groups or segments from the population

s

Judgment Samples

Snowball design: a special form of judgment sample s Appropriate for small specialized populations s Each respondent is asked to identify one or more other population members

s

Judgment Samples

Drawbacks? s Those with more ties to sample members are selected s Similar people are more likely to be named

s

Quota Sampling

s

Attempt to be representative by selecting sample elements in proportion to their known incidence in the population

Quota Sampling

Example: Surveying undergraduate students about campus food services s Step 1: Identify attributes researcher believes is important, e. g. sex and class level s Step 2: Look at incidence of sex and class level in population

Quota Sampling

Class Level Freshmen Sophomores Juniors Seniors 3200 2600 2200 2000

Sex Males 4500 Females 5500

If I sample 100, how many of each type do I select?

Quota Sampling

Don’t be fooled -- relies on personal, subjective selection of quota attributes s The sample can still be non-representative with respect to some other characteristic (e.g. in this example, perhaps race) s I plead guilty -- I have sinned -- and will do so again -- …….so shoot me………….

s

Probability Sampling

Does not guarantee representativeness, but does allow for the assessment of sampling error s Sampling error: error that occurs because a sample rather than a census is used

s

**Simple Random Sampling (SRS)
**

Each sample element has a known, non-zero, equal chance of being selected s Example: Lottery numbers s Or, put everyone’s name in a hat s Major polling firms use random digit dialing to approximate random samples s Or, use a random numbers table (actually pseudo-random I’m told)

s

Systematic Sampling

Systematically spreads sample through a list of population members s Example: If a population contained 10,000 people, and need a size of 1000, select every 10th list name s In nearly all practical examples, the procedure results in a sample equivalent to SRS

s

Systematic Sampling

s

Only exception: when there are “regularities” in the list

Systematic Sampling

Another application of systematic sampling: s select a number of millimeters or inches down a page or column that will be selected (it’s easier than counting!)

Stratified Sampling

Information about subgroups in the sample frame is used to improve the efficiency of the sample plan

Stratified Sampling

Three major reasons to use

s

s

s

Some subgroups are more homogenous than others so fewer numbers are needed for those groups to obtain the same level of precision Group comparison is the purpose of the study (disproportionate stratified sampling) Some elements are more important in determining outcome of research interest than are others

**How is this different from quota sampling?
**

s

Within strata, selection of sample elements is random, not first available

**Bad Uses of Stratification
**

To satisfy people distrustful that random sampling will not be representative s To correct for MAJOR problems with survey cooperation

s

Poststratification is OK

Is done after sampling s Corrects for MINOR differences between sample and population produced by noncooperation

s

**Area (or Cluster) Sampling
**

Elements are geographically grouped into relatively homogenous clusters (e.g. a city is divided into 40 areas) s From these areas, 10 are randomly selected s From these larger areas, blocks within areas will be randomly selected s Within each block, attempt to survey each household

s

**Area (or Cluster) Sampling
**

Especially useful for door-to-door personal surveys (significantly reduces costs) s However, clustering increases sampling errors (people who live close together tend to be more similar) s Statistics formula suggests in marketing research 20-25 clusters is appropriate with 20-25 observations per site

s

Determining Sample Size

**Ad Hoc Methods (non-statistical)
**

Rules of thumb: Collect sample size large enough so that when divided into groups, each group will have a minimum sample of 100 or so (Sudman) s Budget constraints: calculate the cost of interview and data analysis per respondent. Divide total budget by this amount to get maximum sample size.

s

**Ad Hoc Methods (non-statistical)
**

s

Comparable studies: Find similar studies which are successful and getting sufficiently reliable results

**Most general formula
**

Total sampling error= desired confidence level (Z)*standard deviation of sample (SD)/sample size (N) s Sampling error: the standard deviation of the distribution of sample means s Sampling error is expressed as an absolute, and is not a percentage: it is the amount your measurement is from the true value

s

**Re-arranging Algebraically
**

N=Z2σ 2/(sampling error )2 Where N=sample size Z=z score from normal curve table (1.96 for a confidence interval of 95%) σ =standard deviation (obtained from previous survey or estimated, e. g. 95% of responses fall between 3 and 5, so 1 SD=.5)

Example:

For example, if allowable sampling error = . 20 (on a 7 point scale), SD=1.34, and a confidence interval of .05 is being used, s N=1.962*1.342/.202 s N=172

s

**What this formula suggests
**

If the sample is more varied, a larger sample is required s If more precision is required, a larger sample is necessary s If a small confidence interval is desired, a larger sample is necessary s The increase required to achieve ever more precision and confidence increases at an increasing rate!

s