You are on page 1of 36

Sampling: Design and Procedures

The Sampling Design Process

Define the Population Determine the Sampling Frame Select Sampling Technique(s) Determine the Sample Size Execute the Sampling Process

Define the Target Population
The target population is the collection of elements or objects that possess the information sought by the researcher and about which inferences are to be made. The target population should be defined in terms of elements, sampling units, extent, and time. ± An element is the object about which or from which the information is desired, e.g., the respondent. ± A sampling unit is a unit containing the element, that is available for selection at some stage of the sampling process. For e.g. individuals living in a household. ± Extent refers to the geographical boundaries. ± Time is the time period under consideration.

Define the Sample Size Important qualitative factors in determining the sample size ± the importance of the decision (more important decision larger the sample size) ± the nature of the research (for exploratory research designs the sample size is small) ± the number of variables (to cover large number of variables large sample size is required) ± the nature of the analysis (for multivariate analysis techniques large sample size is required) ± sample sizes used in similar studies ( table in the next slide gives some indication of the sample size) ± incidence rates ( eligible respondents) ± completion rates (mailed questionnaire has less completion rate) .

pricing) Product tests Test marketing studies TV. market potential) Problem-solving research (e.Determine the Sampling Size Type of Study Minimum Size Typical Range Problem identification research (e.000-2. radio.g.g. or print advertising (per commercial or ad tested) Test-market audits Focus groups 500 200 1.500 300-500 200 200 150 10 stores 2 groups 300-500 300-500 200-300 10-20 stores 4-12 groups .

.Determine the Sampling Frame ‡ Sampling Frame: It is the representation of the elements of the target population. It consists of a list or set of directions for identifying the target population.

Classification of Sampling Techniques Sampling Techniques Non-probability Sampling Techniques Probability Sampling Techniques Convenience Sampling Judgmental Sampling Quota Sampling Snowball Sampling Simple Random Sampling Systematic Sampling Stratified Sampling Cluster Sampling Other Sampling Techniques .

Non-Probability Sampling Techniques .

Convenience Sampling Convenience sampling attempts to obtain a sample of convenient elements. ± use of students. respondents are selected because they happen to be in the right place at the right time. Often. and members of social organizations ± mall intercept interviews without qualifying the respondents ± department stores using charge account lists ± people on the street interviews .

± test markets ± purchase engineers selected in industrial marketing research ± expert witnesses used in court .Judgmental Sampling Judgmental sampling is a form of convenience sampling in which the population elements are selected based on the judgment of the researcher.

of population elements. ± In the second stage. convenience or judgment. sample elements are selected based on pro-rata. ± The first stage consists of developing control categories.Quota Sampling Quota sampling may be viewed as two-stage restricted judgmental sampling. or quotas. .

usually at random. . ± Subsequent respondents are selected based on the referrals. these respondents are asked to identify others who belong to the target population of interest. ± After being interviewed.Snowball Sampling In snowball sampling. an initial group of respondents is selected.

Probability Sampling Techniques .

Simple Random Sampling (SRS) ‡ Each element in the population has a known and equal probability of selection. ‡ This implies that every element is selected independently of every other element. ‡ Each possible sample of a given size (n) has a known and equal probability of being the sample actually selected. .

‡ The sampling interval. ‡ When the ordering of the elements is related to the characteristic of interest. i. systematic sampling increases the representativeness of the sample.Systematic Sampling ‡ The sample is chosen by selecting a random starting point and then picking every ith element in succession from the sampling frame. . ‡ If the ordering of the elements produces a cyclical pattern. systematic sampling may decrease the representativeness of the sample. is determined by dividing the population size N by the sample size n and rounding to the nearest integer.

elements are selected from each stratum by a random procedure. usually SRS. or strata.Stratified Sampling ‡ A two-step process in which the population is partitioned into subpopulations. ‡ The strata should be mutually exclusive and collectively exhaustive in that every population element should be assigned to one and only one stratum and no population elements should be omitted. ‡ Next. ‡ A major objective of stratified sampling is to increase precision without increasing cost. .

but the elements in different strata should be as heterogeneous as possible. Crucially. the size of the sample from each stratum is proportionate to the relative size of that stratum and to the standard deviation of the distribution of the characteristic of interest among all the elements in that stratum. ‡ The stratification variables should also be closely related to the characteristic of interest. the size of the sample drawn from each stratum is proportionate to the relative size of that stratum in the total population. ‡ In proportionate stratified sampling.Stratified Sampling ‡ The elements within a stratum should be homogeneous vis-à-vis a particular attribute. ‡ In disproportionate stratified sampling. the sampling fraction is not the same within all strata: some strata are over-sampled relative to others. .

and ± To increase the precision of key survey estimates. NOTE: Disproportionate stratification will only reduce standard errors (relative to a proportionate stratified sample) if the population standard deviation for the variable of interest is higher than average within the over-sampled strata. .Stratified Sampling ‡ Disproportionate stratification is used for two purposes: ± To give larger than proportionate sample sizes in one or more sub-groups so that separate analyses by subgroup will be possible.

‡ Elements within a cluster should be as heterogeneous as possible. but clusters themselves should be as homogeneous as possible. based on a probability sampling technique such as SRS. ‡ For each selected cluster. Ideally. or clusters. Thus in the first stage large clusters are more likely to be included. the clusters are sampled with probability proportional to size. either all the elements are included in the sample (one-stage) or a sample of elements is drawn probabilistically (two-stage). . each cluster should be a small-scale representation of the population. the probability of selecting a sampling unit in a selected cluster varies inversely with the size of the cluster. In the second stage.Cluster Sampling ‡ The target population is first divided into mutually exclusive and collectively exhaustive subpopulations. ‡ Then a random sample of clusters is selected. ‡ In probability proportionate to size sampling (PPS).

Types of Cluster Sampling Cluster Sampling One-Stage Sampling Two-Stage Sampling Multistage Sampling Simple Cluster Sampling Probability Proportionate to Size Sampling .

‡ ‡ When to Use Cluster Sampling Cluster sampling should be used only when it is economically justified . However. to conduct personal interviews of operating room nurses. the interviewer could conduct many interviews in a single day at a single hospital. etc. Using cluster sampling. it would be possible to randomly select a subset of stores (stage 1 of cluster sampling) and then interview a random sample of customers who visit those stores (stage 2 of cluster sampling). it might make sense to randomly select a sample of hospitals (stage 1 of cluster sampling) and then interview all of the operating room nurses at that hospital. might require the interviewer to spend all day traveling to conduct a single interview at a single hospital. This is most likely to occur in the following situations: ± Constructing a complete list of population elements is difficult. it may not be possible to list all of the customers of a chain of hardware stores.). For example. Simple random sampling. schools. in contrast. ± The population is concentrated in "natural" clusters (city blocks. hospitals. . For example. or impossible.when reduced costs can be used to overcome losses in precision. costly.

not time-consuming Sample can be controlled for certain characteristics Can estimate rare characteristics Easily understood. easier to implement than SRS. difficult to compute and interpret results . results projectable Can increase representativeness. most convenient Low cost. sample not representative. no assurance of representativeness. lower precision. sampling frame not necessary Include all important subpopulations. no assurance of representativeness Time-consuming Difficult to construct sampling frame. not feasible to stratify on many variables. expensive Imprecise. not recommended for descriptive or causal research Does not allow generalization. precision Easy to implement. convenient. Can decrease representativeness Probability sampling Simple random sampling (SRS) Systematic sampling Stratified sampling Cluster sampling Difficult to select relevant stratification variables. subjective Selection bias. cost effective Weaknesses Selection bias. least time-consuming. expensive.Strengths and Weaknesses of Basic Sampling Techniques Technique Nonprobability Sampling Convenience sampling Judgmental sampling Quota sampling Snowball sampling Strengths Least expensive.

Choosing Nonprobability vs. Probability Sampling Factors Conditions Favoring the Use of Nonprobability Probability sampling sampling Exploratory Conclusive Nature of research Relative magnitude of sampling and nonsampling errors Nonsampling errors are larger Homogeneous (low) Unfavorable Sampling errors are larger Heterogeneous (high) Favorable Variability in the population Statistical considerations Operational considerations Favorable Unfavorable .

. ‡ Sometimes. That is a type of univariate analysis. all a researcher wants to do is report the data on one variable. you might want to report the number of high school dropouts in New Delhi since 2005.Univariate Analysis UNIVARIATE ANALYSIS ‡ Definition: Method for analyzing data on a single variable at a time. For example.

. together with a tabulation of the number of observations in each category. ‡ An example of a frequency is the following table of a frequency distribution of the POLVIEWS that asks the respondent to classify her/his political views.Univariate Analysis Univariate method ‡ Frequencies: A frequency distribution is a listing of categories of possible values for a variable.

As you can see. most people consider themselves to be moderate (202 people). The frequency column contains the number of observations that correspond to each category. .Univariate Analysis Value Frequency Extremely Liberal Liberal Slightly Liberal Moderate Slightly Conservative Conservative Extremely Conservative Don t Know NA Total 1 2 3 4 5 6 7 8 9 12 63 62 202 71 59 9 21 1 500 Sample Table 1: POLVIEWS THINK OF SELF AS LIBERAL OR CONSERVATIVE. The values column contains the numbers arbitrarily assigned to this ordinal variable.

2 100.9 12.0 Cumulative Percent 2.2 .0 42.3 1.0 .8 1.9 Missing Missing 100.9 85.6 12.7 28.4 40.4 12.2 11.7 70.5 15.3 14.Sample Table 2 Value Frequency Extremely Liberal Liberal Slightly Liberal Moderate Slightly Conservative Conservative Extremely Conservative Don t Know NA Total 1 2 3 4 5 6 7 8 9 12 63 62 202 71 59 9 21 1 500 Percent 2.4 14.8 98.1 100.8 4.2 13.5 13.0 Valid Percent 2.

The valid percent column calculates the percentages without the missing data being included. Now. ‡ We can use frequencies on any type of variable. such as interval variables): ± Histograms ± Pie Charts ± Bar Diagrams . we can say that 42% of the sample were moderate.Univariate Analysis ‡ In Sample Table 2. ‡ The relative frequency is usually much more useful than the frequency itself because it puts the raw number into perspective. ‡ Other methods (They are better for variables that have multiple response categories. The cumulative percent tallies the cumulative percentage of the valid percentages. the percent column provides the relative frequency for each category.

‡ A/B tests are usually performed to determine the better of two content variations. multivariate testing or multi-variable testing is a technique for testing hypotheses on complex multi-variable systems. The only limits on the number of combinations and the number of variables in a multivariate test are the amount of time it will take to get a statistically valid sample of visitors and computational power. especially used in testing market perceptions. It can be thought of in simple terms as numerous A/B tests performed on one page at the same time. multivariate testing can theoretically test the effectiveness of limitless combinations. multivariate hypothesis testing is a process by which more than one component of a product/service may be tested in a live environment. ‡ In marketing. .Multivariate Analysis Multivariate testing ‡ In statistics.

‡ . However.g.Multivariate Analysis ‡ ‡ Multivariate hypothesis testing is usually employed in order to ascertain which content or creative variation produces the best improvement in the pre-defined. in case of the User Experience Testing for a website. it is possible to identify those elements that consistently tend to produce the greatest increase in user convenience. not all elements produce the same increase in user convenience. and by looking at the results from different tests. form layouts and even landing page images and background colours. dramatic increase in user convenience can be seen through testing different copy text. For e.

‡ In a nutshell. multivariate testing can be seen as allowing respondents to vote for the preferred option and will stand the most chance of them proceeding to a defined goal. These services insert their content to predefined areas of a site and monitor user behavior.Multivariate Analysis ‡ Testing can be carried out on a dynamically generated website by setting up the server to display the different variations of content in equal proportions to incoming visitors. . Outsourced services can also be used to provide multivariate testing on websites with minor changes to page coding. Statistics on how each visitor went on to behave after seeing the content under test must then be gathered and presented.

‡ These statistical tools are: ± Correlation Analysis ± Regression Analysis .Measurements of Association ‡ In certain situations. we are interested to determine whether some association exists among the groups of data and the intensity of this association. ‡ We make use of certain statistical tools to determine whether any association exists among groups of data.

Measurements of Association Correlation Analysis ‡ It is a statistical tool with the help of which we determine the intensity of relationship between two or more than two variables. ‡ To determine correlation. ‡ Two variables are said to be correlated if the movements in one are accompanied by the movements in the other. we could use: ‡ ‡ Karl Pearson s Coefficient of Correlation Rank Correlation . ‡ For example: With the increase of rainfall up to a certain extent the production of rice increases. ‡ The Correlation Coefficient is a measure of correlation and summarises the direction and degree of correlation.

‡ The value of r will lie between -1 and +1.Measurements of Association ‡ Karl Pearson s Coefficient of Correlation: It is denoted by r. ‡ The formula used to find r is as follows: Where X and Y are two variables under study and X and Y are their respective means. ‡ A positive sign and negative sign for r have significant implications. .

‡ Then for these items of individuals the Rank Correlation Coefficient is determined. ‡ The formula used is: Where d = Differences of rank d = paired items in two series betweenX-Y n = No.Measurements of Association Rank Correlation ‡ It is also known as Spearman s Rank Correlation method. ‡ This method holds good when quantitative measures of certain factors such as evaluation of salesmen s leadership ability cannot be ascertained. of units .

Measurements of Association Regression Analysis ‡ It is a statistical tool which enables a researcher to estimate or predict the values of unknown variables from the known values of another variable. a marketing researcher is able to determine the average probable change in one variable given a certain amount of change in another variable. we can say that with the help of regression analysis.e. i. ‡ The regression analysis enables us to find the cause and effect relationship between variables. he can predict the amounts of sales corresponding to each advertising expenditure or vice versa. ‡ For example: If a marketer has got the values on advertising expenditure with regard to a company s product. .