You are on page 1of 38


Why Sampling

The basis logic behind sampling is that, in most cases, the underlying patterns in a population become clear after a certain section or subgroup has been examined, thus making a complete census unnecessary. In simple, this is the basic idea behind sampling- by studying a sub-group of a population, the characteristics of the population can be ascertained.

The benefits derived from sampling

Reduced costs Reduced time Greater accuracy Greater flexibility of scope

Sampling Frame
Population Sampling Frame Sampling Unit Sampling Element

Sampling Method
Sample Size Sampling Plan Sample Selection


This is not the entire population of a given geographical area, but the pre-defined set of potential respondents (elements) in a geographical area.

For example, a population may be defined as all mothers who buy branded baby food in a given area or "all teenagers who watch MTV in the country" or all adult males who have heard about or use the AQUAFRESH brand of toothpaste or all MBA students for the Research Methodology

Primary Group of Population MBA students and statistics students Secondary Group of Population or Alternative population Undergraduate management students, libraries of business schools and statistics, teacher who teach this subject

Sampling Frame
This is a subset of the defined target population, from which we can realistically select a sample for our research. Census list, Telephonic directories, lists of subscribers to magazines, members of an association (example HRD Association/AIMA) and database of customers maintained by various corporations are all examples of sampling frame.

Sampling Unit

In a survey to assess brand preferences of refrigerators, should we interview households or dealers?

Sampling Element

This is the unit about which information is sought by the marketing researcher for further analysis and action. The most common sampling element in marketing research is a human respondent who could be a consumer, a potential consumer, a dealer or a person exposed to an advertisement, etc. But some other possible elements for a study could be companies, families or households, retail stores and so on. At the last stage, we reach the individual sampling element the respondent we wanted to meet.

Sampling Element

In the preceding study on refrigerators, assuming that the household is identified as the sampling unit, who should be interviewed the housewife, the head of the household, or the entire family? The number of people in a household is determined, and a random number is chosen to selected a particular person as a respondent.

The Sample Size Calculation It is not a formula alone that determines sample size in actual marketing research. Sampling in practice is based on science, but is also an art.

The basic assumptions made while computing sample sizes through the use of formulae are sometimes not met in practice. At other times, there are other factors which are influential in increasing or decreasing sample sizes obtained through the use of formulae. In simple sense one percent of the population considered for research is the sample size For now, remember that sample size is decided based on
use of formulae, experience of similar studies, time and budget constraints, output or analysis requirements, number of segments of the target population, number of centres where the study is conducted, etc.

There are two formulas depending on variable type, used for computing sample size for a study. The first is used when the critical variable studied is an interval-scaled one. We will study only this formula Formula for Sample Size Calculation when Estimating Means (for Continuous or Interval Scaled Variables)

The formula for computing n, the sample size required to do the study, is

Z s n = ---------e Let us examine one by one what the quantities Z, s, and e represent.

Z :The Z value represents the Z score from the standard normal distribution for the confidence level desired by the researcher. For example, a 95 percent confidence level would indicate (from a standard normal distribution for a 2sided probability value of 0.95) a z score of 1.96. Similarly, if the researcher desires a 90 percent confidence level, the corresponding z score would be 1.645 (again, from the standard normal distribution, for a 2 sided probability of 0.90). Generally, 90 or 95 percent confidence is adequate for most marketing research studies. A 100 percent confidence level is not practical, as it means we have to take a census of the entire population, instead of using a sample. We will use z = 1.96, equivalent to a 95 percent confidence level, in our example.

s : The s represents the population standard deviation for the variable which we are trying to measure from the study. By definition, this is an unknown quantity, since we have not taken a sample yet. So, the question of knowing the value of s, the sample standard deviation, does not arise. However, we can use a rough estimate of the sample standard deviation for the variable being measured. This estimate can be obtained in the following ways If past studies have measured this variable, we can use the standard deviation of the variable from one of the studies from the recent past. It serves as a good approximation. A very small sample can be taken as a test or pilot sample, only for the purpose of roughly estimating the sample standard deviation of the concerned variable. If the minimum and maximum values of the variable can be estimated, then the range of the variables values is known. Range = Maximum value Minimum value. Assuming that in practically all variables, 99.7 percent of the values of the variables would lie within + 3 standard deviations of the mean, we could get an approximate value of the standard deviation by dividing the range by 6. The logic of this is that Range is equal to 6 standard deviations for most variables. Therefore, Range, when divided by 6, should give a fairly good estimate of the standard deviation.

e : The third value required for calculating the sample size required for the study is e, called tolerable error in estimating the variable in question. This can be decided only by the researcher or his sponsor for the study. The lower the tolerance, the higher will be the sample size. The higher the tolerable error, the smaller will be the sample size required. Now, let us take an example of the use of the above formula, to see how it works. Let us assume we are doing a customer satisfaction study for a washing machine. We are measuring satisfaction on a scale of 1 to 10. 1 represents "Not at all satisfied", and 10 represents "Completely Satisfied". The scale would look like this on a questionnaire


Customer Satisfaction Scale We will assume that the questionnaire consists only of 7-8 questions, all of them using this 10-point scale. Therefore, the variable we are trying to measure or estimate through the survey, is Customer Satisfaction, which is being measured on a 10 point interval scale.

We will apply the formula discussed for sample size calculation, and check for its usefulness. Zs

is the formula, for variables which are continuous, or scaled.

Z Let us assume we want a 95 percent confidence level in our estimate of customer satisfaction level from the study. Then, from the standard normal distribution tables, (for a 2-sided probability value of 0.95), the Z value is 1.96. s Let us assume that such a customer satisfaction study was not conducted in the past by us. We have no idea of the standard deviation of the variable Customer Satisfaction. We can then use the rough approximation of Range divided by 6 to estimate the sample standard deviation. In this case, the lowest value of customer satisfaction is 1, and the highest value is 10. Thus, the Range of values for this variable is 101 = 9. Therefore, the estimated sample standard deviation becomes 9/6 = 1.5. We will use this value of 1.5, as s in our formula.


e The tolerable error is expressed in the same units as the variable being measured or estimated by the study. Thus, we have to decide how much error (on a scale of 1 to 10) we can tolerate in the estimate of average customer satisfaction. Let us say, we put the value at + 0.5. That means we are putting the value of e as 0.5. This means, we would like our estimate of customer satisfaction to be within 0.5 of the actual value, with a confidence level of 95 percent (decided earlier while setting the z value).

Slide 10 Now, we have all 3 values required for calculating n, the sample size. So let us calculate n.

n = Z s e

1.96 x 1.5 0.5


= (1.96 x 3)

= 34.57 or 35 (approximately)

Therefore, a sample size of 35 would give us an estimate of customer satisfaction measured on a 1 10 point scale, with 95 percent confidence level, and error level maintained within + 0.5 of the actual value. If we were to tighten our tolerance level of error (e) to + 0.25 instead of + 0.5, we would have to take a sample of higher size. n would then be equal to 1.96 x 1.5 0.25

( 1.96 x 6 )

= 138.3

= 138 (approximately)

Sampling Method

Probability Random Stratified Systematic Cluster Convenience Quota

Judgmental Snowball


If we wish to use simple random sampling we could make a list of all the population say 100 employees. Then, an identification number could be allotted to each employee. We could then write these 100 numbers on small pieces of paper, one number on each. Shuffling these folded pieces of paper, we can draw 5 pieces out of the 100, and use these employees as our sample.

2.Stratified coati

This is a special case of simple random sampling. In this case, the total population is divided into strata that are internally homogeneous with respect to the characteristic being studied and as distinct as possible from the other strata. This could be based on age or area


For example India has four different regions can be selected as north, south, east and west in the state. Select randomly the sample.


Systematic sampling is very similar to Simple Random Sampling, and easier to practice. Just as we do in a simple random sample, we start with a list of all sampling units or respondents in the population. We first compute the sample size required, based on a formula and select the required sample.

Systematic Sampling

Once the sample size (n) is decided, we divide the total population into (N n) parts, where n is the sample size required. From the first part of sampling units, we pick one at random. Thereafter, we pick every (N n) th item from the remaining parts. To illustrate, say we have a population of 600 students, for some research. We need a sample of 15 out of these.

Systematic Sampling

We divide the list into 300/15 = 20 parts. Out of the first 20 students, we choose any one at random. Let us say, we choose student number 7 (all students are listed). Thereafter, we choose student numbers 7+20, 7+20+20, 7+20+20+20 and so on in a systematic sampling plan. Therefore, the selected students will be numbers 7, 27, 47, 87, 107, 127, 147, 167, 187, 207, 227, 247, 267, and 287 All these 15 students will comprise our total sample for the study.


A list of all available clusters should be prepared All clusters should be numbered A sample of clusters (number to be decided by researcher) should be randomly drawn. All sampling units/elements such as households in the selected clusters should be chosen to be a part of the sample.

Example testing the fill of bottles 4. Cluster It is time consuming to pull A cluster is a group of individual bottles. It is expensive sampling units or elements, to waste an entire cartons of 12 which can be identified, listed bottles to just test one bottle. If we and a sample of which can be chosen. Theoretically, a would like to test 240 bottles, we could randomly select 20 cartons, cluster could be on the basis of any criterion. But in test all 12 bottles within each practice, clusters tend to be carton. This reduces the time and found either in terms of expense required. geographical areas, or
membership of some groups such as a church, a club, or a social organization.


Let us assume that a study is to be conducted in the city of Mumbai to determine the perception of second-year students about job opportunities in the field of International jobs. Second year marketing students may be approached in all the 25 odd business schools in the city. But this is time consuming so, each of the classes of the second-year students in the various business schools may be treated as a stratum. (Instead Number each business school or group them according to areas) From the numbered B-Schools select according to the required sample size or From the area cluster select from each area cluster the required sample size

Non-Probability Sampling Techniques

In reality, because of various difficulties involved in obtaining reliable lists of the desired target population, it is difficult to use a textbook probability sampling prescription. Therefore, some compromises could be made, or approximately probability-type of sampling procedures may be used. Some of the non-probabilistic techniques may also be used explicitly in cases where it is not feasible to use probability based methods.

1. Convenience

1. Convenience

Also referred to as availability sampling, convenience sampling is a method by which the respondents are selected on the basis of the interviewers convenience or on the basis of availability. For example students could be used as a sample by a marketing researcher who lives in a college town. They (the students) need not be representative of the target population for the study, for the product being researched.

Other examples of convenience sampling includes on-the-street interviews, or any other meetings, or from employees of one office block or factory. Another common example of convenience sampling is the one by TV reporters who catch any person passing by and interview him on the street.

2. Quota

This method, quota sampling, is very similar to stratified random sampling. The first step of deciding on the strata, or segments which the population is divided into, is actually the same. The second step, of calculating a total sample size, and allocating it to the various strata, is also the same. The major difference is that, random selection of respondents is not strictly adhered to. More liberty is given to the field worker to select enough respondents to complete the segment wise quota.


In practice, unless there are untrained field workers, or the field supervision is lax, the results produced by a quota sample could be very similar to the one produced by a stratified random sample. But there is no guarantee that it would be similar. In practice, many researchers use quota sampling, because it saves time, compared with stratified random sampling. For example, if a household is locked, a quota sample would permit the field worker to use a substitute household in the same apartment block. But with a stratified random sample, he would be expected to make a second or third attempt at different times of the day to contact the same locked household. This would increase the time taken to complete the required quota.

3. Judgemental

4. Snowball

This is another variant of convenience sampling, where the units are selected on the basis of the interviewers judgement to ensure a better quality of response. For example, the interviewees may be experts in a field.

This technique is used when the population being sought is a small one, and chances of finding them by traditional means are low. For example, to find owners of Mercedes Benz cars in a city, we may go to one or two, and ask them if they know anyone else who owns one. They in turn are asked for more names of owners.

Types of Errors in Marketing Research

Any research study has an error margin associated with it. No method is foolproof, as we will see, including a census. This is because there are two major types of errors associated with a research study. These are called
Sampling Error or Random Error Non-sampling or Human Error Sampling Error This is the error which occurs due to the selection of some units and nonselection of other units into the sample. It is controllable if the selection of sample is done in a random, unbiased way. In other words, if a probability sampling technique is used, it is possible to control this error. In general, this error reduces as sample size increases.


Non-sampling Error
This is the effect of various errors in doing the study, by the interviewer, data entry operator or the researcher himself. Handling a large quantity of data is not an easy job, and errors may creep in at any stage of the researcher. The data entry person may interchange the column of yes and no responses while entering or compiling data, or the interviewer may cheat by not filling up the questionnaire in the field, and instead, fudge the data. Or, the respondent may say one thing, but another may be recorded by mistake. These errors are usually proportionate to the sample size. That is, the larger the sample size, the larger the non-sampling error. Also, it is difficult to estimate the size of non-sampling error. But we can use some controls on the quality of manpower, and supervise effectively to minimize it.

Total Error 1. This is the total of sampling error + non-sampling error. 2. Out of this, the sampling error can be estimated in the case of probability samples, but not in the case of non-probability samples. 3. Non-sampling errors can be controlled through hiring better field workers, qualified data entry persons, and good control procedures throughout the project. 4. One important outcome of this discussion of errors is that the total error is usually unknown. But, we may have to live with higher nonsampling error in our attempt to reduce sampling error by increasing the sample size of the study, not to mention the higher cost of a larger sample. 5. Therefore, it is worthwhile to optimise total error by optimising the sample size, rather than going blindly for the largest possible sample size.