You are on page 1of 27

Research With Statistical

Tools
Ahmad Sina Sabawoon
KEY TERMS
Population:   is the entire group that a researcher want to draw conclusions about.
Sample: is a subset of population.
Sampling: The process of choosing a sample from the population
Sample Unit: The units of population that are chosen in sample in order to study
Sampling Technique: The method used for choosing sample units from the population
Representative Sample: A sample that could rightly represent the characteristics of
population.
Sample Size: A group of subjects that are selected from the general population and is
considered a representative of the real population for that specific study
Sampling Frame: is the actual list of individuals that the sample will be drawn from.
Ideally, it should include the entire target population (and nobody who is not part of
that population).
Factors Affecting Sample Size
• Population Size: Normally bigger the size of the population, bigger will be the sample size
needed to draw meaningful conclusions from the sample.
• Heterogeneity: in the population’s concerned characteristic e.g. age or income in the case of
human population or life of electric bulbs in the case of a physical item, or high school
examination results in the case of the population of schools. More the heterogeneity in the data,
more the size of the sample required. As mentioned earlier, in the case of rice being cooked, even
a sample of one piece of rice is sufficient to draw conclusion about the extent of cooking.
• Accuracy and Reliability: In general, results obtained from a bigger size sample would be more
accurate and reliable as compared to results obtained from a smaller size sample. Therefore, more
the accuracy and reliability required, more would be the requirement of sample size.
• Allocation of Resources: The sample size depends on the resources allocated or made
available. Obviously, more the resources in terms of manpower, money, time, etc. are
made available, more the sample size can be increased.
Population Vs Sample
Census Vs Sample
• Collecting data from a whole population is called census.
• Collecting data from a sample is called sampling
Census: Advantages and Disadvantages
Advantages:
• Accurate and reliable – however, this advantage is a myth if the population is quite large.
Disadvantages:
• More resources in terms of men, money, time, etc.
• If the test is destructive i.e. the item is destroyed while collecting the information about
the item, this option is totally ruled out. Some examples are: Estimating the life of
bulbs/tubes, etc.; Testing the quality of bullets, fuses, etc.; Testing the quality of food, etc.
Sampling: Advantages and Disadvantages
Advantages:
• Less resources in terms of manpower, money, time, etc.
• Highly qualified and skilled persons can be deployed for collection of data as the
manpower requirement is relatively low. This aspect assumes greater significance when
the collection of data requires special skills or knowledge.
• Indispensable or a must if in the process of getting the desired information about the unit,
it gets destroyed (items like, bullets, fuses) or gets consumed (e.g. fruits) or becomes
useless (item like an electric bulb, tubelight) after its failure time or “life” is recorded.
Disadvantages:
• Less accurate and reliable because the sample may not be a true representative of the
population. This disadvantage can be minimised by selecting a sample such that it is a
true representative of the population, but it cannot be eliminated.
Statistic Vs Parameter

Parameter: Is the
characteristic of a
population.
Statistic: Is the
characteristic of a
sample.
Statistic Vs Parameter
Examples of Parameters
• 20% of U.S. senators voted for a specific measure. Since there are
only 100 senators, you can count what each of them voted.
Examples of Statistic
• 50% of people living in the U.S. agree with the latest health care
proposal. Researchers can’t ask hundreds of millions of people if they
agree, so they take samples or part of the population and calculate the
rest.
Calculating Sample Size
• If you take a population sample, you must use a formula to figure out
what sample size you need to take. Sometimes you know something
about a population, which can help you determine a sample size. 
• n = N / (1 + Ne2)
• Where: n = Number of samples, N = Total population and, e =
Error tolerance (level).
Example Question
Use Slovin’s formula to find out what sample of a population of 1,000
people you need to take for a survey on their soda preferences.
• Step 1: Figure out what you want your confidence level to be. For
example, you might want a confidence level of 95 percent (giving you
an alpha level of 0.05), or you might need better accuracy at the 98
percent confidence level (alpha level of 0.02).
• Step 2. Plug your data into the formula. In this example, we’ll use a 95
percent confidence level with a population size of 1,000.
n = N / (1 + N e2) = 1,000 / (1 + 1000 * 0.05 2) = 285.714286
• Step 3: Round your answer to a whole number (because you can’t sample
a fraction of a person or thing!) 285.714286 = 286
Sampling Techniques
Sampling Techniques are broadly classified into 2:
1. Probability sampling techniques
Probability sampling means that every member of the population has a
chance of being selected. It is mainly used in quantitative research. If you
want to produce results that are representative of the whole population,
probability sampling techniques are the most valid choice.
2. Non-probability sampling techniques
Apposite of Probability sampling techniques
Sampling Techniques

Probability Sampling Techniques


• Simple Random Sampling Technique
• Systematic Random Sampling Technique
• Cluster Sampling Technique
• Stratified Sampling Technique

Non-Probability Sampling Techniques


•Haphazard Sampling Technique
•Purposive Sampling Technique
•Quota Sampling Technique
•Judgement Sampling Technique
•Convenience Sampling Technique
•Snowball Sampling Technique
•Inverse Sampling
Simple Random Sampling
• In this sampling technique, every member of the population has an equal
chance of being selected.
• The sampling frame should include the whole population.
• To conduct this type of sampling, one can use tools like random number
generators or other techniques that are based entirely on chance.
Example
• You want to select a simple random sample of 100 employees of Company
X. You assign a number to every employee in the company database from 1
to 1000, and use a random number generator to select 100 numbers.
Systematic Sampling Technique
• Systematic sampling is similar to simple random sampling, but it is usually
slightly easier to conduct. Every member of the population is listed with a
number, but instead of randomly generating numbers, individuals are
chosen at regular intervals.
Example
• All employees of the company are listed in alphabetical order. From the first
10 numbers, you randomly select a starting point: number 6. From number
6 onwards, every 10th person on the list is selected (6, 16, 26, 36, and so
on), and you end up with a sample of 100 people.
Stratified Sampling
• Stratified sampling involves classifying the population into a certain number of non-
overlapping homogeneous groups called strata, and then selecting samples independently
from each stratum (singular form of strata).
• For example, in India the entire population can be classified into two strata as rural and urban,
in three strata as ‘lower income group’, ‘middle income group’, and ‘higher income group’ or
in three strata as ‘children’ (up to 18 years of age), ‘adults’ (more than 18 years of age) but up
to 60 years of age, and ‘senior citizens’ (more than 60 years of age).
Another Example
• The company has 800 female employees and 200 male employees. You want to ensure that
the sample reflects the gender balance of the company, so you sort the population into two
strata based on gender. Then you use random sampling on each group, selecting 80 women
and 20 men, which gives you a representative sample of 100 people.
• The strata should be as homogeneous as possible within each stratum, and as heterogeneous as
possible among various strata.
Cluster sampling
• Cluster sampling also involves dividing the population into subgroups, but each subgroup
should have similar characteristics to the whole sample. Instead of sampling individuals from
each subgroup, you randomly select entire subgroups.
• If it is practically possible, you might include every individual from each sampled cluster. If
the clusters themselves are large, you can also sample individuals from within each cluster
using one of the techniques above. This is called multistage sampling.
• This method is good for dealing with large and dispersed populations, but there is more risk
of error in the sample, as there could be substantial differences between clusters. It’s difficult
to guarantee that the sampled clusters are really representative of the whole population.
Example
• The company has offices in 10 cities across the country (all with roughly the same number of
employees in similar roles). You don’t have the capacity to travel to every office to collect
your data, so you use random sampling to select 3 offices – these are your clusters.
Haphazard Sampling
• This method of selecting a sample is in total contrast to random sampling. In such
sampling, the units from the population are selected without any set criteria.
• They are selected based on the preference, prejudice or bias of the person(s) selecting the
sample. This method is usually followed when one wants to know the opinions of people
in the crowd coming out after watching a film or a play, etc.
Purposive Sampling
• As the name implies, under this type of sampling, units of the population are selected
according to the relevance and the nature of representativeness of sampled units.
• For example, if one wants to assess the likely reaction of employees to certain new
measures contemplated in an organisation, it might be better to include those employees
in the sample who are likely to influence on the thinking and actions of a vast majority of
the employees.
• Incidentally, the sample size in such cases is not fixed. We may terminate the sampling
i.e. recording of information when we feel that no further information or suggestion is
being obtained.
Quota Sampling
• Such sampling is, sometimes, considered a type of purposive sampling. It is usually
resorted when some quota about the number of units to be included in the sample is fixed.
• The quota is fixed due to constraints on availability of time or/and cost. Within the quota
stipulated, one has to select a sample which is representative of the entire population.
• For example, within the overall quota of interviewing 100 persons for some opinion poll,
one may contact some persons from various categories like college students, housewives,
shopkeepers, office-goers, daily wage earners, etc.
• Similarly, in an organisation, one might include persons from all categories of staff cadre-
wise as well as function-wise, department-wise, etc.
Judgement Sampling
• In such type of sampling, the selection of units, to be included in the sample, depends on
the judgement or assessment of the person(s) collecting the sample.
• The sample is selected based on their judgement/assessment as to what would constitute a
representative sample.
• This is specially useful when the sample size is small, and if random sampling is adopted,
then the units which are more important and critical to the objective of the study might
not get included in the sample.
• For example, in a training institute, the teaching staff was 30. However, for urgent
academic or administrative matters, the Director used to get opinion of one particular
faculty as he was known to have balanced views, did not belong to any group, and was
frank enough to express his views. Thus, the Director used to rely on a sample of size one.
Convenience Sampling
• Such sampling is dictated by the needs of convenience rather than any other
consideration.
• For example, one may select some persons from a telephone directory, for getting their
opinion on some issue provided the views of those who own phones are relevant to the
issue.
• For instance, their views on TV programmes might be relevant but their views on some
party or a person in a general election may not be much relevant as they represent only
relatively affluent class of people.
• Similarly, one could select a sample of persons from the list of credit card holders.
• Another example relates to opinion poll when one may fi nd it easier to get the opinion of
those in the shops or restaurants or walking on pavement rather than going from house to
house.
Snowball Sampling
• Snowball sampling—also known as chain referral sampling—is considered a type of
purposive sampling.
• In such sampling, the sampling units are not fixed in advance but are decided as the
sampling proceeds. We may move to sample the units one after the other depending on
the response received from the previous units. If the units are human beings, one
individual might refer to other individual who, in turn, might refer to some other
individuals. That is how it is called as “chain referral’ sampling. In this method,
participants or informants with whom contact has already been made use their
influence/social networks to refer the researcher to other people who could potentially
participate in or contribute to the study.
• Snowball sampling is often used to find and recruit ‘hidden populations’, that is, groups
not easily accessible to researchers through other sampling strategies.
Inverse Sampling
• In normal sampling, we take a sample of units and estimate about the characteristics of the units in
the population. However, if the proportion of units of a certain type is very small like fake notes in
circulation, then the method may not work.
• For instance, if we do not find any fake note in a sample of 1000 pieces, examined at random, can we
say that the proportion of fake notes is zero? In such cases, inverse sampling could be used.
• Inverse sampling is a method of sampling which requires that drawings of random samples shall be
continued until certain specified conditions dependent on the results of the earlier drawings have
been fulfilled, e.g. until a given number of units of specified type have been found.
• Thus, referring to the above problem of estimating fake notes, we may continue taking samples till
we reach a certain number, say 10, of, say, Rs. 100 denomination notes. Suppose, we find 10 fake
notes in a total sample of 10 lakhs pieces, it implies that the chance of a note being fake is 10/10
lakhs i.e. 1 in 1 lakh. Thus, if the number of Rs. 100 notes in circulation are 100 crores i.e. 10,000
lakhs, then about 10,000 lakhs × (1/1,00,000) = 10,000 notes are fake notes. It is only a rough
approximation but better than pure guess.
• In the case of human beings, this method may be used to estimate the number of persons with some
rare characteristic like say, having 6 fingers on a palm, or some rare disease.
Confidence Interval
• A confidence interval shows that how much uncertainty there is with any
particular statistic.
• Confidence intervals are often used with a margin of error.
• It tells you how confident you can be that the results from a poll or survey reflect
what you would expect to find if it were possible to survey the entire population.
• Confidence intervals are intrinsically connected to confidence levels.
• Confidence levels are expressed as a percentage (for example, a 95% confidence
level).
• It means that should you repeat an experiment or survey over and over again, 95
percent of the time your results will match the results you get from a population 
• Confidence intervals are your results and they are usually numbers.
Example
• A recent article on Rasmussen Reports states that “38% of Likely U.S.
Voters now say their health insurance coverage has changed because
of Obamacare”. If you scroll down to the bottom of the article, you’ll
see this line: “The margin of sampling error is +/- 3 percentage points
with a 95% level of confidence.”
• It’s impractical to survey all 300 million+ U.S. residents, so it’s
impossible to know exactly how many people would actually respond
“yes my health insurance has changed.”
Con…
• We take a sample (say, 2,000 people) and, using good statistical techniques
like simple random sampling, take our “best guess” at what that actual figure is (we
call that unknown figure a population parameter). What a 95 percent confidence
level is saying is that if the poll or survey were repeated over and over again, the
results would match the results from the actual population 95 percent of the time.
What about “+/- 3 percentage points”?
• The width of the confidence interval tells us more about how certain (or uncertain)
we are about the true figure in the population. This width is stated as a plus or
minus (in this case,+/- 3) and is called the confidence interval. When the interval
and confidence level are put together, you get a spread of percentage. In this case,
you would expect the results to be 35 (38-3) to 41 (35+3) percent, 95% of the time.

You might also like