You are on page 1of 54

Inferential Statistics

• Population
– Collection of all items of interest to our study

• Sample
– A subset of the population
• We want to study the job prospects of the
students studying in LPU
• By chance

• All the elements should have an equal chance


of being selected
Population
• Population: The aggregate of all the elements, sharing some common set of characteristics, for the
purpose of research problem.
– Example: If you want returns of all the stocks traded at NSE
– The proportion of consumers who are loyal to a particular brand of toothpaste.

• Information about population parameters may be obtained by taking a census or a sample.

• Census: A complete enumeration of the elements of a population or study objects.

• Sample: A subgroup of the elements of the population selected for participation in the study.
• Example: a sample of 50 stocks may be selected from all the stocks listed at NSE to represent the
population of all the stocks traded at NSE.

• Sampling Frame
– A sampling frame is a representation of the elements of the target population.
– It consists of a list or set of directions for identifying the target population.
– Examples of a sampling frame include the telephone book, an association directory listing the
firms in an industry, a customer database.
Potential Sources of Error
• Total error
– Total error is the variation between the true mean value in the population of the variable of interest and
the observed mean value obtained in the marketing research project.
• Random sampling error
– The particular sample selected is an imperfect representation of the population of interest
• Non-sampling error
– Can be attributed to sources
– Non Response error
– Response error (inaccurate answers or their answers are mis-recorded or mis-analysed)
• Researcher errors
– Surrogate information error (information needed for research and information sought by the
researcher)
– Measurement error (information sought and measured- perception and preference)
– Population definition error
– Sampling frame error
– Data analysis error
• Interviewer errors
– Respondent selection error
– Questioning error
– Recording error
– Cheating error
• Respondent errors
– Inability error
– Unwillingness error
Conditions favouring the use of a
sample versus a census
Factors Conditions favouring the use of
Sample Census
Budget Small Large
Time available Short Long
Population size Large Small
Variance in the Small Large
characteristic
Cost of sampling errors Low High
Cost of non-sampling High Low
errors
Nature of measurement Destructive Non-destructive
Attention to individual Yes No
cases
The Sampling Design Process
• Define the target population
– This is the collection of elements or objects that possess the information sought
by the researcher and about which inferences are to be made.

– Defining the target population involves translating the problem definition into a
precise statement of who should and should not be included in the sample.

– The target population should be defined in terms of elements, sampling units,


extent and time.

– An element is the object about which or from which the information is desired.
In survey research, the element is usually the respondent.

– A sampling unit is an element, or a unit containing the element, that is available


for selection at some stage of the sampling process.

– Suppose that Revlon wanted to assess consumer response to a new line of lipsticks
and wanted to sample females over 25 years of age. It may be possible to sample
females over 25 directly, in which case a sampling unit would be the same as an
element. Alternatively, the sampling unit might be households. In the latter case,
households would be sampled and all females over 25 in each selected household
would be interviewed. Here, the sampling unit and the population element are
different.
• Determine the sampling frame
– A sampling frame is a representation of the elements of
the target population.
– It consists of a list or set of directions for identifying the
target population.
– Examples of a sampling frame include the telephone
book, an association directory listing the firms in an
industry, a customer database.

– Discrepancy between the population and the sampling


frame
– One approach is to redefine the population in terms of
the sampling frame.
– Account for sampling frame error by screening the
respondents in the data collection phase.
• Select a sampling technique
– Sampling with replacement or Sampling without
replacement
– Probability or non probability sampling

• Determine the Sample Size


– Sample size refers to the number of elements to be
included in the study.
– Important qualitative factors to be considered in
determining the sample size include
• (1) the importance of the decision,
• (2) the nature of the research,
• (3) the number of variables,
• (4) the nature of the analysis,
• (5) sample sizes used in similar studies, and
• (6) resource constraints.

• Execute the sampling process


Sampling Techniques
• Non-Probability Sampling: Non-probability sampling relies on the
personal judgment of the researcher.
– Convenience Sampling
– Judgment Sampling
– Quota Sampling
– Snowball Sampling

• Probability Sampling: A sampling procedure in which each element


of the population has a fixed probabilistic chance of being selected
for the sample.
– Simple random sampling
– Systematic sampling
– Stratified sampling
– Cluster sampling
Non-probability Sampling Techniques
• Convenience Sampling
– A non-probability sampling technique that attempts to obtain a sample of
convenient elements. The selection of sampling units is left primarily to the
interviewer.
– Often, respondents are selected because they happen to be in the right place at
the right time.

– Examples: use of students, mall-intercept interviews, people on the street


interviews.

– Advantages
• Least expensive
• Least time consuming
• Sampling units are easily accessible, measure and cooperative

– Limitations
• Self-selection bias
• Sample may not be representative of population

– Not recommended for causal research, but can be used in exploratory research for
generating ideas, insights or hypothesis.
• Judgmental Sampling
– Judgmental sampling is a form of convenience sampling in which the population elements are
selected based on the judgment of the researcher. The researcher, exercising judgment or
expertise, chooses the elements to be included in the sample because he or she believes that
they are representative of the population of interest or are otherwise appropriate.

– Examples include: (1) test markets selected to determine the potential of a new product, (2)
purchase engineers selected in industrial marketing research because they are considered to
be representative of the company, (3) supermarkets selected to test a new merchandising
display system.

– Advantages:
• Is inexpensive, convenient and quick

– Disadvantages:
• Judgmental sampling is subjective and its value depends entirely on the researcher’s
judgment, expertise and creativity.
• It does not allow direct generalizations to a specific population, usually because the
population is not defined explicitly.

– Judgment samples are frequently used in business-to-business marketing research projects,


given that in many projects the target population is relatively small
• Quota sampling
– Quota sampling may be viewed as two-stage restricted judgmental
sampling .
– The first stage consists of developing control characteristics, or quotas, of
population elements such as age or gender. To develop these quotas, the
researcher lists relevant control characteristics and determines the
distribution of these characteristics in the target population, such as Males
49%, Females 51% (resulting in 490 men and 510 women being selected in
a sample of 1,000 respondents).
– Often, the quotas are assigned so that the proportion of the sample
elements possessing the control characteristics is the same as the
proportion of population elements with these characteristics.
– In the second stage, sample elements are selected based on convenience
or judgment.

– Advantages
• Low cost, greater convenience to the interviewers.
– Disadvantages
• Selection bias
• Snowball Sampling
– In snowball sampling, an initial group of respondents is selected,
sometimes on a random basis, but more typically targeted at a few
individuals who are known to possess the desired characteristics of the
target population.

– After being interviewed, these respondents are asked to identify others


who also belong to the target population of interest. Subsequent
respondents are selected based on the referrals. By obtaining referrals
from referrals, this process may be carried out in waves, thus leading to a
snowballing effect.

– The main objective of snowball sampling is to estimate characteristics


that are rare in the wider population.

– Examples include users of particular government or social services,


members of a scattered minority ethnic group.

– The major advantage of snowball sampling is that it substantially


increases the likelihood of locating the desired characteristic in the
population.
Probability Sampling Techniques
• Simple Random Sampling
– In simple random sampling (SRS), each element in the population has a known and equal
probability of selection.
– This method is equivalent to a lottery system in which names are placed in a container, the
container is shaken and the names of the winners are then drawn out in an unbiased manner.
– To draw a simple random sample, the researcher first compiles a sampling frame in which
each element is assigned a unique identification number. Then random numbers are
generated to determine which elements to include in the sample. The random numbers may
be generated with a computer routine or a table (see Table 1 in the Appendix of statistical
tables).
– Suppose that a sample of size 10 is to be selected from a sampling frame containing 800
elements. This could be done by starting with row 1 and column 1 of Table 1, considering the
three rightmost digits, and going down the column until 10 numbers between 1 and 800 have
been selected.
– Advantages
• It is easily understood and the sample results may be projected to the target population.
– Limitations
• It is often difficult to construct a sampling frame that will permit a simple random sample
to be drawn.
• SRS can result in samples that are very large or spread over large geographical areas, thus
increasing the time and cost of data collection.
– SRS is not widely used in marketing research. Procedures such as systematic sampling are
more popular.
• Systematic Sampling
– In systematic sampling, the sample is chosen by selecting a random starting
point and then picking every ith element in succession from the sampling
frame.

– The sampling interval, i, is determined by dividing the population size N by the


sample size n and rounding to the nearest whole number.

– For example, there are 100,000 elements in the population and a sample of
1,000 is desired. In this case, the sampling interval, i, is 100. A random number
between 1 and 100 is selected. If, for example, this number is 23, the sample
consists of elements 23, 123, 223, 323, 423, 523, and so on.

– For systematic sampling, the researcher assumes that the population


elements are ordered in some respect. In some cases, the ordering (e.g.
alphabetical listing in a telephone book) is unrelated to the characteristic of
interest. In other instances, the ordering is directly related to the
characteristic under investigation. For example, credit card customers may be
listed in order of outstanding balance, or firms in a given industry may be
ordered according to annual sales.

– When the ordering of the elements is related to the characteristic of interest,


systematic sampling increases the representativeness of the sample. If firms
in an industry are arranged in increasing order of annual sales, a systematic
sample will include some small and some large firms.
• Stratified Sampling
– Stratified sampling is a two-step process in which the population is
partitioned into subpopulations, or strata.
– The strata should be mutually exclusive and collectively exhaustive in
that every population element should be assigned to one and only one
stratum and no population elements should be omitted.
– Next, elements are selected from each stratum by a random
procedure, usually SRS.
– Stratified sampling differs from quota sampling in that the sample
elements are selected probabilistically rather than based on
convenience or judgment.
– The variables used to partition the population into strata are referred
to as stratification variables.
– The elements within a stratum should be as homogeneous as possible,
but the elements in different strata should be as heterogeneous as
possible.
– Variables commonly used for stratification include demographic
characteristics, type of customer (e.g. credit card versus non-credit
card), size of firm, or type of industry.
• Cluster Sampling
– In cluster sampling, the target population is first divided
into mutually exclusive and collectively exhaustive
subpopulations.
– These subpopulations or clusters are assumed to contain
the diversity of respondents held in the target population.
– A random sample of clusters is selected, based on a
probability sampling technique such as SRS.

– For each selected cluster, either all the elements are


included in the sample or a sample of elements is drawn
probabilistically.
– If all the elements in each selected cluster are included in
the sample, the procedure is called one-stage cluster
sampling.
– If a sample of elements is drawn probabilistically from
each selected cluster, the procedure is two-stage cluster
sampling.
• The key distinction between cluster sampling and stratified sampling is
that in cluster sampling only a sample of subpopulations (clusters) is
chosen, whereas in stratified sampling all the subpopulations (strata) are
selected for further sampling.

• With respect to homogeneity and heterogeneity, the criteria for forming


clusters are just the opposite of those for strata. Elements within a
cluster should be as heterogeneous as possible, but clusters themselves
should be as homogeneous as possible.

• A common form of cluster sampling is area sampling, in which the


clusters consist of geographical areas, such as counties, housing districts
or residential blocks.

• If only one level of sampling takes place in selecting the basic elements
(e.g. if the researcher samples blocks and then all the households within
the selected blocks are included in the sample), the design is called one-
stage area sampling. If two or more levels of sampling take place before
the basic elements are selected (if the researcher samples blocks and
then samples households within the sampled blocks), the design is
called two-stage (or multistage) area sampling.
Finite and Infinite Population
• Finite population
– By finite, we mean that the population has stated
or limited size, that is to say, there is a whole
number (N) that tells us how many items there are
in the population.
• Infinite population
– By infinite, we mean a population that could not
be enumerated in a reasonable period of time.
Population Sample
Definition Collection of items being Part or proportion of the
considered population chosen for
study

Characteristics Parameters Statistics


Symbols Population size= N Sample size= n
Population mean= µ Sample mean= x
Population standard Sample standard
deviation= σ deviation= s
• Normal Distribution…
Sampling Distributions
• If we take several samples from a population, the statistics (mean
and standard deviation) we would compute for each sample need
not be the same and most probably would vary from sample to
sample.

• A probability distribution of all the possible means of the samples


is a distribution of the sample means.

• Statisticians call this a sampling distribution of the mean.

• If we plot a probability distribution of the possible proportions of


infested trees in all the samples, we would see a distribution of the
sample proportions. This is called a sampling distribution of the
proportion.
Concept of Standard Error
• Standard deviation of the distribution of sample means.

• Suppose we wish to learn something about the height of students


studying in Graduation in State universities.

• We could take a series of samples and calculate the mean height for each
sample.

• It is highly unlikely that all of these samples means would be the same.

• The standard deviation of the distribution of sample means measures the


extent to which we expect the means from the different samples to vary
because of this chance error in the sampling process.

• Thus, the standard deviation of the distribution of a sample statistic is


known as the standard error of the statistic.
Conventional terminology used to
refer to sample statistic
When we wish to refer to the We use the conventional term
Standard deviation of the distribution of Standard error of the mean
sample mean
Standard deviation of the distribution of Standard error of proportion
sample proportions
Standard deviation of the distribution of Standard error of the median
sample medians
Standard deviation of the distribution of Standard error of the range
sample ranges
• Assume that we have a population of all the
filter screens in a large industrial pollution-
control system and this distribution is the
operating hours before a screen becomes
clogged.

• The distribution of operating hours has a


mean µ (mu) and a standard deviation σ
(sigma).
The Population Distribution
• Suppose that somehow we are able to take all
the possible samples of 10 screens from the
population distribution.

• Next we calculate the mean and the standard


deviation for each of the samples.

• As a result each sample would have its own


mean, x , and its own standard deviation, s.
• As a last step, we would produce a distribution of
all the means from every sample that could be
taken.
• This distribution is called the sampling
distribution of the mean.
• This distribution of the sample means (the
sampling distribution) and would have its own
mean, µ x (mu sub x bar), and its own standard
deviation, or standard error, σ x (sigma sub x bar)
The Sampling Distribution
• In reality…
Sampling from Normal Population
• Suppose we draw samples from a normally distributed population with a
mean of 100 and standard deviation of 25, and that we start by drawing
samples of five items each and by calculating their means.

• The first mean be 95, the second be 106, the third be 101, and so on.

• Because we are averaging five items to get each sample mean, very large
values in the sample would be averaged down and very small values up.

• We would reason that we would get less spread among the sample means
than we would among the individual items in the original population.

• This is the same as saying that the standard deviation of the sampling
distribution of the mean or standard error of the mean would be less than
the standard deviation of the individual items in the population (as shown in
Fig.)
• Sample size increased from 5 to 20.

• This would not change the standard deviation


of the items in the original population.

• But with samples of 20, we have increased the


effect of averaging in each sample and would
expect even less dispersion among the sample
means.
Properties of the Sampling Distribution of the Mean
when the Population is Normally Distributed

• The sampling distribution of a mean of a sample


taken from a normally distributed population
demonstrates the important properties as follows:

Property Illustrated Symbolically

The sampling distribution has a mean equal


to the population mean
 x  
The sampling distribution has a standard
deviation (a standard error) equal to the
population standard deviation divided by  x 

n
the square root of the sample size
• A bank calculates that its individual savings
accounts are normally distributed with a mean
of Rs. 2000 and a standard deviation of Rs.
600. If the bank takes a random sample of 100
accounts, what is the probability that the
sample mean will lie between Rs. 1900 and Rs.
2050?

• Standard error of the mean σ = n
x
x
• z= 
• Standardizing the sample mean
• z = x

x

• 0.7492
Sampling from Nonnormal
Populations
• Until now…when the population is normally
distributed , the sampling distribution of the
mean is also normal.

• Decision makers must deal with many


populations that are not normally distributed.

• How does the sampling distribution of the mean


react when the population from which the
samples are drawn is not normal?
• Five motorcycle owners and lives of their tires

• Population too small to be assumed normal

• We take all of the possible samples of the


owners in groups of three, compute sample
means ( x ), list them, and compute the means
of the sampling distribution (  ). x

Experience of five motorcycle owners with life of tires


Owner Carl Debbie Elizabeth Frank George
Tire Life (months) 3 3 7 9 14
Total= 36 months
Mean= 36/5= 7.2
Calculation of sample mean tire life
with n=3
• Now we enlarge the population size to 40 and
take samples of different sizes and plot the
sampling distributions of the mean that would
occur for the different sizes.
• This shows how quickly the sampling
distribution of the mean approaches
normality, regardless of the shape of the
population distribution.
General Conclusions that follow….
• The mean of the sampling distribution of the mean will equal the
population mean regardless of the sample size, even if the population is
not normal.

• As the sample size increases, the sampling distribution of the mean will
approach normality, regardless of the shape of the population distribution.

• This relationship between the shape of the population distribution and


the shape of the sampling distribution of the mean is called the central
limit theorem.

• It assures us that the sampling distribution of the mean approaches


normal as the sample size increases.

• A sample size does not have to be very large for the sampling distribution
of the mean to approach to normal. Sample size 30.

• The significance of the central limit theorem is that it permits us to use


sample statistics to make inferences about population parameters without
knowing anything about the shape of the frequency distribution of that
population other than what we can get from the sample.
• The distribution of annual earnings of all bank
tellers with five year experience is negatively
skewed.

• This distribution has a mean of Rs. 19000 and


a standard deviation of Rs. 2000. If we draw a
random sample of 30 tellers, what is the
probability that their earnings will average
more than Rs. 19750 annually?
Sampling distribution of proportion
• In many situations, the issue of interest is
categorical in nature, which can be classified
as occurrence or non-occurrence.
• In these situations, researcher is interested in
estimating proportion of occurrence.
• Mean:µ = p
p

• Standard error: σ = pq
p
p p n
• Hence z = pq
n
1. In a sample of 25 observations from a normal distribution with mean 98.6 and
standard deviation 17.2
1. (a) what is P(92  x  102) ?
2. (b) Find the corresponding probability given a sample of 36.

2. Mary Bartel, a auditor for a large credit card company, knows that, on average, the
monthly balance of any given customer is Rs. 112, and the standard deviation is Rs. 56.
If Mary audits 50 randomly selected accounts, what is the probability that the sample
average monthly balance is:
1. (a) Below Rs. 100
2. (b) Between Rs. 100 and Rs. 130?

3. In a sample of 16 observations from a normal distribution with a mean of 150 and a


variance of 256, what is
1. (a) P( x  160) ?
2. (b) P( x  142) ?
4. If, instead of 16 observations, 9 observations are taken, find
1. (c) P( x  160) ?
2. (d) P( x  142) ?

5. In a sample of 19 observations from a normal distribution with mean 18 and standard


deviation 4.8
1. (a) P(16  x  20) ?
2. (b) Suppose the sample size is 48. What is the new probability in part (a)?
• 1.
– 0.8115
– 0.8703
• 2.
– 0.0643
– 0.9241
• 3.
– 0.9938
– 0.9772
• 4.
– 0.9699
– 0.9332
• 5.
– 0.9312
– 0.9962

You might also like