You are on page 1of 4

RESEARCH METHODOLOGY

LESSON 17:
PRINCIPLES OF STATISTICAL INFERENCE AND CONFIDENCE INTERVALS

In this chapter we shall learn to apply techniques of statistical To solve problems such as this we have to learn how to use
inference to business situations. I will first introduce the basic idea characteristics of samples to test an assumption about the
underlying statistical inference and hypothesis testing. Our focus population from which the sample comes from. This in effect is
will be more on understanding key concepts intuitively and less the process of statistical inference.
on formulas and calculations, which can be done easily on The issue facing the manager in the above example is:
computers.
• Could a sample of 100 aluminum sheets with average
A business manager in a typical managerial situation needs to thickness of .048 inches have come from a population with a
determine whether results based on samples can be generalized to average thickness of .04 inches?
a population. Different management situations require different
• Does the sample estimate of thickness differ from the
statistical techniques to carryout tests regarding the applicability of
population estimate due to sampling error or is it because
sample statistics to a population.
our fundamental assumption about the mean thickness of
By the end of this unit you should be able to the underlying population is not correct?
• Understand the nature of statistical inference • Suppose he believes it to be the first case and accepts the
• Understand types of statistical inference consignment, what is the risk that he runs that the
• Understand the theory behind statistical inference consignment is flawed and does not conform to quality
standards of .04 inches?
• Apply sampling theory concepts to confidence intervals and
estimation This is just one example of a very typical managerial situation
where the principles of statistical inference can be put to use to
Nature of Statistical Inference solving the manager’s dilemma.
What is Statistical Inference? Types of Statistical Inference
We have seen that all managerial/ business situations involve Broadly statistical inference falls into two major categories:
decision making in situations with incomplete information. When estimation and hypothesis testing. Both are actually two sides of
a particular finding emerges from data analysis the manager asks the same coin and can be regarded as representing different aspects
whether the empirical findings represent the true picture or have of a technique.
occurred as a result of sampling accident.
Below I briefly explain each of them.
Statistical inference is the process where we generalize from sample
results to a population from which the sample has been drawn. 1. Estimation
Thus statistical inference is the process where we extend our This is concerned with how we use sample statistics to estimate
knowledge obtained from a random sample which is only a small population parameters. It is not necessary that an estimate be
part of the population to the whole population. based on statistical data. All managers make quick estimates based
on incomplete information, gut feel and intuition. Thus an estimate
Where do we use Statistical Inference? of sales for the next quarter can be based on gut feel or on an
Let us think of a typical managerial situation: Imagine you are a analysis of past sales data for the quarter. Both represent estimates.
purchase manager. Your basic problem is to ensure that a The difference between a estimate based on intuition and one
consignment of Aluminum sheets supplied to you by a supplier based on a random sample is that we can apply the principles of
correspond to the required specification of .04 inch thickness. probability allows us to calculate percentage of error variation in
How do you go about ensuring this? an estimate attributable to sampling variation.
One way would be to accept blindly what ever your suppliers The sample mean for example can be used as an estimate of the
claim. Another option would be to audit each and every item. population mean. Similarly the percentage of occurrence in a sample
Clearly this would be both very time consuming and expensive of an attribute or event can be used to estimate the population
and would result in an unacceptably low level of productivity. proportion . To explain the concept a little more clearly we can
Another option is for the manger to choose a random sample of look at a few examples of estimation:
100 aluminum and measures them for their thickness. He finds • University departments make estimates of next years
for example that the sheets in the sample have an average thickness enrollments on the basis of last years enrollments in the
of .048 inches. On tzhe basis of past experience with the supplier same courses.
the manager believes that the sheets come from a population with
• Credit managers make estimates about whether a purchaser
a standard deviation of .004inches. On the basis of this data he
has to make a decision whether the to accept or reject a consignment will pay his bills on the basis of past behaviour of
of 10,000 sheets. customers with similar characteristics or their past repayment
record

© Copy Right: Rai University


100 11.556
2. Hypotheses Testing many such samples from a population and calculate their mean

RESEARCH METHODOLOGY
If we find a difference between two samples, we would like to and standard deviations. Given the existence of sampling variation
know, is this a “real” difference (i.e., is it present in the population) it is likely that there is also going to be some variability in the
or just a “chance” difference (i.e. it could just be the result of different estimates of mean and standard deviations. This can be
random sampling error). explained best with the help of an example:
Hypothesis begins with an assumption called a hypothesis that Suppose there is a store which sells CDS . We assume it has a
we make about a population parameter. We then collect sample regular customer base. A random sample of 100 customers is
data and calculate sample statistics such as mean, standard deviation taken and we find the sample mean age of customers was equal
to decide how likely it is that our hypothesized population to 42years, with a standard deviation of 5 years. However this is
parameter is correct. Essentially the process involves judging only one possible sample, which could have been taken. A second
whether a difference between a sample and assumed population different sample may have had a result where mean was equal to
value is significant or not. The smaller the difference the greater 45 years and standard deviation of 6 years. To change a sample we
the chance that our hypothesized value for the mean is correct. need only change one of the customers. We would expect samples
Some examples of real world situations where we might want to taken from a population to generate similar if not identical sample
test hypotheses: means. If we take repeated samples such that all possible samples
are taken then we are likely to obtain a sampling distribution of
• A random sample of 100 south Indian families finds that
means.
they consume more of a particular brand of Assam tea per
family than a random sample of 100 North Indian families. What does this Distribution Look Like?
It could be that the observed difference was caused by Logically we can conceive that there is only one sample which will
sampling accident and that there is actually no difference contain the youngest possible customers and its mean will have
between the two populations. However if the results are not the lowest sample mean. Similarly there will be another couple of
caused by pure sampling fluctuations then we have a case for samples having the lowest 99 customers. These samples will have
the firm to take some further marketing action based on means, which are slightly higher than the lowest mean. A
sampling finding. somewhat higher number will contain the youngest 98 customers
• Colgate Palmolive have decided that a new TV ad campaign and so on.
can only be justified if more than 55% of viewers see the The majority of the samples will have a cross section of all age
ads. In this case the company requests a marketing research groups and therefore there would be a clustering of sample means
company to carryout a survey to assess viewership. The around what is likely to be the true population mean. The
agency comes back with a ad penetration of 50% for a distribution of sample means will look like the normal
random sample of 1000. It is now the company’s problem distribution as shown in the figure1 below.
to assess whether the sample viewing proportion is Sampling distribution of sample mean values
representative of the hypothesized level of viewership that
the company desires, i.e 55%. Can differences between the
two proportions be attributed to sampling error or is the
ads true viewership actually lower.
In the next section we shall look at the theory behind statistical
inference. The basis of inference remains the same irrespective of
whether the managerial objective is to obtain a point or interval
estimate of a population parameter or to test whether a particular
hypotheses is supported by sample data or not. µ
Activities
Figure 1
1. What is statistical inference? What are the different types of
inference? This result follows from the Central Limit theorem : if we take
random samples of size n from a population, the distribution of
2. Why do decision makers measure samples rather than entire sample means will approach that of a normal probability
population? What are the disadvantages of sampling? distribution. This approximation is closer the larger is n.
Theory behind Statistical Inference We do not actually know what form our population distribution
We now look at the underlying theoretical basis of statistical takes: it could be normal or it could be skewed. However it doesn’t
inference. The underlying basis of statistical inference is the theory matter, as the sampling distribution will approximate a normal
of sampling distributions. distribution as long as sufficiently large samples are taken.
Now we shall briefly review some concepts, which have been dealt Normal Distribution
with in more detail in the earlier chapter on sampling. We now look briefly at some of the key characteristics of the
What is a Sample? normal distribution.
A sample is a representative subset of the underlying population. The normal sampling distribution can be summarized by its two
For each sample that is taken from a population we can calculate statistics:
various sample statistics such as mean and variance. We can take

© Copy Right: Rai University


11.556 101
• Mean F From our earlier classes we know that irrespective of the values of
RESEARCH METHODOLOGY

and , for a normal probability distribution, the total area under


• standard deviation the normal curve is 1.00. Further specific portions of the normal
Logically we can see that the mean of the sampling distribution curve lie between plus/ minus any given number of standard
should equal the mean of the population. The standard deviation deviations from the mean.
of the sampling distribution is given by / ”n, where is the These results are summarized below:
population standard deviation and n is the sample size. Thus the • Approx 68% of all values in a normally distributed
sampling distribution of the mean can be defined in terms of its population lie within ±1 standard deviation from the mean.
mean and standard deviation. Approximately 16% of the area lies on either side of of the
However we should be clear We are talking about three different population mean lies outside this range. This is illustrated in
statistics: the figure 3.
Mean Standard • Approx 95.5% of all values in a normally distributed
Deviation population lie within ± 2 standard deviation from the mean.
Approximately 2.25% of area on either side of the
Sample x s population mean lies outside this range. This is illustrated in
Population µ s the figure4.
• Approx 99.7% of all values in a normally distributed
Sampling distribution of mean µ
population lie within ±3 standard deviation from the mean .
This is illustrated in the figure5. Only .15% of the area under
The three distributions are illustrated below: the curve on either side of the mean lies outside this range.
Sampling distribution of the Population
The two distributions are illustrated in figure2 below. As can be
seen the sampling distribution of the sample mean is far more
concentrated than the population distribution. However both
distributions have the same mean m.

Figure 3,4,5

Standard Normal Distribution


However we rarely need intervals involving only one, two or three
standard deviations. Statistical tables provide areas under the
normal curve that are contained by any number of standard
Figure 2
deviations (plus/ minus ) from the mean. We do this by
Application of sampling theory concepts to confidence constructing the standard normal distribution which is
intervals.Once we have calculated our sample mean we need to standardized .Thus all normal distributions with mean and
know where it lies in the sampling distribution of the mean in standard deviation can be transformed into a standard normal
relation to the true mean of the sampling distribution or the distribution with µ =0 and s =1. This transformation is done
population mean. It might be higher than the population mean using the z statistic where
or lower, or it might be identical with the population mean. While Z= x - µ/ s
we cannot know for certain where the sample mean lies in relation
s
The distribution of the z statistic represents the standard normal
to the population mean we can use probability to assess its likely
distribution with mean µ =0 and standard deviation s =1.
position vis a vis the population mean.

© Copy Right: Rai University


102 11.556
With the normal table we can determine the probability that the

RESEARCH METHODOLOGY
sample mean c lies within a certain distance from the population
mean.. The distance from the mean is defined in terms of number
of standard deviations away from the mean.
How do we do this?
This follows from the result that, irrespective of the shape of the
normal curve, the area under the normal curve for a distance of
one ,two or three or any given number of standard deviations is
the same across all curves. Therefore all intervals containing the
same number of standard deviations from the mean contain the
same proportion of the total area under the curve. Hence we can
make use of only one standard normal distribution.
Using the Standard normal probability distribution
The figure 6 below shows the raw scale and the standard normal
transformation. The standardized normal variable is z= x-FII/
s
As can be seen from the figure5 below, z actually represents a
transformation or change in measurement scale on the horizontal
axis. Thus in the raw scale the m=50. In the standard normal scale
this value is transformed to m=0.

Figure 5
The Standard normal probability table is organized in terms of z
values . It only gives the z values for half the area under the curve.
Because the distribution is symmetric: values which hold for one
half of the distribution are true for the other.
So far we have tried to understand the theory of sampling
underlying confidence intervals. We now turn to defining what
exactly is a confidence interval.
References
Aaker D A , Kumar V & Day G S - Marketing Research (John
Wiley &Sons Inc, 6th ed.)
Diamantopoulos A and Schlegelmilch A- Taking the fear out of
Data Analysis (Dryden Press, 1997)
Communication Service
Kothari C R – Quantitative Techniques (Vikas Publishing House
3rd ed.)
Levin R I & Rubin DS - Statistics for Management (Prentice Hall
of India, 2002)
Notes

© Copy Right: Rai University


11.556 103

You might also like