You are on page 1of 56

Statistical Inference:

Hypothesis Testing
Learning Objectives

 Understand hypothesis-testing procedure using one-tailed and two-


tailed tests

 Understand the concepts of Type I and Type II errors in hypothesis


testing

 Understand the procedure of hypothesis testing


Introduction to Hypothesis Testing

 The sample statistics is computed through sampling and it is used to


make an inference about the population parameters.

 In real life, a decision maker needs to collect sample data, compute


the sample statistics and use this information to ascertain the
correctness of the hypothesized population parameter.

 For this purpose, a researcher develops a “hypothesis’’.

 “WHAT IS HYPOTHESIS DEVELOPMENT?”


Example….
Suppose the vice president (HR) of a company wants to know the effectiveness of a
training programme which company has organized for all its 70,000 employees
based at 130 different locations in the country.

Sample size – 629 from all the different locations.

The result that is obtained would not be the result from the entire population but only
from the sample.

VP(HR) set an assumption – “Training has not enhanced efficiency”

Will accept or reject this assumption

Through well-defined statistical procedure (Hypothesis testing)


Introduction to Hypothesis Testing

 A statistical hypothesis is an assumption about an unknown


population parameter.

 Hypothesis testing is a well defined procedure which helps us to


decide objectively whether to accept or reject the hypothesis based
on the information available from the sample.

 In statistical analysis, we use the concept of probability to specify a


probability level at which a researcher concludes that the observed
difference between the sample statistic and the population parameter
is not due to chance.
Systematic Procedure needs to be
adopted for Hypothesis Testing

Seven Steps in Hypothesis Testing


Hypothesis Testing Procedure
Seven steps of hypothesis testing
Hypotheses.. Important Notes

 About Population Parameters (NOT SAMPLE


VALUES OR STATISTICS)
 Null is no effect
 Null includes equality
Step 1: Set Null and Alternative
Hypotheses
 The null hypothesis generally referred by H0 (H sub-zero), is the hypothesis which
is tested for possible rejection under the assumption that is true. Theoretically, a null
hypothesis is set as no difference or status quo and considered true, until and
unless it is proved wrong by the collected sample data.

 Symbolically, a null hypothesis is represented as:

Population mean

Hypothesized
value of
Population mean
Step 1: Set Null and Alternative
Hypotheses

Let’s take an example to


understand this…..

 The null hypothesis: Currently accepted value of a


parameter.

 The alternative hypothesis: also called as research


hypothesis, involves the claim to be tested.
Step 1: Set Null and Alternative
Hypotheses

 The alternative hypothesis, generally referred by H1 (H sub-one), is a logical


opposite of the null hypothesis.

 Symbolically, alternative hypothesis is represented as:


Step 2: Determine the Appropriate
Statistical Test

 Type, number, and the level of data


may provide a platform for deciding the
statistical test.
Step 3: Set the Level of Significance

 The level of significance generally denoted by α is the probability,


which is attached to a null hypothesis, which may be rejected even
when it is true.

 The level of significance is also known as the size of the rejection


region or the size of the critical region.

 The levels of significance which are generally applied by researchers


are: 0.01; 0.05; 0.10.
Type I and Type II Errors
When a researcher tests statistical hypotheses, there can be four possible
outcomes as follows:
Step 4: Set the Decision Rule
Acceptance and rejection regions of null hypothesis (two-tailed test)

Critical region is the area under the normal curve, divided into two mutually exclusive
regions. These regions are termed as acceptance region (when the null hypothesis is
accepted) and the rejection region or critical region (when the null hypothesis is
rejected).
Two-Tailed Test of Hypothesis
 Let us consider the null and alternative hypotheses as below:

 Two-tailed tests contain the rejection region on both the tails of


the sampling distribution of a test statistic. This means a
researcher will reject the null hypothesis if the computed sample
statistic is significantly higher than or lower than the
hypothesized population parameter (considering both the tails,
right as well as left).
Acceptance and rejection regions (alpha = 0.05)
One-Tailed Test of Hypothesis
Let us consider a null and alternative hypotheses as below:

One-tailed test contains the rejection region on one tail of the


sampling distribution of a test statistic. In case of a left-tailed test, a
researcher rejects the null hypothesis if the computed sample
statistic is significantly lower than the hypothesized population
parameter.

In case of a right-tailed test, a researcher rejects the null hypothesis


if the computed sample statistic is significantly higher than the
hypothesized population parameter.
Acceptance and rejection regions for one-tailed (left)
test (alpha = 0.05)
Acceptance and rejection regions for one-tailed (right)
test (alpha = 0.05)
Step 5: Collect the Sample Data
 In this stage of sampling, data are collected and the appropriate
sample statistics are computed.

 The first four steps should be completed before collecting the


data for the study.

 It is not advisable to collect the data first and then decide on the
stages of hypothesis testing.
Step 6: Analyse the data
 In this step, the researcher has to compute the test statistic. This
involves selection of an appropriate probability distribution for a
particular test.
 Some of the commonly used testing procedures are z, t, F, and
χ2.
Step 7: Arrive at a Statistical Conclusion
and Business Implication

 In this step, the researchers draw a statistical conclusion. A


statistical conclusion is a decision to accept or reject a null
hypothesis.
 Statisticians present the information obtained using hypothesis-
testing procedure to the decision makers. Decisions are made on
the basis of this information. Ultimately, a decision maker
decides that a statistically significant result is a substantive result
and needs to be implemented for meeting the organization’s
goals.
Hypothesis Testing for a Single Population
Mean Using the Z Statistic
 When sample size is greater than (equals to) 30.
 Population has a normal distribution.
Hypothesis Testing for a Single Population
Mean Using the Z Statistic

A marketing research firm conducted a survey 10 years ago and found that
the average household income of a particular geographic region is Rs
10,000. Mr. Gupta, who has recently joined the firm as a vice president has
expressed doubts about the accuracy of the data. For verifying the data,
the firm has decided to take a random sample of 200 households that yield
a sample mean (for household income) of Rs 11,000. Assume that the
population standard deviation of the household income is Rs 1200.
Verify Mr. Gupta’s doubts using the seven steps of hypothesis testing. Let
α = 0.05 (5%).
Example (Solution)
Hypothesis Testing for a Single Population
Mean Using the T Statistic (Case of a
Small Random Sample When N < 30)
When a researcher draw a small random sample (n < 30) to estimate
the population mean μ and when the population standard deviation
is unknown and population is normally distributed, t-test can be
applied.
Example
Royal Tyres has launched a new brand of tyres for tractors and
claims that under normal circumstances the average life of the tyres
is 40,000 km. A retailer wants to test this claim and has taken a
random sample of 8 tyres. He tests the life of the tyres under normal
circumstance. The results obtained are presented in Table 10.4.
Example (Solution)
Figure : Computed and critical t values for Example 10.4
Lets Do It !!

A cable TV network company wants to provide modern facilities to its


consumers. The company has five-year old data which reveals that the
average household income is Rs 120,000. Company officials believe that
due to the fast development in the region, the average household income
might have increased. The company takes a random sample of 25
households to verify this assumption. From the sample the average
income of the households is calculated as 125,000. From historical data,
standard deviation is obtained as 1200. Use alpha = 0.05 to verify the
finding.
Statistical Inference:
Hypothesis Testing for
Two Populations
Hypotheseis Testing for the Difference
Between Two Population Means Using the
Z Statistic (Case of a large Random
Sample, n1, n2 > 30, When Population
Standard Deviation Is Known)

When sample size is large (n1, n2 > 30) and samples are
independent (not related) and the population standard deviation is
known, the Z statistic can be used to test the hypothesis for
difference between two population means.
Hypothesis Testing for the Difference Between
Two Population Means Using the Z-Statistic (Case
of a large Random Sample, n1, n2 > 30)
LET’S DO IT !

The amount of a certain trace element in


blood is known to vary with a standard
deviation of 14.1 ppm (parts per million)
for male blood donors and 9.5 ppm for
female donors. Random samples of 75 male
and 50 female donors yield concentration
means of 28 and 33 ppm, respectively.
What is the likelihood that the population
means of concentrations of the element are
the same for men and women?
Your Turn !
Dominos wanted to test their claim regarding
who can eat more slices of Pizza in a Pizza eating
festival for males vs females. For the purpose,
they randomly selected 22 males and 20 females.
The average number of slices eaten by males
were 450 with a standard deviation of 25 (from
historical data) and the average number of slices
eaten by females were 550 with a standard
deviation of 20. On the basis of the samples
taken for the study, estimate the difference in
population means taking 5% as the level of
significance and help Dominos to check their
claim that females eat more pizza slices than
males.
Hypotheseis Testing for the Difference
Between Two Population Means Using the
t Statistic (Case of a Small Random
Sample, n1, n2 < 30, When Population
Standard Deviation Is Unknown)

When sample size is small (n1, n2 < 30) and samples are
independent (not related) and the population standard deviation is
unknown, the t statistic can be used to test the hypothesis for
difference between two population means.
Hypotheseis Testing for the Difference Between
Two Population Means Using the t Statistic (Case
of a Small Random Sample, n1, n2 < 30, When
Population Standard Deviation Is Unknown)
LET’S DO IT !
Anmol Constructions is a leading company in the construction
sector in India. It wants to construct flats in Raipur and Dehradun,
the capitals of the newly formed states of Chattisgarh and
Uttarakhand, respectively. The company wants to estimate the
amount that customers are willing to spend on purchasing a flat
in the two cities. It randomly selected 25 potential customers
from Raipur and 27 customers from Dehradun and posed the
question, “how much are you willing to spend on a flat?” The
mean for dehradun was 143.44 (variance = 203.64) and for Raipur
it was found as 162.8 (Variance = 273.08). On the basis of the
samples taken for the study, estimate the difference in
population means taking 95% as the confidence level.
Statistical Inference About the Difference
Between the Means of Two Related Populations
(Matched Samples)

 For dependent samples or related samples, it is important that the two


samples taken in the study are of the same size.

 t Formula to test the difference between the means of two related


populations (matched samples)
Example
Being a Marketer for Prozac,
suppose you wish to test the effect
of Prozac on the well-being of
depressed individuals, using a
standardised "well-being scale"
that sums Likert - type items to
obtain a score that could range
from 0 to 20. Higher scores
indicate greater well-being (that
is, Prozac is having a positive
effect). Assume the sample
population deviation to be 2.45 for
both the samples. Use level of
significance as 5%.
Example

An electronic goods company


arranged a special training
programme for one segment of
its employees. The company
wants to measure the change in
the attitude of its employees
after the training. For this
purpose, it has used a well-
designed questionnaire, which
consists of 10 questions on a 1 to
5 rating scale (1 is strongly
disagree and 5 is strongly agree).
The company selected a random
sample of 10 employees. The
scores obtained by these
employees are given in Table Use α = 0.10 to determine whether there is a significant
with S.D of 4.44. change in the attitude of employees after the training
programme.
PUZZLE 1
A pharmaceutical company wants to diversify into
hospitality industry. The company has a notion that the
average daily hotel room rates are different in Delhi and
Mumbai. The company has taken a random sample of
15 hotels from Delhi and 17 hotels from Mumbai for
testing its notion. The average hotel room rates for
Delhi is 1572 with a sample variance of 8735 while the
average hotel room rates for Mumbai is 1175 with a
sample variance of 6126. Take alpha = 10% and test
whether there is a difference in the average daily hotel
room rates of the two cities taken for the study using
seven stages.
PUZZLE 2
The best selling product of a consumer durables
manufacturer has reached the saturation stage in its
product life cycle. The company is not willing to
withdraw the product from the market and has decided
to motivate its sales executives to take the personal
selling route. The company organized a one month
workshop to motivate its sales executive. Three month
later, the company selected 38 sales executives and
collected data on the number of average productive
sales calls in a day before and after the training. Assume
the value of average difference for all the sales persons
as -3.58 with a standard deviation of 1.64. Test the
hypothesis whether sales has a positive impact on
training or not.
CHI SQUARE TEST

 Test related to categorical data.


 Some researchers place the chi-square technique in
the category of non-parametric tests for testing of
the hypothesis.
 Chi-square distribution is the family of curves with
each distribution defined by the degree of freedom
associated to it. In fact chi-square is a continuous
probability distribution with range 0 to ∞.
CHI SQUARE TEST

•There are three types of Chi-Square tests:


–Goodness of fit
–Test of Association or Independence
•Measuring Association between Categorical data.
Chi Square Test of Association
The Chi Square Test of Association was derived
mathematically by Karl Pearson, and is often known as
Pearson's Chi Square Test of Association. 

fo = observed frequency
fe = expected frequency
Acceptance or rejection region in a Chi-Square test
Chi Square Table Value Case
Chi-Square Goodness-of-Fit Test
A company is concerned about the increasing violent altercations
between its employees. The number of violent incidents recorded by
the management during six randomly selected months is given in Table
13.2. record of voilent incidents in six randomly selected months.

Use level of significance= 0.05 to determine whether the data fits a


uniform distribution.

Month Jan Feb Mar April May June

No. of 55 65 68 72 78 85
voilent
incidents
Computation of Expected Frequencies and
Chi-square Statistic
Chi-square Test of Independence

The Vice President (Sales) of a garment company wants to determine Example 13.2
whether sales of the company’s brand of jeans is independent of age group. He has
appointed a marketing researcher for this purpose. This marketing researcher has
taken a random sample of 703 consumers who have purchased jeans. The
researcher conducted survey for three brands of the jeans, namely Brand 1, Brand 2,
and Brand 3.The researcher has also divided the age groups into four categories: 15
to 25, 26 to 35, 36 to 45, and 46 to 55. The observations of the researcher are
provided in Table 13.6:
Chi-square Test of Independence

Determine whether brand preference is independent of age


group. Use alpha=0.05.
Chi-square Test of Independence
Chi-square Test of Independence

You might also like