You are on page 1of 47

IMT-PG Programme in Management

Decision Sciences

Doubt Resolution on Hypothesis


Testing

Presented by: Dr. Anuja Shukla

https://learn.upgrad.com/course/1260
Agenda
Topic Time (mins.)
Quiz 1. Framing of hypothesis https://forms.gle/JaUNBYKtgd7QgNor6 5
What is Hypothesis? Need for hypothesis in business 5
Converting of business problem into a hypothesis statement: Null and Alternate 10
Types of tail in test 10
Quiz 2. Testing of Hypothesis https://forms.gle/jLbszZKDucYstzyy5 5
Step-by-step process of hypothesis testing 5
Testing of Hypothesis: Critical Value method 15
Testing of Hypothesis: p value method 15
Types of Errors 10
Quiz3. Practice https://forms.gle/NmAufBEXStaRagRr8 HW
Q&A : Link to data
:https://drive.google.com/file/d/1Y37nO1ZA8OifzA38IEpCYTOMyXWJyYtp/view?usp=sharing 10
Total 90
Module 2: Hypothesis Testing
Hypothesis testing Z distribution
 Two tailed test
 Left tailed test
 Right tailed test
Hypothesis testing t distribution
 One sample
 Two sample
 Paired
 Unpaired
 A/B testing

Quiz 1. Framing of hypothesis https://forms.gle/JaUNBYKtgd7QgNor6


Hypothesis
• A research hypothesis is a specific, clear, and testable proposition or predictive
statement about the possible outcome of a scientific research study based on a
particular property of a population, such as presumed differences between groups
on a particular variable or relationships between variables.
• Decision-makers often face situations wherein they are interested in testing
hypotheses on the basis of available information and then take decisions on the
basis of such testing.
• In social science, where direct knowledge of population parameter(s) is rare,
hypothesis testing is the often used strategy for deciding whether a sample data
offer such support for a hypothesis that generalisation can be made.
• Hypothesis may be defined as a proposition or a set of proposition set forth as an
explanation for the occurrence of some specified group of phenomena either
asserted merely as a provisional conjecture to guide some investigation or accepted
as highly probable in the light of established facts.
Need for hypothesis in business

• An airline company claims that 90% of its flights are on time.


• A consultant claims that using just-in-time production can reduce your inventory cost per unit
by ₹10.
• A tyre manufacturer claims its tyres last 50% longer than its competitors’.

• Hypothesis testing is designed to detect significant differences: differences that did not occur
by random chance.

• Additional Reading :https://towardsdatascience.com/how-to-interpret-p-value-with-covid-19-


data-edc19e8483b
Need for hypothesis in business
Types of Hypothesis

Types of Hypothesis

Null Hypothesis Alternate Hypothesis


-An alternative hypothesis is one in which some
-A null hypothesis is a statement of the status quo, difference or effect is expected. Accepting the
one of no difference or no effect. If the null
alternative hypothesis will lead to changes in
hypothesis is not rejected, no changes will be made.
opinions or actions.
-Represented by H0 -Represented by Ha
- Equal sign (=, ≥, ≤)
-Never equal sign (≠,<,>)

Null hypothesis refers to a specified value of the population parameter not sample
A null hypothesis may be rejected, but it can never be accepted based on a single test.
Tails of test

Two tail Left tail Right tail


H0: Mean=10 H0: Mean ≥ 10 H0: Mean ≤ 10
Ha: Mean≠10 Ha: Mean < 10 Ha: Mean >10
Tails of test

We want to test that the We want to test that the We want to test that the
population mean is different population mean is less than 10 population mean is greater than 10
than 10

Two tail Left tail Right tail


H0: Mean=10 H0: Mean ≥ 10 H0: Mean ≤ 10
Ha: Mean≠10 Ha: Mean < 10 Ha: Mean >10

Commonly used Critical z values

Significance Level (⍺) Two-Tailed Test One-Tailed Test (Left) One-Tailed Test (Right)
0.01 ±2.58 -2.326 +2.326
0.05 ±1.96 -1.645 +1.645
0.10 ±1.645 -1.282 +1.282
Steps Involved in Hypothesis Testing
Formulate H0 and Ha

Select Appropriate Test


Choose Level of Significance

Collect Data and Calculate Test Statistic

Determine Probability Determine Critical Value of Test


Associated (p value) with Test Statistic Statistic

Compare with Level of Significance,  Compare with tabulated value


- p value≤ alpha, reject H0 -Zcal≥ Ztab, reject H0

Reject or Do not Reject H0

Draw Statistical Conclusion


A Broad Classification of Hypothesis Tests

Hypothesis Tests

Tests of Tests of
Association Differences

Distributions Means Median/


Proportions
Rankings
Formulating the Hypotheses
• A well-known car-maker claims that one of its cars has mileage of at least 17
kilometres per litre. You want to challenge this claim. Define the null and alternative
hypotheses for the problem.

• Hypothesis Statement
Null hypothesis: The mileage is greater than or equal to 17
(as this is the default claim made by the brand )
Alternative hypothesis: The mileage is less than 17
(as this challenges the null hypothesis)

• Mathematically
Ho: Mileage (mean) ≥ 17
Hα: Mileage (mean) < 17
Formulating the Hypotheses
• Let’s say you are the COO of a shoe-manufacturing company. An employee has
developed a new sole and claims that incorporating it will decrease the wear after
three years of use by more than 9%. Now, suppose you want to test this claim.
• What will be the null and alternative hypotheses for the sole developed by the
employee in this scenario?

Ho: Decrease in wear after 3 years ≤ 9%; Ha: Decrease in wear after 3 years > 9%
Formulating the Hypotheses
• Mr. Mohan of the Civil Engineering Department wants to test the load bearing
capacity of an old bridge which must be more than 10 tons, in that case he can
state his hypotheses as under:
• Null hypothesis H0 : tons µ<=10
• Alternative Hypothesis Ha : tons µ > 10
Formulating the Hypotheses
• The average score in an aptitude test administered at the national level
is 80. To evaluate a state’s education system, the average score of 100
of the state’s students selected on random basis was 75. The state
wants to know if there is a significant difference between the local
scores and the national scores.

• Null hypothesis H0 : µ = 80
• Alternative Hypothesis Ha : µ ≠ 80
Formulating the Hypotheses

• It is believed that the average commute time for an employee to and


from their office in Hyderabad is at least 35 minutes. Now, suppose you
want to test this claim.
• What will be the null and alternative hypotheses in this case if the
average commute time is represented by μ?
• Ho: μ ≥ 35 minutes; Ha: μ < 35 minutes
Formulating the Hypotheses
• Goodyear has launched a new tyre, which, it claims, can travel more
than 7,500 miles before it needs any replacement.
• Assuming that the ‘average distance travelled before replacement’ is
given by μ, what would be the null and alternative hypotheses in this
case?

Ho: μ ≤ 7500 miles; Ha: μ > 7500 miles


Formulating the Hypotheses
• Aashirwad company packages flour as per weight and a particular size of package is
supposed to average 10 kg. Suppose the manufacturer wants to test to determine
whether their packaging process is out of control as determined by the weight of
the flour packages.
• H0: Mean = 10 kg
• Ha: Mean ≠ 10 kg

• The null hypothesis for this experiment is that the average weight of the flour
packages is 10 kg (no problem). The alternative hypothesis is that the average is not
10 kg (process is out of control).
Testing of Hypothesis
Type of Test
Type of Test
Test One sample Population standard N>30 Z test
(normality (Parameter of deviation is known
assumption) measurement:
mean)
Population standard N<30 Independent
deviation is not known sample t test

Population standard N>30 Independent


deviation is not known sample t test
Two sample Paired two-sample means test
(Parameter of (Measuring same population before and after)
measurement: Unpaired two-sample means test
mean) (Measuring two different populations)
Comparing two versions A/B Testing
(Parameter of measurement: population proportion)
Results of Hypothesis
Critical value method
• Calculated z lies within range: Fail to reject null hypothesis
• Calculated z lies outside range : Reject null hypothesis
• Zcal≥ Ztab, reject H0, Zcal< Ztab, Fail to reject H0

P value method
If p<=alpha , Reject Ho
If p> alpha, Fail to Reject Ho
Confidence level

The confidence level or reliability is the expected percentage of times that the actual value
will fall within the stated precision limits.
The confidence level is defined for the hypothesis test according to the accuracy needed.
A higher confidence level indicates that more evidence is needed to reject the null hypothesis.
Therefore, increasing the confidence level makes it harder to reject the null hypothesis.
Inversely, a low confidence level indicates that the null hypothesis can be rejected easily.
Thus, if we take a confidence level of 95%, then we mean that there are 95 chances in 100
(or .95 in 1) that the sample results represent the true condition of the population within a
specified precision range against 5 chances in 100 (or .05 in 1) that it does not.
We can always remember that if the confidence level is 95%, then the significance level will be
(100 – 95) i.e., 5%; if the confidence level is 99%, the significance level is (100 – 99) i.e., 1%.
We should also remember that the area of normal curve within precision limits for the
specified confidence level constitute the acceptance region and the area of the curve outside
these limits in either direction constitutes the rejection regions.
Level of Significance
• The significance level is the probability of
rejecting the null hypothesis when it is true.
• For example, a significance level of 0.05 indicates
a 5% risk of concluding that a difference exists
when there is no actual difference.
• Lower significance levels indicate that you
require stronger evidence before you will reject
the null hypothesis.
• Los (Alpha)= 1- CI
Z Score

If the z-score of the sample lies further away from the center than the critical z-
values, the null hypothesis is rejected.
Otherwise, the test fails to reject the hypothesis.
The only two possible outcomes of a hypothesis test are ‘reject the null hypothesis’
or ‘fail to reject the null hypothesis’. This hypothesis can never be ‘accepted’.
Commonly used critical z scores
Left tail test Two tail test Right tail test

One-Tailed Test
Significance Level (⍺) One-Tailed Test (Left) Two-Tailed Test
(Right)
0.01 -2.326 ±2.58 +2.326
0.05 -1.645 ±1.96 +1.645
0.10 -1.282 ±1.645 +1.282
Two tail test

• Example : One plus


• You need to verify whether the OnePlus 6 takes 30 minutes to reach 60% charge,
since this is the popular sentiment.
• First, if the time taken is less than 30 minutes, you want to revise your claim to
boast about the better figure.
• Second, if the time taken is more, you want the engineers to fix this issue.
• What will your null and alternative hypotheses be?
• Hypothesis Statement
Assumptions • Null hypothesis: The time needed to charge till 60 percent is equal to 30 minutes.
Population is normal • Alternative hypothesis: The time needed to charge till 60 percent is not equal to 30 minutes.
Sample size is large (n>30)
Z score • For Testing hypothesis
Two tail hypothesis Test • Ho: Mean = 30
• Hα: Mean ≠ 30
Testing hypothesis : Z score

• z score is distance of point from centre (in terms of std dev)


Testing hypothesis : P Value
 An alternative way of obtaining the
test result is by calculating the p-value.
 The p-value can be calculated from the
z-score, using a z-table or by inserting
the z-score into a p-value calculator .
 The null hypothesis can be rejected at
all confidence levels below 1-p.
 p-value can be visualised as the
‘probability of the null hypothesis
being true’
 Directly tells Confidence Interval at
which null hypothesis can be rejected  P value method
If p<=alpha , Reject Ho
 If the p-value is less than the If p> alpha, Fail to Reject Ho
significance level (α), then you can
reject the null hypothesis.
http://courses.atlas.illinois.edu/spring2016/STAT/STAT200/pnormal.html
Right Tail test
• Example: Hypothesis test from the perspective of a OnePlus 6
customer
• Will you care if the OnePlus 6 makes the claim of “a day’s power in half
an hour” and then overperforms by taking lesser time to charge?
• I would care only if the phone was underperforming. Therefore, it is
often sufficient to perform the hypothesis test on only one side of the
curve, depending on the context.
• Null hypothesis : The time needed ≤ 30 minutes.
• Alternative hypothesis : The time needed is > 30 minutes.
Hypothesis test : One plus Customer

• Two tail test One Tail test (Right Tail)

Example 2: MS EXCEL

Fail to reject Null Hypothesis


Left Tail test

• Imagine you’re the owner of a pizza company, and you claim that your pizzas are
more than 9 inches in diameter. But you’ve been receiving complaints from some
of your customers, who say that the pizzas are actually smaller. Your task is to now
find out whether your chefs are producing smaller pizzas. In this case, you will
conduct a ‘left-tailed test’ by checking whether your sample mean is significantly
lesser than 9 inches, since you’re checking whether the complaints about smaller
pizzas are true.
• Hypothesis Statement
• Null hypothesis : Pizza size is at least 9 inches (i.e. 9 or more).
• Alternative hypothesis : Pizza size is less than 9 inches
• Mathematically
• Null hypothesis : Pizza size ≥ 9.
• Alternative hypothesis : Pizza size is < 9.
Hypothesis testing –One sample t test
• The One Sample t Test examines whether the mean of a population is
statistically different from a known or hypothesized value. The One
Sample t Test is a parametric test.
• Applicable when population standard deviation is unknown
• Number of samples is less than 30
Two Sample t Test
• When there is a need to compare the means of two samples, a two-
sample t-test is conducted. In such a case, the formula for the t-statistic
becomes

https://learn.upgrad.com/course/1260/segment/10349/64185/187901/997831
Types of two sample test
Paired t test Unpaired t test

• Paired t test - Paired means that • An unpaired t-test is used to compare the


both samples consist of the same test subjects mean between two independent groups. You
use an unpaired t-test when you are comparing
• Paired t-tests are used when the same item or
two separate groups with equal variance.
group is tested twice, which is known as a repeated
measures t-test. Some examples of instances for • Unpaired t test- Unpaired means that
which a paired t-test is appropriate include: both samples consist of distinct test subjects.
• Research, such as a pharmaceutical study or
• The before and after effect of a pharmaceutical
other treatment plan, where ½ of the subjects
treatment on the same group of people.
are assigned to the treatment group and ½ of
• Body temperature using two different the subjects are randomly assigned to the
thermometers on the same group of participants. control group.
• Standardized test results of a group of students • Comparing the average commuting distance
before and after a study prep course. traveled by New York City and San Francisco
residents using 1,000 randomly selected
participants from each city.

https://www.technologynetworks.com/informatics/articles/paired-vs-unpaired-t-test-differences-assumptions-and-hypotheses-330826
Summary
1. Define the hypothesis statements: Your test will either ‘reject’ or ‘fail to reject’ the null hypothesis.
2. Collect as many data points as possible: The data points you collect will produce one sample. The size
of this single sample will depend on how many data points you take.
3. Measure the sample mean and the sample standard deviation: The standard deviation should be
calculated using the ‘n-1’ method. The STDEV function in Excel takes care of this.
4. Identify the distribution of the sample means: If the sample size is larger than 30, the distribution will
be a normal one (We’re only focusing on normal distributions for now).
5. Define the confidence level: This is the level of surety that you demand from a hypothesis test. The
higher the confidence level, the harder it is to reject the null hypothesis.
6. Find the critical z-scores of the confidence level and the test statistic or the z-score of the sample: The
z-score of the sample can be calculated by subtracting the hypothesised mean from the sample mean
and dividing it by the population standard deviation, divided by the root over sample size.
7. Compare the sample test statistic with the critical z-scores: Here, you check whether the sample
statistic is more extreme than the z-scores.
8. If the sample test statistic is more extreme than the critical z-scores, you will reject the null
hypothesis. Otherwise, you will fail to reject it.
Summary
When the test needs to check only positive or negative deviation from the null
hypothesis, a one-tailed test is performed.
When the test needs to check for deviation on either side of the null hypothesis, a
two-tailed test is performed.
When the sample size is low, a t-test is performed.
A t-test is also preferred over a z-test when the population standard deviation is
unknown.
When two sample means need to be checked for equality, a two-sample t-test is
performed.
When there is a need to check whether an entire distribution is similar to another,
a goodness of fit test is performed.
Hypothesis testing also carries some probability of committing errors. The errors
can be of two types: Type I and Type II.
A/B testing
An A/B test tells you whether there is a statistical difference in the performance
of the two options.
Data driven decision making system
A/B tests are used whenever there is a need to compare two alternatives.
The A/B test can be considered the most basic kind of randomized controlled
experiment
You will now learn about ‘A/B tests’, which are used in the industry when there is
a need to make a choice between two options. An A/B test tells you whether
there is a statistical difference in the performance of the two options.
A/B testing : History
• In the 1920s statistician and biologist Ronald Fisher discovered the most
important principles behind A/B testing and randomized controlled
experiments in general.
• Fisher ran agricultural experiments, asking questions such as, What happens if I
put more fertilizer on this land? The principles persisted and in the early 1950s
scientists started running clinical trials in medicine.
• In the 1960s and 1970s the concept was adapted by marketers to evaluate
direct response campaigns (e.g., would a postcard or a letter to target
customers result in more sales?).
Areas of application
• Medicine, to understand if a drug works or not
• Economics, to understand human behaviour
• Foreign aid and charitable work (the reputable ones at least), to understand which
interventions are most effective at alleviating problems (health, poverty, etc)
• Comparing two version of websites
• Comparing two colors/ tab/ page design
Example: A/ B testing
• Let’s say John builds a website for a free e-book and is testing out two colour variations —
red and blue. On the red website, 45 out of 100 visitors downloaded the e-book. But on
the blue website, 47 out of 100 visitors downloaded the e-book. Based on this, John may
conclude that the blue website is performing better.

• However, John’s method can backfire. This is because he did not bother to check for
statistical significance. The difference in performance observed may be due to plain old
randomness. Thus, there’s a high probability that he may end up with an inferior website
colour.

• You will tackle this problem through an A/B test


• Null hypothesis (H0): Visitors that receive Layout B will not have higher end-of-visit
conversion rates compares to visitors that receive Layout A
• Alternative hypothesis (H1): Visitors that receive Layout B will have higher end-of-visit
conversion rates compared to visitors that receive layout A
Hypothesis: A/ B testing

• A/B testing at Amazon

H0: Performance of “Buy Now”= Performance of “Shop Now”


Buy Now Shop now Ha: Performance of “Buy Now” ≠ Performance of “Shop
Now”

• A/B testing at Upgrad


H0: ‘Apply now’ button gets less than or equal number of
clicks as the ‘Enrol now’ button
Enrol now Apply Now Ha: ‘Apply now’ button gets more clicks than ‘Enrol now’
button
Example: A/ B testing
Ola launched a new coupon codes for its new users. Two coupons were provided to
facilitate the commuters.
• Coupon A: Get Rs 100 off the first ride. Book online now!
• Coupon B: Get an additional Rs 100 off. Book online now.
Test the claim that option B will be liked more by customers.

Coupon A Coupon B
Impressions 50000 650000
Clicks 2400 2770
CTR 4.80% 4.26%

Variant B’s conversion rate (4.26%) was 11.22% lower than variant A’s conversion rate (4.80%). You can
be 95% confident that variant B will perform worse than variant A.
Power 0.00% p value 1.0000
Example: A/ B testing
Tanishq launched two ads during Diwali on youtube to promote its products. Both
ads were measured in terms of how many people watched the ad and how many
clicked on them to visit Tanishq store. Using the following data, calculate if adv 2 is
more effective in directing the traffic.

Advertisement 1 Advertisement 2
Impressions 343490 344200
Clicks 96720 97535
CTR 28.16% 28.34%

Variant B’s conversion rate (28.34%) was 0.63% higher than variant A’s conversion rate (28.16%).
You can be 95% confident that variant B will perform better than variant A.
Power 75.27% p value 0.0499
Errors in Hypothesis test
Decision
Type I error:
Fail to reject H0 Reject H0
Reject H0 when H0 is true
H0 (True) Correct decision Type I error
(Alpha Error) Type II error:
H0 (False) Type II Error Correct decision Fail to reject H0 when H0 is
(Beta error) false

Framing of error: Left tail test


Null hypothesis : Pizza size ≥ 9.
Alternative hypothesis : Pizza size is < 9.

• Type 1- Null hypothesis was true but rejected, pizza>=9, but I rejected
• Type 2 error- Accept Ho when ho is false, pizza was not >=9 but accepted it
Handling Error
There are two ways of handling error-
1. Increasing confidence level of the test
a. Reduces type 1 error
b. Increase types two error
2. Increasing sample size
a. Reduces type 2 error
b. Doesn’t effect Type 1 error
• P value calculator
• http://courses.atlas.illinois.edu/spring2016/STAT/STAT200/pnormal.htm
l

• A/B testing
• https://www.surveymonkey.com/mp/ab-testing-significance-calculator/

• Quiz3. Practice
https://forms.gle/NmAufBEXStaRagRr8
Doubts?
All the Best!

https://www.youtube.com/watch?v=Z9Gw9dIJGiA&t=86s&ab_channel=upGrad_Gmba

You might also like