STAT - Day 3 - Faculty Deck

gaurrohit0071@gmail.
com
STB6BGNELH
Tests of Significance
This file is meant for personal use by gaurrohit0071@gmail.com only.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Agenda
● Large Sample Test
○ Two Sample
● Small Sample Test

○ One
gaurrohit0071@gmail.com Sample
STB6BGNELH
○ Two Sample
● Test for Population Proportion

○ One Sample
○ Two Sample

Large sample tests
Large
Sample
Test
gaurrohit0071@gmail.com
STB6BGNELH
Two
One Sample
Sample

Large Sample Tests
STB6BGNELH

Large sample tests
Number of Number of
● One sample Defective Items Defective Items
from Production from Production
Line A Line B
STB6BGNELH
1 5 Can we say that
Production Line
● Two sample 3 8
B produces less
5 2 defectives than
6 5
Production Line A?
2 3
4 6
Two sample tests
● The two sample tests are used to compare the equality of means of two populations
● Used to compare:
STB6BGNELH
○ Performance of two machineries
○ Performance of two portfolios

Assumptions of Two sample Z tests
for Large Samples
● Let there be two different populations such that they follow normal distribution
● The two samples are independent of each other

●
STB6BGNELH The population standard deviations of the variables must be known.

Two sample test - hypothesis
● Let there be two samples of sizes n1 and n2 drawn from normal population
N(µ1, σ12) and N(µ2, σ22) respectively
STB6BGNELH● The hypothesis to test whether the population means are equal.
µ H 0 : µ1 = µ2 against H1 : µ1 ≠ µ1
● It implies
H0: The two population means are equal (i.e µ1 = µ2)
against H1: The two population means are not equal µ0 (i.e µ1 ≠ µ2)
● Like the one sample tests, it is possible to test for
H0 : µ1 ≤ µ2 against H1: µ1 > µ2

STB6BGNELH Or
H0 : µ1 ≥ µ2 against H1: µ1 < µ2
● Failing to reject H0 implies that the null hypothesis is true

Two sample tests -Z test statistic
● The test statistic is Z given by
Sample Specified mean

gaurrohit0071@gmail.com mean
STB6BGNELH
● Under H0, the test statistic follows normal distribution

Two sample tests - Z test statistic
● If σ12 and σ22 are not known then they are replaced by sample estimates s12 and s22
respectively provided n1≥ 30and n2 ≥ 30
● The test statistic is Z given by

STB6BGNELH
Sample Specified mean

mean
where

The python code to conduct a Z test for two population means is
STB6BGNELH
statsmodels.stats.weightstats.ztest(Sample_1, Sample_2, value, alternative)

Two sample tests - decision rule
Based on
Based on critical Based on p-
H1 confidence
region value
gaurrohit0071@gmail.com interval
STB6BGNELH
Reject H0 if |Z|≥
For two tailed test µ1 ≠ µ 2 Zα/2
Reject H0 if p- value Reject H0 if µ1 - µ2
is less than or equal
For left tailed test µ1 < µ 2 Reject H0 if Z ≤ Zα to the level of does not
significance lie in the
confidence interval
For right tailed test µ1 > µ 2 Reject H0 if Z ≥ Zα
Two sample test for population mean (σ unknown)
Question:
A study was carried out to understand amount of hemoglobin in blood for males and females.
A random sample of 160 males and 180 females have means of 13 g/dl and 15 g/dl. The two
STB6BGNELH
samples have standard deviation of 4.1 g/dl for male donors and 3.5 g/dl for female donor .
Can it be said the population means of hemoglobin are the same for men and women? Use α
= 0.01.

Solution:
X: Amount of hemoglobin in blood for males Y: Amount

of hemoglobin in blood for females
STB6BGNELH
Here n1 = 160, s1 = 4.1, = 13
n2 = 180, s2 = 3.5 = 15
To test H0: µ1 = µ2 against H1: µ1 ≠ µ2

Solution:
The test statistic

STB6BGNELH
Decision Rule: Reject |Zcalc| ≥ Zα/2
Here Zα/2 = 2.58
Since 4.807 > 2.58, reject H0. -2.58 2.58
We may conclude that both males and females have

different hemoglobin averages
Python solution: Calculate critical z-value
STB6BGNELH
i.e. If test statistic is less than -2.58 or greater than 2.58 then we reject H0.
Python solution: Calculate test statistic
STB6BGNELH
As test statistic (=-4.8068) < critical value (=-2.58), we

reject H0.
Python solution: Calculate p-value
STB6BGNELH
As p-value < 0.01, we reject H0.

Small sample tests
STB6BGNELH

Two sample tests
● Like the two sample Z test, the two sample t test are used to compare the equality
of means of two populations for unpaired data when population standard
deviations are not known or sample sizes are less than 30
STB6BGNELH
● For unpaired data use t-test to test relative performances of two machineries,
investment portfolios, two drugs to reduce an outcome , etc

Two sample tests for unpaired data
● To test difference between population means of two populations
●
STB6BGNELH Let there be two different populations such that they follow normal distribution
● Two samples of sizes n1 and n2 drawn from normal population N(µ1, σ12) and N(µ2,
σ22) respectively
● The two samples are independent of each other

● The hypothesis to test whether the population means are equal
H0 : µ1 = µ2 against H1 : µ1 ≠ µ1
●
STB6BGNELH
It implies
H0: The two population means are equal (i.e µ1 = µ2)
against H1: The two population means are not equal µ0 (i.e µ1 ≠µ2)
● The one sided hypothesis are
H0 : µ1 ≤ µ2 vs H :µ >µ Or H : µ1 ≥ µ 2 vs
This file is meant 1for personal 1 2 by gaurrohit0071@gmail.com only. 0
use
H1 : µ1 < µ2 Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Two sample tests - test statistic
● Recall for large samples the test statistic, Z is given by
σ12 and σ22 are known

STB6BGNELH
● Note: If σ12 and σ22 are not known then they are replaced by the sample
estimates s12 and s22 respectively
Two sample tests - options
● If sample sizes are small and standard deviations of the populations are unknown
then we use the t-test. The t-test for small independent samples has two options
● First option is when variances are assumed equal. In this case the pooled sample
variance sp2 is used . Under H0, the test statistic follows t distribution with n1 + n2
STB6BGNELH
- 2 df
● Second option is when variances are assumed unequal. In this case we use the
sample variances s12 and s22 with degrees of freedom equal to the smaller of n1 -
1 or n2 - 1
● In either options failure to reject H0 implies that the two population means are
equal (i.e µ1 = µ2)
Two sample tests - test statistic
● For small samples the test statistic, t is given by

Unequal Variance Equal Variance
STB6BGNELH
Where is the pooled sample variance given by

The python code to conduct a t test for two population means which are not
paired is
STB6BGNELH
scipy.stats.ttest_ind(Sample_1, Sample_2, equal_var=True)
Note that this assumes equal population variance.

The python code to conduct a t test for two population means which are not
paired is
STB6BGNELH
scipy.stats.ttest_ind(Sample_1, Sample_2, equal_var=False)
Note that this assumes unequal population variance and performs Welch’s t-test

Two sample test for unpaired data
Question:
An experiment was conducted to compare the pain relieving hours of two new medicines. Two
groups of 14 and 15 patients were selected and were given comparable doses. Group 1 was given
STB6BGNELH
medicine 1 and group 2 was given medicine 2. Following data is obtained from the two samples.
Test whether the two populations give the same mean hours of relief. Assume the data comes
from normal distribution has equal variance. [Use α = 0.01]
Medicine 1 Medicine 2
Mean of hours of 6.4 7.3
relief
S.D of This
hours of relief 1.4 1.5
file is meant for personal use by gaurrohit0071@gmail.com only.
Solution:
Let X: patients receiving medicine 1 Y:

patients receiving medicine 2
STB6BGNELH
To test H0: µ1 = µ2 against H1: µ1 ≠ µ2
The pooled variance is

Solution:
The test statistic is

STB6BGNELH
Decision rule: Reject H0 if |tcal| ≥ tn1+n2-2, α/2
tn1+n2-2, α/2 = t27,0.005 = 2.771
- 2.771 2.771
Since 2.771>|-1.667|, we fail to reject H . 0
The two medicines have the

This same mean
file is meant hoursuse
for personal ofbysleep.
gaurrohit0071@gmail.com only.
Python solution: Calculate critical t-
value
STB6BGNELH
If test statistic is less than -2.77 or greater than 2.77, then we reject H0.
Python solution: Calculate test
statistic
STB6BGNELH
As test statistic (=-1.667) >This

critical value (=-2.77), we fail to reject H0. only.
file is meant for personal use by gaurrohit0071@gmail.com
STB6BGNELH
As p-value > 0.01, we fail to reject H0.

● The test is carried out under the assumption that the samples are drawn from
independent normal population
● If the populations from which the samples are drawn do not have equal variance
STB6BGNELH
then the Welch test is used which is the t-test for independent samples with
unequal variances

Two sample tests for paired data
● For paired data effectiveness of some training/treatment is measured
● The observations are recorded on the same individual/item twice resulting in pairs of
observations
STB6BGNELH
Example:
A energy drink manufacturing company wants to test if sales increase after

they advertise the drink with a life sized picture of a well-known athlete.

● Let {(Xi, Yi), i = 1, 2, 3, … ,n} where X and Y are paired data

● Let µX, µY be the mean of data from X and Y respectively
● Define d = yi - xi
gaurrohit0071@gmail.comi
STB6BGNELH
● Let µd = µy - µx
● In paired t-test, we test for
H0 : µd = µ0 against H1 : µd ≠ µ0
H0 : µd ≤ µ0 against H1 : µd > µ0
H0 : µd ≥ µ0 against H1 : µd < µ0
● Suppose mean of di = and variance of di =
● The test statistic is

STB6BGNELH
● Under H0, the test statistic follows t-distribution with n-1 degrees of
freedom
● Failing to reject H0, implies that the null hypothesis is true

The python code to conduct a t test for two population means which are
gaurrohit0071@gmail.com paired is
STB6BGNELH
scipy.stats.ttest_rel(Sample_1, Sample_2)

Two sample t-test for paired data
Question:
An energy drink distributor claims that a new advertisement poster, featuring a life-size picture of
a well-known athlete. For a random sample of 10 outlets, the following data was collected. Test
STB6BGNELH
that the null hypothesis that there the advertisement was effective in increasing the sales
Before 33 32 38 45 37 47 48 41 45
After 42 35 31 41 37 36 49 49 48
Test the hypothesis using critical region technique. [Use α = 0.05].

Solution:
To test,
STB6BGNELH
H0: The advertisement was not effective ( µd ≤ 0)
against
H1: The advertisement was effective (µd > 0)
Before 33 32 38 45 37 47 48 41 45
Com pute di
After 42 35 31 41 37 36 49 49 48
di=yi-xi 9 3 -7 -4 0 -11 1 8 3
Solution:
To test,
H : The advertisement was not effective
STB6BGNELH 0 against ( µd ≤ 0)
H1: The advertisement was effective (µd > 0)
We have mean and its variance

Solution:
Decision Rule: Reject H0, if tcalc ≥ tdf,α

Here tdf,α = tn-1,α= t8,0.05 = 1.86
STB6BGNELH
Since 1.86 > 0.0998, fail to reject H0

1.86
Thus, there is no effect of the advertisement.

Python solution: Calculate critical t-value
STB6BGNELH
If test statistic is greater than 1.86, then we reject H0.

Python solution: Calculate test statistic and p-
value
STB6BGNELH
The test statistic (=0.1) < critical value (=1.86), also the p-value > 0.05, thus we fail to reject H0.

Test for Population Proportion
STB6BGNELH

Test for proportion
● For qualitative data the proportion of a desired characteristic is obtained
● Test for proportion:

STB6BGNELH
○ One sample: Testing population proportion (P) is equal to a specified value (P 0)
○ Two sample: Testing equality of Two population proportions (P 1 = P2)
● Similar to the tests of population mean

Test for proportion
Test for
proportion
STB6BGNELH
One Sample Two Sample

One sample test - hypothesis
● The hypothesis to test the population proportion is equal to a specified value
H0 : P = P 0 against H1 : P ≠ P0
●
STB6BGNELH It implies
H0: The population proportion is equal to P0
against H1: The population proportion is not equal to P0
● Failing to reject H0 implies that the population proportion is equal to P0

Test for proportions
● The test statistic is given by

Specified
Sample
proportion
proportion
STB6BGNELH
Sample size
● Under H0, the test statistic follows standard normal distribution

One sample test for proportion - decision rule
Based on
H1 Based on critical region Based on p-value
confidence interval
STB6BGNELH
For two tailed test P ≠ P0 Reject H0 if |Z|≥ Zα/2

Reject H0 if p-value
Reject H0 if P0 does
is less than or equal
For left tailed test P < P0 Reject H0 if Z ≤ -Zα not lie in the
to the level of
confidence interval
significance
For right tailed test P > P0 Reject H0 if Z ≥ Zα

One sample test for proportion
Question:
From a sample 361 business owners had gone into bankruptcy due to recession. On taking a
survey, it was found that 105 of them had not consulted any professional for managing their
STB6BGNELH
finance before opening the business. Test the null hypothesis that at most 25% of all businesses
had not consulted before opening the business.
Test the claim using p-value technique. [Use α = 0.05].

Solution:
From a sample 361 business owners had gone into bankruptcy due to recession.
i.e. n = 361
STB6BGNELH
On taking a survey, it was found that 105 of them had not consulted any professional for
managing their finance before opening the business.
Let X: business which did not consult before x = 105
The sample proportion (p) = x/n = 105/361 = 0.2909

Solution:
To test: The null hypothesis that at most 25% of all businesses had not consulted before
opening the business
STB6BGNELH
Here P0 = 0.25
To test, H0: P ≤ 0.25 against H1: P > 0.25

Solution:
The test statistic

STB6BGNELH
The p-value = P(Z > 1.79) = 0.0367 1.79
As p-value < 0.05, reject H0.
We may conclude that at least 25% of all businesses had not consulted before starting the
business.
Python solution: Calculate test statistic
STB6BGNELH

STB6BGNELH
As the p-value < 0.05, we reject H0.

Test for proportion
Test for
proportion
STB6BGNELH
One Sample Two Sample

Two sample tests for population proportion
● Let there be two samples sizes n1 and n2 from different populations of such that x1 and
x2 are the number of specific items in each of them respectively
STB6BGNELH
● Suppose these samples have proportions of specific items p1 and p2 respectively
● To test the equality of population proportion from which these samples are chosen

● The hypothesis to test the population proportion
H0 : P1 = P 2 against H1 : P 1 ≠ P2
●
STB6BGNELH It implies
H0: The two population proportions are equal (P1 = P2)
against H1: The two population proportions are not equal (P1 ≠ P2)
● Failing to reject H0 implies that the two population proportions are equal
Test for proportions
● The test statistic is given by
where is the proportion

STB6BGNELH
of pooled sample such that
● Under H0, the test statistic follows standard normal distribution

The python code to conduct a Z test for two population proportions is
STB6BGNELH
statsmodels.api.stats.proportions_ztest(Sample_1, Sample_2)

Two sample test for proportion - decision rule
Based on
H1 Based on critical region Based on p-value
confidence interval
STB6BGNELH
For two tailed test P1 ≠ P2 Reject H0 if |Z| ≥ Zα/2

Reject H0 if p-value
Reject H0 if P1 - P2
is less than or
For left tailed test P1 < P2 Reject H0 if Z ≤ -Zα does not lie in the
equal to the level
confidence interval
of significance
For right tailed test P1 > P2 Reject H0 if Z ≥ Zα

Two sample test for proportion
Question:
Steve owns a kiosk where sells two magazines - A and B in a month. He buys 100 copies of
magazine A out of which 78 were sold and 70 copies of magazine B out of which 65 were sold. Is
STB6BGNELH
there enough evidence to say that magazine is B is more popular?
Test the claim using p-value technique. [Use α = 0.05].

Solution:
Steve owns a kiosk where sells two magazines - A and B in a month. Let X:
STB6BGNELH
the number of magazines sold

Out of 100 copies of magazine A 78 are sold Out of 70 copies of magazine B 65 are sold
Here, x1 = 78 and n1 = 100 Here, x2 = 65 and n1 = 70
Let p1 be the proportion of sell of magazine A Let p2 be the proportion of sell of magazine B
p1= x1/n1 = 78/100 = 0.78 p2 = x2/n2 = 65/70 = 0.928

Solution:
To test, whether magazine B is more popular
i.e H : P ≥ P
0 1
gaurrohit0071@gmail.com2 against H1: P1 < P2
STB6BGNELH
Where
P1: denotes population proportion of magazine A sold
P2: denotes population proportion of magazine B sold

Solution:
The pooled proportion is

STB6BGNELH

Solution:
The test statistic Z = -2.5905

The p-value = P(Z < Zcalc, under H0) = P(Z < -2.5905, µ = 13) = 0.0048
STB6BGNELH
Since p-value < 0.05, we reject H0.
-2.59
Thus there is enough evidence to conclude that magazine is B is more popular.

Python solution: Calculate test statistic and p-value
STB6BGNELH
As the p-value < 0.05, we reject H0.

Summary
STB6BGNELH

Parametric tests
● The tests considered so far have two features:
○ The probability distribution of the samples was assumed to be known

STB6BGNELH ○ The hypothesis test was about the parameter of the probability distribution
● These tests are known as the parametric tests
● The times when these assumptions are not satisfied, use the non-parametric tests

STB6BGNELH
Thank You


STAT - Day 3 - Faculty Deck

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STAT - Day 3 - Faculty Deck

Uploaded by

Copyright:

Available Formats

gaurrohit0071@gmail.

This file is meant for personal use by gaurrohit0071@gmail.com only.

● Small Sample Test

● Test for Population Proportion

This file is meant for personal use by gaurrohit0071@gmail.com only.

This file is meant for personal use by gaurrohit0071@gmail.com only.

This file is meant for personal use by gaurrohit0071@gmail.com only.

○ Performance of two portfolios

This file is meant for personal use by gaurrohit0071@gmail.com only.

● The two samples are independent of each other

This file is meant for personal use by gaurrohit0071@gmail.com only.

● Like the one sample tests, it is possible to test for

H0 : µ1 ≤ µ2 against H1: µ1 > µ2

H0 : µ1 ≥ µ2 against H1: µ1 < µ2

● Failing to reject H0 implies that the null hypothesis is true

This file is meant for personal use by gaurrohit0071@gmail.com only.

● The test statistic is Z given by

Sample Specified mean

● Under H0, the test statistic follows normal distribution

This file is meant for personal use by gaurrohit0071@gmail.com only.

● The test statistic is Z given by

Sample Specified mean

This file is meant for personal use by gaurrohit0071@gmail.com only.

This file is meant for personal use by gaurrohit0071@gmail.com only.

This file is meant for personal use by gaurrohit0071@gmail.com only.

X: Amount of hemoglobin in blood for males Y: Amount

Here n1 = 160, s1 = 4.1, = 13

To test H0: µ1 = µ2 against H1: µ1 ≠ µ2

This file is meant for personal use by gaurrohit0071@gmail.com only.

The test statistic

Decision Rule: Reject |Zcalc| ≥ Zα/2

Here Zα/2 = 2.58

Since 4.807 > 2.58, reject H0. -2.58 2.58

We may conclude that both males and females have

As test statistic (=-4.8068) < critical value (=-2.58), we

As p-value < 0.01, we reject H0.

This file is meant for personal use by gaurrohit0071@gmail.com only.

This file is meant for personal use by gaurrohit0071@gmail.com only.

This file is meant for personal use by gaurrohit0071@gmail.com only.

● To test difference between population means of two populations

● The two samples are independent of each other

● The hypothesis to test whether the population means are equal

H0: The two population means are equal (i.e µ1 = µ2)

● The one sided hypothesis are

● Recall for large samples the test statistic, Z is given by

σ12 and σ22 are known

● For small samples the test statistic, t is given by

Where is the pooled sample variance given by

This file is meant for personal use by gaurrohit0071@gmail.com only.

scipy.stats.ttest_ind(Sample_1, Sample_2, equal_var=True)

Note that this assumes equal population variance.

This file is meant for personal use by gaurrohit0071@gmail.com only.

This file is meant for personal use by gaurrohit0071@gmail.com only.

Let X: patients receiving medicine 1 Y:

To test H0: µ1 = µ2 against H1: µ1 ≠ µ2

The pooled variance is

This file is meant for personal use by gaurrohit0071@gmail.com only.

The test statistic is

Decision rule: Reject H0 if |tcal| ≥ tn1+n2-2, α/2

tn1+n2-2, α/2 = t27,0.005 = 2.771

The two medicines have the

As test statistic (=-1.667) >This

As p-value > 0.01, we fail to reject H0.