Professional Documents
Culture Documents
DSC7215
“The secret of getting ahead is getting started.” – Mark Twain 2
When to use t-Distribution
If the sample size is small (<30), the variance of the population is not adequately
captured by the variance of the sample. Instead of z-distribution, t-distribution is
used. It is also the appropriate distribution to be used when population variance is
not known, irrespective of sample size.
̅
(#$%) ̅
(#$%)
𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 𝑜𝑟 𝑡 𝑠𝑐𝑜𝑟𝑒 , 𝑡 = ! Recall 𝑧 = #
" "
DSC7215
Ref: "Comparing Normal and Student's t-Distributions" from the Wolfram Demonstrations Project, http://demonstrations.wolfram.com/ComparingNormalAndStudentsTDistributions/
Contributed by: Gary McClelland; Last accessed: July 27, 2020
“Don’t limit yourself. Many people limit themselves to what they think they can do. You can go as far as your mind lets you… 3
Properties of t-Distribution
$%& $%&
' ' $
*' '
• PDF of a t-distribution: $ 1+ , where 𝜈 is the degrees of freedom
()' ' (
!"# !"#
DSC7215
distribution (z distribution)
…What you believe, remember, you can achieve.” – Mary Kay Ash 4
Confidence Interval to Estimate 𝝁
𝑠 𝑠
𝑥̅ − 𝑡!"#$#!%& ≤ 𝜇 ≤ 𝑥̅ + 𝑡!"#$#!%& Recall: 𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 𝑜𝑟 𝑡 𝑠𝑐𝑜𝑟𝑒 , 𝑡 =
̅
(#$%)
!
𝑛 𝑛 "
– Sample mean, standard deviation and sample size can be calculated from
the data; Critical t value corresponding to the Significance Level, 𝛼, can be
obtained from software.
𝑝 𝑣𝑎𝑙𝑢𝑒
DSC7215
𝑡203415
−𝑡,-./.,01 𝑡,-./.,01
𝑡=0
“The best time to plant a tree was 20 years ago. The second best time is now.” – Chinese Proverb 5
t-Test in Pharmaceutical Industry
The labeled potency of a tablet dosage form is 100 mg. As per the
quality control specifications, 10 tablets are randomly assayed.
A researcher wants to estimate the interval for the true mean of the
batch of tablets with 95% confidence. Assume the potency is normally
distributed.
DSC7215
“Only the paranoid survive.” – Andy Grove 6
t-Test (Confidence Intervals) in Pharmaceutical Industry
Mean, 𝑥̅ = 99.69 mg
Standard deviation, s = 0.41
n = 10 0.975
𝜈 = 10 − 1 = 9
0.025
'
At 95% level, 𝛼 = 0.05, and ∴, = 0.025
(
qnorm(0.975,0,1) = 1.96
95% Confidence Level |z| = 1.96
qnorm(0.025,0,1) = -1.96
qt(0.975,9) = 2.262
DSC7215
95% Confidence Level |t| = 2.262
qt(0.025,9) = -2.262
“It’s hard to beat a person who never gives up.” – Babe Ruth 7
t-Test (Confidence Intervals) in Pharmaceutical Industry
Mean, 𝑥̅ = 99.69 mg, Standard deviation, s = 0.41
n = 10, 𝜈 = 10 − 1 = 9
𝑠 𝑠
𝑥̅ − 𝑡 ∗ ≤ 𝜇 ≤ 𝑥̅ + 𝑡 ∗
𝑛 𝑛
0.41 0.41
99.69 − 2.262 ∗ ≤ 𝜇 ≤ 99.69 + 2.262 ∗
10 10
99.40 ≤ 𝜇 ≤ 99.98
The batch (sample) mean is 99.69 mg with an error of +/-0.29 mg. The
researcher is 95% confident that the average potency of the batch (population)
of tablets is between 99.40 mg and 99.98 mg.
DSC7215
That is, it cannot be equal to 100 mg (if you are picky about decimal places).
“If people are doubting how far you can go, go so far that you can’t hear them anymore.” – Michele Ruiz 8
t-Test (Hypothesis Testing) in Pharmaceutical Industry
The labeled potency of a tablet dosage form is 100 mg.
What are null and alternate hypotheses?
𝐻6: 𝜇7 = 100; 𝐻7: 𝜇7 ≠ 100
Is it a one-tailed test or a two-tailed test? 0.95
DSC7215
𝑡= 𝑠 = = −2.3784 R: qt(0.025,9)
0.41
𝑛 10 Python: t.ppf(0.025,9)
“Write it. Shoot it. Publish it. Crochet it, sauté it, whatever. MAKE.” – Joss Whedon 9
t-Test in Pharmaceutical Industry – R
dosage <- c(99.2, 100.1, 100, 100, 99.5, 99.4, 99.3, 100.3, 99.9, 99.2)
R code: t.test(dosage, mu = 100)
𝑥̅ − 𝜇 99.69 − 100
𝑡= 𝑠 = = −2.3784
0.41
𝑛 10
HYPOTHESIS TESTING APPROACH
DSC7215
R code: pt(-2.3784,9) = 0.02067
All software output double the calculated value for 2-tailed tests to
'
enable easy check against the significance level (𝛼) instead of with .
(
“We need to accept that we won’t always make the right decisions, that we’ll screw up royally sometimes... 10
t-Test in Retail/eCommerce Industry
Amount Purchased (INR)
An eCommerce company believes people are spending Rs 5003 3400
2,000 on an average on vitamin supplements and immunity 2345 3476
2341 453
boosters during these pandemic affected days. They took a 3489 5643
random sample of 19 people and looked at their purchases. 321 5643
236 349
What is your conclusion at 95% confidence interval (or 712 761
equivalently, 5% significance level)? 2190 4321
324 987
451 549
2315 579
713 5612
453 651
678 670
902 654
910 2398
2110 549
DSC7215
312 982
1035 1129
837
1057
…– understanding that failure is not the opposite of success, it’s part of success.” – Arianna Huffington 11
t-Test (Confidence Intervals) in Retail/eCommerce Industry
Mean, 𝑥̅ = ₹ 1,412.63
Standard deviation, s = ₹ 1,287.38
n = 19 0.975
𝜈 = 19 − 1 = 18
0.025
'
At 95% level, 𝛼 = 0.05, and ∴, = 0.025
(
norm.ppf(0.975,0,1) = 1.96
95% Confidence Level |z| = 1.96
norm.ppf(0.025,0,1) = -1.96
t.ppf(0.975,18) = 2.10
DSC7215
95% Confidence Level |t| = 2.1
t.ppf(0.025,18) = -2.10
“It’s no use going back to yesterday, because I was a different person then.” – Lewis Carroll 12
t-Test (Confidence Intervals) in Retail/eCommerce Industry
Mean, 𝑥̅ = ₹ 1,412.63, Standard deviation, s = ₹ 1,287.38
n = 19, 𝜈 = 19 − 1 = 18
𝒔 𝒔
J−𝒕∗
𝒙 ≤𝛍≤𝒙 J+𝒕∗
𝒏 𝒏
1287.38 1287.38
1412.63 − 2.1 ∗ ≤ 𝜇 ≤ 1412.63 + 2.1 ∗
19 19
792.13 ≤ 𝜇 ≤ 2033.13
The batch mean is ₹ 1,412.63 with an error of ₹ 620.50.
The researcher is 95% confident that the average purchase amount of ALL
shoppers in the vitamins and supplements section is between ₹ 792.13 and
DSC7215
₹ 2,033.13.
“Smart people learn from everything and everyone, average people from their experiences, stupid people already have all the answers.” – Socrates 13
t-Test (Hypothesis Testing) in Retail/eCommerce Industry
What are null and alternate hypotheses?
𝐻6: 𝜇7 = 2,000; 𝐻7: 𝜇7 ≠ 2,000
Is it a one-tailed test or a two-tailed test?
0.95
Two-tailed.
0.025 0.025
What are the degrees of freedom?
𝜈 = 19 − 1 = 18
What is the t-score (or t-value) of the sample?
𝜇 = ₹2000
𝑥̅ − 𝜇 1412.63 − 2000
𝑡= 𝑠 = = −1.99 -2.1
1287.38
DSC7215
𝑛 19
“Do what you feel in your heart to be right – for you’ll be criticized anyway.” – Eleanor Roosevelt 14
t-Test in Retail/eCommerce Industry - Python
DSC7215
“Happiness is not something ready made. It comes from your own actions.” – Dalai Lama XIV 15
TWO-SAMPLE t-TEST FOR MEANS
DSC7215
“Whatever you are, be a good one.” – Abraham Lincoln 16
▪ Do two samples come from the same population?
▪ If they come from different populations, what is the difference in the means of the
two populations?
▪ Is one version of web page leading to better product sales in an A/B testing for an
eCommerce company?
▪ Does the average cost of a two-bedroom flat differ between Delhi and Mumbai? What is the
difference?
▪ What is the difference in the strength of steel produced under two different temperatures?
▪ Does the effectiveness of Head & Shoulders anti-dandruff shampoo differ from Pantene anti-
dandruff shampoo?
▪ What is the difference in the productivity of men and women on an assembly line under
certain conditions?
DSC7215
▪ Does an antibiotic affect the efficacy of another drug being taken by a patient?
“The same boiling water that softens the potato hardens the egg. It’s what you’re made of. Not the circumstances.” – Anonymous 17
Two-Sample t-Test (Student’s and Welch’s t-tests)
In 2-sample t-test, Student’s t-test Welch’s t-test
difference in means of !!" = !""
# (&'()*()
!!" ≠ !""
# (&'()*()
Assumption and Check
the two samples is Assumed true if # !(#,'&&*() ≤ 2
#"" (#,'&&*()
Assumed true if # !(#,'&&*() > 2
#"" (#,'&&*()
studied. Null Hypothesis '-: *! = *" '-: *! = *"
Alternative Hypothesis '!: *! ≠ *" '!: *! ≠ *"
-̅ − -̅" − (*! − *")
,= ! = -̅! − -̅" − (*! − *")
1 1 ,=
Test Statistic, t 2. + 2!" 2""
4! 4"
" /!0! #!""1(/"0!)#""" 4! + 4"
where 2. = / 0! 1(/ 0!)
/!!0! 1(/""0!)
"
2!" 2""
+
4! 4"
6=
Degrees of freedom, 6 6 = (4!+ 4" − 2) 2!" " 2"" "
4! 4
𝑥̅% − 𝑥̅* + "
𝜇% − 𝜇* 4! − 1 4" − 1
DSC7215
R Code (95% t.test(data1, data2, var.equal =
Confidence Interval) TRUE) t.test(data1, data2)
ttest_ind(data1, data2,
Python Code (95% CI) ttest_ind(data1, data2)
equal_var = False)
“If we have the attitude that it’s going to be a great day it usually is.” – Catherine Pulsifier 18
DSC7215
The Times of India, Hyderabad edition, October 25, 2020
“You can either experience the pain of discipline or the pain of regret. The choice is yours.” – Anonymous 19
t-Test in Pharmaceutical / Healthcare Industry
Antibiotic rifampicin increases the amount of
drug metabolizing enzyme present in the
liver. This causes increase in the rate of
elimination of a lot of other drugs.
An experiment was conducted to study
whether rifampicin affects the metabolic
removal of the anti-asthma drug
theophylline. A high elimination rate would
mean inadequate treatment of the patient’s Image Source: Deccan Chronicle, Hyderabad edition, May 04,
2016
asthma.
DSC7215
“Impossible is just an opinion.” – Paulo Coelho 20
t-Test in Pharmaceutical / Healthcare Industry
Two groups of 15 subjects were pre-treated with oral rifampicin (600 mg daily
for 10 days) and a placebo, respectively. All of them were then given
intravenous injection of theophylline (3 mg/kg of body weight).
Drug content was then measured from the blood samples and efficiency of
removal of theophylline reported as clearance (in ml/min/kg).
DSC7215
“Your passion is waiting for your courage to catch up.” – Isabelle Lafleche 21
t-Test in Pharmaceutical / Healthcare Industry
Clearance of theophylline (ml/min/kg)
Control Subjects Treated Subjects
0.81 0.56 0.46 1.15 1.15 0.92
1.06 0.45 0.43 1.28 0.72 0.67
0.43 0.88 0.37 1.00 0.79 0.76
0.54 0.73 0.73 0.95 0.67 0.82
0.68 0.43 0.93 1.06 1.21 0.82
𝑛( = 15 𝑛; = 15
𝑥( = 0.633 𝑥; = 0.931
𝑠( = 0.216 𝑠; = 0.202
𝑠9:;*<:= 0.216
DSC7215
= = 1.07, 𝑖. 𝑒. , < 𝟐
𝑠><?@*?A 0.202
Hence, Student’s t-test is used assuming 𝜎7+ = 𝜎++
“Magic is believing in yourself. If you can make that happen, you can make anything happen.” – Johann Wolfgang Von Goethe 22
t-Test in Pharmaceutical / Healthcare Industry
Hypothesis Testing Approach
DSC7215
Rifampicin decreases theophylline clearance.
“If something is important enough, even if the odds are stacked against you, you should still do it.” – Elon Musk 23
t-Test in Pharmaceutical / Healthcare Industry
At α = 0.05, determine if there is a significant difference between the two groups.
DSC7215
simply compare with 0.05 and not worry about having to compare with 0.025.
“Don’t be afraid to give up the good to go for the great.” – John D Rockefeller 24
t-Test in Pharmaceutical / Healthcare Industry
Hypothesis Testing Approach
DSC7215
“People who wonder if the glass is half empty or full miss the point. The glass is refillable.” – Anonymous 25
t-Test in Pharmaceutical / Healthcare Industry
Confidence Intervals Approach
𝑥7 − 𝑥+ − (𝜇7 − 𝜇+)
𝑡=
1 1
𝑠B 𝑛7 + 𝑛+
Rewriting:
1 1 1 1
𝑥7 − 𝑥+ − 𝑡 ∗ 𝑠B + ≤ 𝜇7 − 𝜇+ ≤ 𝑥7 − 𝑥+ + 𝑡 ∗ 𝑠B +
𝑛7 𝑛+ 𝑛7 𝑛+
0.298 − 2.048 ∗ 0.0763 ≤ 𝜇7 − 𝜇+ ≤ 0.298 + 2.048 ∗ 0.0763
95% CI: (0.142, 0.454)
qt(0.025, 28) qt(0.975, 28)
DSC7215
Note zero difference (null hypothesis) is unlikely as at 95% Confidence Level, the
difference ranges between 0.142 and 0.454 ml/min/kg, with a point estimate for the
difference in mean clearance being 0.298 ml/min/kg.
“Things may come to those who wait, but only the things left by those who hustle.” – Abraham Lincoln 26
t-Test in Pharmaceutical / Healthcare Industry
R
HYPOTHESIS TEST
CONFIDENCE INTERVAL
DSC7215
All software output double the calculated value for 2-tailed tests to
'
enable easy check against the significance level (𝛼) instead of with ( .
The eCommerce company already tested if the average spend on Version A of the
site was ₹2,000 or not using 1-sample t-test. Now they conducted another test.
200 people (100 male and 100 female) were randomly shown a different version (A
DSC7215
or B) of the webpage and their purchases tracked. Is there a significant difference
in the purchase amounts between the 2 variants?
DSC7215
312 982
1035 1129
837
1057
“Work like there is someone working twenty four hours a day to take it away from you.” – Mark Cuban 29
t-Test: A/B Testing in eCommerce Industry
What is the null hypothesis?
𝐻6: 𝜇7 − 𝜇+ = 0 (Variant A is not different from Variant B)
DSC7215
“Hustle in silence and let your success make the noise.” – Anonymous 30
t-Test: A/B Testing in eCommerce Industry
Python: ttest_ind(VariantA, VariantB)
HYPOTHESIS TEST
CONFIDENCE INTERVAL
DSC7215
Can true difference in means be equal to 0?
R code: pt(-1.0111,38) = 0.1592
All software output double the calculated value for 2-tailed tests to
'
enable easy check against the significance level (𝛼) instead of with .
(
“We are what we repeatedly do. Excellence, then, is not an act, but a habit.” – Aristotle 31
t-Test: A/B Testing in eCommerce Industry
Will you reject the null hypothesis or fail to do so?
DSC7215
“If you’re offered a seat on a rocket ship, don’t ask what seat! Just get on.” – Sheryl Sandberg 32
t-Test in Manufacturing Industry
A machine produces metal sheets with 22mm thickness. There is variability in
thickness due to machines, operators, manufacturing environment, raw material,
etc. The company wants to know the consistency of two machines and randomly
samples 10 sheets from machine 1 and 12 sheets from machine 2. Thickness
measurements are taken. Assume sheet thickness is normally distributed in the
population.
The company wants to know if the two samples come from the same population
(population means are equal) or from different populations (population means are
unequal).
DSC7215
How do you test this?
“If you hear a voice within you say ‘you cannot paint,’ then by all means paint and that voice will be silenced.” – Vincent Van Gogh 33
t-Test in Manufacturing Industry
Machine 1 Machine 2
22.3 21.9 22.0 21.7
21.8 22.4 22.1 21.9
22.3 22.5 21.8 22.0
21.6 22.2 21.9 22.1
21.8 21.6 22.2 21.9
22.0 22.1
𝜇 = 22.040 𝑛 = 10 𝜇 = 21.975 𝑛 = 12
𝑠R@STO;?7 0.34
= = 2.37, 𝑖. 𝑒. , > 𝟐
DSC7215
𝑠R@STO;?+ 0.14
Hence, Welch’s t-test is used assuming 𝜎7+ ≠ 𝜎++
“Some people want it to happen, some wish it would happen, others make it happen.” – Michael Jordan 34
t-Test: A/B Testing in Manufacturing Industry
What is the null hypothesis?
𝐻6: 𝜇7 − 𝜇+ = 0 (Machine 1 is not different from Machine 2)
DSC7215
“Great things are done by a series of small things brought together.” – Vincent Van Gogh 35
t-Test in Manufacturing Industry
Python: ttest_ind(VariantA, VariantB, equal_var = False)
R: t.test(Machine1, Machine2)
HYPOTHESIS TEST
CONFIDENCE INTERVAL
DSC7215
R code: pt(0.5687,11.655, lower.tail = FALSE) = 0.2901765
All software output double the calculated value for 2-tailed tests to
'
enable easy check against the significance level (𝛼) instead of with .
(
“Leaders can let you fail and yet not let you be a failure.” – Stanley McChrystal 36
t-Test in Manufacturing Industry
Will you reject the null hypothesis or fail to do so?
DSC7215
“It’s not the load that breaks you down, it’s the way you carry it.” – Lou Holtz 37
Learning Outcomes •
•
These are the expected outcomes from your learning today.
If you can confidently and fluently answer the below, you have understood the key aspects.
• Before asking questions, you must first revise the material and the videos of today’s session.
of Today’s Session • Your questions must clearly indicate your understanding of the topic before we clarify.
▪ LO14.4 – Be able to provide a simple definition and/or explanation of degrees of freedom and compute
them in various problems
▪ LO14 – Be able to explain t-test and give some real-world examples of its use
▪ LO14.2 – Be able to explain the differences and similarities between the t and the z-distribution
▪ LO14.3 – Be able to use one-sample and two-sample t-tests and take decisions using both Confidence
Intervals and Hypothesis Testing approaches based on the software output
DSC7215
“The hard days are what make you stronger.” – Aly Raisman 38
Email: info@insofe.edu.in Website: www.insofe.edu.in
This presentation may contain references to findings of various reports available in the public domain. INSOFE makes no representation as to their accuracy or that the organization subscribes to those findings. 39