You are on page 1of 10

MATH 006B Statistical Analysis with Software Applications

Module 3
The t Distribution
The Normal Distribution
• A symmetric continuous distribution, has two parameters
➢ The mean
➢ The standard deviation (std)
• A Standard Normal Distribution
➢ Mean = 0
➢ Std = 1
The Standard Normal Distribution Example: = T.DIST(1.3, 21, TRUE) = 0.89616
It is symmetric about zero with a fixed standard deviation = 1

The t Distribution
a.k.a. the Student’s t Distribution
• A symmetric distribution centered at zero
• Has a single parameter called the degrees of freedom or df

The t Distribution, degrees of freedom


• It is the parameter of the t-distribution
• A mathematical constraint imposed by the data
• Linked to size of data being used
• Larger set of data would have greater degrees of freedom
• As degrees of freedom increases, the t-distribution becomes closer
and closer to the standard normal distribution
The t distribution, T.INV Function
• The T.INV Function is an inverse of the T.DIST Function
• The T.DIST Function calculates the probability to the left of any
• In a t-distribution, as the degrees of freedom increase, the t- particular point in a t distribution.
distribution becomes closer to the Standard Normal Distribution.
• Does not have any stand-alone business application
• Used as an interim tool to calculate confidence intervals and
hypothesis testing
• Has an associated probability density function
• Associated Excel functions are:
➢ = T.DIST
➢ = T.INV
The t Distribution, T.DIST function
Syntax: = T.INV(probability, df)
= T.INV(0.05, 15)

Syntax: = T.DIST(x, df, TRUE)


MATH 006B Statistical Analysis with Software Applications
Module 3

• Example:
Philippine Presidential Election, predicting the proportion of votes
for a popular candidate
Confidence interval for the ‘Population Proportion’
• Example:
Average starting salary of all business students who graduated last
=1.75305 year in TIP
Confidence interval for the ‘Population Mean’
Gosset and Guinness
The z statistic and the t statistic
Central Limit Theorem (CLT)

Willia m Sealy
Gosset, 1876 - 1937
Confidence interval
• It is an ‘interval’ with some ‘confidence’ or probability attached to
it
• An interval for some unknown characteristic of the population data
• Example.
Predicting the actual share of votes of a particular candidate in a bi-
party Philippine presidential election.
Philippine Presidential Election
➢ Only two candidates, A and B
➢ Wish to predict vote percentages for candidate A
➢ A random selection of 500 potential voters surveyed.
➢ 300 voters will vote for A and 200 for B
➢ That is, 60% of surveyed voters will vote for A • A more realistic scenario is where the population standard deviation
Will candidate A get 60% of all votes in the actual election? σ is not known.
Confidence Interval
Example.
Philippine Presidential Election
A 95% confidence interval for the vote share of candidate A,
[55.7%, 64.3%]
A 95% confidence interval for the vote share of candidate A,
There is a 0.95 probability that the actual share for A will be between
55.7% and 64.3%
MATH 006B Statistical Analysis with Software Applications
Module 3
More about the z-scores or z-statistic 𝟔𝟔. 𝟕𝟖 < 𝝁 < 𝟕𝟑. 𝟐𝟐
• A data point’s distance (how many standard deviations are there) … a 85% Confidence Interval for the population mean
from the population mean. What if we want a 95% Confidence Interval for the population mean?
• Standard Score
• In the finance world, it can tell if a company is near bankruptcy using
The Altman Z-score.
• In the quality management world, z-scores are used in Six Sigma
Performance Evaluations.
The Altman Z-Score

What if we want a 99% Confidence Interval for the population mean?


𝟔𝟒. 𝟐𝟒 < 𝝁 < 𝟕𝟓. 𝟕𝟔
Confidence Interval Construction
A Stylized example …
A random sample of 20 observations from a population data had a
mean equal to 70. The standard deviation of the population data is 10.
Find an 85% confidence interval for the population mean.
Probability outside the confidence interval is referred to as
α
… and we wish to construct a (1 – α ) confidence interval for the
population mean
A (1 – α ) confidence interval for the population mean …

Trade-off between ‘Precision’ and ‘Uncertainty’

𝝈
𝑴𝒂𝒓𝒈𝒊𝒏 𝒐𝒇 𝑬𝒓𝒓𝒐𝒓 = |𝒛𝜶 |
√𝒏 𝟐
= NORM.INV(0.15/2, 0, 1)
= -1.4395

𝝈
𝑴𝒂𝒓𝒈𝒊𝒏 𝒐𝒇 𝑬𝒓𝒓𝒐𝒓 = |𝒛𝜶 |
𝟐 √𝒏
𝟏𝟎
𝑴𝒂𝒓𝒈𝒊𝒏 𝒐𝒇 𝑬𝒓𝒓𝒐𝒓 = |−𝟏. 𝟒𝟑𝟗𝟓|
√𝟐𝟎
MATH 006B Statistical Analysis with Software Applications
Module 4
Confidence Interval
A (1 – α) confidence interval for the population mean …

Where:
α = the probability outside the Confidence Interval
𝜇 = the unknown population mean
𝑥̅ = the sample mean

A 95% confidence interval for the average (population mean) house


size: [3048.2, 3428.3] square feet
Confidence Interval for the Population Proportion
In the past lessons we introduced calculation for a confidence interval
for the population mean
When the population standard deviation (σ) is not known,
➢ we replace it by the sample standard deviation (s)
➢ the z-statistic (z α/2) gets replaced by the t-statistic (t α/2)
Example:
A consultancy firm surveyed a randomly selected set of 210 CEOs of
‘fast growing small companies’ in the US and Europe. Only 51% of
these executives had a management succession plan in place, the
remaining did not have one.
Use this information to compute a 90% confidence interval to estimate
the proportion of all ‘fast growing companies’ that have a management
succession plan.

Where:
p = unknown population proportion
𝑝̂ = sample population proportion
|𝑧𝛼 |= NORM.INV(probability, mean, std)
2

Example
A person wishes to explore the size of single family houses that are
typically available for purchase in a particular neighborhood of
Marikina Heights in Marikina City. She manages to get hold of a list
containing house sizes of a sample of 100 houses that were sold in
the past two years. The data is provided in the Excel data file
Home_Sizes.xlsx, and given this data she wishes to assess the
average size of houses typically available for purchase in this
neighborhood.
➢ Population standard deviation is not known
➢ Need to use calculation for confidence interval based on t
statistic
MATH 006B Statistical Analysis with Software Applications
Module 4

Answer:
The 90% confidence interval is [0.453, 0.567] or [45.3%, 56.7%]

Confidence Interval
• For the unknown Population Mean

• For the unknown Population Proportion

Sample Size Calculation


How big a sample to take?
• A pollster wanting to make a prediction about a particular
candidate’s vote share in the Philippine Presidential Election.
• A quality control manager at a battery manufacturer wanting to
estimate the average number of defective batteries contained in a
box shipped by the company.
Different industries (disciplines) may have a different rule of
thumb strategies for sample size selection.
The Statistics behind it.
Typically, we are interested in,
➢ The Population Mean
➢ The Population Proportion
… and we build confidence intervals for these unknown quantities.
• The pollster may want to have a Margin of Error +/- 3% with a
confidence level of 95%.
• The quality control manager may want to assess the average
number of defectives in a box with a margin of error of +/- 0.3
batteries and a confidence level of 95%.
Example:
A quality control manager at a battery manufacturer wants to estimate
the average number of defective batteries contained in a box shipped
by the company.
How many boxes does she need to open to figure out the average
number of defective batteries contained in a box?
Margin of Error : +/- 0.3 batteries
Confidence Level : 95%
Population Std. Dev : ? → 0.9 batteries
MATH 006B Statistical Analysis with Software Applications
Module 4
Sample Size Calculation Hypothesis Testing
How big a sample to take? • Hypothesis tests are an important tool to analyze data and make
A pollster wanting to make a prediction about a particular candidate’s some inferences from it.
vote share in the Philippine Presidential Election. • All hypothesis tests follow a basic logic…
How many voters should the pollster survey? 1. An assumption or claim is made.
The pollster in the prediction wants to have a Margin of Error +/- 3% 2. If your data contradicts this assumption or claim then you
with a confidence level of 95% conclude that the claim or assumption made must be wrong.
• Example:
As I drove to work one day recently, I assumed that the road that I
normally take to school would, as usual, take me to school.
There is the assumption, part 1 of the hypothesis.
But I reached a construction barricade, the road was closed.
There is the data.
It contradicts my assumption. So my hypothesis is wrong.
• Another Example:
You are the production manager at a beverage manufacturing firm and
you receive a bottling unit that has been recently re-adjusted so that it
puts 200 milliliter of beverage in disposable plastic bottles.
There is the assumption. You assume that the unit actually puts
200 milliliter of beverage in the bottles.
• Another Example Continuation:
Next, rather than accepting this assumption, you decide to test it using
Use a conservative estimate of 𝒑
̂ data. You fill-up 10 bottles using the unit at different times so as to
obtain a random sample and very carefully measure the amount of
beverage inside each bottle.
This is your data. A random sample of 10 observations on the
For n to be maximum, 𝑝̂ (1 − 𝑝̂ ) has to be maximized amount of beverage in the bottles.
• What would you conclude…
If the average amount of beverage per bottle across these 10 bottles
is 170 milliliter?
Easy to conclude that the bottling unit is not properly adjusted.
• What would you conclude…
If the average amount of beverage per bottle across these 10 bottles
is 200 milliliter?
Again, the conclusion seems easy given the evidence.
• What would you conclude…
If the average amount of beverage per bottle across these 10 bottles
Use a conservative estimate of 𝑝̂ . For n to be maximum, 𝑝̂ has to be 0.5 is 199.9 milliliter or 200.1 milliliter?
Perhaps, you would conclude that the unit is properly adjusted.
• What would you conclude…
If the average amount per bottle in the sample turns out to be 199.1
ml? 198 ml? 202 ml? …?
At what point would you start rejecting the assumption that the
unit puts in 200 ml of beverage?
At what point would you start rejecting the assumption that the
unit puts in 200 ml of beverage?
Many industries use rule-of-thumb strategies / heuristics • Use your ‘gut feeling’
• A more scientific procedure is to use hypothesis testing
• It takes into account…
➢ Size of the sample
➢ Variability in the sample
➢ Level of ‘significance’ you desire in your conclusion
Hypothesis Test is a scientific tool to aid your decision making
Is a widely used procedure to test a variety of claims…
• Testing the fuel efficiency claim of a car manufacturer.
• Testing the claims of efficacy made by a new drug.
• Testing the claim that the defective rate in your production process
is greater than the acceptable limit.
• Testing the claim of many others.
MATH 006B Statistical Analysis with Software Applications
Module 5
The Logic of Hypothesis Testing Distribution of sample mean
• All hypothesis tests follow a basic logic…
1. An assumption or claim is made.
2. If your data contradicts this assumption or claim then you
conclude that the claim or assumption made must be wrong.
The Logic of Hypothesis Testing
• Example:
You are the production manager at a beverage manufacturing firm and
you receive a bottling unit that has recently been re-adjusted so that it 𝑺𝒂𝒎𝒑𝒍𝒆 𝒎𝒆𝒂𝒏 𝒇𝒂𝒍𝒍𝒊𝒏𝒈 𝒊𝒏 𝒕𝒉𝒆 𝒓𝒆𝒋𝒆𝒄𝒕𝒊𝒐𝒏 𝒓𝒆𝒈𝒊𝒐𝒏
puts 200 milliliter of beverage in disposable plastic bottles. ≡
You need to test that the bottling unit puts in 200 milliliters of beverage. 𝑻𝒉𝒆 𝒛 − 𝒔𝒕𝒂𝒕𝒊𝒔𝒕𝒊𝒄 𝒇𝒂𝒍𝒍𝒊𝒏𝒈 𝒊𝒏 𝒕𝒉𝒆 𝒓𝒆𝒋𝒆𝒄𝒕𝒊𝒐𝒏 𝒓𝒆𝒈𝒊𝒐𝒏
For that you fill-up 10 bottles using the unit at different times so as to
obtain a random sample and very carefully measure the amount of
beverage inside each bottle.
The Logic of Hypothesis Testing
Example:
You are the production manager at a beverage manufacturing firm and
you receive a bottling unit that has recently been re-adjusted so that it
puts 200 milliliter of beverage in disposable plastic bottles. 𝑺𝒂𝒎𝒑𝒍𝒆 𝒎𝒆𝒂𝒏 𝒇𝒂𝒍𝒍𝒊𝒏𝒈 𝒊𝒏 𝒕𝒉𝒆 𝒓𝒆𝒋𝒆𝒄𝒕𝒊𝒐𝒏 𝒓𝒆𝒈𝒊𝒐𝒏
You need to test that the bottling unit puts in 200 milliliters of beverage. ≡
For that you fill-up 10 bottles using the unit at different times so as to 𝑻𝒉𝒆 𝒕 − 𝒔𝒕𝒂𝒕𝒊𝒔𝒕𝒊𝒄 𝒇𝒂𝒍𝒍𝒊𝒏𝒈 𝒊𝒏 𝒕𝒉𝒆 𝒓𝒆𝒋𝒆𝒄𝒕𝒊𝒐𝒏 𝒓𝒆𝒈𝒊𝒐𝒏
obtain a random sample and very carefully measure the amount of Guidelines for Framing the Null Hypothesis
beverage inside each bottle.
• Null hypothesis should not have a strict inequality
Population mean claimed by the bottling unit .
Null hypothesis cannot have,
Using the Central Limit Theorem ≠ > <
Null hypothesis can only have,
= ≥ ≤

Recap of formulas ̅
𝒙− 𝝁
1. Calculating the t-statistic 𝒕 − 𝒔𝒕𝒂𝒕𝒊𝒔𝒕𝒊𝒄 = 𝒔
ൗ 𝒏
If the claim is correct: 𝒙
̅ ~ 𝑵𝒐𝒓𝒎𝒂𝒍 (𝟐𝟎𝟎,
𝝈
) 2. Calculating rejection regions for the t-statistic √
√𝒏

Higher probability Example for when the Null hypothesis turns out to have a strict
Distribution of around the mean inequality.
sample mean if the We wish to test a claim that the average age of Men MBA students
claim is correct Lower probability across various MBA programs in Metro Manila is greater than 28
in the tails years. For this we collect data on average ages of men MBA students
across a sample of 40 MBA programs in Metro Manila.

If the sample mean falls in


the rejection region, we
reject the claim

Distribution of If the sample mean does not


sample mean if the fall in the rejection region, we
claim is correct do not reject the claim
MATH 006B Statistical Analysis with Software Applications
Module 5
Hypothesis Testing Steps 1
Steps in Hypothesis Testing
• Example:
You are the production manager at a beverage manufacturing firm and
you receive a bottling unit that has recently been re-adjusted so that it
puts 200 milliliter of beverage in disposable plastic bottles.
You need to test that the bottling unit puts in 200 milliliters of
beverage.
For that you fill-up 10 bottles using the unit at different times so as to
obtain a random sample and very carefully measure the amount of
beverage inside each bottle. (Standard deviation = 0.8 ml)
Let us assume that:
n = 10, 𝒙 ̅ = 199 ml, s = 0.8 ml

• Marketing Example:
Subaru of America rates the customer satisfaction of its dealers on a
weekly basis on its Purchase Experience Survey, and demands that
dealers achieve a 93% satisfaction score, or the dealers are required
H : μ ≤ 28 Do not reject to take additional training to improve their customer satisfaction
➢ Do not reject the Null hypothesis
0
H : μ > 28 Reject
scores. Suppose that you have selected a random sample of rating
➢ Reject the Alternate hypothesis A forms submitted by new car purchasers (either online or through the
mail) for the St. Louis Subaru dealer from a recent week and that you
have prepared the hypothetical table Business_Case.xlsx (Mktg
Conclusion Customer Satisfaction worksheet).
The average age of Men MBA students across various MBA Hypothesis Testing of Surveyed Items
programs in Metro Manila is less than or equal to 28 years. Step 1: Formulate the hypothesis
Null hypothesis: H0: μ = 4
Alternate hypothesis: HA: μ ≠ 4
MATH 006B Statistical Analysis with Software Applications
Module 5
Step 2: Calculate for the t-statistic Hypothesis Test for Population Proportion
̅− 𝝁
𝒙 𝟓.𝟎𝟗−𝟒
𝒕 − 𝒔𝒕𝒂𝒕𝒊𝒔𝒕𝒊𝒄 = 𝒔 = 𝟏.𝟓𝟏 = 3.39 • Example:
ൗ ൗ
√𝒏 √𝟐𝟐 TIP introduces a new lunch facility on campus on a trial basis. TIP
Step 3: Calculate the t-cutoff operates the lunch facility for a few months and then decides to survey
Two Tail Test the student body. Based on the survey, TIP would make this facility a
t-cutoff = +/-|T.INV(α/2,df)| permanent fixture or do away with it. Specifically, if more than or
t-cutoff = +/-|T.INV(0.05/2, 21)| equal to 70% of the student body approves of it then the facility would
t-cutoff = +/- 2.08 be made permanent else it would shut down.
Step 4: Check if the t-statistic falls within the rejection region. TIP conducts a survey with 750 randomly selected students on
The t-statistic falls within the rejection region campus and finds that 510 of these students (or 68% of the sampled
students) approve of the new facility and the remaining 240 students
or 32% students do not approve of it.
Based on the criteria set by TIP should the facility be made
permanent?
• Population Proportion rather than the Population Mean
• The facility would be made permanent if ≥ 70% of the entire student
body approves it.
• TIP has a sample of 750 responses
Step 1: Formulate the hypothesis
Null hypothesis H0: p ≥ 0.70
Alternate hypothesis HA: p < 0.70
Conclusion: Step 2: Calculate for the t-statistic … z-statistic
Since the t-statistic falls within the rejection region, we should reject 𝑝̅ − 𝑝
the Null hypothesis. We can say that at 95% confidence level, the new 𝑧 − 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 =
car buyers at St. Louis Subaru agree that the salesperson was √𝑝(1 − 𝑝)
knowledgeable about the Subaru model line. 𝑛
Where: 𝑝̅ is the sample proportion, 𝑝 is the population proportion, and
• Human Resources Example: 𝑛 sample size
Suppose that you work in the Human Resources department of your
company and that top management has asked your department to
conduct a Morale Survey of managers to determine their attitude
toward working in this company. To check your Excel skills, you have
drawn a random sample of the results of the survey from the managers
on one question, and the data from a certain item in the instrument
appear. Open Business_Case.xlsx (HRM Morale Survey worksheet)

Step 1: Formulate the hypothesis


Null hypothesis: H0: μ = 5
Alternate hypothesis: HA: μ ≠ 5
Step 2: Calculate for the t-statistic
̅− 𝝁
𝒙 𝟒.𝟕𝟐−𝟓
𝒕 − 𝒔𝒕𝒂𝒕𝒊𝒔𝒕𝒊𝒄 = 𝒔 = 𝟏.𝟗𝟎 = -0.74
ൗ ൗ
√𝒏 √𝟐𝟓
Step 3: Calculate the t-cutoff
Two Tail Test
t-cutoff = +/-|T.INV(α/2,df)|
t-cutoff = +/-|T.INV(0.05/2, 24)|
t-cutoff = +/- 2.06
Step 4: Check if the t-statistic falls within the rejection region.
The t-statistic does not fall within the rejection region

Step 2: Calculate for the t-statistic … z-statistic


𝑝̅ − 𝑝
𝑧 − 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = = −1.1952
√ 𝑝(1 − 𝑝)
𝑛
The following conditions should be met
𝑛𝑝̅ > 5 𝑎𝑛𝑑 𝑛(1 − 𝑝̅) > 5

Conclusion:
Since the t-statistic does not fall within the rejection region, we should
fail to reject the Null hypothesis. We can say that at 95% confidence
level, the managers rate the intellectual challenge provided by their
jobs as neither high nor low.
MATH 006B Statistical Analysis with Software Applications
Module 5
Success/Failure Condition (Approximately Normal Condition) Step 4: Check if the z-statistic falls within the rejection regions.
Sometimes with a binomial experiment, we can use a normal distribution to (the orange-colored areas are the rejection regions)
approximate a binomial; This can make finding probabilities easier for some
experiments — especially when you’re dealing with large samples. The
Success/Failure condition is used to figure out if a sample size in a binomial
experiment is large enough to use the normal approximation.

The following conditions should be met


𝑛𝑝̅ > 5 𝑎𝑛𝑑 𝑛(1 − 𝑝̅ ) > 5

Step 3: Cutoff values for the z-statistic α = 0.05


[One tail test, rejection region on the left hand side] The z-statistic does not fall within the rejection regions.

Fail to reject the Null hypothesis.


Conclusion:
Since the z-statistic does not fall within the rejection regions, we fail to reject
the Null hypothesis. It can also be said that there is enough evidence to
suggest that half of the individual grocery owners consider Wal-Mart as their
biggest competitive threat.
Type I and II Errors
Two Types of Possible Error
• Type I Error
• Type II Error
Example
Your friend Sam claims that he can shoot 40 or more baskets in an hour from
the 3-point line in a Basketball court. So, Sam is making a claim about a
population parameter, in this case it is his true shooting ability from the 3-
Step 4: Check whether the z-statistic falls in the rejection region point line in a Basketball court. This can be likened to the population mean
mu. Thus Sam is claiming that the population mean mu of his shooting ability
is greater than or equal to 40 baskets in an hour from the 3-point line in a
Basketball court.
Next, you decide to test this claim. For that, you take Sam to the basketball
court everyday for 10 days and make him shoot baskets from the 3 point line
for an hour every day. You end up with 10 data points which are the number
of baskets Sam shot in those 10 days. You can calculate the sample mean
and the sample standard deviation from these ten observations.

Step 1: Formulate the hypothesis


Conclusion Null hypothesis H0: μ ≥ 40
• Do not reject the Null hypothesis. Alternate hypothesis HA: μ < 40
• The lunch facility should be made permanent.

• Another Example:
In a survey of independent grocery owners, they were asked to consider what
they believed to be their biggest competitive threat. Some noted large-chain
supermarkets, wholesale clubs, and other independent grocers. The common
response, given by 78 of the 151 owners was that they viewed Wal-Mart as
their biggest competitive threat. The claim is that half of the independent Step 2: Calculate the t-statistic
grocery owners view Wal-Mart as their biggest threat. Step 3: Cutoff values for the t-statistic
Step 4: Check whether the t-statistic falls in the rejection region
Step 1: Formulate the hypothesis
Null hypothesis H0: μ = .50 Suppose that Sam’s true ability is indeed ≥ 40.
Alternate hypothesis HA: μ ≠ .50 However, the 10 days were not good for Sam.
Step 2: Calculate the z-statistic He gave a low sample average.
𝑝̅ − 𝑝 You reject the Null hypothesis.
𝑧 − 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 =
√𝑝(1 − 𝑝) Type I Error: Rejecting the Null hypothesis when it is true.
𝑛 (Would it be possible for you to tell if a Type I Error has occurred?)
78
Where: 𝑝̅ = sample proportion; = 0.5166 ‘α’, the significance level is also known as the probability of a Type I Error.
151
p = population proportion; .50
Two Types of Possible Error
n = sample size; 151
Suppose that Sam’s true ability is NOT ≥ 40.
However, the 10 days were lucky for Sam.
0.5166−0.50
𝑧 − 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = = 0.40797 He gave a high sample average.

0.50(1−0.50) You DID NOT reject the Null hypothesis.
151

Type II Error: NOT Rejecting the Null hypothesis when it is false.


Step 3: Calculate the z-cutoff values α = 0.05
Two-Tailed Test Reducing the probability of Type I and Type II Errors
z-cutoff = |NORM.INV(α/2, 0, 1)| • The probability of Type I Error is set by our choice of α.
z-cutoff = ABS(NORM.INV(0.05/2, 0, 1)) • Typically, α = 0.05 or 0.01
z-cutoff = +/- 1.96 • The probability of Type II Error can be reduced by taking a larger sample
size.

You might also like