Professional Documents
Culture Documents
Module 3
The t Distribution
The Normal Distribution
• A symmetric continuous distribution, has two parameters
➢ The mean
➢ The standard deviation (std)
• A Standard Normal Distribution
➢ Mean = 0
➢ Std = 1
The Standard Normal Distribution Example: = T.DIST(1.3, 21, TRUE) = 0.89616
It is symmetric about zero with a fixed standard deviation = 1
The t Distribution
a.k.a. the Student’s t Distribution
• A symmetric distribution centered at zero
• Has a single parameter called the degrees of freedom or df
• Example:
Philippine Presidential Election, predicting the proportion of votes
for a popular candidate
Confidence interval for the ‘Population Proportion’
• Example:
Average starting salary of all business students who graduated last
=1.75305 year in TIP
Confidence interval for the ‘Population Mean’
Gosset and Guinness
The z statistic and the t statistic
Central Limit Theorem (CLT)
Willia m Sealy
Gosset, 1876 - 1937
Confidence interval
• It is an ‘interval’ with some ‘confidence’ or probability attached to
it
• An interval for some unknown characteristic of the population data
• Example.
Predicting the actual share of votes of a particular candidate in a bi-
party Philippine presidential election.
Philippine Presidential Election
➢ Only two candidates, A and B
➢ Wish to predict vote percentages for candidate A
➢ A random selection of 500 potential voters surveyed.
➢ 300 voters will vote for A and 200 for B
➢ That is, 60% of surveyed voters will vote for A • A more realistic scenario is where the population standard deviation
Will candidate A get 60% of all votes in the actual election? σ is not known.
Confidence Interval
Example.
Philippine Presidential Election
A 95% confidence interval for the vote share of candidate A,
[55.7%, 64.3%]
A 95% confidence interval for the vote share of candidate A,
There is a 0.95 probability that the actual share for A will be between
55.7% and 64.3%
MATH 006B Statistical Analysis with Software Applications
Module 3
More about the z-scores or z-statistic 𝟔𝟔. 𝟕𝟖 < 𝝁 < 𝟕𝟑. 𝟐𝟐
• A data point’s distance (how many standard deviations are there) … a 85% Confidence Interval for the population mean
from the population mean. What if we want a 95% Confidence Interval for the population mean?
• Standard Score
• In the finance world, it can tell if a company is near bankruptcy using
The Altman Z-score.
• In the quality management world, z-scores are used in Six Sigma
Performance Evaluations.
The Altman Z-Score
𝝈
𝑴𝒂𝒓𝒈𝒊𝒏 𝒐𝒇 𝑬𝒓𝒓𝒐𝒓 = |𝒛𝜶 |
√𝒏 𝟐
= NORM.INV(0.15/2, 0, 1)
= -1.4395
𝝈
𝑴𝒂𝒓𝒈𝒊𝒏 𝒐𝒇 𝑬𝒓𝒓𝒐𝒓 = |𝒛𝜶 |
𝟐 √𝒏
𝟏𝟎
𝑴𝒂𝒓𝒈𝒊𝒏 𝒐𝒇 𝑬𝒓𝒓𝒐𝒓 = |−𝟏. 𝟒𝟑𝟗𝟓|
√𝟐𝟎
MATH 006B Statistical Analysis with Software Applications
Module 4
Confidence Interval
A (1 – α) confidence interval for the population mean …
Where:
α = the probability outside the Confidence Interval
𝜇 = the unknown population mean
𝑥̅ = the sample mean
Where:
p = unknown population proportion
𝑝̂ = sample population proportion
|𝑧𝛼 |= NORM.INV(probability, mean, std)
2
Example
A person wishes to explore the size of single family houses that are
typically available for purchase in a particular neighborhood of
Marikina Heights in Marikina City. She manages to get hold of a list
containing house sizes of a sample of 100 houses that were sold in
the past two years. The data is provided in the Excel data file
Home_Sizes.xlsx, and given this data she wishes to assess the
average size of houses typically available for purchase in this
neighborhood.
➢ Population standard deviation is not known
➢ Need to use calculation for confidence interval based on t
statistic
MATH 006B Statistical Analysis with Software Applications
Module 4
Answer:
The 90% confidence interval is [0.453, 0.567] or [45.3%, 56.7%]
Confidence Interval
• For the unknown Population Mean
Recap of formulas ̅
𝒙− 𝝁
1. Calculating the t-statistic 𝒕 − 𝒔𝒕𝒂𝒕𝒊𝒔𝒕𝒊𝒄 = 𝒔
ൗ 𝒏
If the claim is correct: 𝒙
̅ ~ 𝑵𝒐𝒓𝒎𝒂𝒍 (𝟐𝟎𝟎,
𝝈
) 2. Calculating rejection regions for the t-statistic √
√𝒏
Higher probability Example for when the Null hypothesis turns out to have a strict
Distribution of around the mean inequality.
sample mean if the We wish to test a claim that the average age of Men MBA students
claim is correct Lower probability across various MBA programs in Metro Manila is greater than 28
in the tails years. For this we collect data on average ages of men MBA students
across a sample of 40 MBA programs in Metro Manila.
• Marketing Example:
Subaru of America rates the customer satisfaction of its dealers on a
weekly basis on its Purchase Experience Survey, and demands that
dealers achieve a 93% satisfaction score, or the dealers are required
H : μ ≤ 28 Do not reject to take additional training to improve their customer satisfaction
➢ Do not reject the Null hypothesis
0
H : μ > 28 Reject
scores. Suppose that you have selected a random sample of rating
➢ Reject the Alternate hypothesis A forms submitted by new car purchasers (either online or through the
mail) for the St. Louis Subaru dealer from a recent week and that you
have prepared the hypothetical table Business_Case.xlsx (Mktg
Conclusion Customer Satisfaction worksheet).
The average age of Men MBA students across various MBA Hypothesis Testing of Surveyed Items
programs in Metro Manila is less than or equal to 28 years. Step 1: Formulate the hypothesis
Null hypothesis: H0: μ = 4
Alternate hypothesis: HA: μ ≠ 4
MATH 006B Statistical Analysis with Software Applications
Module 5
Step 2: Calculate for the t-statistic Hypothesis Test for Population Proportion
̅− 𝝁
𝒙 𝟓.𝟎𝟗−𝟒
𝒕 − 𝒔𝒕𝒂𝒕𝒊𝒔𝒕𝒊𝒄 = 𝒔 = 𝟏.𝟓𝟏 = 3.39 • Example:
ൗ ൗ
√𝒏 √𝟐𝟐 TIP introduces a new lunch facility on campus on a trial basis. TIP
Step 3: Calculate the t-cutoff operates the lunch facility for a few months and then decides to survey
Two Tail Test the student body. Based on the survey, TIP would make this facility a
t-cutoff = +/-|T.INV(α/2,df)| permanent fixture or do away with it. Specifically, if more than or
t-cutoff = +/-|T.INV(0.05/2, 21)| equal to 70% of the student body approves of it then the facility would
t-cutoff = +/- 2.08 be made permanent else it would shut down.
Step 4: Check if the t-statistic falls within the rejection region. TIP conducts a survey with 750 randomly selected students on
The t-statistic falls within the rejection region campus and finds that 510 of these students (or 68% of the sampled
students) approve of the new facility and the remaining 240 students
or 32% students do not approve of it.
Based on the criteria set by TIP should the facility be made
permanent?
• Population Proportion rather than the Population Mean
• The facility would be made permanent if ≥ 70% of the entire student
body approves it.
• TIP has a sample of 750 responses
Step 1: Formulate the hypothesis
Null hypothesis H0: p ≥ 0.70
Alternate hypothesis HA: p < 0.70
Conclusion: Step 2: Calculate for the t-statistic … z-statistic
Since the t-statistic falls within the rejection region, we should reject 𝑝̅ − 𝑝
the Null hypothesis. We can say that at 95% confidence level, the new 𝑧 − 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 =
car buyers at St. Louis Subaru agree that the salesperson was √𝑝(1 − 𝑝)
knowledgeable about the Subaru model line. 𝑛
Where: 𝑝̅ is the sample proportion, 𝑝 is the population proportion, and
• Human Resources Example: 𝑛 sample size
Suppose that you work in the Human Resources department of your
company and that top management has asked your department to
conduct a Morale Survey of managers to determine their attitude
toward working in this company. To check your Excel skills, you have
drawn a random sample of the results of the survey from the managers
on one question, and the data from a certain item in the instrument
appear. Open Business_Case.xlsx (HRM Morale Survey worksheet)
Conclusion:
Since the t-statistic does not fall within the rejection region, we should
fail to reject the Null hypothesis. We can say that at 95% confidence
level, the managers rate the intellectual challenge provided by their
jobs as neither high nor low.
MATH 006B Statistical Analysis with Software Applications
Module 5
Success/Failure Condition (Approximately Normal Condition) Step 4: Check if the z-statistic falls within the rejection regions.
Sometimes with a binomial experiment, we can use a normal distribution to (the orange-colored areas are the rejection regions)
approximate a binomial; This can make finding probabilities easier for some
experiments — especially when you’re dealing with large samples. The
Success/Failure condition is used to figure out if a sample size in a binomial
experiment is large enough to use the normal approximation.
• Another Example:
In a survey of independent grocery owners, they were asked to consider what
they believed to be their biggest competitive threat. Some noted large-chain
supermarkets, wholesale clubs, and other independent grocers. The common
response, given by 78 of the 151 owners was that they viewed Wal-Mart as
their biggest competitive threat. The claim is that half of the independent Step 2: Calculate the t-statistic
grocery owners view Wal-Mart as their biggest threat. Step 3: Cutoff values for the t-statistic
Step 4: Check whether the t-statistic falls in the rejection region
Step 1: Formulate the hypothesis
Null hypothesis H0: μ = .50 Suppose that Sam’s true ability is indeed ≥ 40.
Alternate hypothesis HA: μ ≠ .50 However, the 10 days were not good for Sam.
Step 2: Calculate the z-statistic He gave a low sample average.
𝑝̅ − 𝑝 You reject the Null hypothesis.
𝑧 − 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 =
√𝑝(1 − 𝑝) Type I Error: Rejecting the Null hypothesis when it is true.
𝑛 (Would it be possible for you to tell if a Type I Error has occurred?)
78
Where: 𝑝̅ = sample proportion; = 0.5166 ‘α’, the significance level is also known as the probability of a Type I Error.
151
p = population proportion; .50
Two Types of Possible Error
n = sample size; 151
Suppose that Sam’s true ability is NOT ≥ 40.
However, the 10 days were lucky for Sam.
0.5166−0.50
𝑧 − 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = = 0.40797 He gave a high sample average.
√
0.50(1−0.50) You DID NOT reject the Null hypothesis.
151