Professional Documents
Culture Documents
BUSINESS SCHOOL
9th Edition
MANAGEMENT LAB MODULE
ADVANCED
BUSINESS STATISTICS
Head of Programme
Dr. Valentino Budhidarma, S.Kom., M.M.
Subject Coordinator
Dr. Vina Christina Nugroho, S.E., M.M.
Prepared By
Lab Assistant Coordinator
Darren Kimi
Lab Assistant Team
Christoforus Axel Billikusuma
Ferren Aurelia
Patricia Angelica
TABLE OF CONTENTS
FOREWORD 3
INTRODUCTION 4
INSTRUCTION PROGRAM OUTLINE (SAP) 7
MODULE 1 : ESTIMATION AND CONFIDENCE INTERVAL 8
REVIEW QUESTIONS 11
MODULE 2 : ESTIMATION AND CONFIDENCE INTERVAL 12
REVIEW QUESTIONS 15
MODULE 3 : ONE-SAMPLE TEST HYPOTHESIS 16
REVIEW QUESTIONS 20
MODULE 4 : HYPOTHESIS TESTING 23
REVIEW QUESTIONS 27
MODULE 5 : ANALYZING THE DIFFERENCE IN TWO POPULATION 28
REVIEW QUESTIONS 34
MODULE 6 : TWO SAMPLE TEST OF HYPOTHESIS 36
REVIEW QUESTIONS 42
MODULE 7 : ANALYSIS OF VARIANCE 43
REVIEW QUESTIONS 56
MODULE 8 : CORRELATION AND LINEAR REGRESSION 57
REVIEW QUESTIONS 61
MODULE 9 : ESTIMATING Y VALUE 63
REVIEW QUESTIONS 72
MODULE 10 : MULTIPLE REGRESSION ANALYSIS 74
REVIEW QUESTIONS 80
MODULE 11 : NONPARAMETRIC : GOODNESS OF FIT TEST 83
REVIEW QUESTIONS 93
MODULE 12 : NONPARAMETRIC : ANALYSIS OF ORDINAL DATA 96
REVIEW QUESTIONS 109
APPENDIX 110
FOREWORD
Introduction to Business Statistics is the foundation for many statistics classes. From this subject,
students will learn more about how to apply statistical equations and diagrams through business
appropriateness. The students will embark on a statistical journey of knowing why and how the
world is being done with great examples from the past and current situations.
This class will let the students find out the basics from Business Statistics, which is to find out the
elements that are needed in order for us to estimate through hypothesis. We will learn on how to
find the mean, median, mode, standard deviation, variance, and many other basics that are essential
in creating our own hypothesis.
This module has been compiled in a way that these purposes might be achieved. It contains the key
elements of each chapter, followed by specifically chosen exercises to be solved and discussed in
each meeting with the lab assistant.
The aim of this module is to help students understand more about Intermediate to Business Statistics
in order for the students to be able to move into a more advanced statistics education. We will try
our best for the students to be able to apply the knowledge from Intermediate to Business Statistics
for the cases that will be given and also for more advanced statistical subjects.
I would like to express a special thanks to Universitas Pelita Harapan and Dra. Gracia Shinta S.
Ugut, M.B.A., Ph.D., as The Head of Management Department, and Mrs. Sylvia Samuel, M.IBL as
The Lab Assistant Coordinator for giving us the opportunity to create this module for the student to
learn and understand more about this course subject.
I also want to thank Christoforus Axel Billikusuma, Ferren Aurelia, Patricia Angelica who have
made lots of contributions to make this module together. This module wouldn’t exist without them.
And for the students, I hope that we can learn well together with their own laboratory assistant and
don’t hesitate to ask and give feedback because the laboratory assistants are still university students
so they also had to learn from the experience of having to teach a class from the students. It is our
wish that you can use these statistical techniques for your tools in the future. May God bless all of
you and grant you wisdom throughout the course. On behalf of the Laboratory Assistant Team, I
wish you all great success and have a great journey. Good Luck!
A. Description
Laboratory subject is related to the main subject (Theory), which cannot be separated. The
purpose of laboratory subjects is to make the student be able to understand the concept of
the subject by exercising themselves in problems and cases. All laboratory subjects are 0
credit but the duration of the class is 120 minutes which is equivalent to 2 credits.
C. Lecture Activities
The students are directed to involve actively in the class learning process.
1. To facilitate the learning process, the students must read the chapter on the reference
book that is related to the class material. Students are also able to read the brief theory
that is provided in each module.
2. The questions that are provided in this module are only the materials that partially
have been taught in the theory subject.
3. Students must do the questions on the module individually based on the instruction of
the laboratory assistant, do quizzes that will be held, follow the laboratory mid-exam
and final exam based on the given schedule.
D. Class Rules
1. Attendance
2. Lateness
3. Permission Exception
E. The final grade is the sum of the student’s theory and lab score with a composition of
80% theory class and 20% lab course.
Quiz 1: 10%
Quiz 2: 10%
Assignment: 10%
Score Grade
90 – 100 A
85 – 89.99 A-
80 – 84.99 B+
75 – 79.99 B
70 – 74.99 B-
65 – 69.99 C+
60 – 64.99 C
55 – 59.99 C-
0 – 54.99 F
INSTRUCTION PROGRAM OUTLINE
(SAP)
WEEK MODULE CHAPTER MATERIAL
Point estimate, Confidence interval population
mean (deviation standard known/not known),
Module Population correction factors
1 Chapter 9
1,2 Confidence interval population proportion, Sample
Size, Finite correction factor
Hypothesis testing
- population mean
2 Module 3 Chapter 10, 15 - One and two ways hypothesis testing
- Type I error and Type II error;
- z and t distributions
Hypothesis testing
- population proportion
3 Module 4 Chapter 10, 15
- One and two ways hypothesis testing;
- p value
Two sample test; population mean; independent
4 Module 5 Chapter 11, 15 and dependent samples (variances known/not
known)
Two sample test; population mean; independent
5 Module 5 Chapter 11, 15 and dependent samples (variances known/not
known)
Two sample test; population proportion;
6 Module 6 Chapter 11, 15
population variances
7 MID-TERM EXAM
F Distribution; compare two population variances,
8 Module 7 Chapter 12
assumptions in ANOVA, ANOVA testing
Correlation analysis define, correlation coefficient,
9 Module 8 Chapter 13 determination coefficient, significant test,
assumptions on regression analysis
10 Module 9 Chapter 13 Estimation for single value Y
To compute a confidence interval for a population mean, we will consider two situations:
● We use sample data to estimate 𝜇 with 𝑥̅, and the population standard deviation (𝜎) is
known.
● We use sample data to estimate 𝜇 with 𝑥̅, and the population standard deviation (𝜎) is
unknown. In this case, we substitute the sample standard deviation (s) for the population
standard deviation (𝜎).
Population standard deviation (𝝈) known
A confidence interval is computed using two statistic: the sample mean(𝑥̅)and the standard
deviation(𝜎). In computing a confidence interval, the standard deviation is used to compute
the limits of the confidence interval.
A confidence interval for the population mean when the population follows the normal
distribution and the population standard deviation is known is compute by:
Example:
Del Monte foods distributes diced peaches in 4.51 ounce plastic cups. To ensure that
each cup contains at least the required amount, Del Monte sets the filling operation to
dispense 4.51 ounces of peaches and gel in each cup. From historical data, Del Monte knows
that 0.04 ounce is a standard deviation of the filling process and follows the normal
probability distribution. The quality control technician selects a sample of 64 cups at the start
of each shift, this morning the sample of 64 cups had a sample mean of 4.507 ounce. Using
95% confidence interval for the population mean.
● Step 1
𝑥̅ = 4.507 ounce
𝜎 = 0.04 ounce
n = 64 cups
confidence interval = 95%
● Step 2
Compute confidence level to get z value:
1. First, we divide the confidence level
in half, so. 95%/2 = 0.4750
2. Find the value 0.4750 in the body of
table
3. Locate the corresponding row value in the left margin, which is 1.9, and the column
value in the top margin, which is 0.06. adding the row and column values gives us a Z –
value of 1.96
● Step 3
To develop a confidence interval for the population mean using the t distribution.
Example:
A tire manufacturer wishes to investigate the tread life of its tires. A sample of 10 tires driven
50,000 miles revealed a sample mean of 0.32 inch of tread remaining with a standard
deviation of 0.09 inch. Construct a 95% confidence interval for the population mean. Would
it be reasonable for the manufacturer to conclude that after 50,000 miles the population mean
amount of tread remaining is 0.3 inch?
● Step 1
𝑥̅ = 0.32 inch
𝑠 = 0.09 inch
n = 10 tires
Confidence interval = 95%
● Step 2
Df = n – 1
= 10 – 1
Df = 9
Confidence interval = 95%
Find the value of t on table t – distribution is 2.262.
● Step 3
𝑥̅± → 0.32 − 2.262 and 0.32+ 2.262
The manufacturer can be reasonably sure (95% confident) that the mean remaining tread
depth is between 0.256 and 0.384 inch. Because the value of 0.30 is in this interval, it is
possible that the mean of population is 0.3
REVIEW QUESTIONS
Problem 1.1
Nijisanji Limited has calculated its employee activity score for the last 15 days. The
information below shows the result for the activity:
Problem 1.2
A report from a nearby neighborhood regarding tax payment was issued. A random sample of
65 of these reports showed the mean amount of tax was $45.000 with a sample standard
deviation of $8.000. What is a 98% confidence interval for the mean amount of the tax
payment?
Problem 1.3
A quarterly financial statement discusses issues of Ligma Effect with economic conditions in
nearby companies. In a sample of 30 statements, the mean was $23,456 with the standard
deviation of the sample was $3,456
a. Based on the information above, develop a 90% confidence interval for the population
mean.
Sample Proportion
p=
p±z
Example :
The owner of Shell wishes to determine the proportion of customers who use a credit card or
debit card to pay at the pump. She surveys 40 customers and finds that 20 paid at the pump.
(using 95 percent confidence interval)
a) Estimate the value of the population proportion
b) Develop a 95 percent confidence interval for the population proportion
Solution :
a) p = = = 0.5
b) p ± z =
0.5 ± 1.96 =
= 0.345 and 0.655
So, the 95% confidence interval estimates that the value of the population proportion is
between 0.345 and 0.6555
To estimate a population mean, we can express the interaction among these three factors and
the sample size in the following formula. Notice that this formula is the margin of error used to
calculate the endpoints of confidence intervals to estimate a population mean!
where:
n is the size of the sample.
z is the standard normal value corresponding to the desired level of confidence.
σ is the population standard deviation. E is the maximum allowable error.
Example :
A population is estimated to have a standard deviation of 10. We want to estimate the
population mean within 2, with a 99% level of confidence interval. How large a sample is
required?
Solution:
b.) Sample Size to Estimate a Population Proportion
To determine the sample size for a proportion, the same three variables need to be specified:
1. The margin of error.
2. The desired level of confidence.
3. The variation or dispersion of the population being studied. For the binomial
distribution, the margin error is:
where:
n is the size of the sample.
z is the standard normal value corresponding to the desired level of confidence.
E is the maximum allowable error
𝜋 is the population proportion Note :
Example :
Suppose the President of the United States wants an estimate of the proportion of the
population who support his current policy toward revision in the international market system.
The President wants to estimate 0,6 of the true proportion. Assume a 95% level of confidence.
The Prime Minister political advisor estimated the proportion supporting the current policy to
be 0,4. How large is the sample required?
Solution :
Finite-Population Correction Factor
The key to know whether we are using the finite correction or not is when the sample size is
equal or greater than 5% of the population.
If we wished to develop a confidence interval for the mean from a finite population and the
population standard deviation was unknown, we would adjust formula as follows
Example:
There are 173 families in Seoul, Korea. A random sample of 35 of these families revealed the
mean annual sanitary contribution was $399 and the standard deviation was $69. What is the
best estimate of population mean? (90% confidence interval)
Solution :
Problem 2.1
Suppose a market research company is hired to estimate the percentage of adults who live in
big cities that have PCs. 400 adult residents randomly selected in this city were surveyed to
determine if they had a PC. Of the 400 people surveyed, 265 said yes, they have a PC. Using
a 95% confidence level, calculate the estimated confidence interval for the actual proportion
of adult residents in this city who have PCs.
Problem 2.2
An analyst in a flight company wants to determine the means of pilots in small cities earn per
month as a pilot. The error in estimating the mean is to be less than $155,000 with a 90
percent level of confidence. The analyst found a report that estimated the standard deviation
to be $190,000. What is the required sample size?
Problem 2.3
A retailer would like to estimate the proportion of their customers who bought an item after
viewing their online website. The retailer wants the margin of error to be within 0.65 of the
population proportion, the desired level of confidence is 95%, and no estimate is available of
the population proportion. What is the required sample size ?
Problem 2.4
Thirty people from a population of 300 were asked how much they had in savings. The
sample mean was $150,000 with a sample standard deviation of $90,000. Construct a 95%
confidence interval estimate for the population mean.
MODULE 3
ONE-SAMPLE TESTS OF HYPOTHESIS (1)
By Darren Kimi
What is a Hypothesis?
A hypothesis is a statement about a population parameter subject to verification.
Business researchers often develop hypotheses that can be studied and explored to find
answers. Hypotheses are tentative explanations of a principle operating in nature.
The Formula
Example :
A Survey had been conducted across Asia, that the average net income for the electronic
industry is $85.621. The survey takes a random sample of 132. Assume the population
standard deviation of net income is $15.250. α = 5%. A sample mean of $95.874 is
known.
If the 𝑍𝑣𝑎𝑙𝑢𝑒is not between -1.96 and +1.96, reject the null hypothesis (H0). If 𝑍𝑣𝑎𝑙𝑢𝑒 between
-1.96 and +1.96, do not reject the null hypothesis (H0).
Step 5: Make a decision
x̅ = $95,874 , n = 132
σ = $ 15,250 , µ = $ 85,621
→ → 7.7245
We get the Z value of 7.7245. Since the value of 7.7245 does not fall between the values of -
1.96 and +1.96, then we can conclude that the null hypothesis is false (reject the null
hypothesis).
Step 6: Interpret the result
H0 is rejected, so the average net income for the electronic industry is not equal to $ 85.621. It
can be higher or lower.
Example:
Using the previous study case sample, but in addition, there are 500 electronic industries in
Asia, but the sample taken is only for 132 people only.
N = 500
The Formula
df = n – 1
df = degree of freedom
̅
𝑋 is the sample mean.
Example :
Known:
α = 5%, n = 20,
df = n-1 = 19
𝐻0 : µ = 25
𝐻1 : µ ≠ 25
Two tailed; Alpha must be split which yields α/2 = 0.025
T0.025, 19 = ±2.093 → t table
Tvalue = = 0.53
(By comparing ttable and tvalue) the observed t is 0.53, the observed value is between -2.093
and +2.093, so the null hypothesis (𝐻0) is not rejected.
It means the population average weight of the Air Conditioner is equal to 25.
One tailed Test
In the first case, we wanted to know whether there was a difference in the mean number
assembled, but now we want to know whether there has been an increase or decrease. Because
we are investigating different questions, we will set our hypothesis differently. The biggest
difference occurs in the alternate hypothesis. Before, we stated the alternate hypothesis as
“different from”; now we want to state it as “greater than” or “lower than” In symbols:
The critical values for a one-tailed test are different from a two-tailed test at the same
significance level. The formula to test the hypothesis is the same with the formula we
previously discussed. The difference is to calculate the α.
In the previous example, we split the significance level (α) in half and put half in the lower
tail and half in the upper tail. In a one-tailed test, we put all the rejection region in one tail.
Example :
A sample of 32 observations is selected from a normal population. The sample mean is 14,
and the population standard deviation is 4. Conduct the following test of hypothesis using the
.05 significance level.
𝐻0: mean is less than or equal to 10
𝐻1: mean is greater than 10
Solution :
𝐻0: µ ≤ 10
𝐻1: µ > 10 α = 0.05
Calculate 0.5 - 0.05 = 0.45. By using this 0.45 area and the z table, the critical value can be
obtained.
𝑍α = ±1.96 → 𝒛 𝒕𝒂𝒃𝒍𝒆
x̅ = 14 , n = 32
σ = 4 , µ = 10
𝑍𝑣𝑎𝑙𝑢𝑒 = 5.66
Because the 𝑍𝑣𝑎𝑙𝑢𝑒 is higher than +1.96, reject the null hypothesis (H0).
It means the population mean is greater than 10.
REVIEW QUESTIONS
Problem 3.1
The mean work weeks for an accountant in LPH is believed to be about 70 hours. A newly
hired accountant hopes that the length is shorter. He asks 10 of her accountant friends in other
firms for the lengths of their mean work weeks. Below is the data (lengths of mean work
week).
55 60 55 60 65 66 66 70 50 45
Based on the data above, should she count on the mean work week to be shorter than 70
hours? Use the .01 significance level.
Problem 3.2
Better Supply Co. manufactures and assembles wooden tables in several plants in Jakarta.
Plant Z produced 150 wooden tables every week. Plant Z follows a normal probability
distribution with a mean of 300 and a standard deviation of 80. Recently, because of market
expansion, new production methods have been introduced and new employees were hired.
The head manager of the manufacturing department would like to investigate whether there
has been a change in the weekly production of the wooden tables. Is the mean number of
tables produced by Plant Z different from 250 at the .05 significance level?
Problem 3.3
Given the following hypothesis:
H0 : µ ≤ 15
H1 : µ ˃ 15
A random sample of 30 observations is selected from a normal population. The sample
mean was 17 and the sample standard deviation 8. Using the .05 significance level:
a. State the decision rule.
b. Compute the value of the test statistic.
c. What is your decision regarding the null hypothesi
MODULE 4 HYPOTHESIS TESTING
by Patricia Angelica
To test a hypothesis about a population proportion, a random sample is chosen from the
population Some assumptions must be made and conditions met before testing a population
proportion.:
1. The sample data collected are the result of counts
2. the outcome of an experiment is classified into one of two mutually exclusive
categories—a “success” or a “failure”
3. The probability of a success is the same for each trial
4. The trials are independent, meaning the outcome of one trial does not affect the
outcome of any other trial.
The test we will conduct shortly is appropriate when both n𝜋and n(1-𝜋 ) are at least 5. nis the
sample size, and 𝜋 is the population proportion. It takes advantage of the fact that a binomial
distribution can be approximated by the normal distribution.
We can determine the formula to calculate proportion hypothesis testing as follows:
Where:
: 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛
p: sample proportion
n: sample size
One-tail proportion hypothesis testing
Suppose prior elections in a certain state indicated it is necessary for a candidate for governor
to receive at least 80 percent of the vote in the northern section of the state to be elected. The
incumbent governor is interested in assessing his chances of returning to office and plans to
conduct a survey of 2,000 registered voters in the northern section of the state and it is
revealed that 1,550 voters planned to vote for incumbent governor.Using the hypothesis-
testing procedure, assess the governor’s chances of reelection.
H0: 𝜋 ≥ .80
H1: 𝜋 <.80
Step 2: Select level of significance
One- tail hypothesis testing with level of significance or 𝛼 of 0.5.
Step 3: Select the test statistics
z is the appropriate statistic,
Step 4: Formulate decision rule
One tail and the alternate hypothesis state the direction to the left, so only left side of curve is
used. Significance level was 0.05, so the area between zero and critical value is 0.45 (0.5-
0.05). so, we can find out that the critical value of z is -1.65. The decision is therefore to
reject the null hypothesis if the computed value falls to the left of -1.65.
Step 5: Make decision and interpret result
The computed value of z (-2.08) is in the rejection region, so the null hypothesis is being
rejected at the 0.05 level. It indicates that the evidence at this point does not support the claim
that the incumbent governor will return to the governor’s mansion for another four years.
H0: 𝜋 = .40
H1: 𝜋 ≠.40
A sample of 120 observations revealed that p =.30. At the .05 significance level, can the null
hypothesis be rejected?
H0: 𝜋 = .40
H1: 𝜋 ≠.40
Step 2: select level of significance
Two tail hypothesis testing with confidence interval of 95%.
Step 3: select the test statistics
z is the appropriate statistic,
Step 4: formulate decision rule
This alternate hypothesis does not state direction, so this is a two tailed test. Both sides of the
curve are used. Significance level was 0.025 (0.05/2) ,the area between zero and critical value
is 0.475 (0.5-0.025). so, we can find out that the critical value of z is ±1.96. The decision is
therefore to reject the null hypothesis if the computed value is on the rejection region
Step 5: make decision and interpret result
The computed value of z (-2.24) is in the rejection region, so the null hypothesis is being
rejected at the 0.05 level. It indicates that population proportion is not equal to 0.4
smaller than the significance level, H0 is rejected. If it is larger than the significance level, H0
is not rejected.
Determining the p-value not only results in a decision regarding H0, but it gives us additional
insight into the strength of the decision. A very small p-value, such as .0001, indicates that
there is little likelihood the H0 is true. On the other hand, a p-value of .2033 means that H0 is
inside table). So, we know that p-value= 0.025 (0.5-0.475). if alpha = 0.05, H0 is rejected
H
because p- value< alpha. But, if alpha = 0.01, 0 is accepted because p-value > alpha
REVIEW QUESTIONS
PROBLEM 4.1
Research done at Clarabell Company showed that 35 percent of its workers had worked from
home this year. Clarabell Company had employed 100 workers; only 20 did work from home
last year.. Use the five-step hypothesis-testing procedure at the 0.1 significance level to test
whether this data is opposite / contradicts the research report. and what is the p-value and
what does that imply?
PROBLEM 4.2
A polling done at Sera Institution indicates that 20 percent of workers had their turnover on
the first year of their job. A random sample of 225 workers revealed that 70 had their
turnover after the first year of the program. Has there been a significant decrease in the
proportion of students who change their major after the first year in this program? Use the .02
level of significance.
PROBLEM 4.3
The following hypothesis are given
H0: 𝜋 ≥ .45
H1: 𝜋 <.45
A sample of 280 observations revealed that p =.80 . At the .05 significance level, can the null
hypothesis be rejected?
a. State the decision rule.
b. Compute the value of the test statistic.
c. What is your decision regarding the null hypothesis?
MODULE 5
ANALYZING THE DIFFERENCES IN TWO POPULATION
by Patricia Angelica
DEMONSTRATION PROBLEM I
As part of a study of corporate employees, the director of human resources for PNC Inc.
wants to compare the distance traveled to work by employees at its office in downtown
Cincinnati with the distance for those in downtown Pittsburgh. A sample of 35 Cincinnati
employees showed they travel a mean of 370 miles per month. A sample of 40 Pittsburgh
employees showed they travel a mean of 380 miles per month. The population standard
deviation for the Cincinnati and Pittsburgh employees are 30 and 26 miles, respectively. At
the 0.05 significance level, is there a difference in the mean number of miles traveled per
month between Cincinnati and Pittsburgh employees?
SOLUTION :
Step 1 : State the null hypothesis and the alternate hypothesis
H0: 𝜇1 = 𝜇2
H1: 𝜇1 ≠ 𝜇2
The computed value of 1.532 is smaller than the critical value 1.96. Our decision is to accept
the null hypothesis (Ho)
SOLUTION
Step 1: State H0 and H1
We know that the hypothesis is one-tailed ones, because we are trying to prove whether
consumers’ proportion is significantly higher than the CEOs regarding of their beliefs. Thus
we have :
H0 : π1 < π2
H1 : π1 > π2
Where :
π1 = the proportion of consumers
π2 = the proportion of CEOs
In a one-tailed test (population means difference will be higher or lower than something), the
rejection region is α in the respective tail (left or right). In a two-tailed test (population means
difference is equal with something), the rejection regions are α/2 in both left or right.
(α=100% - confidence level). After that, we must find the t value of it to be able to know
where the calculated t is located (in the rejection area or not).
SOLUTIONS
a. We first compute the mean and the standard deviation of the sample differences.
b. The value of the test statistic is computed from the following formula :
𝑑𝑓 = 𝑛 − 1
Where :
̅
𝑑 = mean sample difference
Sd = standard deviation of sample difference
n = number of pairs
DEMONSTRATION PROBLEM IV
The management of Discount Furniture, a chain of discount furniture stores in the North-east,
designed an incentive plan for salespeople. To evaluate this innovative plan, 12 salespeople
were selected at random, and their weekly incomes before and after the plan were recorded.
Was there a significant increase in the typical salesperson’s weekly income due to the
innovative incentive plan? Use the .05 significance level. Interpret your answer SOLUTION
Step 1 :State HO and H1
H0 : μd ≥ 0
H1 : μd < 0
Step 2 : Calculate d and sd
Step 5: Make a decision and interpret your answerThe observed value is greater than the t
table value, so the decision is to reject the null hypothesis. The incentive plan resulted in a
decrease in daily income.
REVIEW
QUESTIONS
PROBLEM 5.1
There are 270 men who have tested the new launch salted egg rice box, 100 of them like the
taste and the packaging. Meanwhile, from a group of 350 men, 150 of them like the taste and
the packaging. At the 0.10 significance level, can we conclude that there is a significant
different of proportion between women and men who like the taste amd the packaging of the
new launch salted egg rice box Determine the p value!
PROBLEM 5.2
Lisa observes the difference on sales between group Papoy and group Pipoy. The 70 days
sample show that group Papoy sold 1700 smartphones in average per day. Meanwhile, the 80
days sample show group Pipoy sold 1800 smartphones in average per day. The population
standard deviation for group Papy is $270 and $320 for group Pipoy. At the 0.05 significance
level, can Jerry conclude that the average sales of group Pipoy is greater than group Papoy’s?
Determine the p value!
PROBLEM 5.3
A vegan pizza advertisement claims that it can help weight loss. A random sample of 8
influencers show their before and after consumption weight in a table.
Name Alpha Bane Claude Diggie Estes Fanny Gord Harley
Before 65 80 64 76 85 54 83 86
After 62 72 74 68 67 42 77 75
At the 0.01 significance level, can we concluded that the vegan pizza can effectively help
weight loss?
MODULE 6
To conduct the test, we assume each sample is large enough that the normal
distribution will serve as a good approximation of the binomial distribution. The test statistic
follows the standard normal distribution. We compute the value of z from the following
formula:
pc is the pooled proportion possessing the trait in the combined samples. It is called the
pooled estimate of the population proportion and is computed from the following formula.
Example:
The null and alternate hypotheses are:
A sample of 100 observations from the first population revealedX1 is 70. A sample of 150
observations from the second population revealed X2is 90. Use the 0.05 significance level to
test the hypothesis.
Solution:
Step 2:Find the pooled estimate of the population proportion for combined samples
Pc = (70+90)/(100+150) = 0.64
Not reject H0
EQUAL POPULATION STANDARD DEVIATION
The following formula is used to pool the sample standard deviations. Notice that two
factors are involved: the number of observations in each sample and the sample standard
deviations themselves.
Example:
Solution:
𝑆2𝑃 = (10−1)(4)210+8−2 2
+(8−1)(5) = 19.9375
Ignore the positive sign of the t-value, so it will be 1.416. At 0.10 significance level with
df=16, the t-value = 1.746since it is two-tailed test. At 0.20 significance level with df=16, the
t-value= 1.337. 1.416 is between 0.10 (10%) and 0.20 (20%) significance level. Hence, p-
value is greater than 0.10 and less than 0.20.
UNEQUAL POPULATION STANDARD DEVIATION
In the previous section, it was necessary to assume that the populations had equal standard
deviations. If the standard deviations are equal, then we use a statistic very much like the
previous section. The sample standard deviations, s1 and s2, are used in place of the respective
population standard deviations. In addition, the degrees of freedom are adjusted downward by
a rather complex approximation formula. The effect is to reduce the number of degrees of
freedom in the test, which will require a larger value of the test statistic to reject the null
hypothesis.
Example:
A random sample of 15 items from the first population showed a mean of 50 and a standard
deviation of 5. A sample of 12 items for the second population showed a mean of46 and a
standard deviation of 15.Assume the sample populations do not have equal standard deviation
sand use the 0.05 significance level: (step 1) determine the number of degrees of
freedom,(step 2) state the decision rule, (step 3) compute the value of the test statistic, and
(step 4) state your decisionabout the null hypothesis.
Solution:
25 225 2
( 2+ ) 2 0.1984+31.9602
25 225
(15)
+ (12−1
12 )
Failed to reject H0
REVIEW QUESTIONS
PROBLEM 6.1
A study was conducted to investigate the effectiveness of hypnotism in reducing pain. Results
for randomly selected subjects are shown in the table below. The “before” value is matched
to an “after” value, and the differences are calculated. The differences have a normal
distribution
This table shows the before and after values of the data in our sample
PROBLEM 6.2
Two college instructors are interested in whether or not there is any variation in the way they
grade math exams. They each grade the same set of 30 exams. The first instructor’s grades
have a variance of 52.3. The second instructor’s grades have a variance of 89.9. Test the
claim that the first instructor’s variance is smaller. (In most colleges, it is desirable for the
variances of exam grades to be nearly the same among instructors.) The level of significance
is 10%.
PROBLEM 6.3
A random sample of 10 hot drinks from Dispenser A had a mean volume of 203 ml and a
standard deviation (divisor (n −1)) of 3 ml. A random sample of 15 hot drinks
from Dispenser B gave corresponding values of 206 ml and 5 ml. The
amount dispensed by each machine may be assumed to be normally
distributed. Test, at the 5% significance level, the hypothesis that
there is no difference in the variability of the volume dispensed
by the two machines
MODULE 7
ANALYSIS OF VARIANCE
By Ferren Aurelia
ANOVA
Analysis of Variance (ANOVA) is used to test whether two samples are from populations
having equal variances, and it is also applied when we want to compare several population
means simultaneously.
ANOVA Assumptions
Another use of the F distribution is the analysis of variance (ANOVA) technique in which we
compare three or more population means to determine whether they could be equal. To use
ANOVA, we assume the following:
1. The populations follow the normal distribution.
2. The populations have equal standard deviations
3. The populations are independent.
When these conditions are met, F is used as the distribution of the test statistic.
F Distribution
F Distribution is use to test the hypothesis that the variance of one normal population equals
the variance of another normal population. The one application of F distribution is to
compare the two population variances.
Example :
Lammers Limos offers limousine service from Government Center in downtown Toledo,
Ohio, to Metro Airport in Detroit. Sean Lammers, president of the company, is considering
two routes. One is via U.S. 25 and the other via I-75. He wants to study the time it takes to
drive to the airport using each route and then compare the results. He collected the following
sample data, which is reported in minutes. Using the .10 significance level, is there a
difference in the variation in the driving times for the two routes?
U.S. Route 25 Interstate 75
52 59
67 60
56 61
45 51
70 56
54 63
64 57
65
Step 1: State the null hypothesis and the alternate hypothesis. The test is two-tailed because
we are looking for a difference in the variation of the two routes. We are not trying to show
that one route has more variation than the other.
𝐻0: 𝜎2 = 𝜎2
𝐻1: 𝜎2 ≠ 𝜎2
Step 5: Find the variance ratio/F value for the two samples. We have to know the standard
deviation then squaring it into variance.
*The biggest 𝜎 become the numerator, and the smaller become denominator. Thus the
variance ratio/F calculated is always bigger than 1.00. This also apply in determining the
degree of freedom numerator and denominator*
Variance ratio or F calculated
F calculated > F table = reject null hypothesis
The F calculated value is 4.23 which is bigger than F critical value/table 3.87. So, we reject
null hypothesis.
ANOVA Testing
One-Way ANOVA
We used one-way ANOVA to compare means of three or more samples if the data only has 1
independent variable/category (example: comparison by gender, race, color, age, types, etc).
𝐻0: 𝜇1 = 𝜇2 = 𝜇3 = ⋯ = 𝜇𝑘
ANOVA Table
Where :
X is each sample observation
Recently airlines cut services, such as meals and snacks during flights, and started
charging for checked luggage. A group of four carriers hired Brunner Marketing Research
Inc. to survey passengers regarding their level of satisfaction with a recent flight. The survey
included questions on ticketing, boarding, in-flight service, baggage handling, pilot
communication, and so forth. Twenty-five questions offered a range of possible answers :
excellent, good, fair, or poor. A response of excellent was given a score of 4, good a 3, fair a
2, and poor a 1. These responses were then totalled, so the total score was an indication of the
satisfaction with the flight. The greater the score, the higher level of satisfaction with the
service. The higher possible score was 100.
Brunner randomly selected and surveyed passengers from the four airlines. Below is
the sample information. Is there a difference in the mean satisfaction level among the four
airlines? Use the .01 significance level.
94 75 70 68
90 68 73 70
85 77 76 72
80 83 78 65
88 80 74
68 65
65
𝐻0: 𝜇𝑁 = 𝜇𝑊 = 𝜇𝑃 = 𝜇𝐵
The mean scores are the same for the four airlines.
The first passenger rated Northern a 94, so (X-X̅N)2 = (94-87.25)2 = 45.5625. The first
passenger in the WTA group responded with a total score of 75, so (X-X̅W) = (75-78.20)2 =
10.24 the detail for all the passengers follows.
96.04 50.98 25
23.62 16
61.78
Insert the particular values of F into an ANOVA table and compute the value of F as follows.
The computed value of F is 8.99, which is greater than the critical value / F table of 5.09, so
the null hypothesis is rejected.
Step 5 : Interpret the result. Conclusion is not all populations means are equal.
Example :
Let compute the confidence interval for difference between the mean score of passengers on
Northern and Branson (the SSE is 594.4). With 95% confidence interval, and the mean of
Northern is 87.25, mean of Branson is 69.00. With number of sample of Northern and
Branson are 4 and 6.
WARTA, the Warren Area Regional Transit Authority, is expanding bus service from the suburb
of Starbrick into the central business district of Warren. There are four routes being considered
from Starbrick to downtown Warren : (1) via U.S. 6, (2) via the West End, (3) via the Hickory
Street Bridge, and (4) via Route 59. WARTA conducted several tests to determine whether there
was a difference in the mean travel times along the four routes. Because there will be many
different drivers, the test was set up so each driver drove along each of the four routes. Below is
the travel time, in minutes, for each driver-route combination.
Travel Time
from Starbrick
to Warren
(minutes)
Deans 18 17 21 22
Snaverly 16 23 23 22
Ormson 21 21 26 22
Zollaco 23 22 29 25
Fillbeck 25 24 28 28
At the .05 significance level, is there a difference in the mean travel time along the four
routes? If we remove the effect of the drivers, is there a difference in the mean travel time?
1. Ho : Travel time between the routes from starbrick to warren is the same (µ1 = µ2 = µ3
= µ4).
H1 : Travel time between the routes from starbrick to warren is not the same
2. Ho : The driver mean travel time is the same ( µD = µS =µo = µz = µF)
H1 : The driver mean travel time is not the same
Step 2 : Select the level of significance. We selected the .05 significance level.
Step 3 : Formulate the decision rule.
1. Test the hypothesis concerning the treatment means. There are (k-1) = (4-1) = 3
degrees of freedom in the numerator and (b-1) (k-1) = (5-1) (4-1) = 12 degrees of
freedom in the denominator. Using the .05 significance level, the critical value of F is
3.49. The null hypothesis that the mean times for the four routes are the same is
rejected if the F ratio exceeds 3.49.
2. Test the hypothesis concerning the blocks means. The degrees of freedom in the
numerator for blocks are (b-1) = (5-1) = 4. The degrees of freedom for the
denominator are the same as before : (b-1) (k-1) = (5-1) (4-1) = 12. The null
hypothesis that the driver mean travel time are the same is rejected if the F ratio
exceeds 3.26.
Step 4 : Select the sample, perform the calculations, and make a decision.
= 229.2
= 119.7
Travel Time
from
Starbrick to
Warren
(minutes)
Driver U.S. 6 West End Hickory St. Rte. 59 Driver Driver
Sums Means
Deans 18 17 21 22 78 19.50
Snaverly 16 23 23 22 84 21.00
Ormson 21 21 26 22 90 22.50
Zollaco 23 22 29 25 99 24.75
Fillbeck 25 24 28 28 105 26.25
SST = b∑ (X̅t-X̅G)2
= 72.8
Travel Time from
Starbrick to
Warren (minutes)
= 36.7
Treatment : The null hypothesis is rejected. The travel time between the routes from starbrick
to warren is not the same. F calculated > F table (7.93 > 3.49)
Block : The null hypothesis is rejected. The driver mean travel time is not the same. F
calculated > F table (9.78 > 3.26)
REVIEW QUESTIONS PROBLEM 7.1
A study was conducted to determine whether there are differences in the amount of instant
coffee consumed. The data below is the amount of instant coffee that household drink during
month. These four types of instant coffee are compared by:
13 15 16 8
12 17 5 17
10 7 13 10
At the 0.05 significance level, is there a difference in the amount of noodle consumed by their
brand?
PROBLEM 7.2
Sainz company is a T-Shirt manufacturer that sells three types of clothing size, small,
medium and large. Sales, in millions of dollars, for the past 5 months are given in the
following table. Using the .01 significance level, test whether the mean sales difference for
the three types of clothing sizes and by month.
Sales ( $ million)
January 12 8 9
February 6 12 4
March 10 14 11
April 8 5 13
May 7 10 6
MODULE 8
CORRELATION AND LINEAR REGRESSION
By Ferren Aurelia
Correlation Analysis
A group of techniques to measure the relationship between two variables. It provides a
quantative measure of the strength of the relationship between two variables. For example,
wheter the stocks of two airlines rise and fall in any related number.
Correlation Coefficient
Describe the strength of the relationship between two sets of interval-scaled or ratio-scaled
variables. It ranges from -1 up to and including +1.
where :
𝑠𝑦 = standard deviation y
𝑠𝑥 = standard deviation x
𝑛 = number of data
Example :
Suppose in the Heart Hospital there are 7 doctors for a month and they have examined the
patient. The doctor have made a prescription which the patient received medicine. We
obtained the following results and want to know if there is any relationship between the
measured variables
REVIEW QUESTIONS
PROBLEM 8.1
The manufacturer of Car Tire wants to study the relationship between the numbers of months
since the tire was purchased and the length of time the car tire was used last week. Determine
the coefficient correlation
PROBLEM 8.2
The city council of Pixie Hollow is considering increasing the number of police in an effort to
reduce crime. Before making a final decision, the council asked the chief of police to survey
other cities of similar size to determine the relationship between the number of police and the
number of crimes reported. The chief gathered the following sample information.
City
Police Number of Crimes
Thneedville 16 6
Axiom 7 11
Grytt 11 8
Auburn 19 10
Blyworth 23 7
Bartons 8 5
Use the data above to compute a correlation coefficient (r) to determine the correlation between
police and number of crimes, and conduct a test of hypothesis to determine if it is reasonable to
conclude that the population correlation is greater than zero. Use the 0.05 significance level.
PROBLEM 8.3
The production department of Astro International wants to explore the relationship between
the number of employees who assemble a subassembly and the number produced. As an
experiment, three employees were assigned to assemble the subassemblies. They produced 20
during a one-hour period. Then five employees assembled them. They produced 30 during a
one-hour period. The complete set of paired observations follows.
Number of One-Hour
Assemblers Productiion
(Units)
3 20
5 30
1 5
7 45
2 15
6 35
a. Compute the correlation coefficient between the two variables. At the 0.05 significance level,
conduct a test of hypothesis to determine if the population correlation is greater than zero.
b. Determine the regression equation.
MODULE 9
ESTIMATING Y VALUE
By Ferren Aurelia
STEP 1 : HYPOTESIS
Two tailed test : One tailed test : One tailed test :
𝐻0 : 𝛽 = 0 𝐻𝑜 ∶ 𝛽 ≥ 0 𝐻0 ∶ 𝛽 ≤ 0
𝐻1 ∶ 𝛽 ≠ 0 𝐻1 ∶ 𝛽 < 0 𝐻1 ∶ 𝛽 > 0
STEP 5 : CONCLUSION
Using t-test, find out if the more sales calls will result in the more sale of more copiers! (use a
5% significance level)
STEP 1 : HYPOTESIS
𝐻𝑜 ∶ 𝛽 ≤ 0
𝐻1 ∶ 𝛽 > 0
b = 1.18421
Sb = 0.35914
STEP 5 : CONCLUSIONS
So, the more calls make by Sales Representative will make more sales of copiers with
95% level of confidence. There is a positive relationship between calls and sale of more
copiers.
*Where :
Σ( − )2= SSE
n = number of observation
Example :
STEP 1 : FIND THE REGRESSION LINE
From the data we know that :
a = 18.9474
b = 1.18421
So, 𝑌̂ = 18.9474 + 1.18421 𝑋
STEP 4 : CONCLUSION
If the standard error of estimate is small, then it can be used to predict Y with a little error.
If the standard error of estimate is large, then it can’t used to predict Y.
B. COEFFICIENT OF DETERMINATION
The proportion of the total variation in the dependent variable Y that is explained, or
accounted for, by the variation in the independent variable X.
Coefficient of Determination is the Coefficient Correlation squared (r2)
In Picture 9.1, the Coefficient Correlation is showed by Multiple R= 0.759. If we square the r
then we get r2 = 0.7592 = 0.576. To interpret the Coefficient of Determination, we should
convert to percent so 0.576 x 100% = 57.6%
If the Coefficient of Determination close to 100%, then it can interpret as the more possible
to make perfect predictions. Then the conclusions is only 57.6% of the variation in the
number of copiers sold is explained, or accounted for, by the variation in the number of sales
calls.
Soni 30 70 8 64
Total 0 760
STEP 5 : CONCLUSION
If a sales representatives make 25
calls and expect to sales 48.5526
copies, then the sales will range
from 40.9170 to 56.1882 copiers.
REVIEW QUESTIONS
PROBLEM 9.1
A recent article in Economic Times Magazine listed the “Best Start-up Company.” We are
interested in the current results of the companies’ sales and earnings. A random sample of 10
companies was selected and the sales and earnings, in millions of dollars, are reported below.
a. Conduct a test of hypothesis to show whether there is a relationship between sales and
earnings. Show that the slope of the regression is different from zero.
b. Determine the coefficient of determination. Interpret this value.
c. Determine the standard error of estimate. About 95 percent of the residuals will be
between what two values?
PROBLEM 9.2
A nutritionist performed a regression analysis of the relationship between people’s lifespan and
their lifestyle. The regression analysis is lifespan = 11.04 + 0.9372385 (lifestyle). Some
additional output is:
Analysis of Variance
Source DF SS MS F P
Regression 1 1539 1539 52.52 2.766
Residual Error 10 293 29
Total 11 1832
PROBLEM 9.3
Norris Estate, a Real Estate Company is planning to sell 8 houses. Data of the prices and sizes of
the houses are listed below:
Prices ($ million) Sizes
85 110
55 85
65 100
50 85
125 120
100 90
80 95
40 65
Multiple regression analysis is a statical tool which a mathematical model, which is used to
predict a dependent variable by two or more independent variables (in which at least one
predictor is nonlinear)
y = value
of
dependent variable (response variable)
b0 = regression constant
Regression constant (b0) and partial regression coefficient (b1,2,k) are population values that are
unknown. These values can be estimated by using sample information. Estimating y with sample
information can be seen below using Model Fit.
y = b0 +
THE MODEL FIT
The procedure for determining formulas to solve for multiple regression coefficient by
using methods of calculus equations, resulting in k + 1 equations with k + 1 unknowns ( b0 and k
values of b1) to minimize the sum of squares of error for regression model. BUT solving the
equation by hand is time-consuming, so in reality, researchers use computer statistical software
package.
Example:
Shown below the data of Electability For Next President of 10 names. Determine the multiple
regression equation. What is the estimated winning candidate for next election, if base on their
track record 15, there are 5 Capability index , and 10 for their leadership ?
Track
Capability Leadership Winning
Name Recor
Index Index Index (%)
d
Index
Jokowi 89 80 75 50
Megawati 50 60 19 13
Ridwan 33 20 26 10.5
Susi 32 25 83 34
Prabowo 62 23 47 33
Anies 37 50 65 45
Fahri 21 43 21 20
Gatot 29 64 76 43
Tito 50 74 88 22
Sri
77 83 65 39
Mulyani
Ŷ = 4.908+0.158X1+0.006X2 + 0.321X3
We can now estimate or predict the next president as we know the their track record 15, there are
5 Capability index , and 10 for their leadership ?
Ŷ = 4.908+0.158(15)+0.006(5) + 0.321(10)
Ŷ = 10.581
H0 : β1 = β2 = ... = βk = 0
dferr = N-k-1
dferr = N – k – 1 = 10 – 3 – 1 = 6
F value : 4.76
F observed :
F-statistic should later be compared with F-observed, in which the difference between the
two, results in P-value. P-values evaluate how well the sample data support the argument
that the null hypothesis is true. In this case, F-statistic is 2.23 and the F-observed is also
2.23 which means they’re equal
H0 : β1 = 0 H0 : β2 = 0 H0 : β3 = 0
H1 : β1 ≠ 0 H1 : β2 ≠ 0 H1 : β3 ≠ 0
df = N – k – 1 = 6
Refers to the previous example
and use the t distribution table to determine a critical value (tα/2;N-k-1) :
𝛼 0.05
⁄2 = ⁄2 = 0.025
β1 = 0,7476
β2 = 0,0275
β3 = 1,9978
If the value of t-statistic doesn’t provided on the table, we can calculate it by formula below.
t-statistic should be compared with t-value to know the p-value. It test the independent
variables individually to determine whether the net regression coefficients differ from zero.
If |tobserved| ≤ |tα|,
acceptH0
The observed value for furnace age (β3) is smaller than its critical value so the null
hypothesis is not rejected. While null hypothesis is rejected for temperature (β1)’s and
insulation (β2)’s tobserved. In another word, both’s variables are significant predictors in
estimating heating cost for a home and researchers should drop furnace age (β3) from
analysis.
REVIEW QUESTIONS
PROBLEM 10.1
Source Df SS MS F
PROBLEM 10.2
The following regression output was obtained from a study of botanical garden firms. The
dependent variable is the total amount of the fees in millions of dollars.
SE
Predictor Coefficient Coefficient t P-value
Source DF SS MS F F
Total 51 6357.38
a. How large is the sample? How many independent variables are there ? How many
dependent variables are there?
b. Conduct a global test of hypothesis to see if any of the set regression coefficients could be
different from 0. Use the 0,05 significance level. What is your conclusion?
c. Conduct a test of hypothesis for each independent variable. Use the 0,05 significance
level which variable would you consider eliminating first ?
d. Outime a strategy for deleting independent variable in this case.
PROBLEM 10.3
Performance on the new menu is designated Y.
By Tiffany
A six-sided die is rolled 30 times and the number 1 through 6 appears as shown in the
following frequency distribution. Can we conclude that the die is fair?
Outcome Frequency
1 3
2 6
3 2
4 3
5 9
6 7
K is number of categories
(Note : for Equal Expected Frequencies, Expected Frequencies are the same for
each cell)
∑ 𝑓0 3+6+⋯+7 30
𝑓𝑒 = = = =5
𝑛 6 6
1 3 5 4 0.8
2 6 5 1 0.2
3 2 5 9 1.8
4 3 5 4 0.8
5 9 5 16 3.2
6 7 5 4 0.8
Do not reject 𝐻0, cannot reject 𝐻0 that outcomes are the same.
Note :
From Experience, the bank credit department of Carolina Bank knows that 5% of its card
holders have had some high school, 15% have completed high school, 25% have had some
college, and 55% have completed college. Of the 500 card holders whose cards have been
called in for failure to pay their charges this month, 50 had some high school, 100 had
completed high school, 190 had some college, and 160 had completed college. Can we
conclude that the distribution of card holders who do not pay their charges is different from
all others?
Step 1 : State the null hypothesis and the alternative hypothesis
3. Limitations of Chi-Square
If there is an unusually small expected frequency in a cell, chi-square (if applied) might result
in an erroneous conclusion. This can happen because appears in the denominator, and
dividing by a very small number makes the quotient quite large! Two generally accepted
policies regarding small cell frequencies are:
● If there are only two cells, the expected frequency in each cell should be at least 5.
● For more than two cells, chi-square should not be used if more than 20 percent of the cells
have expected frequencies less than 5.
A goodness-of-fit test can also be used to determine whether a sample of observations is from
a normal population.
First, calculate the mean and standard deviation of the sample data, Group the data into a
frequency distribution. Convert the class limits to z values and find the standard normal
probability distribution for each class. For each class, find the expected normally distributed
frequency by multiplying the standard normal probability distribution by the class frequency.
Calculate the Chi-Square goodness-of-fit statistic based on the observed and expected class
frequencies. Find the expected frequency in each cell by determining the product of the
probability of finding a value in each cell by the total number of observations. If we use the
information on the sample mean and the sample standard deviation from the sample data, the
degrees of freedom are k - 3. But if we know the mean and the standard deviation of a
population, the degrees of freedom are k – 1.
Example :
The IRS is interested in the number of individual tax forms prepared by small accounting
firms. The IRS randomly sampled 50 public accounting firms with 10 or fewer employees in
the Dallas-Fort Worth area. The following frequency tables reports the result of the study.
Assume the sample mean is 44.8 clients and the sample standard deviation is 9.37 clients. Is it
reasonable to conclude that the sample data are from a population that follows a normal
probability distribution? Use the 0.05 Significance level.
20 up to 30 1
30 up to 40 15
40 up to 50 22
50 up to 60 8
60 up to 70 4
To test for a normal distribution, we need to find the expected frequencies for
each class in the distribution, start with the normal distribution by calculating
probabilities for each class.
𝑥 − 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛
𝑧=
𝑠
𝑧 = 𝑥−𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛
𝑠 = 30−44.8 =9.37
−1.58
Do not Reject 𝐻0, These data could be from a normal distribution, because
0.6493 is not greater than 5.991.
A contingency table is used to test whether two traits or characteristics are related
For Example :
The Director of advertising for the Carolina Sun Times, the largest newspaper in the Carolinas, is studying
the relationship between the type of community in which a subscriber resides and the section of the
newspaper he or she reads first. For a sample of readers, she collected the sample information in the
following table.
At the 0.05 significance level, can we conclude there is a relationship between the type of
community where the person resides and the section of the paper read first?
Step 1 : State the null hypothesis and the alternative hypothesis (𝑹𝒐𝒘
𝒕𝒐𝒕𝒂𝒍)(𝑪𝒐𝒍𝒖𝒎𝒏 𝒕𝒐𝒕𝒂𝒍)
𝐻0: 𝑇ℎ𝑒𝑟𝑒 𝑖𝑠 𝑛𝑜 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑐𝑜𝑚𝑢𝑛𝑖𝑡𝑦 𝑠𝑖𝑧𝑒 𝑎𝑛𝑑 𝑠𝑒𝑐𝑡𝑖𝑜𝑛 𝑟𝑒𝑎𝑑
𝒇 =
𝐻1: 𝑇ℎ𝑒𝑟𝑒 𝑖𝑠 𝑎 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝
= 104.25
= 157.5 = 122.25
Do not reject 𝐻0, there is no relationship between community size and section
read
REVIEW QUESTIONS
PROBLEM 11.1
In Korea , there are three Drama airing now. According to a report in this morning’s local
newspaper, a random sample of 100 viewers last night revealed : 82 people watched Hospital
Playlist , 76 people watched Itaewon Class , 52 people watched Dream High . At the 0.02
significance level, is there a difference in the proportion of viewers watching the three dramas ?
PROBLEM 11.2
A junior high school principal wanted to make uniform t-shirt for all their students. They
conducted a survey investigated the colors that each students in each grades want for the uniform
t-shirt. According to the results last year, 46% of the students wanted the color of black, 10% of
the student wanted the color of yellow , 14% of the students wanted the color of red , and 30% of
the students wanted the color of the white. Listed below is a breakdown of a sample of 350
responses randomly selected from all responses from all students in the school this month. At the
0.1 Significance level, does the distribution of the responses last year reflect all the students this
year?
Color Frequency
Black 241
Yellow 92
Red 94
White 123
TOTAL 550
PROBLEM 11.3
EXO Manufacturing Company believes that their hourly wages follow a normal probability
distribution. To confirm this, 100 employees were sampled and the result organized into the
following frequency distribution. The sample mean is 7.321 and the sample standard deviation is
3. At the 0.02 significance level, is it reasonable to conclude that the distribution of hourly wages
follows a normal distribution?
1.00 up to 1.50 17
2.00 up to 2.50 21
3.00 up to 3.50 12
4.00 up to 4.50 39
5.00 up to 5.50 11
TOTAL 100
PROBLEM 11.4
A survey investigated the public’s opinion toward Boba price. Each sampled citizen was
classified as to whether he or she felt the Boba seller should reduce the price or increase the price
, or if the individual had no opinion. The sample results of the study by gender are reported
below.
Female 84 61 99
Male 42 52 21
In this module, you will learn about the specific hypothesis tests that are being used to test non-
parametric data (data that does not rely on numbers to form a normal distribution/trends, but
based on a certain ranking system or an order to the data available). There are 4 testing methods
that you’ll be required to know for this module:
- The Sign Test.
- Wilcoxon Signed-Rank Test.
- Kruskal-Wallis Test.
- Rank-order Correlation Test.
This module will discuss all of them, starting from The Sign Test.
The managers were then given a 3-month training program and were rated by the same panel of
experts again. The following table compares the managers’ old and new knowledge:
Name Before After Sign of Difference
Tyler Good Outstanding +
Sue Fair Excellent +
James Excellent Good -
Jackson Poor Good +
Andy Excellent Excellent 0*
Sarah Good Outstanding +
Antonia Poor Fair +
Jean Excellent Outstanding +
Coy Good Poor -
Troy Poor Good +
Virginia Good Outstanding +
Juan Fair Excellent +
Candy Good Fair -
Arthur Good Outstanding +
Sandy Poor Good +
* Andy’s knowledge neither improved nor declined after the training, so his sign of difference is
zero and therefore has been excluded from the test.
Solution
Step 1: State the Null and Alternative Hypothesis
H0: π ≤ 0.50
There has been no change in the computer knowledge of the managers after the computer-
training program.
H1: π >0.50
There has been an increase in the computer knowledge of the manager after the computer-
training program
- The π symbol refers to the proportion of the population that has a specific characteristic.
This test is one-tailed.
- In this case, there are only two outcomes: “success” and “failure”, hence the probability
in all observations for both outcomes is 0.50.
- Number of trials in this observation is fixed (n=15) and each trialis independent from
each other.
Step 2: Select a level of significance
The level of significance for this test is 0.10
Step 3: Decide on the test statistic.
We’re using the number of plus signs that resulted from the observation of the change in the
manager’s knowledge level.
Step 4: Formulate a decision rule.
- Fifteen managers were entered into the training, but Andy showed no change in his
knowledge level, so his sign difference is 0 and therefore excluded from the test. So,
n=14.
- From the binomial probability distribution table for n=14 and X=0.50, we know that:
No. of Successes Probability of Success Cumulative Probability
0 0.000 1.000
1 0.001 0.999
2 0.006 0.998
3 0.022 0.992
4 0.061 0.970
5 0.122 0.909
6 0.183 0.787
7 0.209 0.604
8 0.183 0.395
9 0.122 0.212
10 0.061 0.090
11 0.022 0.029
12 0.006 0.007
13 0.001 0.001
14 0.000 0.000
Solution
Step 1: State the Null and Alternate Hypothesis
H0: There is no difference in the ratings of the two flavors.
H1: The ratings for the new flavor are higher.
Since it’s either no difference or the new flavor having higher ratings, the test is one-tailed.
Step 2: Identify the Level of significance (0.05)
Rural 130 90 88
Where :
n1 is the number of observations from the first population.
n2 is the number of observations from the second population.
U is the sum of the ranks from the first population
Kruskall-Wallis Test
This test is an alternative to the one-way ANOVA (analysis of variance). Should be used when:
- The data does not follow a normal distribution.
- Population standard deviation and/or variance are unequal.
- Samples selected from the population are independent.
Uses the Chi-Square Table as its critical values.
Formula for Kruskall-Wallis Test 𝟏𝟐 (𝚺𝑹𝟏)𝟐
𝑯= [
(𝚺𝑹𝟐)(𝚺𝑹𝒌)𝟐 𝒏(𝒏 + 𝟏)
𝒏𝟐+ ⋯ +
𝒌
] − 𝟑(𝒏 + 𝟏)With k-1 degrees of freedom (k is the number of populations), where:
1. ΣR1, ΣR2,...,ΣRk are the sums of the ranks of samples 1, 2, ….., k respectively.
2. n1, n2, n3,…., nk are the sizes of sample 1,2, ….., k respectively. Where n is the combined
number of observations for all samples.
Example of application using the 6-step hypothesis testing procedure.
Sample Problem
The director of a Hospital Systems company is concerned about the emergency treatment waiting
times for patients in the 3 hospitals around the city that it operates. To find out, the director
selected random samples of patients in the three locations and the following data was collected:
Solution
Step 1: State the Null and Alternate Hypothesis
H0: The Population distributions of waiting times are the same for the three hospitals.
H1: The Population distributions are not all the same for the three hospitals.
Step 2: Determine the level of significance and degree of freedom
As given in the problem, the level of significance is 0.05. Degree of freedom is k-1, k is the number of
population used in the test, which is 3. So, the degree of freedom is 3-1= 2. n number of samples is 21 (n=21).
Step 3: Determine the critical value for the test
For Df = 2 and significance level of 0.05, the critical value from the Chi-Square table is 5.991.
Null hypothesis isn’t rejected for test value equal to or less than 5.991 and the null hypothesis
should be rejected for test value more than 5.991.
Step 4: Conduct the Kruskall Wallis Test
1. Rank the waiting times on each hospital from the shortest to the longest:
St. Luke’s Swedish Piedmont
56 9 103 20 42 5.5*
39 4 87 16 38 2.5*
48 7 51 8 89 17.5*
38 2.5* 95 19 75 15
73 14 68 13 35 1
60 10 42 5.5* 61 11
62 12 107 21
89 17.5*
*Rankings for same number, for example there are two 38 in the data after number 35
in rank 1, so those two 38s should occupy rank 2 & 3 respectively, but can’t since
they’re the same value, so an average rank is calculated by (2+3)/2.
● ΣR1, ΣR2, ΣR3 is the sum of the ranks on each population to be used for the test.
2. Input the numbers found and available in the Kruskall-Wallis formula to conduct the test. 𝟏𝟐
𝑯= [
(𝟓𝟖. 𝟓)𝟐𝟏𝟐𝟎)𝟐𝟓𝟐. 𝟓)𝟐 𝟐𝟏(𝟐𝟏 + 𝟏) 𝟕 +𝟖 +] − 𝟑(𝟐𝟏 + 𝟏) = 𝟓. 𝟑𝟖𝟔
The test result yields a value of H= 5.38 which will be compared with the critical value found
from the chi-square table to determine whether the null hypothesis should be rejected or not.
Step 5: Make a decision regarding the null hypothesis.
The test yields a value of 5.38, which is less than the critical value of 5.991, so the decision
should be to fail to reject the null hypothesis.
Step 6: Interpretation of the result.
Since the null hypothesis didn’t get rejected, this means that the waiting times in the three
hospitals are the same.
Sample Problem
Recent studies focus on the relationship betweeen the age of online shoppers and the
number of minutes spent browsing on the internet. Table shows a sample of 15 online shoppers
who actually made a purchase last week. Included is their age and the time, in minutes, spent
browsing on the internet last week.
SHOPPERS AGE BROWSING TIME
(MINUTES)
SPINA 28 342
GORDON 50 125
SCHNUR 44 121
ALVEAR 32 257
MYERS 55 56
LYONS 60 225
HARBIN 38 185
BOBKO 22 141
KOPPEL 21 342
ROWATTI 45 169
MONAHAN 52 218
LANOUE 33 241
ROLL 19 583
GOODALL 17 394
BRODERICK 21 249
Solution
SHOPPERS AGE AGE BROWSING BROWSING D D2
RANK TIME RANK
(MINUTES)
SPINA 28 6.0 342 12.5 -6.50 42.25
The only additional component to this test is testing the significance of rs using t distribution as
its critical value. The additional test is done using the hypothesis test, rank correlation formula:
𝑛
−2
𝑡 = 𝑟𝑠√1 − 𝑟
Example:
Using a significance level of 0.05, an rs of -0.724 and n of 15, conduct the hypothesis test of rank
correlation!
Solution:
H0: The rank correlation in the population is zero
H1: There is a negative correlation amongst the variables in the population. Df=15-2= 13
Using the Df and the significance level, we can find the critical value for this test in t distribution
table for one tailed test, which yields a value of -1.771
15 − 2 (
𝑡 = (−0.724)√ )2 = −3.784
1 − −0.724
The t-test yields a result of -3.784, which is less than the critical value of -1.771, so the
conclusion would be to reject the null hypothesis, meaning that there is a negative correlation
amongst the variables in the population.
REVIEW QUESTIONS
PROBLEM 12.1
Many new stockbrokers resist giving presentations to bankers and certain other groups.
Sensing this lack of self-confidence, management arranged to have a confidence-building
seminar for a sample of new stockbrokers and enlisted Career Boosters for a three-week course.
Before the first session, Career Booster measured the level of confidence of each participant. It
was measured again after the three-week seminar. The before and after levels of self confidence
for the 14 in the course are shown below. Self-confidence was classified as being either
negative,low,high, or very high.
The purpose of this study is to find whether Career Boosters was effective in raising the self-
confidence of the new stockbrokers. That is,was the level of self-confidence higher after the
seminar than before it? Use the .05 significance level
PROBLEM 12.2
The assembly area of Matthew Product was recently redesigned. Installing a new lighting
system and purchasing a new workbench were two features of the redesign. The production
supervisor would like to know if the changes resulted in improved worker productivity. To
investigate, she selected a sample of 11 workers and determined the production rate before and
after the changes. The sample information is reported below.
A. B. 25 15
S. Z. 20 17
B. B. 22 25
M. F. 30 24
S. S. 16 16
C. L 23 21
H. R. 26 23
M. N. 18 17
S. N. 19 20
E. L. 28 10
(a) Use the Wilcoxon signed-rank test to determine whether the new procedures actually
increased production. Use the .05 level and a one-tailed test.
(b) What assumption are you making about the distribution of the differences in production
before and after redesign?
PROBLEM 12.3
The regional bank manager of Capital Financial Bank is interested in the number of
transactions accounting in personal checking account at four of the bank’s branches. Each branch
randomly samples a number of personal checking accounts and records the number of
transactions made in each account over the last six months. The results are in the table below.
Using the .01 level and the Kruskal-Wallis Test,determine whether there is a difference in the
number personal checking account transactions among the four branches.
EASTERN WEST SIDE NORTHERN SOUTH SIDE
BRANCH BRANCH BRANCH BRANCH
340 100 296 80
180 99 91 86
103 199
PROBLEM 12.4
A sample of individuals applying for manufacturing jobs at Kevin Enterprises revealed
the following score o nan eye perception test (X) and a mechanical aptitude test (Y) :
01 682 40
02 840 42
03 777 62
04 805 30
05 810 28
06 777 55
07 820 51
08 777 70
09 820 60
10 805 23
(a) Compute the coefficient of rank correlation between eye percerption and mechanical aptitude.
(b) At the .05 significance level, can we conclude that the correlation in the population is
different from 0?
Appendix 1 : Z TABLE
Appendix 2 : t TABLE
ontinued
Appendix 3 : F TABLE (0.5)
Appendix 4 : F TABLE (0.1)
Appendix 5 : CHI SQUARE TABLE
Appendix 6 : WILCOXON T VALUE