Professional Documents
Culture Documents
Population
Sample
Subset
3
STEPS FOR TEST OF HYPOTHESIS
1):-Construction of hypotheses
2):- Level of significance
3):- Test statistic
4):-Decision rule
5):-Conclusion
4
4
1/5 Construction of hypotheses
[Null and Alternative Hypotheses]
A Statistical Hypothesis is an assumption made about the
population parameter which may or may not be true.
H0: = 50 H1: 50
6
6
1/5 Construction of hypotheses
[Null and Alternative Hypotheses]
Example:
A major west coast city provides one of the most
comprehensive emergency medical services in the
world. The service goal is to respond to medical
emergencies with a mean time of 12 minutes or less.
The director of medical services wants to
formulate a hypothesis test that could use a sample
of emergency response times to determine whether
or not the service goal of 12 minutes is being
achieved. 7
7
1/5 Construction of hypotheses
[Construction of Hypotheses]
• Null and Alternative Hypotheses
Hypotheses Conclusion and Action
H0: The emergency service is meeting
the response goal; no follow-up
action is necessary.
H1: The emergency service is not
meeting the response goal;
appropriate follow-up action is
necessary.
Where: = mean response time for the population
of medical emergency requests. 8
2/5 Level of significance
[Type I and Type II errors]
Whenever sample evidence is used to draw a conclusion about population, there
are risks of making wrong decision because of sampling.
Such errors in making the incorrect conclusion are called Inferential Errors,
because they entail drawing an incorrect inference from the sample about the value
of the population parameter.
One the basis of sample information, we may reject a true statement about
population or don’t reject a false statement
Type I error = Reject H0 when in fact H0 true
Type II error = Don’t Reject H0 when in fact H0 is false
9
9
2/5 Level of significance
[Type I and Type II errors]
• Significance Level
Probability of committing a Type-I error is called the level of
significance, denoted by α .
The level of significance is also called the size of test.
By α =5% we mean that there are 5 chances in 100 of incorrectly
rejecting a true null hypothesis.
To put it in another way we say that we are 95% confident in
making the correct decision.
• Level of Confidence
The probability of not committing a Type-I error, (1- α ), is called
the level of confidence, or confidence co-efficient.
10
10
3/5 Test Statistic
• Test statistic is a rule or formula on which the decision of
rejecting or don’t rejecting the null hypothesis is based.
• In testing of hypothesis the calculation of test statistic is
based on the assumption that the null hypothesis is true.
• Calculated value of Test statistic is amount of evidence
calculated from sample data against null hypothesis.
• In case of sufficient evidence against Ho, we’ll reject null
hypothesis otherwise don’t reject
• Different Test Statistics are availble (Z-test, t-test, F-test etc)
• Which test is to be use depends on objective and availble
information
11
11
Critical Value
4/5 Decision Rule
• Critical region/Rejection region
Critical region is that part of the sampling
distribution of a statistic for which the Ho is
rejected. A null hypothesis is rejected if the
value of test- statistic is not consistent with the
Ho. CR is associated with H1. AR RR
• Non-rejection Region
Non-rejection region is that part of the sampling
distribution of a statistic for which the Ho is not
rejected.
Critical Values:
The values that separate Rejection and Non-
rejection regions are called Critical values
12
12
5/5 Conclusion
Two approaches:
a. Critical value approach( Table values)
Reject Ho
if the calculated value of test statistic falls in
the rejection region otherwise don’t reject Ho
b. p-value (Probability value) approach
Reject Ho
If p-value ≤ Level of significance (α)
13
13
Selection of Test Statistic for
single population mean
X
Population variance Yes Z
Yes
2
Known ?
n
NO
X
Z
sample size large Yes S2
n> 30 Yes
n
NO
X
t
S2
n
14
Criteria for rejecting Ho
Hypotheses Type Rejection Region
Ho : o Reject Ho if
Right sided
H1: > o
Z cal Z
Ho : ≥ o Reject Ho if
H1: < o Left sided
Z cal Z
Ho : = o Reject Ho if
H1: ≠ o Two sided
Z cal Z or Z cal Z
2 2
Z cal Z
2
Construction of hypotheses
POPULATION Ho : 9.63
σ=1.40 H1: < 9.63
< 9.63
Level of significance = 5%
9.63
Test Statistic
X 8.93 9.63
Z= 3
2
(1.40) 2
n 36
Sample
Decision Rule:- Reject Ho if Zcal - Z
n 36
Result:-As Zcal < - Z.05= -1.645 Reject Ho and hence
X 8.93
we conclude that thread has become inferior. 16
16
EXAMPLE:-A random sample of 40 sun flowers plants from a large
population of plants showed 57 days as mean number of days of
flowering with variance 2 days2 .
On the basis of above sample information can we conclude that average
number of days of flowerings of sunflower population is not more than
56 days at 5%
Construction of hypotheses
Ho : ≤ 56
POPULATION
H1: > 56
≤ 56
Level of significance = 5%
> 56
X 0 57 56
Test Statistic Z 4.46
S2 2
n 40
Construction of hypotheses
Ho : = 6
H1: ≠ 6
POPULATION
Level of significance = 5%
=6 Test Statistic
≠6
X 0 6.1 6
t 1.26
S 2
0.25 2
n 10
Decision Rule:- Reject Ho if |tcal | > t/2(n-1)
Sample
n 10 X 6.1 Result:-As |tcal | < t.025(9)= 2.262 So do’t reject Ho
S 0.25 and conclude that the process is in control
18
POPULATION PARAMETER ?
19
19
INTERVAL ESTIMATE
An interval estimation for population
parameter is a rule for determining an interval
in which the parameter is likely to fall. The
corresponding estimate is called interval
estimate. Usually a probability of some
confidence is attached with the interval
estimate when it is formed.
20
20
Example:-The following data represents the daily milk production
of a random sample of 10 cows from a particular breed
12, 15, 11, 13, 16, 19, 15, 16, 18, 15.
Construct 90% C.I for the average milk production of all the cows of
that particular breed.
Where
22
X X XX
22
S Where SS 2
S
2
X tt/ /22((nn11))
X n n11
nn
POPULATION unbiased
unbiasedsample
samplevar iance
variance
66..22
22
15 t .05( 9 )
15 t .05(9) 10 10
15 (1.833)(1.51)
Sample
( 13.55 , 16.45)
n 10 CI 90%
X 15 S 2 6.22
21
21
Selection of Test Statistic
Comparing two population means
Z
X 1 X 2 1 2
Population variances
1 2
Yes 2 2
are Known Yes
n1 n 2
NO X X 2 1 2
Both samples Z 1
2 2
are large Yes
Yes S
1 S 2
(n1 & n2 )> 30 n1 n2
NO
t
X 1 X 2 1 2
2 1 1
S p n1 n2
22
EXAMPLE: A new chemical fertilizer, yielded average 20,400 pounds of
tomatoes from 40 randomly selected acres of farmland. On another 100
randomly selected acres the standard organic fertilizer produced a mean yield
of 19,000 pounds. Do the results of the comparison indicate that the
chemical fertilizer really produces larger yields than the organic? Assume that
the population standard deviations are known to be 1200 and 1000
respectively. 1=Average yield by using chemical fertilizer
2=Average yield by using standard organic fertilizer
POULATION
1 1200
Construction of hypotheses
2 1000
Ho : 1 2 i.e 1- 2 0
1 > 2
1 2 H1: 1 > 2 1- 2 > 0
Z
X 1 X 2 1 2
20400 19000 0 2.083
1 2
2 2
12002 10002
Sample
n1 n 2 40 100
n1 40 X 1 20,400 Decision Rule:- Reject Ho if Zcal > Z
n2 100 X 2 19,000 Result:-As Zcal > Z.05= 1.645 Reject Ho and hence
we conclude that chemical fertilizer produces better 23
23
average yield than standard organic
Example:-The strength of ropes made out of cotton yarn and
coir gave on measurement the following values
Cotton 7.5 5.4 10.6 9.0 6.1 10.2 7.9 9.7 7.1 8.5
Coir 8.3 6.1 9.6 10.4 6.4 10.0 7.9 8.9 7.5 9.7
Test whether there is a significant difference in the strength of
the two types of ropes at 5% level of significance.
(n1 1) S12 (n 2 1) S 2 2 26.78 20.236
S
2
p 2.612
(n1 1) (n 2 1) 99
POPULATION
1 2 Construction of hypotheses
1 = 2 Ho : 1 = 2
H1: 1 2
If population variances for new and old batteries are known to be 25 and 36
respectively. Test the engineer’s claim at 5%
POPULATION Construction of hypotheses
Ho : 1 - 2 ≥ 7
25 2 36
2 2
1
H1: 1 - 2 < 7
1 - 2 ≥ 7
Z
X 1 X 2 1 2
151.33 136.67 7 2.94
1 2
2 2
25 36
Sample n1 n2 9 9
n1 9 n 2 9 Decision Rule:- Reject Ho if Z ≤ - Z
cal
X 1 151.33 Result:-As Zcal > - Z.05= -1.645 so don’t Reject Ho and
X 2 136.67 conclude that new battery will operate continuously for atleast
25
7 minutes longer than the old battery. 25
Your research is to know if a certain diet is effective in reducing LDL
cholesterol levels (the bad kind of cholesterol)
• Randomly select some individuals ( say 20 )and measure their LDL
cholesterol level.
• Give them diet for some period of time ( say three months)
• At the end of the three months, measure the LDL cholesterol levels of
the same 20 individuals.
• There are two data sets one before using and one after using diet
Your research is to compare a new automated procedure for
determining glucose in serum with the established method
• Randomly select some individuals ( say 20 )and take serum sample .
• Divide the serum into two halves
• From one half measure glucose by using new procedure from other
half use established procedure to measure glucose
• There are two sets of measurements (data sets) glucose measurements
from each method
26
Repeated Measure Data
Data in which same subject / individual measured more than once is
called repeated measure data and simple example of repeated measure is
paired data in which same subject measured twice
Other names of paired data are dependent samples or correlated
samples
Test for comparing means of dependent samples is called
Paired t-test
27
EXAMPLE: A new automated procedure for determining glucose
in serum ( Method A) is to be compared to the established
method (Method B) both methods performed on serum from the
same six patients in order to eliminate patient to patient
variability. Do the following result confirm a difference in reading
of two methods at 5% level
1=Average glucose
Patient Method A Method B
Glucose mg/L Glucose mg/L
X1 X2
1 1044 1028 measured by Method A
2=Average glucose
2 720 711
3 845 820
4 800 795
5 957 935
measured by Method B
6 650 639
28
28
Ho : 1 - 2 = 0 i.e =0
d
H1: 1 - 2 ≠ 0 ≠0
d
d d d
d 88
14.67
t n 6
S d2
n S d2
(d d ) 2
301.33
60.27 29
n 1 5
Ho : 1 - 2 = 0 i.e =0d
H1: 1 - 2 ≠ 0 ≠0d
STAT-600 31
Analysis of Variance (ANOVA)
Analysis of Variance is a procedure that
partitions the total variability in the data into
distinct components.
Each component represents the variation due to a
recognized source of variation, in addition, one
component represents the variation due to
uncontrolled factors and random errors
associated with the response measurements
Explained Un-Explained
Total Variation 32
Example:- The milk butterfat percentage of 4 breeds of cows is desired
to be known. A random sample of 6 Mature cows from each of 4
breeds was taken and the following data were recorded.
Breed 1 Breed 2 Breed 3 Breed 4
3.6 4.6 3.7 5.8
4.1 4.9 3.6 5.0
4.0 5.7 3.8 5.3 Test the hypothesis that the average
milk butterfat percentage for four
3.9 5.9 3.2 5.2
breeds are same
3.2 4.3 3.9 4.9
4.3 5.1 3.2 5.8
23.1 30.5 21.4 32.0 107.0
Explained Un-Explained
(Between Breed) STAT-600 (Within Breed) 33
Graphical View of the data
Dot Plot of Butter fat percentage Boxplot of Butter fat percentage
6.0 6.0
5.5 5.5
Butter fat percentage
4.5 4.5
4.0 4.0
3.5 3.5
3.0 3.0
Breed 1 Breed 2 Breed 3 Breed 4 Breed 1 Breed 2 Breed 3 Breed 4
Breed Breed
Average milk butterfat percentage of Breed 2 and 4 while for Breed1 & 3 are almost
similar. Although Breed 2 has largest variability in the data but variability between
34
four breeds are same.
Statistical Analysis by One Way ANOVA
Ho : 1=2=3=4
Average milk butterfat percentage are same for 4 breeds
H1: At least two means are different
2
Test Statistic F S b
2
S w
Within Breed
Total- Between Breed
36
17.8383-13.919=3.92 36
(S.O.V) DF SS MSS=SS/df Fcal
Between Breed 3 13.919 4.640 S2b 23.67*
Within Breed 20 3.920 0.196 S2w(MSE)
TOTAL 23 17.8383
37
37
Breed Mean
Breed 1 3.850 Mean Plot for Butterfat Percentage
Breed 2 5.083 5.5
Breed 3 3.567
Breed 4 5.333
5.0
Mean
4.5
4.0
3.5
Breed 1 Breed 2 Breed 3 Breed 4
Breed
STAT-600 38
TWO WAY ANOVA
The effective life (in hours) of batteries is compared by three material type
Batteries are randomly selected from each material type and are then randomly allocated
to each temperature level. The resulting life of all batteries is shown below:
Explained Un-Explained
• Due to material type (Error)
• Due to different Temp
Type I Type II Type III total Correction Factor (CF)
Low 180 188 160 528 (G.T)2/Obs= (1395)2/9 = 216225
Medium 215 210 190 615
High TotalSS
82 90 80 252
(180)2+(215)2 …(80)2 – CF
1395
Total 477 488 430 = 240993 – 216225 = 24768
Between Material
(477) 2 (488) 2 (430) 2
CF 632.67
Error 3 3 3
Total – Material – Temp
Between Temp
24768-632.67-23946=189.33 (528) 2 (615) 2 ( 252) 2
CF 23946
3 3 3
41
(S.O.V) DF SS MSS=SS/df Fcal Ftab
Material 2 632.67 316.33 S2M 6.68ns F0.05(2,4)=6.94
Temp 2 23946 11973 S2T 252.95* F.05(2,4)=6.94
Error
4 189.33 47.33 (MSE)
TOTAL 8 24768
42
42
Low Medium High
Means 176 205 84
200
175
Average Life
150
125
100
STAT-600 43