Professional Documents
Culture Documents
BY
DR.B.V.DHANDRA
PROFESSOR
CHRIST (DEEMED TO BE) UNIVERSITY
BANGALORE
Testing of Hypotheses:
Parameter and Statistic: The statistical constants of the population such as mean, (µ),
variance (σ2), correlation coefficient (ρ) and proportion (P) are called ‘Parameters’.
The function of the sample data which is free from unknown parameters is called
statistic
E.g Sample mean is statistic, where as sample mean minus population unknown
mean is not a statistic.
Sampling Distribution: The distribution of all possible values which can be assumed
by some statistic measured from samples of same size ‘n’ randomly drawn from the
same population of size N, is called as sampling distribution of the statistic
The standard deviation of the sampling distribution of a statistic is known as its
standard error. It is abbreviated as S.E. For example, the standard deviation of the
sampling distribution of the mean x-bar known as the standard error of the mean.
Testing Hypotheses:
Null Hypothesis and Alternative Hypothesis:
Hypothesis testing begins with an assumption called a Hypothesis, that we make about a population
parameter. A hypothesis is a supposition made as a basis for reasoning. The conventional approach
to hypothesis testing is not to construct a single hypothesis about the population parameter but
rather to set up two different hypothesis. So that of one hypothesis is accepted, the other is rejected
and vice versa.
Null Hypothesis: A hypothesis of no difference is called null hypothesis and is usually denoted by
Ho “ Null hypothesis is the hypothesis” which is tested for possible rejection under the assumption
that it is true.
For example: If we want to find out whether the special classes (for MCA. Students) after college
hours has benefited the students or not. We shall set up a null hypothesis that “Ho: special classes
after college hours has not benefited the students”.
single hypothesis about the population parameter but rather to set up two different hypothesis. So that of one hypothesis is accepted, the other is rejected and vice versa.
Testing of Hypotheses:
Alternative Hypothesis: Any hypothesis, which is complementary to the null hypothesis, is called
an alternative hypothesis, usually denoted by H1,
For example, if we want to test the null hypothesis that the population has a specified mean µ0
(say),
i.e., : Step 1: null hypothesis H0: µ = µ0 then
2. Alternative hypothesis may be
i) H1 : µ ≠ µ0 (ie µ > µ0 or µ < µ0)
ii) H1 : µ > µ0
iii) H1 : µ < µ0
Testing of Hypotheses
.05/2=.025
.01/2= .005
Testing of Hypotheses
Note: Critical Region: A region in the sample space S which amounts to rejection of H0 is termed
as critical region or region of rejection.
Critical Value: The value of test statistic which separates the critical (or rejection) region and the
acceptance region is called the critical value or significant value. It depends upon
i) the level of significance used and
ii) the alternative hypothesis, whether it is two-tailed or single-tailed
One tailed test: A test of any statistical hypothesis where the alternative hypothesis is one tailed
(right tailed or left tailed) is called a one tailed test.
Rough Work
n=4
S={0,1,|2,3,4}
Critical Region=C={x|x=2,3,4}, A={0,1}
T(x)2, Reject Ho.
P(T(x)|Ho}=
X
T(X1,X2,….Xn)=Xbar=Sample mean
Ho: against H1:
Xbar=70.09, Xo= 75.9 C={x|x>75.9}= (75.9, ) , A=(-
)
Testing of Hypotheses
For example, for testing the mean of a population H0: µ = µ0, against the alternative
hypothesis H1: µ > µ0 (right – tailed) or H1 : µ <µ0 (left –tailed)is a single tailed test
. In the right – tailed test H1: µ > µ0 the critical region lies entirely in right tail of the sampling
distribution of x ,
while for the left tailed test H1: µ < µ0 the critical region is entirely in the left of the distribution
of x.
Testing of Hypotheses
For example, suppose that there are two population brands of washing machines, are
manufactured by standard process(with mean warranty period µ1) and the other
manufactured by some new technique (with mean warranty period µ2): If we want to test
if the washing machines differ significantly then our null hypothesis is H0 : µ1 = µ2 and
alternative will be H1: µ1 ≠ µ2 thus giving us a two tailed test.
However if we want to test whether the average warranty period produced by some new
technique is more than those produced by standard process, then we have H0 : µ1 = µ2
and H1 : µ1 < µ2 thus giving us a left-tailed test.
Testing of Hypotheses
Similarly, for testing if the product of new process is inferior to that of standard process then we have,
H0 : µ1 = µ2 and H1 : µ1>µ2 thus giving us a right-tailed test. Thus the decision about applying a two –
tailed test or a single –tailed (right or left) test will depend on the problem under study.
Type I and Type II Errors:
When a statistical hypothesis is tested there are four possibilities.
1. The null hypothesis is true but our test rejects it ( Type I error)
2. The null hypothesis is false but our test accepts it (Type II error)
3. The null hypothesis is true and our test accepts it (correct decision)
4. The null hypothesis is false and our test rejects it (correct decision
Testing of Hypotheses:
Test Procedure : Steps for testing hypothesis is given below. (for both large sample and small
sample tests)
1. Null hypothesis : set up null hypothesis H0.
2. Alternative Hypothesis: Set up alternative hypothesis H1, which is complementary to H0
which will indicate whether one tailed (right or left tailed) or two tailed test is to be applied.
3. Level of significance : Choose an appropriate level of significance (α), α is fixed in advance.
4. Use appropriate test statistic to test the null hypothesis
t-Statistic
t - statistic definition:
If x1, x2 ,……x n is a random sample of size n from a normal population with mean µ
and variance , then Student’s t-statistic is
defined as
t-Statistic
Applications of t-distributio
t-Statistic
The t-distribution has a number of applications in statistics, of which we shall discuss in the
following:
(i) t-test for significance of single mean, population variance being unknown.
(ii) t-test for significance of the difference between two sample means, the population variances
being equal but unknown.
(a) Independent samples
(b) Related samples: paired t-test
Table of Student’s t-statistic
Test of significance for Mean:
H0: µ = µ0;There is no significant difference between the sample mean and population Mean
H1: µ ≠ µ0 ( µ < µ0 (or) µ > µ0) Level of significance: 5% or 1%
Calculation of statistic: Under H0 the test statistic is
Test of significance for Mean:
Inference :
If t-cal ≤ t it falls in the acceptance region and the null hypothesis is accepted
and
if t-cal > tα/2 the null hypothesis H0 may be rejected at the given level of
significance α.
Since we are testing for two-sided alternative hypothesis
Example-1
A reading coordinator in a large public school system suspects that poor readers
may test lower in IQ than children whose reading is satisfactory. He draws a
random sample of 30 fifth grade students who are poor readers. Historically fifth
grade students in the school system have had an average IQ of 105. The sample
of 30 has XBAR = 101.5 and S(XBAR) = 1.42. Test the appropriate hypothesis at
the 5% level.
Solution: Given Data: x̄ = 101.5, Std(x̄ ) = 1.42 and μ0 =105
Ho: μ =105 against H1: μ <105 one sided alternative
Level of Significance= α=0.05
Test Procedure:
Here the population standard deviation is unknown, so we use the t-test for
testing with an assumption of normal population.
Test Statistic is t = || , S(x̄ )= s/
|t-cal|
The table value of student’s t-statistic for n -1=26-1=25 d.f at 5% los is t25,0.05 = 1.7
Inference: Since |t0 | = 2.19 =|t-cal |> t25,0.05 = 1.7, Ho is rejected at 5% level of significance.
Hence, we conclude that advertisement is certainly effective in increasing the sales.
Test of significance for difference between two means:
=
=121.6
Under H0 the test statistic is
|tcal| =| = = = 1.7
Table value of t for (5+7-2=10) d.f at 5% l.o.s is 1.812
Thus, tcal = 1.7 < 1.812 . Hence, accept Ho
Inference: Since tcal = 1.7 < t10, 0.05 = 1.812, it is not significant. Hence H0 is accepted and
we conclude that the medicines A and B do not differ significantly as regards their effect
on increase in weight.
Related samples –Paired t-test:
Related samples –Paired t-test: In the t-test for difference of means, the
two samples were independent of each other. Let us now take a particular
situations where,
(i) The sample sizes are equal; i.e., n1 = n2 = n(say), and
(ii) The sample observations (x1, x 2, ……..xn) and (y1, y2, …….yn) are not
completely independent but they are dependent in pairs.
Paired t-test
Paired t-test
Inference:
If |tcal|> tn-1,α/2, we reject the null hypothesis.
If the |tcal |< tα/2, we accept the null hypothesis
Thus, by comparing |tcal| and tα/2 at the desired level of
significance, usually α=5% or 1%, we reject or accept the null
hypothesis.
Example:
To test the desirability of a certain modification in typists desks, 9 typists were given
two tests of as nearly as possible the same nature, one on the desk in use and the
other on the new type. The following difference in the number of words typed per
minute were recorded:
Test for population variance: Suppose we want to test if the given normal population has a specified
variance =
Null Hypothesis: Ho :
Let x1, x2 …xn be a random sample of size n.
Level of significance:
Let α = 5% or 1%
F-Test for testing the equality of variances
F – Statistic Definition: If and are two independent variates with and d.f.
respectively, then F - statistic is defined below
F = follows F-distribution with (, ) d.f
i.e. F - statistic is the ratio of two independent chi-square variates divided by their
respective degrees of freedom.
This statistic follows G.W. Snedocor’s F-distribution with (, ) d.f.
Testing the ratio of variances:
Suppose we are interested to test whether the two normal population have same
variance or not.
Let , , ….. , be a random sample of size , from the first population with variance
and , , … , be random sample of size form the second population with a
variance . Obviously the two samples are independent.
Null hypothesis: : =
i.e. population variances are the same.
In other words is that the two independent estimates of the common population
variance do not differ significantly.
Calculation of statistics:
Under H0, the test statistic is
= , where = =
and = =
It should be noted that numerator is always greater than the denominator in F-ratio
and F = Variance Larger/ Variance Smaller.
ν1 = d.f for sample having larger variance
ν2 = d.f for sample having smaller variance
Inference:
The calculated value of F is compared with the table value of for ν1 and ν2 at α
% ( 5% or 1% ) level of significance If F-cal > F α then we reject H0.
On the other hand if F-cal < Fα we accept the null hypothesis and it is a inferred
that both the samples have come from the population having same variance.
Since F- test is based on the ratio of variances it is also known as the variance
Ratio test. The ratio of two variances follows a distribution called the F
distribution named after the famous statisticians R.A. Fisher .
Example
Use F test to determine whether variance of the two blocks differ significantly?
Solution: We are given that = 8 =6 XAbar = 60 XBbar = 51 =50 = 40
Null hypothesis: : = ie there is no difference in the variances of yield of wheat.
Alternative Hypothesis: : (two tailed test)
Level of significance: Let α = 0.05
Test Procedure:
= = 57.14
and = = 48
Since > , = = = 1.19
Table value of F for (7,5) d.f at 5% los = 4.88
Inference: Since < (7,5), we accept the null hypothesis and hence infer that there is
no difference in the variances of yield of wheat.
Large Sample Tests:
Inference : Since Zcal > table value, we reject our null hypothesis at 5% level of
significance.
Say that the data indicate a significant difference of the opinion on the matter
between men and women students.
Example: In a certain city 125 men in a sample of 500 are found to be self
employed. In another city, the number of self employed are 375 in a random
sample of 1000. Does this indicate that there is a greater population of self
employed in the second city than in the first?
Example-2
In a certain city 125 men in a sample of 500 are found to be self employed. In another city,
the number of self employed are 375 in a random sample of 1000. Does this indicate that
there is a greater population of self employed in the second city than in the first?
Test of significance for mean:
Test of significance for mean: Let Xi (i = 1,2…..n) be a random sample of size n from a population
with variance , then the sample mean x is given by
=
V() = V(
=
==
Test Procedure: Null and Alternative Hypotheses: H0:µ = µo. H1:µ ≠ µo (µ > µo or µ < µo) -Two
sided alternative hypothesis
Level of significance: Let α = 0.05 or 0.01
Calculation of statistic: Under H0, the test statistic is
=
Compare the calculated value of Z, say table value of Z at .
Inference :
For two sided alternative hypothesis: If , we accept our null hypothesis and
conclude that the sample is drawn from a population with mean µ = µ0
If > ,we reject H0 in favor of H1 and conclude that the sample is not drawn from
a population with mean µ = µ0
For one sided alternative hypothesis: If , we accept our null hypothesis and
conclude that the sample is drawn from a population with mean µ = µ0
If > ,we reject H0 in favor of H1 and conclude that the sample is not drawn from
a population with mean µ = µ0
Example
The mean lifetime of 100 fluorescent light bulbs produced by a company is computed
to be 1570 hours with a standard deviation of 120 hours. If µ is the mean lifetime of all
the bulbs produced by the company, test the hypothesis µ=1600 hours against the
alternative hypothesis µ ≠ 1600 hours using a 5% level of significance.
Solution: Given data: n=100, Xbar=1570, and σ^ = s = 120 and µo = 1600 .
Test Procedure:
Null and Alternative Hypotheses: H0:µ = 1600. H1:µ ≠ 1600 (µ > 1600 or µ < 1600)
Level of Significance: 0.05
Test statistic under Ho is = = = = 2.5
Table value of = 1.96
Now, = 2.5 > = 1.96, so reject the null hypothesis H0:µ = 1600 in favor of alternative
hypothesis H1:µ 1600. We conclude that the mean life time of light bulb is not 1600
time units.
Test the significance of two population means:
Suppose we wish to compare the means of two distinct populations. Each population has a mean and a
standard deviation
. We arbitrarily label one population as Population 1 and the other as Population 2, and subscript the
parameters with the numbers 1 and 2 to tell them apart. We draw a random sample from Population 1 and
label the sample statistics it yields with the subscript 1. Without reference to the first sample we draw
a sample from Population 2 and label its sample statistics with the subscript 2.
Fig1. sampling from
Two independent
Normal poplns
100(1−α)%100(1−α)% Confidence Interval for the Difference Between Two Population
Means: Large, Independent Samples
Test of independence
Let us suppose that the given population consisting of N items is divided into r mutually
disjoint (exclusive) and exhaustive classes , , …, with respect to the attribute A so that
randomly selected item belongs to one and only one of the attributes , , …, .Similarly
let us suppose that the same population is divided into c mutually disjoint and exhaustive
classes , , …, w.r.t another attribute B so that an item selected at random possess one
and only one of the attributes , , …, . The frequency distribution of the items belonging
to the classes , , …, and , , …, can be represented in the following r × c manifold
contingency table
rxc contingency tsble
Contingency Table:
(Ai) is the number of persons possessing the attribute Ai, (i=1,2,…r), (Bj) is the
number of persons possessing the attribute Bj,(j=1,2,3,….c) and (Ai Bj) is the
number of persons possessing both the attributes Ai and Bj(i=1,2,…r ,j=1,2,…c).
Also ΣAi= ΣBj= N
Contd…
Under the null hypothesis that the two attributes A and B are independent, the expected frequency
for (AiBj) is given, the under null hypothesis of the independence of attributes, the expected
frequencies for each of the cell frequencies of the above table can be obtained on using the formula
= ,
where = , i=1, 2, ...r; j=1,2….c
Compare value with that of table value of at (r-1)x(c-1) d.f
Inference: If > , we reject the null hypothesis in favour of alternative
hypotheses. at 100α% level of significance.
Otherwise accept the null hypothesis.
Example-1
Two researchers adopted different sampling techniques while investigating the same
group of students to find the number of students falling in different intelligence levels.
The results are as follows:
Would you say that the sampling techniques adopted by the two researchers are
independent?
Interval Estimation:
If mean is known, then is distribution. Therefore the 100(1- confidence interval for is
Confidence interval for Population Proportion: