You are on page 1of 14

Two important problems in statistical inference are

1. Estimation
2. Tests of Hypotheses

Assume that some characteristics of the elements in a population can be represented by a


random variable X whose density function is f(x, θ), where the form of the density is
assumed known except that it contains an unknown parameter θ (if θ value known, the
density function would be completely specified, and there would be no need to make
inferences about it). Further assume that a random sample x1 , x 2 ,... x n from f ( x ,  )
can be observed. On the basis of observed sample values x1 , x 2 ,... x n it is desired to
estimate the value of the unknown parameter θ .

To find the estimate of unknown θ we find the statistic (functions of the observed
x1 , x 2 ,... x n whose value are used to estimate the parameter θ .

Statistic: A statistic is a function of observable random variable, which is itself an


observable random variable, which does not contain any unknown parameters.
(Observable means that we can observe the value of the random variable). We intend to
use a statistic to make inference about the density function of the random variable. If
random variables are not observable they would be of no use in making inferences.

Example: If x1 , x 2 ,...x n is a random sample from a density f ( x ,  ) (θ Unknown) then

1 n
x  x
ni 1 i

is a statistic (provided x1 , x 2 ,... x n are observable

½ {min ( x1 , x 2 ,... x n ) + max ( x1 , x 2 ,...x n ) } is also a statistic

But X – θ is not a statistic since it depends on  which is unknown

Estimator: Any Statistic (known function of observable random variables that is itself
a random variable) whose values are used to estimate θ is defined to be an estimator of
unknown parameter θ.

“Every estimator is statistic but every statistic is not an estimator.”

Estimation admits two problems:

1. Devise some means of obtaining a statistic to use as an estimator.


2. To select criteria and techniques to define and find a ‘best’ estimator among many
possible estimators.”
Important methods of finding estimators are:

1. Method of Moments
2. Method of Maximum likelihood
3. Minimum Chi-square Method
4. Least square method of Estimators
5. Bays estimation method
(A Bays estimator is given as the mean of the probability distribution or from the
decision- theoretical

Optimum properties that an estimator may possess are defined as

1. Unbiasedness
2. Consistency
3. Efficiency and
4. Sufficiency

In experimental research, the object is some times merely to estimate parameters. But
more often the ultimate purpose will involve some are of the estimate.One may wish, for
example, to compare the field of new hybrid variety of corn that of a standard variety and
perhaps recommend that new variety should replace the standard one if it appears
superior.

This is a common situation in research one may wish to determine whether a new
serum is really effective in curing the disease, whether one educational procedure is
better than the other, whether one method of preserving foods is better than other in so far
as retention of vitamins is concerned.

The purpose of hypothesis testing is to evaluate claims about the values of population
parameter.

Null and Alternative Hypotheses

A statistical hypothesis is an assumption or statement about one or more population


parameters of a probability distribution. The hypothesis will be accepted or rejected on
the basis of information taken from a sample from the population. Hypothesis testing is a
procedure whereby we decide whether to accept or reject a hypothesis.

There are two different hypotheses. There are called Null and Alternative Hypothesis.
These are so constructed that if one is correct the other is wrong.

The Null hypothesis asserts that there is no real difference between the sample statistic
and population parameter and whatever difference in there is attributed to sampling error.
It is chosen in such way that it always includes an equality. Null hypothesis is usually
denoted by Ho.

The negation of the null hypothesis is called the alternative hypothesis. It is set in such a
way that rejection of null hypothesis means the acceptance of the alternative hypothesis.
Adequate care is taken to ensure non-existence of any overlapping in the values of two
hypothesis alternative hypothesis is usually denoted by H1.

Example: 1.
A car dealer advertises that its new petrol model gives average of 20km/liter. Let  be
the mean of the mileage distribution of there cars. We want to test the claim of the car
dealer is true i.e. the null hypothesis is

Ho:  = 20 km/liter

The claim of dealer is rejected if  < 20 hence the alternative hypothesis.

H1:  < 20

Example: 2

A company manufactures ball bearings for precision machines; the average diameter of
certain type of ball bearing should be 6.0mm. To check that the average diameter is
correct, the company decides to formulate a statistical test.
The company wants to test the mean diameter of ball bearing  = 6.0mm. Therefore,

Ho:  = 6.0mm

An error either way could occur and it would be serious therefore

H1:  ≠ 6.0 (u is either smaller than or larger than 6.0)

Example: 3
A package delivery service claims it taken 12 days to send a package from Delhi to New
York. An independent consumer agency is doing a study to test the truth of this claim.
Seven complaints have led the agency to suspect the delivery time in longer than 12 days.

Ho:  = 12 days
If delivery service does not underrate itself, than the only reasonable alternative
hypothesis is
H1:  > 12.

Type I and Type II errors


If we reject the null hypothesis Ho when it is infect true, we have made an error that is
called a type I error on the other hand if we accept the null hypothesis when it is infact
false we have made an error that is called type II error.

Type I and Type II Error


-------------------------------------------------------------------------------------------
Our decision
------------------------------------------------------------------------------------------
Truth of Ho if we accept Ho if we reject Ho
---------------------------------------------------------------------------------------------
If Ho is true correct decision type I error
No Error

If Ho is false type II error correct decision


No error
--------------------------------------------------------------------------------------------------

In order for test of hypotheses to be good they must be designed to minimize possible
errors of decision. (Often we do not know if an error has been made and therefore we can
only talk about the probability of making an error). Usually for a given sample size an
attempt to reduce the probability of one type of errors result in an increase in the
probability of the other type of error.

In practical situations the type I error is more serious error. If we increase the sample size
it is possible to reduce the both type of errors, but the increasing the sample size may not
always be possible.

P [Rejecting Ho | When Ho it is true] =  = probability of making Type I error

P [Accepting Ho | When Ho is false] =  = probability of making Type II error

In standard hypothesis testing procedure we fix  and try to minimize .  , the


probability of rejecting Ho when Ho is true is also known as level of significance of the
test. In good statistical practice,  is specified in advance before any sample are drawn
so that

In practice we generally choose level of significance  = .05 or .01 although other


values are also used. If, for example, .05 or 5% level of significance is chosen in
designing a test of hypothesis then we are 95% confident that we have made the right
decision or in other words we could be wrong with probability 0.05.
or
5% chances that we reject the null hypothesis Ho when it should be accepted.
The quantity 1-  is called the power of test and represents the probability of rejecting Ho
when it is false.

1. The power of test increases when  increases. A test performed at  =.05 level
have more power than one at  = 0.01
2. The power of test increase as sample size n increase

One tailed and two tailed tests

Right Tail Test

Ho: μ = 50 (say)
H1: μ > 50
Sampling distribution

H1 carries >

Two Tail Test

Ho: μ = μo (50)
H1: μ  μo Two side test

H1 carries 

Left Tail Test


Ho: μ = μo (50)
H1: μ < μo

H1 carries <

Note : Notice that null hypothesis is written in the same way regardless of the
what the alternative hypothesis is

Test for Means

Test statistic based on large sample n>30

Z- test

Test statistic based on small sample n<30

t- test

For sample size n>30 we call large samples tests. For large sample n>30, the sampling
distribution of many statistics are approximately normal.

For sample size n<30, the test is called small sample test. The approximation of normal
distribution is not good and it becomes worse with decreasing N, so the appropriate
modifications must be made.

Test of significance based on large samples (n>30)

Z- statistic for
1. One sample test of means
(a) Population variance known
(b) Population variance unknown
2. Two sample test of means

Example:- A manufacturer claims that its radial tyre have an average life of 40000
Kilometer. A Sample of n=49 tyres were inspected, it is found from the
sample that mean life of tyre is 38000 km. It is also known that population
of tyre mileage has standard deviation of 3500 km. Test the claim of
manufacturer at α =.05

In this case we are given

Sine n>30, the sampling distribution is normal

N (μ, σ2) σ2 is known

here μ = population mean and σ2 is the population variance,


Population Variance σ2 is known it is given as 35002

the null hypothesis Ho: μ = 40000


H1: μ < 4000 (one tail test, left tail)

The Z- statistic is given by

x
Z  (Z has standard normal distribution N(0,1))
/ n

Tabulated Values of Zα
Level of Significance (α)
1% 5% 10%
-----------------------------------------------------------------------------------------------------------
Two-tail test Z = 2.58 Z = 1.96 Z= 1.645
Right tail test Z = 2.33 Z = 1.645 Z = 1.28
Left tail test Z = 2.33 Z = -1.645 Z = -1.28

3800  40000
Z=  4.0
3000 / 49

Example: 2
Record for several years of applicants for admission into engineering college for a test
showed that their mean score is 115. An administrator is interested in knowing whether
the caliber of the recent applicant has changed. For the purpose of testing this hypothesis,
the score of a sample of 100 students from the scores of recent applications is obtained
from the admission office. The mean for this turned out to be 118 and standard deviation
= 28, which may also be assumed for the population as a whole .Test at 5% level of
significance
1. n is large, population variance given = 28, x = 118

2. Ho:   115
H 1 :   115
The test statistic

The Z- statistic is given by

x
Z  (Z has standard normal distribution N(0,1))
/ n

118  115
  1.07
28 / 100

If  is Unknown (n >30)
If the population S.D. is unknown, it must be estimated from the sample data using the
sample standard deviation. The test statistic in this care is given by

x
Z
Sx / n
S
where x is the standard deviation of the sample.

Example: Suppose a sample of 36 observations taken resulting the sample mean life of
radial tyre is 41200 and the Sx = is 3000
To test that claims of manufacture that the average life of radial type is 4000. When the
population standard deviation is unknown use  = .05

Ho:   40000
H 1 :  < 40000

x 41200  40000


Z =   2.4
Sx / n 3000 / 28

Two Sample test of Means

Now we have samples from two populations f ( 1 ,  12 ) and f (  2 ,  22 ) . Our concern is


to test the following hypothesis on the basis of samples x1, , x 2 ,....x n1 and
y1, , y 2 ,.... y n 2 .

H 0 : 1   2
H 1 : 1   2 or 1   2 or 1   2

The test statistic is given by

x y
Z
 12  22 when  1 and  2 are known

n1 n2
x y
Z
S12 S 22

n1 n2

S12 is the sample standard deviation of sample x1, , x 2 ,....x n1

S 22 is the sample standard deviation of sample y1, , y 2 ,.... y n 2

Test of Proportions

One – sample test of proportions (Large sample n > 30)

Tests of proportions are appropriate when data being analyzed consists of counts or
frequencies of items in two or more classes. The purpose of such tests is to evaluate
claims about a population proportions (or percentage). The test are based on the premise
that a sample proportion (i.e. x occurrences in n observation, or x/n) will equal to the true
population proportions if allowance are made for sampling variability.

Example
A sample of 142 spare parts from a large shipment is inspected and 8 % are found to be
defective. The supplier from whom the spare parts have been purchased has guaranteed
that no more than 6 % of the parts in any shipment will be deference. The question to be
answered by significance testing in whether the vendors claim is true. Use  =.05

Ho p = 6% (true percent defective claim)


H1 p > 6% (percent deference is greater than 6%)

The test statistic is given by

Z = sample proportion - claimed proportion


Standard deviation of proportion

x
p .08  .06
Z= n =  1.0
.06(.94) / 142
p (1  p ) / n
Z lies in acceptance region the accept Ho

Example 2:- A manufacture claims a shipment of finishing nails contains less than 1 %
defective. A random sample of 200 nails contains 4 (i.e. 2 %) defective. Test this claim at
  .05

Ho p = 1% (true percent defective claim)


H1 p > 1% (percent deference is greater than 1%)

The test statistic is given by

Z = sample proportion - claimed proportion


Standard deviation of proportion

x
p .02  .01
= n =  1.43
.01(.99) / 200
p (1  p ) / n

lies in acceptance region the accept Ho

Example 3:- A survey claims that 90 out of 100 doctors (i.e. 90 % ) recommend aspirin
for their patients who have children. Test this claim, at the 5 % level of significance,
against the alternative that the actual proportion of doctors who do this is less than 90 %,
if a random sample of 100 doctors results in 80 who indicate that they recommend aspirin

Ho: p = 90 %
H1; p< 90%

The test statistic is given by

x
p .8  .9
Z= n =  3.33
.9(.1) / 100
p (1  p ) / n

Z lies in acceptance region the accept Ho

Example 3:- A news paper stats that approximately 25 % of the adults in its circulation
area are illiterate according to government standards. Test this claim against the
alternative that the true percentage is not 25 % and use a 5 % probability of a type one
error. A sample of 740 persons indicate that 20 % would be judged illiterate using the
same standards

Ho p = 25 %
H1 P  25%

x
p .2  .25
Z= n =  3.1
.25(.75) / 740
p (1  p ) / n
Z lies in the rejection region , we reject accept Ho
The null hypothesis is rejected and conclude that the actual percentage is less than 25 %,
as shown in the figurine.

Two sample Test of Proportion ( n1  n2 > 30)

The purpose of a two-sample test is to decide if the two independent sample have been
drawn from population that both have the same proportion of items with a certain
characteristic. The test statistic is compared to a table value of a normal distribution in
order to decides whether to accept or reject Ho.

The null hypothesis in a two sample test is


Ho : p1= p2

The possible hypotheses are

H1 : p1  p2, or H1: P1>P2, or H1: p1<p2

The combined (pooled) estimate of p can be computed as


X1  X 2 1 1
p p  p(1  p)(  )
n1  n2 n1 n2

Whenever X1= The number of successes in sample 1


X 2=The number of successes in sample 2
n1=The number of observations in sample 1
n2=The number of observations in sample 2

The Z statistic for two sample test of proportion is given by


X1 X 2

n1 n2
Z
p

Example :- consider the situation, voters in two cities are asked whether they are for or
against passage of bill on electoral reform that is currently before the state legislature. To
determine if the voters in two cities differ in terms of the percentage who favour passage
of the bill, sample of 100 voters is taken in each city. Thirty in one city favour passage
while 20 is other favor passage.

The null hypothesis


Ho ; p1 = p2
H1 ; p1  p2

(A two fail test is called for because the problem does not specify that the percentage in
one city is thought to be larger or higher than the percentage in the over city). The test
statistics is

.3  .2
X1  X 2 30  20  1.63
p =  .25 Z = .25(.75)( 1  1 )
n1  n2 100  100
100 100

Because the test statistic Z has a value with in the acceptance region the accept Ho. The
two cities does not differ in terms of percentage who favour passage of the bill.

Example :- suppose in the proceeding example that it had been (H1) hypothesized that P1
was gratur than P2. The test would have been setup in the following manner

Ho ; p1 = p2
H1 ; p1  p2

You might also like